New VS Returning Users| Website Analytics

5.2 Analyze New vs Returning Website Users – Understand Your Core Audience Segments

Welcome to Phase 5 2 Understanding Your Core Audience Segments

You have successfully analyzed user journeys and identified drop off points in your funnels. That gave you valuable insights into conversion paths. Now it is time to differentiate between two fundamental groups of your audience. In this phase we will analyze the behavior of new versus returning users. Understanding the differences between these two groups is like knowing whether you are talking to a first time visitor or a loyal customer. It helps you tailor your website content marketing and overall strategy to better serve each segment.
This step is crucial for developing targeted engagement strategies and optimizing your website for different user needs.


Why Analyze New vs Returning Users

Distinguishing between new and returning users gives you unique and useful insights Here is why it matters so much

Tailor First Impressions New users need clear navigation and interesting content to keep them engaged Returning users may look for updates or deeper info

Optimize Conversion Paths Returning users usually convert more often Knowing their journey helps you make it easier for them to complete actions

Assess Marketing Effectiveness New user data shows how well your marketing brings fresh visitors Returning user data shows how good you are at keeping them coming back

Content Strategy Insights This analysis helps you find out what content attracts new visitors and what keeps returning users engaged

Personalize Experiences You can offer different content or calls to action depending on whether a visitor is new or returning

Analyzing these two segments helps you build a more effective and user centric website.


Graph showing comparison of new and returning website users' engagement metrics

Key Metrics for New vs Returning User Analysis

We will compare these core metrics for both new and returning user segments.

Total Users The number of unique users in each segment

Sessions The total number of sessions started by each segment

Page Views The total number of pages viewed by each segment

Average Session Duration (Estimate) The average time spent per session for each segment

Bounce Rate The percentage of sessions where only one page was viewed for each segment


Python Code to Analyze New vs Returning Users

We will write a Python script to fetch your event and user data from SQL Server. It will then categorize sessions as new or returning and calculate key metrics for each group.

Practical Python Code Example
Here is a basic example of the Python code you will write. This code will connect to your SQL Server database. It will fetch event and user data. It will then classify sessions and compute metrics for new and returning users.

# SCRIPT SETUP 1: Import necessary libraries import pandas as pd import pyodbc from datetime import datetime, timedelta # SCRIPT SETUP 2: Database connection details for SQL Server # Replace with your actual server name database name and credentials DB_CONFIG = { 'driver': '{ODBC Driver 17 for SQL Server}', 'server': 'YOUR_SQL_SERVER_NAME', # e.g., 'localhost' or 'SERVER_NAME\SQLEXPRESS' 'database': 'YOUR_DATABASE_NAME', 'uid': 'YOUR_USERNAME', 'pwd': 'YOUR_PASSWORD' } # FUNCTION 1: Connect to the database def connect_to_db(): """Establishes a connection to the SQL Server database.""" conn_str = ( f"DRIVER={DB_CONFIG['driver']};" f"SERVER={DB_CONFIG['server']};" f"DATABASE={DB_CONFIG['database']};" f"UID={DB_CONFIG['uid']};" f"PWD={DB_CONFIG['pwd']};" ) try: conn = pyodbc.connect(conn_str) print("Successfully connected to SQL Server.") return conn except pyodbc.Error as ex: sqlstate = ex.args[0] print(f"Database connection error: {sqlstate}") print(ex) return None # FUNCTION 2: Fetch event and user data def fetch_analytics_data(conn): """Fetches event and user data from SQL Server.""" query_events = """ SELECT user_pseudo_id, session_id, event_timestamp, event_name, page_location FROM events ORDER BY user_pseudo_id, session_id, event_timestamp; """ query_users = """ SELECT user_pseudo_id, first_visit_timestamp FROM users; """ try: df_events = pd.read_sql(query_events, conn) df_users = pd.read_sql(query_users, conn) print(f"Fetched {len(df_events)} event records and {len(df_users)} user records.") return df_events, df_users except Exception as e: print(f"Error fetching analytics data: {e}") return None, None # FUNCTION 3: Analyze New vs Returning Users def analyze_new_vs_returning(df_events, df_users): """Categorizes sessions as new or returning and calculates metrics for each group.""" if df_events is None or df_events.empty or df_users is None or df_users.empty: print("Missing event or user data for analysis.") return None # STEP 3.1: Convert timestamps to datetime objects df_events['event_datetime'] = pd.to_datetime(df_events['event_timestamp'], unit='us') df_users['first_visit_datetime'] = pd.to_datetime(df_users['first_visit_timestamp'], unit='us') # STEP 3.2: Merge events with user first visit data df_merged = pd.merge(df_events, df_users[['user_pseudo_id', 'first_visit_datetime']], on='user_pseudo_id', how='left') # STEP 3.3: Classify each session as 'New' or 'Returning' # A session is 'New' if its start date matches the user's first visit date # This requires finding the first event timestamp for each session session_first_event_time = df_merged.groupby(['user_pseudo_id', 'session_id'])['event_datetime'].min().reset_index(name='session_start_datetime') df_merged = pd.merge(df_merged, session_first_event_time, on=['user_pseudo_id', 'session_id'], how='left') df_merged['user_type'] = df_merged.apply( lambda row: 'New User' if row['session_start_datetime'].date() == row['first_visit_datetime'].date() else 'Returning User', axis=1 ) # STEP 3.4: Calculate core metrics for each user type metrics_by_user_type = df_merged.groupby('user_type').agg( total_users=('user_pseudo_id', 'nunique'), total_sessions=('session_id', 'nunique'), total_page_views=('event_name', lambda x: (x == 'page_view').sum()) ).reset_index() # STEP 3.5: Calculate Bounce Rate for each user type # Get event count per session for bounce rate calculation events_per_session = df_merged.groupby(['user_pseudo_id', 'session_id']).size().reset_index(name='event_count') df_sessions_with_counts = pd.merge(df_merged.drop_duplicates(subset=['user_pseudo_id', 'session_id', 'user_type']), events_per_session, on=['user_pseudo_id', 'session_id'], how='left') bounced_sessions_by_type = df_sessions_with_counts[ (df_sessions_with_counts['event_count'] == 1) & (df_sessions_with_counts['event_name'] == 'page_view') ].groupby('user_type')['session_id'].nunique().reset_index(name='bounced_sessions') metrics_by_user_type = pd.merge(metrics_by_user_type, bounced_sessions_by_type, on='user_type', how='left').fillna(0) metrics_by_user_type['bounce_rate'] = (metrics_by_user_type['bounced_sessions'] / metrics_by_user_type['total_sessions']) * 100 metrics_by_user_type['bounce_rate'] = metrics_by_user_type['bounce_rate'].fillna(0).round(2) # STEP 3.6: Calculate Average Session Duration (estimation) # For each session, find the min and max timestamp session_duration_df = df_merged.groupby(['user_pseudo_id', 'session_id', 'user_type'])['event_datetime'].agg( session_start='min', session_end='max' ).reset_index() session_duration_df['duration_seconds'] = (session_duration_df['session_end'] - session_duration_df['session_start']).dt.total_seconds() # Filter out sessions with 0 duration (single event sessions) or very short ones if desired # For simplicity, we'll include all for average but note that single-event sessions have 0 duration avg_session_duration = session_duration_df.groupby('user_type')['duration_seconds'].mean().reset_index(name='avg_session_duration_seconds') avg_session_duration['avg_session_duration_minutes'] = (avg_session_duration['avg_session_duration_seconds'] / 60).round(2) metrics_by_user_type = pd.merge(metrics_by_user_type, avg_session_duration[['user_type', 'avg_session_duration_minutes']], on='user_type', how='left').fillna(0) print("\n--- New vs Returning User Metrics ---") print(metrics_by_user_type[[ 'user_type', 'total_users', 'total_sessions', 'total_page_views', 'avg_session_duration_minutes', 'bounce_rate' ]]) return metrics_by_user_type # MAIN EXECUTION 1: This block runs when the script starts if __name__ == "__main__": # MAIN EXECUTION 2: Connect to the database conn = connect_to_db() if conn: # MAIN EXECUTION 3: Fetch event and user data events_df, users_df = fetch_analytics_data(conn) # MAIN EXECUTION 4: Analyze new vs returning users if events_df is not None and users_df is not None: user_type_metrics = analyze_new_vs_returning(events_df, users_df) if user_type_metrics is not None: print("\n--- New vs Returning User Analysis Completed ---") # You can now use 'user_type_metrics' DataFrame for plotting or further analysis # For example, to save to a new CSV: # user_type_metrics.to_csv('E:/SankalanAnalytics/data/processed/new_vs_returning_metrics.csv', index=False) # MAIN EXECUTION 5: Close the database connection conn.close() print("Database connection closed.") else: print("Could not establish database connection. Exiting.")

Important Notes on This Code

This script connects to your SQL Server database to pull event and user data. It then uses Pandas to classify each session as either a new user session or a returning user session. A session is marked as new if its start date matches the user's first visit date. All other sessions for that user are considered returning.

The average session duration is estimated by calculating the time between the first and last event in a session. For single event sessions this duration will be zero. The bounce rate calculation is consistent with previous phases. Remember to fill in your actual SQL Server connection details in the DB CONFIG section. This includes your server name database name username and password.


Understanding Your Python New vs Returning User Analysis Script

This Python script helps you gain valuable insights by comparing how new visitors behave versus returning visitors on your website. It pulls data from your SQL Server database and performs a detailed analysis for each group. Let’s break down each part of the code.

1. Setting Up Your Tools and Connections

At the top of the script, you’ll see some import lines bringing in the tools needed for this work. First is import pandas as pd which lets us work with data tables and do calculations. Next is import pyodbc which allows Python to connect to your SQL Server database. Then from datetime import datetime, timedelta helps work with dates and time differences. After that, you’ll see DB_CONFIG where you enter your actual server name, database name, username, and password to connect to your database.

2. Connecting to Your Database

The connect_to_db function handles connecting to your database. It builds a connection string using the info in DB_CONFIG and tries to open the connection. It will print messages to confirm if the connection was successful or if there was an error.

3. Fetching Event and User Data

The fetch_analytics_data function fetches the raw event and user data from your database. It runs two SQL queries: one to get event data like user IDs, sessions, timestamps, and page views; the other to get user info including their first visit timestamp. These results are loaded into two Pandas DataFrames for analysis. The function also reports how many records it fetched and handles any errors.

4. Analyze New vs Returning Users

This core function classifies each session as belonging to a new or returning user and calculates key metrics for both groups. Here’s how it works:

Convert timestamps: Changes numeric timestamps into readable dates and times.
Merge data: Combines event data with each user’s first visit date.
Classify sessions: Marks sessions as 'New User' if they happen on the user’s first visit date; otherwise 'Returning User'.
Calculate metrics: Counts total users, sessions, and page views per user type.
Calculate bounce rate: Measures the percentage of single-page sessions per user type.
Estimate session duration: Finds average session time by calculating the difference between the first and last event timestamps in each session.

The function then prints a clear summary table of these metrics for new and returning users.

5. Running the Script The Main Block

This part runs when you start the Python file. It connects to your database, fetches the event and user data, analyzes new vs returning users, prints the results, and finally closes the database connection to free resources.

Overall Value of This Script

This Python script is a valuable tool for tailoring your website strategy. By understanding the different behaviors of new and returning visitors, you can create more effective content and personalized experiences. For example, you might provide clear guidance for first-time visitors and special offers or updates for returning ones. This shows your ability to do advanced audience segmentation — a key skill in web analytics and user personalization.


Next Steps

After running this Python script, you’ll have a clear comparison of how new and returning users interact with your website. This means you’ve successfully analyzed two important audience groups. The next exciting phase will be to analyze time spent and scroll depth, giving you even deeper insights into user engagement page by page. For now, save this script in your E drive under SankalanAnalytics backend folder and name it something like analyze_user_types.py.


Previous Topic==> Funnel Analysis  Next Topics==> Scroll Depth


SQL All topics Wise Interview Questions    Employee Salary Management SQL FAQ!.   C FAQ   Top 25 PL/SQL Interview Questions

Joins With Group by Having  Equi Join  Joins with Subqueries  Self Join  Outer Join 


Website Analytics Project: Phases and Action Steps

  • Home
  • 🟢 Live App: Web Analytics Simulator
  • Phase 0: Project Setup & Problem Definition
  • 0.1 Define Project Goals & Challenges
  • 0.2 Select Tools, Tech Stack & Data Sources
  • 0.3 Software Requirements & Installation
  • 0.4 Folder Structure & GitHub Repo
  • 0.5 Testing Project Locally
  • Phase 1: Planning for Analytics
  • 1.1 Website Analytics Project Overview
  • 1.2 Define KPIs, Bounce Rate, Engagement
  • 1.3 Identify Target Users & Pain Points
  • Phase 2: Data Collection
  • 2.1 Setup Google Analytics 4 (GA4)
  • 2.2 Export GA4 Data to BigQuery/CSV
  • 2.3 Design SQL Schema for Web Analytics
  • Phase 3: Data Cleaning & Feature Engineering
  • 3.1 Clean Website Data with Python & Pandas
  • 3.2 Create Custom Metrics (Session, Bounce, etc.)
  • Phase 4: Exploratory Data Analysis (EDA)
  • 4.1 Analyze Website Traffic Trends
  • 4.2 Behavior by Device, Source, Location
  • 4.3 Top Pages & High Bounce Pages
  • 4.4 Diagnose Low Traffic & User Drop
  • Phase 5: Business Insights
  • 5.1 Funnel Analysis & Drop-Off Points
  • 5.2 New vs Returning Users
  • 5.3 Time Spent & Scroll Depth
  • Phase 6: SQL for Business
  • 6.1 SQL for Business Insights
  • 6.2 Combine Web Data Using SQL
  • 6.3 Find Problematic Pages Using SQL
  • Phase 7: Machine Learning
  • 7.1 Segment Users with Clustering
  • 7.2 Predict Bounce Rate with ML
  • 7.3 Recommend Pages or Content
  • Phase 8: Dashboards & Visualization
  • 8.1 Dashboard with Streamlit
  • 8.2 Visualize KPIs with Python
  • 8.3 Page-Level Metrics & Drop Heatmaps
  • Phase 9: Final Analytics Story
  • 9.1 Summary Report & Findings
  • Phase 10: Hosting & Portfolio Presentation
  • 10.1 Host Website Project Online
  • 10.2 Add to GitHub with ReadMe
  • 10.3 Pitch Project in Interview
  • Other Topics
  • SQL Interview Questions
  • SQL Case Study: Account Management
  • Python Interview Questions
  • Why C Language

Get in touch

  • tech2dsm@gmail.com

© Sankalan Data Tech. All rights reserved.