User Behavior Analysis | Website Analytics

4.2 Analyze User Behavior by Device, Source and Location

Welcome to Phase 4 2 Understanding Your Audience
You have already looked at your overall website traffic trends which gave you a big picture view. Now it is time to understand who your visitors are and how they use your website. In this phase we will explore user behavior by different groups. We will look at device categories like mobile or desktop. We will check traffic sources like Google or direct visits. We will also look at where visitors come from such as country or city. Knowing these differences helps you understand your audience better. This makes it easier to create content and marketing that fits their needs.

This step is important to personalize your website experience and to spend your marketing budget wisely.


Why Analyze User Behavior by Segments

Breaking down your audience into groups and studying their behavior gives you valuable insights. Here is why it matters.

Tailor Content Different user groups may want different types of content or have different needs. Knowing this helps you create more relevant content.

Optimize User Experience A website that works well on a desktop may not work well on a mobile device. Looking at devices helps you fix these problems.

Improve Marketing Campaigns If some traffic sources bring more active users you can focus your marketing on those channels.

Localize Strategies Understanding where your users come from helps you create content or offers that fit specific places.

Identify Opportunities You can find user groups that are not yet well served or that bring high value. This opens new chances to grow your website.

Analyzing user behavior by segments helps you move beyond general numbers and build more focused and effective plans.


Graph showing website user behavior segmented by device, traffic source and location

Key Metrics for Segmented User Behavior Analysis

To understand how different user groups behave, we focus on these main metrics.

Total Users The count of unique visitors in each group.

Sessions The total number of visits from each group.

Page Views The total number of pages viewed by each group.

Bounce Rate The percentage of visits where users viewed only one page in each group.


Python Code to Analyze User Behavior by Segments

We will write a Python script to fetch your event and user data from SQL Server. It will then group this data by device category traffic source and geographic location. This will show you how different segments interact with your website.

Practical Python Code Example
Here is a basic example of the Python code you will write. This code will connect to your SQL Server database. It will fetch event data. It will then calculate key metrics for each segment.

# SCRIPT SETUP 1: Import necessary libraries import pandas as pd import pyodbc # SCRIPT SETUP 2: Database connection details for SQL Server # Replace with your actual server name database name and credentials DB_CONFIG = { 'driver': '{ODBC Driver 17 for SQL Server}', 'server': 'YOUR_SQL_SERVER_NAME', # e.g., 'localhost' or 'SERVER_NAME\SQLEXPRESS' 'database': 'YOUR_DATABASE_NAME', 'uid': 'YOUR_USERNAME', 'pwd': 'YOUR_PASSWORD' } # FUNCTION 1: Connect to the database def connect_to_db(): """Establishes a connection to the SQL Server database.""" conn_str = ( f"DRIVER={DB_CONFIG['driver']};" f"SERVER={DB_CONFIG['server']};" f"DATABASE={DB_CONFIG['database']};" f"UID={DB_CONFIG['uid']};" f"PWD={DB_CONFIG['pwd']};" ) try: conn = pyodbc.connect(conn_str) print("Successfully connected to SQL Server.") return conn except pyodbc.Error as ex: sqlstate = ex.args[0] print(f"Database connection error: {sqlstate}") print(ex) return None # FUNCTION 2: Fetch event data def fetch_event_data(conn): """Fetches necessary event data from the SQL Server events table.""" query = """ SELECT user_pseudo_id, session_id, event_name, device_category, traffic_source, traffic_medium, geo_country, geo_city FROM events; """ try: df = pd.read_sql(query, conn) print(f"Fetched {len(df)} event records for behavior analysis.") return df except Exception as e: print(f"Error fetching event data: {e}") return None # FUNCTION 3: Analyze User Behavior by Segment def analyze_user_behavior_by_segment(df_events): """Calculates key metrics for user behavior segmented by device, source, and location.""" if df_events is None or df_events.empty: print("No event data to analyze user behavior.") return None # Ensure relevant columns are strings and fill missing values df_events['device_category'] = df_events['device_category'].fillna('unknown').astype(str) df_events['traffic_source'] = df_events['traffic_source'].fillna('unknown').astype(str) df_events['traffic_medium'] = df_events['traffic_medium'].fillna('unknown').astype(str) df_events['geo_country'] = df_events['geo_country'].fillna('unknown').astype(str) df_events['geo_city'] = df_events['geo_city'].fillna('unknown').astype(str) # STEP 3.1: Calculate metrics by Device Category print("\n--- User Behavior by Device Category ---") device_behavior = df_events.groupby('device_category').agg( total_users=('user_pseudo_id', 'nunique'), total_sessions=('session_id', 'nunique'), total_page_views=('event_name', lambda x: (x == 'page_view').sum()) ).reset_index() # Calculate bounce rate for device category # Get event count per session events_per_session = df_events.groupby(['user_pseudo_id', 'session_id']).size().reset_index(name='event_count') # Merge event count back to df_events df_events_with_counts = pd.merge(df_events, events_per_session, on=['user_pseudo_id', 'session_id'], how='left') # Identify bounced sessions (single event sessions that were page views) bounced_sessions_device = df_events_with_counts[ (df_events_with_counts['event_count'] == 1) & (df_events_with_counts['event_name'] == 'page_view') ].groupby('device_category')['session_id'].nunique().reset_index(name='bounced_sessions') device_behavior = pd.merge(device_behavior, bounced_sessions_device, on='device_category', how='left').fillna(0) device_behavior['bounce_rate'] = (device_behavior['bounced_sessions'] / device_behavior['total_sessions']) * 100 device_behavior['bounce_rate'] = device_behavior['bounce_rate'].fillna(0).round(2) print(device_behavior.sort_values('total_sessions', ascending=False)) # STEP 3.2: Calculate metrics by Traffic Source and Medium print("\n--- User Behavior by Traffic Source / Medium ---") source_medium_behavior = df_events.groupby(['traffic_source', 'traffic_medium']).agg( total_users=('user_pseudo_id', 'nunique'), total_sessions=('session_id', 'nunique'), total_page_views=('event_name', lambda x: (x == 'page_view').sum()) ).reset_index() # Calculate bounce rate for source/medium bounced_sessions_source_medium = df_events_with_counts[ (df_events_with_counts['event_count'] == 1) & (df_events_with_counts['event_name'] == 'page_view') ].groupby(['traffic_source', 'traffic_medium'])['session_id'].nunique().reset_index(name='bounced_sessions') source_medium_behavior = pd.merge(source_medium_behavior, bounced_sessions_source_medium, on=['traffic_source', 'traffic_medium'], how='left').fillna(0) source_medium_behavior['bounce_rate'] = (source_medium_behavior['bounced_sessions'] / source_medium_behavior['total_sessions']) * 100 source_medium_behavior['bounce_rate'] = source_medium_behavior['bounce_rate'].fillna(0).round(2) print(source_medium_behavior.sort_values('total_sessions', ascending=False).head(10)) # Show top 10 # STEP 3.3: Calculate metrics by Geo Country print("\n--- User Behavior by Geo Country ---") country_behavior = df_events.groupby('geo_country').agg( total_users=('user_pseudo_id', 'nunique'), total_sessions=('session_id', 'nunique'), total_page_views=('event_name', lambda x: (x == 'page_view').sum()) ).reset_index() # Calculate bounce rate for country bounced_sessions_country = df_events_with_counts[ (df_events_with_counts['event_count'] == 1) & (df_events_with_counts['event_name'] == 'page_view') ].groupby('geo_country')['session_id'].nunique().reset_index(name='bounced_sessions') country_behavior = pd.merge(country_behavior, bounced_sessions_country, on='geo_country', how='left').fillna(0) country_behavior['bounce_rate'] = (country_behavior['bounced_sessions'] / country_behavior['total_sessions']) * 100 country_behavior['bounce_rate'] = country_behavior['bounce_rate'].fillna(0).round(2) print(country_behavior.sort_values('total_sessions', ascending=False).head(10)) # Show top 10 # STEP 3.4: Calculate metrics by Geo City (optional, can be very granular) # This can be very detailed, so only show top few cities print("\n--- User Behavior by Geo City (Top 10) ---") city_behavior = df_events.groupby('geo_city').agg( total_users=('user_pseudo_id', 'nunique'), total_sessions=('session_id', 'nunique'), total_page_views=('event_name', lambda x: (x == 'page_view').sum()) ).reset_index() # Calculate bounce rate for city bounced_sessions_city = df_events_with_counts[ (df_events_with_counts['event_count'] == 1) & (df_events_with_counts['event_name'] == 'page_view') ].groupby('geo_city')['session_id'].nunique().reset_index(name='bounced_sessions') city_behavior = pd.merge(city_behavior, bounced_sessions_city, on='geo_city', how='left').fillna(0) city_behavior['bounce_rate'] = (city_behavior['bounced_sessions'] / city_behavior['total_sessions']) * 100 city_behavior['bounce_rate'] = city_behavior['bounce_rate'].fillna(0).round(2) print(city_behavior.sort_values('total_sessions', ascending=False).head(10)) return device_behavior, source_medium_behavior, country_behavior, city_behavior # MAIN EXECUTION 1: This block runs when the script starts if __name__ == "__main__": # MAIN EXECUTION 2: Connect to the database conn = connect_to_db() if conn: # MAIN EXECUTION 3: Fetch event data events_df = fetch_event_data(conn) # MAIN EXECUTION 4: Analyze user behavior if events_df is not None: device_bhv, source_medium_bhv, country_bhv, city_bhv = analyze_user_behavior_by_segment(events_df) if device_bhv is not None: print("\n--- User Behavior Analysis Completed ---") # You can now use these DataFrames for plotting or further analysis # For example, to save to new CSVs: # device_bhv.to_csv('E:/SankalanAnalytics/data/processed/device_behavior.csv', index=False) # source_medium_bhv.to_csv('E:/SankalanAnalytics/data/processed/source_medium_behavior.csv', index=False) # country_bhv.to_csv('E:/SankalanAnalytics/data/processed/country_behavior.csv', index=False) # MAIN EXECUTION 5: Close the database connection conn.close() print("Database connection closed.") else: print("Could not establish database connection. Exiting.")

Important Notes on This Code

This script connects to your SQL Server database to pull event data. It then uses Pandas to group and summarize this data by device category, traffic source, and geographic location. The bounce rate calculation assumes a bounce is a session with only one page view event, which is a common definition. The script fills missing values with "unknown" to avoid errors during calculations. City-level analysis can be very detailed, so you may want to focus on top cities or group smaller ones depending on your data size. Remember to update your actual SQL Server details in the DB CONFIG section. This includes your server name, database name, username and password.


Understanding Your Python User Behavior Analysis Script

This Python script helps you understand how different groups of users interact with your website. It pulls data from your SQL Server database and calculates key metrics for each segment. Let’s break down each part of the code.

1. Setting Up Your Tools and Connections

At the top of the script, you’ll see some import lines. These bring in the tools we need:

  • import pandas as pd

    We use Pandas to work with data in table form and perform calculations.

  • import pyodbc

    This lets Python connect to your SQL Server database.

Next is DB_CONFIG, which holds your database connection details. You need to update it with your actual server name, database name, username, and password.

2. Connecting to Your Database

The connect_to_db function tries to connect to your database using the details in DB_CONFIG. It builds a connection string and attempts to log in. It prints messages to let you know if the connection was successful or if there was an error.

3. Fetching Event Data

The fetch_event_data function runs a SQL query to pull specific columns like user ID, session ID, event name, device category, traffic source, traffic medium, country, and city from your events table. It uses pd.read_sql to load this data into a Pandas DataFrame. It prints how many records were fetched and shows an error message if something goes wrong.

4. Analyze User Behavior by Segment

This is the main part where the analysis happens. The analyze_user_behavior_by_segment function:

  • Prepares the data by filling missing values with "unknown" and ensuring columns are treated as text.
  • Calculates total users, sessions, page views, and bounce rate grouped by device category.
  • Calculates the same metrics grouped by traffic source and traffic medium.
  • Calculates these metrics by geographic country and prints the top 10 countries by sessions.
  • Optionally calculates metrics by city, showing the top 10 cities by sessions due to detail level.

The function prints separate tables with these insights for each segment.

5. Running the Script

The main block runs when you start the Python file. It:

  • Connects to the database
  • Fetches event data
  • Analyzes user behavior and prints the results
  • Closes the database connection to free resources

Why This Script Matters

This script is a powerful tool for understanding your website audience. By segmenting user behavior by device, source, and location, you gain deep insights into who your visitors are and how they interact with your site. This lets you tailor content, marketing, and design to better serve different user groups. It shows your skill in detailed audience analysis, a key part of web analytics.


Next Steps

After running this script, you’ll see how different user segments behave on your website. This means you’ve analyzed an important part of your audience. Next, you’ll look at top pages and pages with high bounce rates to find popular content and areas needing improvement.

For now, save this Python script in your E:\SankalanAnalytics\backend folder and name it something like analyze_segments.py.


Previous Topic==> Website Traffic Trends! ||  Next Topics==> Top and high Bounce Pages


SQL All topics Wise Interview Questions    Employee Salary Management SQL FAQ!.   C FAQ   Top 25 PL/SQL Interview Questions

Joins With Group by Having  Equi Join  Joins with Subqueries  Self Join  Outer Join 


Website Analytics Project: Phases and Action Steps

  • Home
  • 🟢 Live App: Web Analytics Simulator
  • Phase 0: Project Setup & Problem Definition
  • 0.1 Define Project Goals & Challenges
  • 0.2 Select Tools, Tech Stack & Data Sources
  • 0.3 Software Requirements & Installation
  • 0.4 Folder Structure & GitHub Repo
  • 0.5 Testing Project Locally
  • Phase 1: Planning for Analytics
  • 1.1 Website Analytics Project Overview
  • 1.2 Define KPIs, Bounce Rate, Engagement
  • 1.3 Identify Target Users & Pain Points
  • Phase 2: Data Collection
  • 2.1 Setup Google Analytics 4 (GA4)
  • 2.2 Export GA4 Data to BigQuery/CSV
  • 2.3 Design SQL Schema for Web Analytics
  • Phase 3: Data Cleaning & Feature Engineering
  • 3.1 Clean Website Data with Python & Pandas
  • 3.2 Create Custom Metrics (Session, Bounce, etc.)
  • Phase 4: Exploratory Data Analysis (EDA)
  • 4.1 Analyze Website Traffic Trends
  • 4.2 Behavior by Device, Source, Location
  • 4.3 Top Pages & High Bounce Pages
  • 4.4 Diagnose Low Traffic & User Drop
  • Phase 5: Business Insights
  • 5.1 Funnel Analysis & Drop-Off Points
  • 5.2 New vs Returning Users
  • 5.3 Time Spent & Scroll Depth
  • Phase 6: SQL for Business
  • 6.1 SQL for Business Insights
  • 6.2 Combine Web Data Using SQL
  • 6.3 Find Problematic Pages Using SQL
  • Phase 7: Machine Learning
  • 7.1 Segment Users with Clustering
  • 7.2 Predict Bounce Rate with ML
  • 7.3 Recommend Pages or Content
  • Phase 8: Dashboards & Visualization
  • 8.1 Dashboard with Streamlit
  • 8.2 Visualize KPIs with Python
  • 8.3 Page-Level Metrics & Drop Heatmaps
  • Phase 9: Final Analytics Story
  • 9.1 Summary Report & Findings
  • Phase 10: Hosting & Portfolio Presentation
  • 10.1 Host Website Project Online
  • 10.2 Add to GitHub with ReadMe
  • 10.3 Pitch Project in Interview
  • Other Topics
  • SQL Interview Questions
  • SQL Case Study: Account Management
  • Python Interview Questions
  • Why C Language

Get in touch

  • tech2dsm@gmail.com

© Sankalan Data Tech. All rights reserved.