4.2 Analyze User Behavior by Device, Source and Location

Welcome to Phase 4 2 Understanding Your Audience
You have already looked at your overall website traffic trends which gave you a big picture view. Now it is time to understand who your visitors are and how they use your website. In this phase we will explore user behavior by different groups. We will look at device categories like mobile or desktop. We will check traffic sources like Google or direct visits. We will also look at where visitors come from such as country or city. Knowing these differences helps you understand your audience better. This makes it easier to create content and marketing that fits their needs.

This step is important to personalize your website experience and to spend your marketing budget wisely.

Why Analyze User Behavior by Segments

Breaking down your audience into groups and studying their behavior gives you valuable insights. Here is why it matters.

Tailor Content Different user groups may want different types of content or have different needs. Knowing this helps you create more relevant content.

Optimize User Experience A website that works well on a desktop may not work well on a mobile device. Looking at devices helps you fix these problems.

Improve Marketing Campaigns If some traffic sources bring more active users you can focus your marketing on those channels.

Localize Strategies Understanding where your users come from helps you create content or offers that fit specific places.

Identify Opportunities You can find user groups that are not yet well served or that bring high value. This opens new chances to grow your website.

Analyzing user behavior by segments helps you move beyond general numbers and build more focused and effective plans.

Graph showing website user behavior segmented by device, traffic source and location

Key Metrics for Segmented User Behavior Analysis

To understand how different user groups behave, we focus on these main metrics.

Total Users The count of unique visitors in each group.

Sessions The total number of visits from each group.

Page Views The total number of pages viewed by each group.

Bounce Rate The percentage of visits where users viewed only one page in each group.

Python Code to Analyze User Behavior by Segments

We will write a Python script to fetch your event and user data from SQL Server. It will then group this data by device category traffic source and geographic location. This will show you how different segments interact with your website.

Practical Python Code Example
Here is a basic example of the Python code you will write. This code will connect to your SQL Server database. It will fetch event data. It will then calculate key metrics for each segment.


# SCRIPT SETUP 1: Import necessary libraries
import pandas as pd
import pyodbc

# SCRIPT SETUP 2: Database connection details for SQL Server
# Replace with your actual server name database name and credentials
DB_CONFIG = {
    'driver': '{ODBC Driver 17 for SQL Server}',
    'server': 'YOUR_SQL_SERVER_NAME',  # e.g., 'localhost' or 'SERVER_NAME\SQLEXPRESS'
    'database': 'YOUR_DATABASE_NAME',
    'uid': 'YOUR_USERNAME',
    'pwd': 'YOUR_PASSWORD'
}

# FUNCTION 1: Connect to the database
def connect_to_db():
    """Establishes a connection to the SQL Server database."""
    conn_str = (
        f"DRIVER={DB_CONFIG['driver']};"
        f"SERVER={DB_CONFIG['server']};"
        f"DATABASE={DB_CONFIG['database']};"
        f"UID={DB_CONFIG['uid']};"
        f"PWD={DB_CONFIG['pwd']};"
    )
    try:
        conn = pyodbc.connect(conn_str)
        print("Successfully connected to SQL Server.")
        return conn
    except pyodbc.Error as ex:
        sqlstate = ex.args[0]
        print(f"Database connection error: {sqlstate}")
        print(ex)
        return None

# FUNCTION 2: Fetch event data
def fetch_event_data(conn):
    """Fetches necessary event data from the SQL Server events table."""
    query = """
    SELECT
        user_pseudo_id,
        session_id,
        event_name,
        device_category,
        traffic_source,
        traffic_medium,
        geo_country,
        geo_city
    FROM
        events;
    """
    try:
        df = pd.read_sql(query, conn)
        print(f"Fetched {len(df)} event records for behavior analysis.")
        return df
    except Exception as e:
        print(f"Error fetching event data: {e}")
        return None

# FUNCTION 3: Analyze User Behavior by Segment
def analyze_user_behavior_by_segment(df_events):
    """Calculates key metrics for user behavior segmented by device, source, and location."""
    if df_events is None or df_events.empty:
        print("No event data to analyze user behavior.")
        return None

    # Ensure relevant columns are strings and fill missing values
    df_events['device_category'] = df_events['device_category'].fillna('unknown').astype(str)
    df_events['traffic_source'] = df_events['traffic_source'].fillna('unknown').astype(str)
    df_events['traffic_medium'] = df_events['traffic_medium'].fillna('unknown').astype(str)
    df_events['geo_country'] = df_events['geo_country'].fillna('unknown').astype(str)
    df_events['geo_city'] = df_events['geo_city'].fillna('unknown').astype(str)

    # STEP 3.1: Calculate metrics by Device Category
    print("\n--- User Behavior by Device Category ---")
    device_behavior = df_events.groupby('device_category').agg(
        total_users=('user_pseudo_id', 'nunique'),
        total_sessions=('session_id', 'nunique'),
        total_page_views=('event_name', lambda x: (x == 'page_view').sum())
    ).reset_index()
    
    # Calculate bounce rate for device category
    # Get event count per session
    events_per_session = df_events.groupby(['user_pseudo_id', 'session_id']).size().reset_index(name='event_count')
    # Merge event count back to df_events
    df_events_with_counts = pd.merge(df_events, events_per_session, on=['user_pseudo_id', 'session_id'], how='left')
    # Identify bounced sessions (single event sessions that were page views)
    bounced_sessions_device = df_events_with_counts[
        (df_events_with_counts['event_count'] == 1) & (df_events_with_counts['event_name'] == 'page_view')
    ].groupby('device_category')['session_id'].nunique().reset_index(name='bounced_sessions')
    
    device_behavior = pd.merge(device_behavior, bounced_sessions_device, on='device_category', how='left').fillna(0)
    device_behavior['bounce_rate'] = (device_behavior['bounced_sessions'] / device_behavior['total_sessions']) * 100
    device_behavior['bounce_rate'] = device_behavior['bounce_rate'].fillna(0).round(2)
    print(device_behavior.sort_values('total_sessions', ascending=False))

    # STEP 3.2: Calculate metrics by Traffic Source and Medium
    print("\n--- User Behavior by Traffic Source / Medium ---")
    source_medium_behavior = df_events.groupby(['traffic_source', 'traffic_medium']).agg(
        total_users=('user_pseudo_id', 'nunique'),
        total_sessions=('session_id', 'nunique'),
        total_page_views=('event_name', lambda x: (x == 'page_view').sum())
    ).reset_index()

    # Calculate bounce rate for source/medium
    bounced_sessions_source_medium = df_events_with_counts[
        (df_events_with_counts['event_count'] == 1) & (df_events_with_counts['event_name'] == 'page_view')
    ].groupby(['traffic_source', 'traffic_medium'])['session_id'].nunique().reset_index(name='bounced_sessions')

    source_medium_behavior = pd.merge(source_medium_behavior, bounced_sessions_source_medium, on=['traffic_source', 'traffic_medium'], how='left').fillna(0)
    source_medium_behavior['bounce_rate'] = (source_medium_behavior['bounced_sessions'] / source_medium_behavior['total_sessions']) * 100
    source_medium_behavior['bounce_rate'] = source_medium_behavior['bounce_rate'].fillna(0).round(2)
    print(source_medium_behavior.sort_values('total_sessions', ascending=False).head(10)) # Show top 10

    # STEP 3.3: Calculate metrics by Geo Country
    print("\n--- User Behavior by Geo Country ---")
    country_behavior = df_events.groupby('geo_country').agg(
        total_users=('user_pseudo_id', 'nunique'),
        total_sessions=('session_id', 'nunique'),
        total_page_views=('event_name', lambda x: (x == 'page_view').sum())
    ).reset_index()

    # Calculate bounce rate for country
    bounced_sessions_country = df_events_with_counts[
        (df_events_with_counts['event_count'] == 1) & (df_events_with_counts['event_name'] == 'page_view')
    ].groupby('geo_country')['session_id'].nunique().reset_index(name='bounced_sessions')

    country_behavior = pd.merge(country_behavior, bounced_sessions_country, on='geo_country', how='left').fillna(0)
    country_behavior['bounce_rate'] = (country_behavior['bounced_sessions'] / country_behavior['total_sessions']) * 100
    country_behavior['bounce_rate'] = country_behavior['bounce_rate'].fillna(0).round(2)
    print(country_behavior.sort_values('total_sessions', ascending=False).head(10)) # Show top 10

    # STEP 3.4: Calculate metrics by Geo City (optional, can be very granular)
    # This can be very detailed, so only show top few cities
    print("\n--- User Behavior by Geo City (Top 10) ---")
    city_behavior = df_events.groupby('geo_city').agg(
        total_users=('user_pseudo_id', 'nunique'),
        total_sessions=('session_id', 'nunique'),
        total_page_views=('event_name', lambda x: (x == 'page_view').sum())
    ).reset_index()
    
    # Calculate bounce rate for city
    bounced_sessions_city = df_events_with_counts[
        (df_events_with_counts['event_count'] == 1) & (df_events_with_counts['event_name'] == 'page_view')
    ].groupby('geo_city')['session_id'].nunique().reset_index(name='bounced_sessions')

    city_behavior = pd.merge(city_behavior, bounced_sessions_city, on='geo_city', how='left').fillna(0)
    city_behavior['bounce_rate'] = (city_behavior['bounced_sessions'] / city_behavior['total_sessions']) * 100
    city_behavior['bounce_rate'] = city_behavior['bounce_rate'].fillna(0).round(2)
    print(city_behavior.sort_values('total_sessions', ascending=False).head(10))

    return device_behavior, source_medium_behavior, country_behavior, city_behavior

# MAIN EXECUTION 1: This block runs when the script starts
if __name__ == "__main__":
    # MAIN EXECUTION 2: Connect to the database
    conn = connect_to_db()
    if conn:
        # MAIN EXECUTION 3: Fetch event data
        events_df = fetch_event_data(conn)
        
        # MAIN EXECUTION 4: Analyze user behavior
        if events_df is not None:
            device_bhv, source_medium_bhv, country_bhv, city_bhv = analyze_user_behavior_by_segment(events_df)
            
            if device_bhv is not None:
                print("\n--- User Behavior Analysis Completed ---")
                # You can now use these DataFrames for plotting or further analysis
                # For example, to save to new CSVs:
                # device_bhv.to_csv('E:/SankalanAnalytics/data/processed/device_behavior.csv', index=False)
                # source_medium_bhv.to_csv('E:/SankalanAnalytics/data/processed/source_medium_behavior.csv', index=False)
                # country_bhv.to_csv('E:/SankalanAnalytics/data/processed/country_behavior.csv', index=False)
        
        # MAIN EXECUTION 5: Close the database connection
        conn.close()
        print("Database connection closed.")
    else:
        print("Could not establish database connection. Exiting.")

Important Notes on This Code

This script connects to your SQL Server database to pull event data. It then uses Pandas to group and summarize this data by device category, traffic source, and geographic location. The bounce rate calculation assumes a bounce is a session with only one page view event, which is a common definition. The script fills missing values with "unknown" to avoid errors during calculations. City-level analysis can be very detailed, so you may want to focus on top cities or group smaller ones depending on your data size. Remember to update your actual SQL Server details in the DB CONFIG section. This includes your server name, database name, username and password.

Understanding Your Python User Behavior Analysis Script

This Python script helps you understand how different groups of users interact with your website. It pulls data from your SQL Server database and calculates key metrics for each segment. Let’s break down each part of the code.

1. Setting Up Your Tools and Connections

At the top of the script, you’ll see some import lines. These bring in the tools we need:

import pandas as pd
We use Pandas to work with data in table form and perform calculations.
import pyodbc
This lets Python connect to your SQL Server database.

Next is DB_CONFIG, which holds your database connection details. You need to update it with your actual server name, database name, username, and password.

2. Connecting to Your Database

The connect_to_db function tries to connect to your database using the details in DB_CONFIG. It builds a connection string and attempts to log in. It prints messages to let you know if the connection was successful or if there was an error.

3. Fetching Event Data

The fetch_event_data function runs a SQL query to pull specific columns like user ID, session ID, event name, device category, traffic source, traffic medium, country, and city from your events table. It uses pd.read_sql to load this data into a Pandas DataFrame. It prints how many records were fetched and shows an error message if something goes wrong.

4. Analyze User Behavior by Segment

This is the main part where the analysis happens. The analyze_user_behavior_by_segment function:

Prepares the data by filling missing values with "unknown" and ensuring columns are treated as text.
Calculates total users, sessions, page views, and bounce rate grouped by device category.
Calculates the same metrics grouped by traffic source and traffic medium.
Calculates these metrics by geographic country and prints the top 10 countries by sessions.
Optionally calculates metrics by city, showing the top 10 cities by sessions due to detail level.

The function prints separate tables with these insights for each segment.

5. Running the Script

The main block runs when you start the Python file. It:

Connects to the database
Fetches event data
Analyzes user behavior and prints the results
Closes the database connection to free resources

Why This Script Matters

This script is a powerful tool for understanding your website audience. By segmenting user behavior by device, source, and location, you gain deep insights into who your visitors are and how they interact with your site. This lets you tailor content, marketing, and design to better serve different user groups. It shows your skill in detailed audience analysis, a key part of web analytics.

Next Steps

After running this script, you’ll see how different user segments behave on your website. This means you’ve analyzed an important part of your audience. Next, you’ll look at top pages and pages with high bounce rates to find popular content and areas needing improvement.

For now, save this Python script in your E:\SankalanAnalytics\backend folder and name it something like analyze_segments.py.