4.4 Diagnose Low Traffic and User Drop Off on Your Website

Welcome to Phase 4.4 Troubleshooting Website Performance
You have successfully analyzed overall traffic trends and user behavior by different segments This gave you a strong understanding of your websites performance Now it is time to put on your detective hat In this phase we will focus on diagnosing specific problems We will investigate why traffic might be low or why users might be leaving your website This involves looking for sudden drops in visitors or identifying pages where users stop engaging Understanding these issues is like finding the leaks in your websites bucket It helps you fix problems and prevent further loss of visitors

This step is crucial for proactive website management and continuous improvement

Why Diagnose Low Traffic and User Drop Off

Proactively identifying and understanding traffic dips or user disengagement is vital Here is why it is so valuable

Prevent Further Loss Early detection of traffic drops allows you to react quickly This minimizes negative impact on your website goals

Identify Root Causes Is the drop due to a technical issue a change in marketing a seasonal trend or something else Diagnosis helps pinpoint the exact reason

Optimize Content and Design If users are dropping off on specific pages it might indicate poor content confusing design or slow loading times

Improve User Experience By understanding where and why users leave you can make targeted improvements to their journey This leads to better satisfaction and engagement

Validate Changes If you made recent changes to your website or marketing campaigns diagnosing drops helps you see if those changes had unintended negative effects

This diagnosis helps you move from simply observing data to actively solving problems It makes your web analytics project truly impactful

Website analytics dashboard showing traffic drop and user engagement metrics

Key Approaches for Diagnosing Traffic and User Drop Off

We will use a combination of methods to identify and understand these issues

Period Comparison Compare current traffic and engagement metrics to a previous period This helps highlight significant changes

Page Level Drop Off Identify specific pages where users frequently exit or stop engaging

Source Channel Decline Determine if traffic from particular sources or marketing channels has decreased

Event Analysis Look for changes in specific event counts that might signal a problem for example fewer form submissions or video plays

Python Code to Diagnose Low Traffic and User Drop Off

We will write a Python script to fetch your event data from SQL Server. It will then perform comparisons and identify areas of concern.

Practical Python Code Example
Here is a basic example of the Python code you will write. This code will connect to your SQL Server database. It will fetch event data. It will then compare traffic between two periods and identify pages with high exit rates.


# SCRIPT SETUP 1: Import necessary libraries
import pandas as pd
import pyodbc
from datetime import datetime, timedelta

# SCRIPT SETUP 2: Database connection details for SQL Server
# Replace with your actual server name database name and credentials
DB_CONFIG = {
    'driver': '{ODBC Driver 17 for SQL Server}',
    'server': 'YOUR_SQL_SERVER_NAME',  # e.g., 'localhost' or 'SERVER_NAME\SQLEXPRESS'
    'database': 'YOUR_DATABASE_NAME',
    'uid': 'YOUR_USERNAME',
    'pwd': 'YOUR_PASSWORD'
}

# FUNCTION 1: Connect to the database
def connect_to_db():
    """Establishes a connection to the SQL Server database."""
    conn_str = (
        f"DRIVER={DB_CONFIG['driver']};"
        f"SERVER={DB_CONFIG['server']};"
        f"DATABASE={DB_CONFIG['database']};"
        f"UID={DB_CONFIG['uid']};"
        f"PWD={DB_CONFIG['pwd']};"
    )
    try:
        conn = pyodbc.connect(conn_str)
        print("Successfully connected to SQL Server.")
        return conn
    except pyodbc.Error as ex:
        sqlstate = ex.args[0]
        print(f"Database connection error: {sqlstate}")
        print(ex)
        return None

# FUNCTION 2: Fetch event data within a date range
def fetch_events_in_range(conn, start_date, end_date):
    """Fetches event data from SQL Server within a specified date range."""
    # Convert dates to Unix microseconds for comparison with event_timestamp
    start_timestamp_us = int(datetime.combine(start_date, datetime.min.time()).timestamp() * 1_000_000)
    end_timestamp_us = int(datetime.combine(end_date + timedelta(days=1), datetime.min.time()).timestamp() * 1_000_000) - 1 # End of day

    query = f"""
    SELECT
        user_pseudo_id,
        session_id,
        event_timestamp,
        event_name,
        page_location,
        traffic_source,
        traffic_medium
    FROM
        events
    WHERE
        event_timestamp >= {start_timestamp_us} AND event_timestamp <= {end_timestamp_us}
    ORDER BY
        user_pseudo_id, session_id, event_timestamp;
    """
    try:
        df = pd.read_sql(query, conn)
        print(f"Fetched {len(df)} event records for range {start_date} to {end_date}.")
        return df
    except Exception as e:
        print(f"Error fetching event data for range {start_date} to {end_date}: {e}")
        return None

# FUNCTION 3: Calculate core metrics for a DataFrame
def calculate_core_metrics(df):
    """Calculates total users, sessions, and page views for a given DataFrame."""
    if df is None or df.empty:
        return 0, 0, 0
    total_users = df['user_pseudo_id'].nunique()
    total_sessions = df['session_id'].nunique()
    total_page_views = df[df['event_name'] == 'page_view'].shape[0]
    return total_users, total_sessions, total_page_views

# FUNCTION 4: Diagnose Traffic Drop
def diagnose_traffic_drop(conn, problem_start_date, problem_end_date, compare_start_date, compare_end_date):
    """Diagnoses traffic drop by comparing metrics between two periods."""

    # STEP 4.1: Fetch data for problem period
    df_problem = fetch_events_in_range(conn, problem_start_date, problem_end_date)
    # STEP 4.2: Fetch data for comparison period
    df_compare = fetch_events_in_range(conn, compare_start_date, compare_end_date)

    if df_problem is None or df_compare is None:
        print("Could not fetch data for both periods. Cannot diagnose traffic drop.")
        return

    # STEP 4.3: Calculate core metrics for both periods
    users_problem, sessions_problem, pageviews_problem = calculate_core_metrics(df_problem)
    users_compare, sessions_compare, pageviews_compare = calculate_core_metrics(df_compare)

    print(f"\n--- Traffic Comparison: {problem_start_date} to {problem_end_date} (Problem Period) vs. {compare_start_date} to {compare_end_date} (Comparison Period) ---")
    print(f"Metric              | Problem Period | Compare Period | Change (%)")
    print("-" * 60)

    # Calculate and print percentage changes
    def print_change(metric_name, val_problem, val_compare):
        change_percent = ((val_problem - val_compare) / val_compare) * 100 if val_compare > 0 else (100 if val_problem > 0 else 0)
        print(f"{metric_name:<19} | {val_problem:<14} | {val_compare:<14} | {change_percent: .2f}%")

    print_change("Total Users", users_problem, users_compare)
    print_change("Total Sessions", sessions_problem, sessions_compare)
    print_change("Total Page Views", pageviews_problem, pageviews_compare)

    # STEP 4.4: Analyze page level drop-off in problem period
    print("\n--- Top Pages by Page View Drop in Problem Period ---")
    # Get page views for each page in problem period
    page_views_problem_df = df_problem[df_problem['event_name'] == 'page_view'].groupby('page_location').size().reset_index(name='page_views_problem')
    # Get page views for each page in comparison period
    page_views_compare_df = df_compare[df_compare['event_name'] == 'page_view'].groupby('page_location').size().reset_index(name='page_views_compare')

    # Merge and calculate change
    page_comparison = pd.merge(page_views_problem_df, page_views_compare_df, on='page_location', how='outer').fillna(0)
    page_comparison['page_view_change_percent'] = ((page_comparison['page_views_problem'] - page_comparison['page_views_compare']) / page_comparison['page_views_compare']) * 100
    page_comparison = page_comparison.fillna(0) # Fill NaN from division by zero for pages only in one period

    # Filter for pages with significant traffic in comparison period and show largest drops
    MIN_VIEWS_THRESHOLD = 10 # Only consider pages with at least 10 views in comparison period
    significant_drops = page_comparison[page_comparison['page_views_compare'] >= MIN_VIEWS_THRESHOLD].sort_values('page_view_change_percent').head(10)
    if not significant_drops.empty:
        print(significant_drops[['page_location', 'page_views_problem', 'page_views_compare', 'page_view_change_percent']])
    else:
        print("No significant page view drops found with current threshold.")

    # STEP 4.5: Analyze traffic source/medium drop-off in problem period
    print("\n--- Top Traffic Source/Medium by Session Drop in Problem Period ---")
    # Get sessions per source/medium in problem period
    source_sessions_problem_df = df_problem.groupby(['traffic_source', 'traffic_medium'])['session_id'].nunique().reset_index(name='sessions_problem')
    # Get sessions per source/medium in comparison period
    source_sessions_compare_df = df_compare.groupby(['traffic_source', 'traffic_medium'])['session_id'].nunique().reset_index(name='sessions_compare')

    # Merge and calculate change
    source_comparison = pd.merge(source_sessions_problem_df, source_sessions_compare_df, on=['traffic_source', 'traffic_medium'], how='outer').fillna(0)
    source_comparison['session_change_percent'] = ((source_comparison['sessions_problem'] - source_comparison['sessions_compare']) / source_comparison['sessions_compare']) * 100
    source_comparison = source_comparison.fillna(0) # Fill NaN from division by zero

    # Filter for sources with significant traffic in comparison period and show largest drops
    MIN_SESSIONS_THRESHOLD_SOURCE = 5 # Only consider sources with at least 5 sessions in comparison period
    significant_source_drops = source_comparison[source_comparison['sessions_compare'] >= MIN_SESSIONS_THRESHOLD_SOURCE].sort_values('session_change_percent').head(10)
    if not significant_source_drops.empty:
        print(significant_source_drops[['traffic_source', 'traffic_medium', 'sessions_problem', 'sessions_compare', 'session_change_percent']])
    else:
        print("No significant traffic source/medium drops found with current threshold.")


# MAIN EXECUTION 1: This block runs when the script starts
if __name__ == "__main__":
    # Define your problem and comparison periods
    # Adjust these dates based on when you observed a traffic drop
    PROBLEM_START = datetime(2023, 10, 1).date()
    PROBLEM_END = datetime(2023, 10, 7).date()
    COMPARE_START = datetime(2023, 9, 24).date()
    COMPARE_END = datetime(2023, 9, 30).date()

    # MAIN EXECUTION 2: Connect to the database
    conn = connect_to_db()
    if conn:
        # MAIN EXECUTION 3: Diagnose traffic drop
        diagnose_traffic_drop(conn, PROBLEM_START, PROBLEM_END, COMPARE_START, COMPARE_END)
        
        # MAIN EXECUTION 4: Close the database connection
        conn.close()
        print("Database connection closed.")
    else:
        print("Could not establish database connection. Exiting.")

Important Notes on This Code

This script connects to your SQL Server database to pull event data within specific date ranges You need to adjust the problem start problem end compare start and compare end dates in the main block to match the periods you want to analyze The code calculates percentage changes in users sessions and page views between the two periods It then identifies pages and traffic sources with the most significant drops Thresholds for minimum page views or sessions ensure that only meaningful drops are highlighted Remember to enter your actual SQL Server connection details in the DB CONFIG section This includes your server name database name username and password

Understanding Your Python Traffic Drop Diagnosis Script

This Python script helps you investigate and understand why your website traffic might be low or why users are disengaging. It compares different time periods and pinpoints areas of concern. Let us break down each part of the code.

1 Setting Up Your Tools and Connections

At the very top of the script you see some import lines. These bring in the necessary tools for our work. The first is import pandas as pd which brings in Pandas used to work with data in a table-like format and perform calculations. The next is import pyodbc which lets Python talk to your SQL Server database. Then from datetime import datetime timedelta which is used for working with dates and calculating date ranges. Next you see DB_CONFIG which holds all the details for connecting to your SQL Server. You need to update this with your actual server name, database name, username, and password.

2 Connecting to Your Database

The connect_to_db function is responsible for making the connection to your database. It tries to open a link to your SQL Server using the details from DB_CONFIG. It builds a connection string which helps pyodbc find and log in to your database. It then attempts the connection and prints messages to let you know if it was successful or if there was an error.

3 Fetching Event Data within a Date Range

The fetch_events_in_range function is crucial for getting data for specific time periods you want to compare. It fetches event data from your SQL Server database only for the dates you specify. It converts Python dates into the Unix microseconds format that matches your event_timestamp column in the database. It builds a SQL query with where clauses that filter events by event_timestamp within your chosen start and end dates. Then it uses pd.read_sql to pull this filtered data into a Pandas DataFrame. It prints how many records were fetched for the given date range.

4 Calculate Core Metrics for a DataFrame

This is a helper function to quickly get the main numbers for any given set of event data. The calculate_core_metrics function counts the total unique users, total unique sessions, and total page views within a given DataFrame of events. It uses Pandas nunique() to count unique IDs for users and sessions. It filters for page_view events to count total page views and returns the three calculated numbers.

5 Diagnose Traffic Drop

This is the main analysis function. It orchestrates the fetching and comparison of data to pinpoint traffic issues. It compares key metrics between a problem period where you suspect a drop occurred and a comparison period a normal period before the drop. It also identifies specific pages and traffic sources that saw the biggest declines.
Step 4.1 fetches data for the problem period.
Step 4.2 fetches data for the comparison period.
Step 4.3 calculates core metrics for both periods and prints a clear table showing these numbers and the percentage change.
Step 4.4 analyzes page level drop off in the problem period by calculating page views for each page in both periods, merging this data and calculating percentage change, highlighting the top 10 pages with the biggest drops.
Step 4.5 analyzes traffic source and medium drop off by calculating percentage change in sessions for each source or medium and highlights the top 10 with biggest declines. It prints detailed tables showing the overall traffic comparison and the top pages and traffic sources with significant drops.

6 Running the Script

The main block runs when you start the Python file. It connects to the database, fetches event data, diagnoses traffic drop with your chosen dates, and finally closes the database connection. This is a good practice to free up resources.

Overall Value of This Script

This Python script is an essential tool for any website owner or analyst. It provides a structured way to investigate traffic declines and user drop offs. By pinpointing the exact pages or traffic sources affected, you can quickly take action to fix problems and improve your website performance. This showcases your ability to perform diagnostic data analysis which is a vital skill in web analytics and problem solving.

Next Steps

Once you run this Python script you will see detailed reports on any traffic drops or user disengagement. This means you have successfully diagnosed potential issues with your website. You are now ready to move into the Business Insights phase. The next exciting phase will be to analyze the user journey and funnel drop off points. This will help you understand the paths users take through your website and where they might be abandoning key processes. For now make sure you save this Python script in your E drive SankalanAnalytics backend folder and name it something like diagnose_traffic_drop.py.