5.3 How to Analyze Time Spent and Scroll Depth to Improve Website Content Engagement

Welcome to Phase 5 3 Measuring Content Engagement

You have successfully analyzed user journeys and compared new versus returning user behavior. That gave you valuable insights into conversion paths and audience segments. Now it is time to dig deeper into how users interact with your content on a page by page basis. In this phase we will analyze time spent on pages and scroll depth. Understanding these metrics is like knowing if your readers are truly engaged with your articles or just skimming. It helps you assess content quality and optimize for better user experience.
This step is crucial for evaluating content effectiveness and improving page design.

Why Analyze Time Spent and Scroll Depth

These metrics provide direct insights into how engaging your content is. Here is why they are so valuable.

Assess Content Quality High time spent and deep scrolls often mean users find your content valuable and relevant. Low numbers might signal a need for improvement.

Optimize Page Layout If users are not scrolling far down a page important information might be missed. You can then adjust your layout to highlight key sections.

Improve User Experience Pages that are hard to read or navigate might lead to short time spent. Identifying these helps you make improvements.

Inform Content Strategy What topics keep users engaged? What format works best? This analysis helps you create more effective content in the future.

Identify Problematic Pages Combine with bounce rate to find pages where users leave quickly without engaging.

Analyzing time spent and scroll depth helps you create a more compelling and sticky website experience.

Dashboard showing time spent and scroll depth metrics for website pages

Key Metrics for Content Engagement Analysis

We will compare these core metrics for both new and returning user segments.

Average Time on Page The average amount of time users spend viewing a specific page

Average Scroll Depth The average percentage of a page that users scroll down

Pages with Low Time Spent Pages where users spend very little time

Pages with Low Scroll Depth Pages where users do not scroll far down

Python Code to Analyze Time Spent and Scroll Depth

We will write a Python script to fetch your event data from SQL Server. It will then calculate time spent on page and scroll depth for each page.

Practical Python Code Example
Here is a basic example of the Python code you will write. This code will connect to your SQL Server database. It will fetch event data. It will then compute time spent and scroll depth metrics.


# SCRIPT SETUP 1: Import necessary libraries
import pandas as pd
import pyodbc
import json
from datetime import datetime, timedelta

# SCRIPT SETUP 2: Database connection details for SQL Server
# Replace with your actual server name database name and credentials
DB_CONFIG = {
    'driver': '{ODBC Driver 17 for SQL Server}',
    'server': 'YOUR_SQL_SERVER_NAME',  # e.g., 'localhost' or 'SERVER_NAME\SQLEXPRESS'
    'database': 'YOUR_DATABASE_NAME',
    'uid': 'YOUR_USERNAME',
    'pwd': 'YOUR_PASSWORD'
}

# FUNCTION 1: Connect to the database
def connect_to_db():
    """Establishes a connection to the SQL Server database."""
    conn_str = (
        f"DRIVER={DB_CONFIG['driver']};"
        f"SERVER={DB_CONFIG['server']};"
        f"DATABASE={DB_CONFIG['database']};"
        f"UID={DB_CONFIG['uid']};"
        f"PWD={DB_CONFIG['pwd']};"
    )
    try:
        conn = pyodbc.connect(conn_str)
        print("Successfully connected to SQL Server.")
        return conn
    except pyodbc.Error as ex:
        sqlstate = ex.args[0]
        print(f"Database connection error: {sqlstate}")
        print(ex)
        return None

# FUNCTION 2: Fetch event data
def fetch_event_data(conn):
    """Fetches necessary event data from the SQL Server events table."""
    query = """
    SELECT
        user_pseudo_id,
        session_id,
        event_timestamp,
        event_name,
        page_location,
        event_params_json
    FROM
        events
    WHERE
        page_location IS NOT NULL AND page_location != '';
    """
    try:
        df = pd.read_sql(query, conn)
        print(f"Fetched {len(df)} event records for time spent and scroll depth analysis.")
        return df
    except Exception as e:
        print(f"Error fetching event data: {e}")
        return None

# Helper function to parse event_params_json for a specific key
def get_event_param_value(json_str, key_name):
    """Extracts the value for a specific key from event_params_json."""
    if pd.isna(json_str) or not json_str:
        return None
    try:
        params = json.loads(json_str)
        for param in params:
            if param.get('key') == key_name:
                # Prioritize 'int_value', then 'string_value', etc.
                if 'int_value' in param.get('value', {}):
                    return param['value']['int_value']
                elif 'string_value' in param.get('value', {}):
                    return param['value']['string_value']
                elif 'double_value' in param.get('value', {}):
                    return param['value']['double_value']
        return None
    except json.JSONDecodeError:
        return None

# FUNCTION 3: Analyze Time Spent and Scroll Depth
def analyze_content_engagement(df_events):
    """Calculates average time spent on page and average scroll depth per page."""
    if df_events is None or df_events.empty:
        print("No event data for content engagement analysis.")
        return None

    # Ensure page_location is string and fill NaNs
    df_events['page_location'] = df_events['page_location'].fillna('').astype(str)
    df_events['event_datetime'] = pd.to_datetime(df_events['event_timestamp'], unit='us')

    # STEP 3.1: Calculate Average Time on Page
    # Group by session and page to find min/max timestamp within each page view sequence
    page_sessions = df_events[df_events['event_name'] == 'page_view'].copy()
    page_sessions = page_sessions.sort_values('event_datetime')

    # Calculate time spent on each page within a session
    # This is an approximation: time between first and last event on a page within a session
    # A more precise method would involve next_page_path or session_end event.
    time_on_page_agg = page_sessions.groupby(['session_id', 'page_location'])['event_datetime'].agg(
        first_event_time='min',
        last_event_time='max'
    ).reset_index()
    time_on_page_agg['time_spent_seconds'] = (time_on_page_agg['last_event_time'] - time_on_page_agg['first_event_time']).dt.total_seconds()
    
    # Calculate average time spent per page
    avg_time_on_page = time_on_page_agg.groupby('page_location')['time_spent_seconds'].mean().reset_index(name='avg_time_on_page_seconds')
    avg_time_on_page['avg_time_on_page_minutes'] = (avg_time_on_page['avg_time_on_page_seconds'] / 60).round(2)
    
    # STEP 3.2: Calculate Average Scroll Depth
    # Filter for 'scroll' events
    df_scroll_events = df_events[df_events['event_name'] == 'scroll'].copy()
    
    # Extract 'percent_scrolled' from event_params_json
    df_scroll_events['percent_scrolled'] = df_scroll_events['event_params_json'].apply(
        lambda x: get_event_param_value(x, 'percent_scrolled')
    )
    
    # Convert to numeric, coerce errors to NaN, then fill NaN with 0 or a reasonable default
    df_scroll_events['percent_scrolled'] = pd.to_numeric(df_scroll_events['percent_scrolled'], errors='coerce').fillna(0)

    # Calculate average scroll depth per page
    avg_scroll_depth = df_scroll_events.groupby('page_location')['percent_scrolled'].mean().reset_index(name='avg_scroll_depth_percent')
    avg_scroll_depth['avg_scroll_depth_percent'] = avg_scroll_depth['avg_scroll_depth_percent'].round(2)

    # STEP 3.3: Merge all engagement metrics
    engagement_metrics_df = pd.merge(avg_time_on_page, avg_scroll_depth, on='page_location', how='outer').fillna(0)

    print("\n--- Top 10 Pages by Average Time on Page ---")
    print(engagement_metrics_df.sort_values('avg_time_on_page_minutes', ascending=False).head(10))

    print("\n--- Top 10 Pages by Average Scroll Depth ---")
    print(engagement_metrics_df.sort_values('avg_scroll_depth_percent', ascending=False).head(10))

    print("\n--- Pages with Low Engagement (Example: < 15 seconds time on page) ---")
    low_engagement_time = engagement_metrics_df[engagement_metrics_df['avg_time_on_page_seconds'] < 15]
    # Filter for pages with at least some traffic to avoid noise
    # (assuming page_views from previous analysis would be merged or recalculated here)
    # For this example, we'll just show them directly
    print(low_engagement_time.sort_values('avg_time_on_page_seconds').head(10))

    return engagement_metrics_df

# MAIN EXECUTION 1: This block runs when the script starts
if __name__ == "__main__":
    # MAIN EXECUTION 2: Connect to the database
    conn = connect_to_db()
    if conn:
        # MAIN EXECUTION 3: Fetch event data
        events_df = fetch_event_data(conn)
        
        # MAIN EXECUTION 4: Analyze content engagement
        if events_df is not None:
            engagement_metrics = analyze_content_engagement(events_df)
            
            if engagement_metrics is not None:
                print("\n--- Content Engagement Analysis Completed ---")
                # You can now use 'engagement_metrics' DataFrame for plotting or further analysis
                # For example, to save to a new CSV:
                # engagement_metrics.to_csv('E:/SankalanAnalytics/data/processed/content_engagement_metrics.csv', index=False)
        
        # MAIN EXECUTION 5: Close the database connection
        conn.close()
        print("Database connection closed.")
    else:
        print("Could not establish database connection. Exiting.")

Important Notes on This Code

This script connects to your SQL Server database to pull event data including 'event_params_json'. It then uses Pandas to calculate average time spent on page and average scroll depth for each page. The time spent calculation is an estimate based on the duration between the first and last event on a specific page within a session. For single event sessions this duration will be zero.

The scroll depth analysis relies on the 'percent_scrolled' parameter being present in the 'event_params_json' for 'scroll' events. If your GA4 setup does not capture this parameter or uses a different name you will need to adjust the 'get_event_param_value' function accordingly. Remember to fill in your actual SQL Server connection details in the DB CONFIG section. This includes your server name database name username and password.

Understanding Your Python Time Spent and Scroll Depth Analysis Script

This Python script helps you understand how deeply users engage with the content on your website's pages. It pulls data from your SQL Server database and calculates key metrics like time spent on each page and scroll depth. Let’s break down each part of the code.

1. Setting Up Your Tools and Connections

At the top of the script, you’ll see import lines bringing in the necessary tools for this work. First is import pandas as pd which lets us work with data tables and perform calculations. Next is import pyodbc to connect Python with your SQL Server database. Then import json helps parse JSON strings found in your event parameters. Lastly, from datetime import datetime, timedelta supports working with dates and calculating time differences. You will also find DB_CONFIG, where you enter your actual SQL Server name, database, username, and password for the connection.

2. Connecting to Your Database

The connect_to_db function handles the database connection. It builds a connection string from the DB_CONFIG settings and tries to open a connection. The function will print messages letting you know if it connected successfully or if there was an error.

3. Fetching Event Data

The fetch_event_data function fetches raw event data from your database needed for the analysis. It runs a SQL query that selects user IDs, session IDs, event timestamps, event names, page locations, and JSON event parameters. Rows with missing or empty page locations are excluded. The results are loaded into a Pandas DataFrame, which becomes the foundation for content engagement analysis. The function reports the number of records fetched and handles errors if they occur.

4. Helper Function `get_event_param_value`

This helper function extracts specific values from the JSON string in the event_params_json column. Given a JSON string and a key name, it parses the JSON and searches for the key's value, supporting integers, strings, and doubles. It also includes error handling for invalid JSON or missing keys.

5. Analyze Time Spent and Scroll Depth `analyze_content_engagement`

This core function calculates average time spent and scroll depth metrics per page.

Here’s how it works:
Initial Data Preparation: Ensures the page_location column is a string and converts event_timestamp into datetime objects.
Step 3.1 Calculate Average Time on Page: Filters for page view events, then for each session and page finds the earliest and latest event timestamps. The difference estimates time spent on that page. It averages these times across all visits to each page.
Step 3.2 Calculate Average Scroll Depth: Filters for scroll events and uses get_event_param_value to extract the percent_scrolled value from JSON parameters. It calculates the average scroll percentage per page.
Step 3.3 Merge Engagement Metrics: Combines average time on page and scroll depth into a single table for easy comparison.

The function prints the top 10 pages by average time on page, the top 10 by scroll depth, and highlights pages with low engagement based on low time spent.

6. Running the Script The Main Block

When you run the Python file, this block connects to your database, fetches event data, analyzes content engagement, prints the results, and then closes the database connection to free up resources.

Overall Value of This Script

This Python script is invaluable for understanding how users truly interact with your website’s content. By analyzing time spent and scroll depth, you can identify which pages or articles are engaging visitors most, and which ones might need improvement. This insight helps you refine your content strategy, optimize page layouts, and deliver a better user experience. It showcases your ability to perform deep content performance analysis, a vital skill in web analytics and optimization.

Next Steps

After running this Python script, you will have clear insights into user engagement on a page-by-page basis. This means you’ve successfully analyzed content engagement. You are now ready to move into the next phase: writing SQL queries to answer specific business questions. This will help you directly extract actionable insights from your structured database.