Type Spent on Website| Website Analytics

5.3 How to Analyze Time Spent and Scroll Depth to Improve Website Content Engagement

Welcome to Phase 5 3 Measuring Content Engagement

You have successfully analyzed user journeys and compared new versus returning user behavior. That gave you valuable insights into conversion paths and audience segments. Now it is time to dig deeper into how users interact with your content on a page by page basis. In this phase we will analyze time spent on pages and scroll depth. Understanding these metrics is like knowing if your readers are truly engaged with your articles or just skimming. It helps you assess content quality and optimize for better user experience.
This step is crucial for evaluating content effectiveness and improving page design.


Why Analyze Time Spent and Scroll Depth

These metrics provide direct insights into how engaging your content is. Here is why they are so valuable.

Assess Content Quality High time spent and deep scrolls often mean users find your content valuable and relevant. Low numbers might signal a need for improvement.

Optimize Page Layout If users are not scrolling far down a page important information might be missed. You can then adjust your layout to highlight key sections.

Improve User Experience Pages that are hard to read or navigate might lead to short time spent. Identifying these helps you make improvements.

Inform Content Strategy What topics keep users engaged? What format works best? This analysis helps you create more effective content in the future.

Identify Problematic Pages Combine with bounce rate to find pages where users leave quickly without engaging.

Analyzing time spent and scroll depth helps you create a more compelling and sticky website experience.


Dashboard showing time spent and scroll depth metrics for website pages

Key Metrics for Content Engagement Analysis

We will compare these core metrics for both new and returning user segments.

Average Time on Page The average amount of time users spend viewing a specific page

Average Scroll Depth The average percentage of a page that users scroll down

Pages with Low Time Spent Pages where users spend very little time

Pages with Low Scroll Depth Pages where users do not scroll far down


Python Code to Analyze Time Spent and Scroll Depth

We will write a Python script to fetch your event data from SQL Server. It will then calculate time spent on page and scroll depth for each page.

Practical Python Code Example
Here is a basic example of the Python code you will write. This code will connect to your SQL Server database. It will fetch event data. It will then compute time spent and scroll depth metrics.

# SCRIPT SETUP 1: Import necessary libraries import pandas as pd import pyodbc import json from datetime import datetime, timedelta # SCRIPT SETUP 2: Database connection details for SQL Server # Replace with your actual server name database name and credentials DB_CONFIG = { 'driver': '{ODBC Driver 17 for SQL Server}', 'server': 'YOUR_SQL_SERVER_NAME', # e.g., 'localhost' or 'SERVER_NAME\SQLEXPRESS' 'database': 'YOUR_DATABASE_NAME', 'uid': 'YOUR_USERNAME', 'pwd': 'YOUR_PASSWORD' } # FUNCTION 1: Connect to the database def connect_to_db(): """Establishes a connection to the SQL Server database.""" conn_str = ( f"DRIVER={DB_CONFIG['driver']};" f"SERVER={DB_CONFIG['server']};" f"DATABASE={DB_CONFIG['database']};" f"UID={DB_CONFIG['uid']};" f"PWD={DB_CONFIG['pwd']};" ) try: conn = pyodbc.connect(conn_str) print("Successfully connected to SQL Server.") return conn except pyodbc.Error as ex: sqlstate = ex.args[0] print(f"Database connection error: {sqlstate}") print(ex) return None # FUNCTION 2: Fetch event data def fetch_event_data(conn): """Fetches necessary event data from the SQL Server events table.""" query = """ SELECT user_pseudo_id, session_id, event_timestamp, event_name, page_location, event_params_json FROM events WHERE page_location IS NOT NULL AND page_location != ''; """ try: df = pd.read_sql(query, conn) print(f"Fetched {len(df)} event records for time spent and scroll depth analysis.") return df except Exception as e: print(f"Error fetching event data: {e}") return None # Helper function to parse event_params_json for a specific key def get_event_param_value(json_str, key_name): """Extracts the value for a specific key from event_params_json.""" if pd.isna(json_str) or not json_str: return None try: params = json.loads(json_str) for param in params: if param.get('key') == key_name: # Prioritize 'int_value', then 'string_value', etc. if 'int_value' in param.get('value', {}): return param['value']['int_value'] elif 'string_value' in param.get('value', {}): return param['value']['string_value'] elif 'double_value' in param.get('value', {}): return param['value']['double_value'] return None except json.JSONDecodeError: return None # FUNCTION 3: Analyze Time Spent and Scroll Depth def analyze_content_engagement(df_events): """Calculates average time spent on page and average scroll depth per page.""" if df_events is None or df_events.empty: print("No event data for content engagement analysis.") return None # Ensure page_location is string and fill NaNs df_events['page_location'] = df_events['page_location'].fillna('').astype(str) df_events['event_datetime'] = pd.to_datetime(df_events['event_timestamp'], unit='us') # STEP 3.1: Calculate Average Time on Page # Group by session and page to find min/max timestamp within each page view sequence page_sessions = df_events[df_events['event_name'] == 'page_view'].copy() page_sessions = page_sessions.sort_values('event_datetime') # Calculate time spent on each page within a session # This is an approximation: time between first and last event on a page within a session # A more precise method would involve next_page_path or session_end event. time_on_page_agg = page_sessions.groupby(['session_id', 'page_location'])['event_datetime'].agg( first_event_time='min', last_event_time='max' ).reset_index() time_on_page_agg['time_spent_seconds'] = (time_on_page_agg['last_event_time'] - time_on_page_agg['first_event_time']).dt.total_seconds() # Calculate average time spent per page avg_time_on_page = time_on_page_agg.groupby('page_location')['time_spent_seconds'].mean().reset_index(name='avg_time_on_page_seconds') avg_time_on_page['avg_time_on_page_minutes'] = (avg_time_on_page['avg_time_on_page_seconds'] / 60).round(2) # STEP 3.2: Calculate Average Scroll Depth # Filter for 'scroll' events df_scroll_events = df_events[df_events['event_name'] == 'scroll'].copy() # Extract 'percent_scrolled' from event_params_json df_scroll_events['percent_scrolled'] = df_scroll_events['event_params_json'].apply( lambda x: get_event_param_value(x, 'percent_scrolled') ) # Convert to numeric, coerce errors to NaN, then fill NaN with 0 or a reasonable default df_scroll_events['percent_scrolled'] = pd.to_numeric(df_scroll_events['percent_scrolled'], errors='coerce').fillna(0) # Calculate average scroll depth per page avg_scroll_depth = df_scroll_events.groupby('page_location')['percent_scrolled'].mean().reset_index(name='avg_scroll_depth_percent') avg_scroll_depth['avg_scroll_depth_percent'] = avg_scroll_depth['avg_scroll_depth_percent'].round(2) # STEP 3.3: Merge all engagement metrics engagement_metrics_df = pd.merge(avg_time_on_page, avg_scroll_depth, on='page_location', how='outer').fillna(0) print("\n--- Top 10 Pages by Average Time on Page ---") print(engagement_metrics_df.sort_values('avg_time_on_page_minutes', ascending=False).head(10)) print("\n--- Top 10 Pages by Average Scroll Depth ---") print(engagement_metrics_df.sort_values('avg_scroll_depth_percent', ascending=False).head(10)) print("\n--- Pages with Low Engagement (Example: < 15 seconds time on page) ---") low_engagement_time = engagement_metrics_df[engagement_metrics_df['avg_time_on_page_seconds'] < 15] # Filter for pages with at least some traffic to avoid noise # (assuming page_views from previous analysis would be merged or recalculated here) # For this example, we'll just show them directly print(low_engagement_time.sort_values('avg_time_on_page_seconds').head(10)) return engagement_metrics_df # MAIN EXECUTION 1: This block runs when the script starts if __name__ == "__main__": # MAIN EXECUTION 2: Connect to the database conn = connect_to_db() if conn: # MAIN EXECUTION 3: Fetch event data events_df = fetch_event_data(conn) # MAIN EXECUTION 4: Analyze content engagement if events_df is not None: engagement_metrics = analyze_content_engagement(events_df) if engagement_metrics is not None: print("\n--- Content Engagement Analysis Completed ---") # You can now use 'engagement_metrics' DataFrame for plotting or further analysis # For example, to save to a new CSV: # engagement_metrics.to_csv('E:/SankalanAnalytics/data/processed/content_engagement_metrics.csv', index=False) # MAIN EXECUTION 5: Close the database connection conn.close() print("Database connection closed.") else: print("Could not establish database connection. Exiting.")

Important Notes on This Code

This script connects to your SQL Server database to pull event data including 'event_params_json'. It then uses Pandas to calculate average time spent on page and average scroll depth for each page. The time spent calculation is an estimate based on the duration between the first and last event on a specific page within a session. For single event sessions this duration will be zero.

The scroll depth analysis relies on the 'percent_scrolled' parameter being present in the 'event_params_json' for 'scroll' events. If your GA4 setup does not capture this parameter or uses a different name you will need to adjust the 'get_event_param_value' function accordingly. Remember to fill in your actual SQL Server connection details in the DB CONFIG section. This includes your server name database name username and password.


Understanding Your Python Time Spent and Scroll Depth Analysis Script

This Python script helps you understand how deeply users engage with the content on your website's pages. It pulls data from your SQL Server database and calculates key metrics like time spent on each page and scroll depth. Let’s break down each part of the code.

1. Setting Up Your Tools and Connections

At the top of the script, you’ll see import lines bringing in the necessary tools for this work. First is import pandas as pd which lets us work with data tables and perform calculations. Next is import pyodbc to connect Python with your SQL Server database. Then import json helps parse JSON strings found in your event parameters. Lastly, from datetime import datetime, timedelta supports working with dates and calculating time differences. You will also find DB_CONFIG, where you enter your actual SQL Server name, database, username, and password for the connection.

2. Connecting to Your Database

The connect_to_db function handles the database connection. It builds a connection string from the DB_CONFIG settings and tries to open a connection. The function will print messages letting you know if it connected successfully or if there was an error.

3. Fetching Event Data

The fetch_event_data function fetches raw event data from your database needed for the analysis. It runs a SQL query that selects user IDs, session IDs, event timestamps, event names, page locations, and JSON event parameters. Rows with missing or empty page locations are excluded. The results are loaded into a Pandas DataFrame, which becomes the foundation for content engagement analysis. The function reports the number of records fetched and handles errors if they occur.

4. Helper Function get_event_param_value

This helper function extracts specific values from the JSON string in the event_params_json column. Given a JSON string and a key name, it parses the JSON and searches for the key's value, supporting integers, strings, and doubles. It also includes error handling for invalid JSON or missing keys.

5. Analyze Time Spent and Scroll Depth analyze_content_engagement

This core function calculates average time spent and scroll depth metrics per page.

Here’s how it works:
Initial Data Preparation: Ensures the page_location column is a string and converts event_timestamp into datetime objects.
Step 3.1 Calculate Average Time on Page: Filters for page view events, then for each session and page finds the earliest and latest event timestamps. The difference estimates time spent on that page. It averages these times across all visits to each page.
Step 3.2 Calculate Average Scroll Depth: Filters for scroll events and uses get_event_param_value to extract the percent_scrolled value from JSON parameters. It calculates the average scroll percentage per page.
Step 3.3 Merge Engagement Metrics: Combines average time on page and scroll depth into a single table for easy comparison.

The function prints the top 10 pages by average time on page, the top 10 by scroll depth, and highlights pages with low engagement based on low time spent.

6. Running the Script The Main Block

When you run the Python file, this block connects to your database, fetches event data, analyzes content engagement, prints the results, and then closes the database connection to free up resources.

Overall Value of This Script

This Python script is invaluable for understanding how users truly interact with your website’s content. By analyzing time spent and scroll depth, you can identify which pages or articles are engaging visitors most, and which ones might need improvement. This insight helps you refine your content strategy, optimize page layouts, and deliver a better user experience. It showcases your ability to perform deep content performance analysis, a vital skill in web analytics and optimization.


Next Steps

After running this Python script, you will have clear insights into user engagement on a page-by-page basis. This means you’ve successfully analyzed content engagement. You are now ready to move into the next phase: writing SQL queries to answer specific business questions. This will help you directly extract actionable insights from your structured database.


Previous Topic==> New Vs Returning User  Next Topics==> SQL For Bussines Insights


SQL All topics Wise Interview Questions    Employee Salary Management SQL FAQ!.   C FAQ   Top 25 PL/SQL Interview Questions

Joins With Group by Having  Equi Join  Joins with Subqueries  Self Join  Outer Join 


Website Analytics Project: Phases and Action Steps

  • Home
  • 🟢 Live App: Web Analytics Simulator
  • Phase 0: Project Setup & Problem Definition
  • 0.1 Define Project Goals & Challenges
  • 0.2 Select Tools, Tech Stack & Data Sources
  • 0.3 Software Requirements & Installation
  • 0.4 Folder Structure & GitHub Repo
  • 0.5 Testing Project Locally
  • Phase 1: Planning for Analytics
  • 1.1 Website Analytics Project Overview
  • 1.2 Define KPIs, Bounce Rate, Engagement
  • 1.3 Identify Target Users & Pain Points
  • Phase 2: Data Collection
  • 2.1 Setup Google Analytics 4 (GA4)
  • 2.2 Export GA4 Data to BigQuery/CSV
  • 2.3 Design SQL Schema for Web Analytics
  • Phase 3: Data Cleaning & Feature Engineering
  • 3.1 Clean Website Data with Python & Pandas
  • 3.2 Create Custom Metrics (Session, Bounce, etc.)
  • Phase 4: Exploratory Data Analysis (EDA)
  • 4.1 Analyze Website Traffic Trends
  • 4.2 Behavior by Device, Source, Location
  • 4.3 Top Pages & High Bounce Pages
  • 4.4 Diagnose Low Traffic & User Drop
  • Phase 5: Business Insights
  • 5.1 Funnel Analysis & Drop-Off Points
  • 5.2 New vs Returning Users
  • 5.3 Time Spent & Scroll Depth
  • Phase 6: SQL for Business
  • 6.1 SQL for Business Insights
  • 6.2 Combine Web Data Using SQL
  • 6.3 Find Problematic Pages Using SQL
  • Phase 7: Machine Learning
  • 7.1 Segment Users with Clustering
  • 7.2 Predict Bounce Rate with ML
  • 7.3 Recommend Pages or Content
  • Phase 8: Dashboards & Visualization
  • 8.1 Dashboard with Streamlit
  • 8.2 Visualize KPIs with Python
  • 8.3 Page-Level Metrics & Drop Heatmaps
  • Phase 9: Final Analytics Story
  • 9.1 Summary Report & Findings
  • Phase 10: Hosting & Portfolio Presentation
  • 10.1 Host Website Project Online
  • 10.2 Add to GitHub with ReadMe
  • 10.3 Pitch Project in Interview
  • Other Topics
  • SQL Interview Questions
  • SQL Case Study: Account Management
  • Python Interview Questions
  • Why C Language

Get in touch

  • tech2dsm@gmail.com

© Sankalan Data Tech. All rights reserved.