Visualize Page Level Metrics and Drop-Offs with Python

Welcome to Phase 8.3 Deep Diving into Page Performance

You have successfully created an interactive Streamlit dashboard and generated professional static KPI visualizations. That is excellent work in presenting your overall website performance. Now it is time to zoom in even further. In this phase we will learn how to visualize detailed page level metrics and conceptual drop heatmaps using Python with Matplotlib and Seaborn. Think of this as getting a microscope for your website. It allows you to see exactly which pages are performing well. It also shows where users are encountering issues or leaving your site. This is crucial for pinpointing specific areas for content and user experience optimization.
This step is crucial for micro level optimization and improving specific user journeys.

Why Visualize Page-Level Metrics & Drop Heatmaps are Essential

While overall KPIs are helpful, page-level insights give you the clarity needed to make focused improvements. Here’s why these visualizations are so useful.

• Pinpoint Underperforming Content Identify pages like articles or product listings that have high bounce rates or low user engagement.
• Optimize User Flow Visualize how users move through your site and where they tend to drop off. This helps improve navigation and flow.
• Improve Page Design See if users are missing key information due to low scroll depth or if calls to action are being ignored.
• A/B Testing Insights Use visual comparisons to understand how changes on individual pages impact user behavior.
• Content Strategy Refinement Discover what types of content your audience connects with the most and adjust your approach accordingly.
• Conversion Funnel Optimization Find the exact steps in your funnel where users are exiting, and work on improving those points.

These types of insights help you fine-tune specific areas of your site, leading to better user experience and improved results.

Python chart showing page-level website performance using Matplotlib

Key Concepts for Page-Level Visualizations

To create effective page-level and drop-off visualizations with Python, we need to understand a few core concepts.

Page View Metrics
Page Views: Total number of times a page was viewed.
Unique Page Views: Number of sessions during which the specified page was viewed at least once.
Average Time on Page: The average amount of time users spent on a specific page.
Bounce Rate: Percentage of single-page sessions for a specific page.
Exit Rate: Percentage of page views that were the last in a session for a specific page.

Funnel Drop-off Visualization: While a true "heatmap" of user scroll behavior within a single page requires specialized tools (like Hotjar or Crazy Egg), you can visualize drop-offs between pages in a funnel using standard charting libraries. This involves showing the number of users or sessions at each step and the percentage that drop off.

Data Aggregation: You will need to aggregate your raw event data to calculate these page-specific and funnel-specific metrics.

Sorting and Filtering: Often, you will want to visualize the top N or bottom N pages based on a specific metric.

Dual-Axis Charts: Sometimes it’s useful to show two related metrics on the same chart, such as page views and average time on page.

Python Code for Visualizing Page Level Metrics & Drop Heatmaps

We will write a Python script that connects to your SQL Server database. It will fetch and calculate various page-level metrics. It will then use Matplotlib and Seaborn to create several types of visualizations. These visualizations will highlight top performing pages and identify drop-off points in a hypothetical funnel.

Practical Python Code Example (visualize_page_metrics.py)
Here is the basic Python code you will write for visualizing page-level metrics. Save this file in your 'dashboard/' folder.


# SCRIPT SETUP 1: Import necessary libraries
import pandas as pd
import pyodbc
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

# SCRIPT SETUP 2: Database connection details for SQL Server
# Replace with your actual server name, database name, and credentials
DB_CONFIG = {
    'driver': '{ODBC Driver 17 for SQL Server}',
    'server': 'YOUR_SQL_SERVER_NAME',  # e.g., 'localhost' or 'SERVER_NAME\SQLEXPRESS'
    'database': 'YOUR_DATABASE_NAME',
    'uid': 'YOUR_USERNAME',
    'pwd': 'YOUR_PASSWORD'
}

# FUNCTION 1: Connect to the database
def connect_to_db():
    """Establishes a connection to the SQL Server database."""
    conn_str = (
        f"DRIVER={DB_CONFIG['driver']};"
        f"SERVER={DB_CONFIG['server']};"
        f"DATABASE={DB_CONFIG['database']};"
        f"UID={DB_CONFIG['uid']};"
        f"PWD={DB_CONFIG['pwd']};"
    )
    try:
        conn = pyodbc.connect(conn_str)
        print("Successfully connected to SQL Server.")
        return conn
    except pyodbc.Error as ex:
        sqlstate = ex.args[0]
        print(f"Database connection error: {sqlstate}")
        print(ex)
        return None

# FUNCTION 2: Fetch page-level performance metrics
def fetch_page_performance_metrics(conn):
    """
    Fetches and calculates page-level performance metrics including
    page views, unique page views, average time on page, and bounce rate.
    """
    query = """
    WITH PageSessionDetails AS (
        SELECT
            session_id,
            page_location,
            MIN(event_timestamp) AS page_entry_timestamp,
            MAX(event_timestamp) AS page_exit_timestamp,
            COUNT(*) AS events_on_page_in_session,
            ROW_NUMBER() OVER (PARTITION BY session_id ORDER BY event_timestamp) AS event_seq_num_in_session,
            COUNT(*) OVER (PARTITION BY session_id) AS total_events_in_session
        FROM
            events
        WHERE
            event_name = 'page_view'
            AND page_location IS NOT NULL
            AND page_location != ''
        GROUP BY
            session_id, page_location, event_timestamp -- Group by event_timestamp to keep individual page_view events
    ),
    PageMetrics AS (
        SELECT
            page_location,
            COUNT(*) AS total_page_views,
            COUNT(DISTINCT session_id) AS unique_page_views,
            AVG(CAST((page_exit_timestamp - page_entry_timestamp) AS DECIMAL(18,2)) / 1000000.0) AS avg_time_on_page_seconds,
            SUM(CASE WHEN total_events_in_session = 1 THEN 1 ELSE 0 END) AS bounced_sessions
        FROM
            PageSessionDetails
        GROUP BY
            page_location
    )
    SELECT
        pm.page_location,
        pm.total_page_views,
        pm.unique_page_views,
        pm.avg_time_on_page_seconds,
        CAST(pm.bounced_sessions * 100.0 / NULLIF(pm.unique_page_views, 0) AS DECIMAL(5,2)) AS bounce_rate_percent
    FROM
        PageMetrics pm
    WHERE
        pm.unique_page_views >= 10 -- Filter for pages with at least 10 unique page views for meaningful metrics
    ORDER BY
        pm.total_page_views DESC;
    """
    try:
        df = pd.read_sql(query, conn)
        print(f"Fetched {len(df)} page performance records.")
        return df
    except Exception as e:
        print(f"Error fetching page performance data: {e}")
        return None

# FUNCTION 3: Fetch data for a hypothetical funnel drop-off
def fetch_funnel_data(conn, funnel_steps):
    """
    Fetches data for a hypothetical funnel to visualize drop-offs.
    funnel_steps: A list of page_locations representing the funnel path.
    """
    funnel_data = []
    
    # Base query to get sessions that start at the first step
    base_query = f"""
    SELECT DISTINCT session_id
    FROM events
    WHERE event_name = 'page_view' AND page_location = '{funnel_steps[0]}'
    """
    
    try:
        current_sessions_df = pd.read_sql(base_query, conn)
        current_sessions_count = len(current_sessions_df)
        
        funnel_data.append({'step': funnel_steps[0], 'sessions': current_sessions_count, 'drop_off_percent': 0.0})
        
        for i in range(1, len(funnel_steps)):
            prev_step_sessions = current_sessions_count
            
            # Get sessions that visited the current step AND were in the previous step's sessions
            query = f"""
            SELECT DISTINCT e.session_id
            FROM events e
            JOIN ({base_query}) AS prev_sessions ON e.session_id = prev_sessions.session_id
            WHERE e.event_name = 'page_view' AND e.page_location = '{funnel_steps[i]}'
            """
            
            current_sessions_df = pd.read_sql(query, conn)
            current_sessions_count = len(current_sessions_df)
            
            drop_off = 0.0
            if prev_step_sessions > 0:
                drop_off = ((prev_step_sessions - current_sessions_count) / prev_step_sessions) * 100
            
            funnel_data.append({
                'step': funnel_steps[i], 
                'sessions': current_sessions_count, 
                'drop_off_percent': drop_off
            })
            
            # Update base_query to only include sessions that made it to the current step
            base_query = f"""
            SELECT DISTINCT session_id
            FROM events
            WHERE event_name = 'page_view' AND page_location = '{funnel_steps[i]}'
            AND session_id IN (
                SELECT session_id FROM ({base_query}) AS prev_step_sessions_for_next_filter
            )
            """

        print(f"Fetched funnel data for {len(funnel_steps)} steps.")
        return pd.DataFrame(funnel_data)
    except Exception as e:
        print(f"Error fetching funnel data: {e}")
        return None

# FUNCTION 4: Visualize Page-Level Metrics and Funnel
def visualize_page_level_metrics(conn):
    """
    Fetches data and creates various page-level KPI visualizations and a funnel drop-off chart.
    """
    # Set Seaborn style for better aesthetics
    sns.set_style("whitegrid")
    plt.rcParams['figure.figsize'] = (12, 7) # Default figure size

    # --- Plot 1: Top 10 Pages by Total Page Views ---
    page_metrics_df = fetch_page_performance_metrics(conn)
    if page_metrics_df is not None and not page_metrics_df.empty:
        top_pages_views = page_metrics_df.sort_values('total_page_views', ascending=False).head(10)
        plt.figure(figsize=(12, 7))
        sns.barplot(x='total_page_views', y='page_location', data=top_pages_views, palette='Blues_d')
        plt.title('Top 10 Pages by Total Page Views', fontsize=16)
        plt.xlabel('Total Page Views', fontsize=12)
        plt.ylabel('Page URL', fontsize=12)
        plt.tight_layout()
        plt.savefig('E:/SankalanAnalytics/dashboard/top_pages_by_views.png')
        plt.close()
        print("Generated top_pages_by_views.png")
    else:
        print("Skipping top pages by views plot: No data or empty DataFrame.")

    # --- Plot 2: Top 10 Pages by Bounce Rate ---
    if page_metrics_df is not None and not page_metrics_df.empty:
        # Filter for pages with sufficient traffic to make bounce rate meaningful
        high_bounce_pages = page_metrics_df[page_metrics_df['unique_page_views'] >= 50].sort_values('bounce_rate_percent', ascending=False).head(10)
        
        plt.figure(figsize=(12, 7))
        sns.barplot(x='bounce_rate_percent', y='page_location', data=high_bounce_pages, palette='Reds_d')
        plt.title('Top 10 Pages by Bounce Rate (for pages with >=50 unique views)', fontsize=16)
        plt.xlabel('Bounce Rate (%)', fontsize=12)
        plt.ylabel('Page URL', fontsize=12)
        plt.tight_layout()
        plt.savefig('E:/SankalanAnalytics/dashboard/top_pages_by_bounce_rate.png')
        plt.close()
        print("Generated top_pages_by_bounce_rate.png")
    else:
        print("Skipping top pages by bounce rate plot: No data or empty DataFrame.")

    # --- Plot 3: Top 10 Pages by Average Time on Page ---
    if page_metrics_df is not None and not page_metrics_df.empty:
        # Filter for pages with sufficient traffic and positive time on page
        high_time_on_page = page_metrics_df[
            (page_metrics_df['unique_page_views'] >= 50) & 
            (page_metrics_df['avg_time_on_page_seconds'] > 0)
        ].sort_values('avg_time_on_page_seconds', ascending=False).head(10)
        
        plt.figure(figsize=(12, 7))
        sns.barplot(x='avg_time_on_page_seconds', y='page_location', data=high_time_on_page, palette='Greens_d')
        plt.title('Top 10 Pages by Average Time on Page (for pages with >=50 unique views)', fontsize=16)
        plt.xlabel('Average Time on Page (seconds)', fontsize=12)
        plt.ylabel('Page URL', fontsize=12)
        plt.tight_layout()
        plt.savefig('E:/SankalanAnalytics/dashboard/top_pages_by_time_on_page.png')
        plt.close()
        print("Generated top_pages_by_time_on_page.png")
    else:
        print("Skipping top pages by average time on page plot: No data or empty DataFrame.")

    # --- Plot 4: Hypothetical Funnel Drop-off Visualization ---
    # Define a hypothetical funnel path (replace with actual paths from your website)
    # Example: Homepage -> Category Page -> Product Page -> Checkout
    # Make sure these page_locations actually exist in your data for meaningful results.
    funnel_steps = [
        'https://www.sankalandtech.com/',
        'https://www.sankalandtech.com/Tutorials/sql-plsql-faq-interview/index.html',
        'https://www.sankalandtech.com/Tutorials/sql-plsql-faq-interview/retrieving-data-sql-select-statement-interview-questions-faq.html'
    ]
    
    funnel_df = fetch_funnel_data(conn, funnel_steps)
    if funnel_df is not None and not funnel_df.empty:
        plt.figure(figsize=(10, 6))
        
        # Plot sessions at each step
        ax1 = sns.barplot(x='step', y='sessions', data=funnel_df, palette='viridis', label='Sessions')
        plt.title('Hypothetical Funnel Drop-off', fontsize=16)
        plt.xlabel('Funnel Step', fontsize=12)
        plt.ylabel('Number of Sessions', fontsize=12)
        plt.xticks(rotation=45, ha='right')
        
        # Add drop-off percentage as text on bars
        for index, row in funnel_df.iterrows():
            if index > 0: # Don't show drop-off for the first step
                plt.text(index, row['sessions'] + 5, f"{row['drop_off_percent']:.1f}% drop", 
                         color='red', ha='center', va='bottom', fontsize=10)
        
        plt.tight_layout()
        plt.savefig('E:/SankalanAnalytics/dashboard/funnel_drop_off.png')
        plt.close()
        print("Generated funnel_drop_off.png")
    else:
        print("Skipping funnel drop-off plot: No data or empty DataFrame. Check if funnel_steps exist in your data.")

# MAIN EXECUTION 1: This block runs when the script starts
if __name__ == "__main__":
    # MAIN EXECUTION 2: Connect to the database
    conn = connect_to_db()
    if conn:
        # MAIN EXECUTION 3: Visualize page-level metrics and funnel
        visualize_page_level_metrics(conn)
        
        # MAIN EXECUTION 4: Close the database connection
        conn.close()
        print("Database connection closed.")
    else:
        print("Could not establish database connection. Exiting.")

Important Notes on This Code:

This Python script connects to your SQL Server database to fetch and calculate detailed page-level metrics. It uses Matplotlib and Seaborn to generate static visualizations that help you analyze specific page performance.

The script creates bar charts showing the top pages based on page views, bounce rate, and average time on page. It also includes a conceptual funnel drop-off chart that illustrates how users progress (or drop off) through a defined sequence of pages on your website.

Make sure to update the DB_CONFIG section with your actual SQL Server connection details — including your server name, database name, username, and password. For the funnel visualization to work properly, replace the placeholder funnel_steps list with real page_location URLs from your website. These should reflect a meaningful user journey and have enough data for reliable analysis.

If you haven’t already installed the required libraries, use the following command:
pip install matplotlib seaborn

Understanding Your Python Page-Level Visualization Script

This Python script is designed to provide granular insights into your website's performance at the individual page level and to visualize user drop-offs within key funnels. It helps you understand which pages are performing well and where users are exiting, allowing for more focused optimization. Let us break down each part of the code.

1. Setting Up Your Tools and Connections (Refer to SCRIPT SETUP 1, 2 in code)

At the very top of the script, you’ll find several import lines. These bring in the essential tools needed for data processing, database connection, and visualizations.

import pandas as pd (SCRIPT SETUP 1) This brings in Pandas for data manipulation.

import pyodbc (SCRIPT SETUP 1) This allows Python to connect and interact with your SQL Server database.

import matplotlib.pyplot as plt (SCRIPT SETUP 1) This is the main library for creating static charts and graphs.

import seaborn as sns (SCRIPT SETUP 1) Built on top of Matplotlib, this library helps produce more visually appealing and informative statistical graphics.

from datetime import datetime (SCRIPT SETUP 1) Used to work with and format date and time data, especially helpful when handling time-based metrics.

Next you see DB_CONFIG.
• DB_CONFIG (SCRIPT SETUP 2) This holds all the details for connecting to your SQL Server. You need to update YOUR SQL SERVER NAME, YOUR DATABASE NAME, YOUR USERNAME, and YOUR PASSWORD with your actual information.

2. Connecting to Your Database connect_to_db (Refer to FUNCTION 1 in code)

This function establishes the connection to your SQL Server database.

What it does It tries to open a link to your SQL Server using the details from DB_CONFIG.

How it works It builds a connection string. This string helps pyodbc find and log in to your database. It then attempts the connection.

Safety check It prints messages to the console if the connection is successful or if there is an error.

3. Fetch Page-Level Performance Metrics fetch_page_performance_metrics (Refer to FUNCTION 2 in code)

This function retrieves detailed performance data for each page on your website.

What it does It runs a complex SQL query to calculate total_page_views, unique_page_views, avg_time_on_page_seconds, and bounce_rate_percent for each page_location.

How it works It uses Common Table Expressions (CTEs) and window functions within SQL to aggregate event data at the page and session level. This allows for precise calculation of metrics like average time on page (based on the first and last event timestamp on that page within a session) and bounce rate (sessions with only one page view event). It filters for pages with at least 10 unique page views to ensure meaningful results.

Output It returns a Pandas DataFrame containing these page-level metrics.

4. Fetch Data for a Hypothetical Funnel Drop-off fetch_funnel_data (Refer to FUNCTION 3 in code)

This function is designed to track user progression through a predefined sequence of pages and identify where users drop off.

What it does Given a list of funnel_steps (page URLs), it calculates the number of sessions that successfully reach each step and the percentage of sessions that drop off between steps.

How it works It iteratively queries the events table. For each step, it counts sessions that visited that page and also visited all preceding pages in the funnel. It then calculates the drop-off rate from the previous step. You must customize the funnel_steps list in the visualize_page_level_metrics function with actual URLs from your website that define a user journey you want to analyze.

Output It returns a Pandas DataFrame with the number of sessions and drop-off percentage for each funnel step.

5. Visualize Page-Level Metrics and Funnel visualize_page_level_metrics (Refer to FUNCTION 4 and its internal steps)

This is the main function that orchestrates the data fetching and creates the various visualizations. What it does is it sets up the plotting style, calls the data fetching functions, and then creates and saves four different types of plots.

How it works: First, sns.set_style("whitegrid") and plt.rcParams are used to set a clean aesthetic style for all plots along with a default figure size. The first plot shows the Top 10 Pages by Total Page Views. This plot fetches page performance data using fetch_page_performance_metrics, sorts the data by total_page_views and takes the top 10 pages. It then uses sns.barplot() to visualize these pages and saves the plot as top_pages_by_views.png.

The second plot displays the Top 10 Pages by Bounce Rate. It filters for pages with at least 50 unique page views to ensure meaningful bounce rates, sorts by bounce_rate_percent, and takes the top 10. This plot uses sns.barplot() to show pages with the highest bounce rates and saves the image as top_pages_by_bounce_rate.png.

The third plot is the Top 10 Pages by Average Time on Page. It filters for pages with at least 50 unique page views and a positive average time on page, sorts by avg_time_on_page_seconds, and selects the top 10 pages. Using sns.barplot(), it displays pages where users spend the most time, then saves the plot as top_pages_by_time_on_page.png.

The fourth and final plot is a Hypothetical Funnel Drop-off Visualization. This defines a funnel_steps list which you must customize with your own URLs. It calls fetch_funnel_data to get the session counts and drop-off percentages for each step. The function uses sns.barplot() to visualize the number of sessions at each funnel step, adds text annotations to show the drop-off percentages between steps, and saves the plot as funnel_drop_off.png.

As output, the function prints messages to the console indicating which plots have been generated and saved.

6. Running the Script The Main Execution Block (Refer to MAIN EXECUTION 1 to 4 in code)

This part of the script tells Python to run your visualization process when the file is executed. The line if __name__ == "__main__": is a standard Python construct that ensures the code inside this block only runs when the script is executed directly, not when it is imported as a module. The execution flow first attempts to connect to the database. If the connection is successful, it calls the visualize_page_level_metrics() function to generate and save all the plots. Finally, it closes the database connection.

Overall Value of Page-Level Metrics & Drop Heatmaps Visualizations

Visualizing page-level metrics and funnel drop-offs provides deep, actionable insights into your website's performance. It allows you to move beyond high-level trends and pinpoint specific pages or steps in a user journey that require attention. By understanding where users disengage or encounter friction, you can make targeted improvements to content, design, and user flow, directly impacting engagement and conversion rates. This demonstrates a sophisticated approach to web analytics, crucial for optimizing user experience and achieving business objectives.

Next Steps

You have successfully created detailed page-level metric visualizations and a funnel drop-off chart. This means you are now proficient in generating highly granular and actionable static charts. You have completed the Dashboards & Visualization phase.

The next exciting phase will be to compile your Final Analytics Story. We will start by summarizing your key insights and findings from the entire project. This will prepare you to present your work effectively.

For now, make sure you save this Python script in your E:\SankalanAnalytics\dashboard\ folder. Name the file: visualize_page_metrics.py.

To run your visualization script:

1. Open your Command Prompt (CMD).
2. Navigate to your E:\SankalanAnalytics\dashboard\ folder by typing:
cd E:\SankalanAnalytics\dashboard\
3. Run the Python script:
python visualize_page_metrics.py
4. After execution, you will find the generated .png image files in the same dashboard/ folder.