Python Code for Visualizing Page Level Metrics & Drop Heatmaps
We will write a Python script that connects to your SQL Server database. It will fetch and calculate various page-level metrics. It will then use Matplotlib and Seaborn to create several types of visualizations. These visualizations will highlight top performing pages and identify drop-off points in a hypothetical funnel.
Practical Python Code Example (visualize_page_metrics.py)
Here is the basic Python code you will write for visualizing page-level metrics. Save this file in your 'dashboard/' folder.
# SCRIPT SETUP 1: Import necessary libraries
import pandas as pd
import pyodbc
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
# SCRIPT SETUP 2: Database connection details for SQL Server
# Replace with your actual server name, database name, and credentials
DB_CONFIG = {
'driver': '{ODBC Driver 17 for SQL Server}',
'server': 'YOUR_SQL_SERVER_NAME', # e.g., 'localhost' or 'SERVER_NAME\SQLEXPRESS'
'database': 'YOUR_DATABASE_NAME',
'uid': 'YOUR_USERNAME',
'pwd': 'YOUR_PASSWORD'
}
# FUNCTION 1: Connect to the database
def connect_to_db():
"""Establishes a connection to the SQL Server database."""
conn_str = (
f"DRIVER={DB_CONFIG['driver']};"
f"SERVER={DB_CONFIG['server']};"
f"DATABASE={DB_CONFIG['database']};"
f"UID={DB_CONFIG['uid']};"
f"PWD={DB_CONFIG['pwd']};"
)
try:
conn = pyodbc.connect(conn_str)
print("Successfully connected to SQL Server.")
return conn
except pyodbc.Error as ex:
sqlstate = ex.args[0]
print(f"Database connection error: {sqlstate}")
print(ex)
return None
# FUNCTION 2: Fetch page-level performance metrics
def fetch_page_performance_metrics(conn):
"""
Fetches and calculates page-level performance metrics including
page views, unique page views, average time on page, and bounce rate.
"""
query = """
WITH PageSessionDetails AS (
SELECT
session_id,
page_location,
MIN(event_timestamp) AS page_entry_timestamp,
MAX(event_timestamp) AS page_exit_timestamp,
COUNT(*) AS events_on_page_in_session,
ROW_NUMBER() OVER (PARTITION BY session_id ORDER BY event_timestamp) AS event_seq_num_in_session,
COUNT(*) OVER (PARTITION BY session_id) AS total_events_in_session
FROM
events
WHERE
event_name = 'page_view'
AND page_location IS NOT NULL
AND page_location != ''
GROUP BY
session_id, page_location, event_timestamp -- Group by event_timestamp to keep individual page_view events
),
PageMetrics AS (
SELECT
page_location,
COUNT(*) AS total_page_views,
COUNT(DISTINCT session_id) AS unique_page_views,
AVG(CAST((page_exit_timestamp - page_entry_timestamp) AS DECIMAL(18,2)) / 1000000.0) AS avg_time_on_page_seconds,
SUM(CASE WHEN total_events_in_session = 1 THEN 1 ELSE 0 END) AS bounced_sessions
FROM
PageSessionDetails
GROUP BY
page_location
)
SELECT
pm.page_location,
pm.total_page_views,
pm.unique_page_views,
pm.avg_time_on_page_seconds,
CAST(pm.bounced_sessions * 100.0 / NULLIF(pm.unique_page_views, 0) AS DECIMAL(5,2)) AS bounce_rate_percent
FROM
PageMetrics pm
WHERE
pm.unique_page_views >= 10 -- Filter for pages with at least 10 unique page views for meaningful metrics
ORDER BY
pm.total_page_views DESC;
"""
try:
df = pd.read_sql(query, conn)
print(f"Fetched {len(df)} page performance records.")
return df
except Exception as e:
print(f"Error fetching page performance data: {e}")
return None
# FUNCTION 3: Fetch data for a hypothetical funnel drop-off
def fetch_funnel_data(conn, funnel_steps):
"""
Fetches data for a hypothetical funnel to visualize drop-offs.
funnel_steps: A list of page_locations representing the funnel path.
"""
funnel_data = []
# Base query to get sessions that start at the first step
base_query = f"""
SELECT DISTINCT session_id
FROM events
WHERE event_name = 'page_view' AND page_location = '{funnel_steps[0]}'
"""
try:
current_sessions_df = pd.read_sql(base_query, conn)
current_sessions_count = len(current_sessions_df)
funnel_data.append({'step': funnel_steps[0], 'sessions': current_sessions_count, 'drop_off_percent': 0.0})
for i in range(1, len(funnel_steps)):
prev_step_sessions = current_sessions_count
# Get sessions that visited the current step AND were in the previous step's sessions
query = f"""
SELECT DISTINCT e.session_id
FROM events e
JOIN ({base_query}) AS prev_sessions ON e.session_id = prev_sessions.session_id
WHERE e.event_name = 'page_view' AND e.page_location = '{funnel_steps[i]}'
"""
current_sessions_df = pd.read_sql(query, conn)
current_sessions_count = len(current_sessions_df)
drop_off = 0.0
if prev_step_sessions > 0:
drop_off = ((prev_step_sessions - current_sessions_count) / prev_step_sessions) * 100
funnel_data.append({
'step': funnel_steps[i],
'sessions': current_sessions_count,
'drop_off_percent': drop_off
})
# Update base_query to only include sessions that made it to the current step
base_query = f"""
SELECT DISTINCT session_id
FROM events
WHERE event_name = 'page_view' AND page_location = '{funnel_steps[i]}'
AND session_id IN (
SELECT session_id FROM ({base_query}) AS prev_step_sessions_for_next_filter
)
"""
print(f"Fetched funnel data for {len(funnel_steps)} steps.")
return pd.DataFrame(funnel_data)
except Exception as e:
print(f"Error fetching funnel data: {e}")
return None
# FUNCTION 4: Visualize Page-Level Metrics and Funnel
def visualize_page_level_metrics(conn):
"""
Fetches data and creates various page-level KPI visualizations and a funnel drop-off chart.
"""
# Set Seaborn style for better aesthetics
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (12, 7) # Default figure size
# --- Plot 1: Top 10 Pages by Total Page Views ---
page_metrics_df = fetch_page_performance_metrics(conn)
if page_metrics_df is not None and not page_metrics_df.empty:
top_pages_views = page_metrics_df.sort_values('total_page_views', ascending=False).head(10)
plt.figure(figsize=(12, 7))
sns.barplot(x='total_page_views', y='page_location', data=top_pages_views, palette='Blues_d')
plt.title('Top 10 Pages by Total Page Views', fontsize=16)
plt.xlabel('Total Page Views', fontsize=12)
plt.ylabel('Page URL', fontsize=12)
plt.tight_layout()
plt.savefig('E:/SankalanAnalytics/dashboard/top_pages_by_views.png')
plt.close()
print("Generated top_pages_by_views.png")
else:
print("Skipping top pages by views plot: No data or empty DataFrame.")
# --- Plot 2: Top 10 Pages by Bounce Rate ---
if page_metrics_df is not None and not page_metrics_df.empty:
# Filter for pages with sufficient traffic to make bounce rate meaningful
high_bounce_pages = page_metrics_df[page_metrics_df['unique_page_views'] >= 50].sort_values('bounce_rate_percent', ascending=False).head(10)
plt.figure(figsize=(12, 7))
sns.barplot(x='bounce_rate_percent', y='page_location', data=high_bounce_pages, palette='Reds_d')
plt.title('Top 10 Pages by Bounce Rate (for pages with >=50 unique views)', fontsize=16)
plt.xlabel('Bounce Rate (%)', fontsize=12)
plt.ylabel('Page URL', fontsize=12)
plt.tight_layout()
plt.savefig('E:/SankalanAnalytics/dashboard/top_pages_by_bounce_rate.png')
plt.close()
print("Generated top_pages_by_bounce_rate.png")
else:
print("Skipping top pages by bounce rate plot: No data or empty DataFrame.")
# --- Plot 3: Top 10 Pages by Average Time on Page ---
if page_metrics_df is not None and not page_metrics_df.empty:
# Filter for pages with sufficient traffic and positive time on page
high_time_on_page = page_metrics_df[
(page_metrics_df['unique_page_views'] >= 50) &
(page_metrics_df['avg_time_on_page_seconds'] > 0)
].sort_values('avg_time_on_page_seconds', ascending=False).head(10)
plt.figure(figsize=(12, 7))
sns.barplot(x='avg_time_on_page_seconds', y='page_location', data=high_time_on_page, palette='Greens_d')
plt.title('Top 10 Pages by Average Time on Page (for pages with >=50 unique views)', fontsize=16)
plt.xlabel('Average Time on Page (seconds)', fontsize=12)
plt.ylabel('Page URL', fontsize=12)
plt.tight_layout()
plt.savefig('E:/SankalanAnalytics/dashboard/top_pages_by_time_on_page.png')
plt.close()
print("Generated top_pages_by_time_on_page.png")
else:
print("Skipping top pages by average time on page plot: No data or empty DataFrame.")
# --- Plot 4: Hypothetical Funnel Drop-off Visualization ---
# Define a hypothetical funnel path (replace with actual paths from your website)
# Example: Homepage -> Category Page -> Product Page -> Checkout
# Make sure these page_locations actually exist in your data for meaningful results.
funnel_steps = [
'https://www.sankalandtech.com/',
'https://www.sankalandtech.com/Tutorials/sql-plsql-faq-interview/index.html',
'https://www.sankalandtech.com/Tutorials/sql-plsql-faq-interview/retrieving-data-sql-select-statement-interview-questions-faq.html'
]
funnel_df = fetch_funnel_data(conn, funnel_steps)
if funnel_df is not None and not funnel_df.empty:
plt.figure(figsize=(10, 6))
# Plot sessions at each step
ax1 = sns.barplot(x='step', y='sessions', data=funnel_df, palette='viridis', label='Sessions')
plt.title('Hypothetical Funnel Drop-off', fontsize=16)
plt.xlabel('Funnel Step', fontsize=12)
plt.ylabel('Number of Sessions', fontsize=12)
plt.xticks(rotation=45, ha='right')
# Add drop-off percentage as text on bars
for index, row in funnel_df.iterrows():
if index > 0: # Don't show drop-off for the first step
plt.text(index, row['sessions'] + 5, f"{row['drop_off_percent']:.1f}% drop",
color='red', ha='center', va='bottom', fontsize=10)
plt.tight_layout()
plt.savefig('E:/SankalanAnalytics/dashboard/funnel_drop_off.png')
plt.close()
print("Generated funnel_drop_off.png")
else:
print("Skipping funnel drop-off plot: No data or empty DataFrame. Check if funnel_steps exist in your data.")
# MAIN EXECUTION 1: This block runs when the script starts
if __name__ == "__main__":
# MAIN EXECUTION 2: Connect to the database
conn = connect_to_db()
if conn:
# MAIN EXECUTION 3: Visualize page-level metrics and funnel
visualize_page_level_metrics(conn)
# MAIN EXECUTION 4: Close the database connection
conn.close()
print("Database connection closed.")
else:
print("Could not establish database connection. Exiting.")
Important Notes on This Code:
This Python script connects to your SQL Server database to fetch and calculate detailed page-level metrics. It uses Matplotlib and Seaborn to generate static visualizations that help you analyze specific page performance.
The script creates bar charts showing the top pages based on page views, bounce rate, and average time on page. It also includes a conceptual funnel drop-off chart that illustrates how users progress (or drop off) through a defined sequence of pages on your website.
Make sure to update the DB_CONFIG
section with your actual SQL Server connection details — including your server name, database name, username, and password. For the funnel visualization to work properly, replace the placeholder funnel_steps
list with real page_location
URLs from your website. These should reflect a meaningful user journey and have enough data for reliable analysis.
If you haven’t already installed the required libraries, use the following command:
pip install matplotlib seaborn
Understanding Your Python Page-Level Visualization Script
This Python script is designed to provide granular insights into your website's performance at the individual page level and to visualize user drop-offs within key funnels. It helps you understand which pages are performing well and where users are exiting, allowing for more focused optimization. Let us break down each part of the code.
1. Setting Up Your Tools and Connections (Refer to SCRIPT SETUP 1, 2 in code)
At the very top of the script, you’ll find several import lines. These bring in the essential tools needed for data processing, database connection, and visualizations.
import pandas as pd (SCRIPT SETUP 1) This brings in Pandas for data manipulation.
import pyodbc (SCRIPT SETUP 1) This allows Python to connect and interact with your SQL Server database.
import matplotlib.pyplot as plt (SCRIPT SETUP 1) This is the main library for creating static charts and graphs.
import seaborn as sns (SCRIPT SETUP 1) Built on top of Matplotlib, this library helps produce more visually appealing and informative statistical graphics.
from datetime import datetime (SCRIPT SETUP 1) Used to work with and format date and time data, especially helpful when handling time-based metrics.
Next you see DB_CONFIG.
• DB_CONFIG (SCRIPT SETUP 2) This holds all the details for connecting to your SQL Server. You need to update YOUR SQL SERVER NAME, YOUR DATABASE NAME, YOUR USERNAME, and YOUR PASSWORD with your actual information.
2. Connecting to Your Database connect_to_db (Refer to FUNCTION 1 in code)
This function establishes the connection to your SQL Server database.
What it does It tries to open a link to your SQL Server using the details from DB_CONFIG.
How it works It builds a connection string. This string helps pyodbc
find and log in to your database. It then attempts the connection.
Safety check It prints messages to the console if the connection is successful or if there is an error.
3. Fetch Page-Level Performance Metrics fetch_page_performance_metrics (Refer to FUNCTION 2 in code)
This function retrieves detailed performance data for each page on your website.
What it does It runs a complex SQL query to calculate total_page_views
, unique_page_views
, avg_time_on_page_seconds
, and bounce_rate_percent
for each page_location
.
How it works It uses Common Table Expressions (CTEs) and window functions within SQL to aggregate event data at the page and session level. This allows for precise calculation of metrics like average time on page (based on the first and last event timestamp on that page within a session) and bounce rate (sessions with only one page view event). It filters for pages with at least 10 unique page views to ensure meaningful results.
Output It returns a Pandas DataFrame containing these page-level metrics.
4. Fetch Data for a Hypothetical Funnel Drop-off fetch_funnel_data (Refer to FUNCTION 3 in code)
This function is designed to track user progression through a predefined sequence of pages and identify where users drop off.
What it does Given a list of funnel_steps
(page URLs), it calculates the number of sessions that successfully reach each step and the percentage of sessions that drop off between steps.
How it works It iteratively queries the events
table. For each step, it counts sessions that visited that page and also visited all preceding pages in the funnel. It then calculates the drop-off rate from the previous step. You must customize the funnel_steps
list in the visualize_page_level_metrics
function with actual URLs from your website that define a user journey you want to analyze.
Output It returns a Pandas DataFrame with the number of sessions and drop-off percentage for each funnel step.
5. Visualize Page-Level Metrics and Funnel visualize_page_level_metrics (Refer to FUNCTION 4 and its internal steps)
This is the main function that orchestrates the data fetching and creates the various visualizations. What it does is it sets up the plotting style, calls the data fetching functions, and then creates and saves four different types of plots.
How it works: First, sns.set_style("whitegrid")
and plt.rcParams
are used to set a clean aesthetic style for all plots along with a default figure size. The first plot shows the Top 10 Pages by Total Page Views. This plot fetches page performance data using fetch_page_performance_metrics
, sorts the data by total_page_views
and takes the top 10 pages. It then uses sns.barplot()
to visualize these pages and saves the plot as top_pages_by_views.png
.
The second plot displays the Top 10 Pages by Bounce Rate. It filters for pages with at least 50 unique page views to ensure meaningful bounce rates, sorts by bounce_rate_percent
, and takes the top 10. This plot uses sns.barplot()
to show pages with the highest bounce rates and saves the image as top_pages_by_bounce_rate.png
.
The third plot is the Top 10 Pages by Average Time on Page. It filters for pages with at least 50 unique page views and a positive average time on page, sorts by avg_time_on_page_seconds
, and selects the top 10 pages. Using sns.barplot()
, it displays pages where users spend the most time, then saves the plot as top_pages_by_time_on_page.png
.
The fourth and final plot is a Hypothetical Funnel Drop-off Visualization. This defines a funnel_steps
list which you must customize with your own URLs. It calls fetch_funnel_data
to get the session counts and drop-off percentages for each step. The function uses sns.barplot()
to visualize the number of sessions at each funnel step, adds text annotations to show the drop-off percentages between steps, and saves the plot as funnel_drop_off.png
.
As output, the function prints messages to the console indicating which plots have been generated and saved.
6. Running the Script The Main Execution Block (Refer to MAIN EXECUTION 1 to 4 in code)
This part of the script tells Python to run your visualization process when the file is executed. The line if __name__ == "__main__":
is a standard Python construct that ensures the code inside this block only runs when the script is executed directly, not when it is imported as a module. The execution flow first attempts to connect to the database. If the connection is successful, it calls the visualize_page_level_metrics()
function to generate and save all the plots. Finally, it closes the database connection.
Overall Value of Page-Level Metrics & Drop Heatmaps Visualizations
Visualizing page-level metrics and funnel drop-offs provides deep, actionable insights into your website's performance. It allows you to move beyond high-level trends and pinpoint specific pages or steps in a user journey that require attention. By understanding where users disengage or encounter friction, you can make targeted improvements to content, design, and user flow, directly impacting engagement and conversion rates. This demonstrates a sophisticated approach to web analytics, crucial for optimizing user experience and achieving business objectives.
Next Steps
You have successfully created detailed page-level metric visualizations and a funnel drop-off chart. This means you are now proficient in generating highly granular and actionable static charts. You have completed the Dashboards & Visualization phase.
The next exciting phase will be to compile your Final Analytics Story. We will start by summarizing your key insights and findings from the entire project. This will prepare you to present your work effectively.
For now, make sure you save this Python script in your E:\SankalanAnalytics\dashboard\
folder. Name the file: visualize_page_metrics.py
.
To run your visualization script:
1. Open your Command Prompt (CMD).
2. Navigate to your E:\SankalanAnalytics\dashboard\
folder by typing:
cd E:\SankalanAnalytics\dashboard\
3. Run the Python script:
python visualize_page_metrics.py
4. After execution, you will find the generated .png
image files in the same dashboard/
folder.