Visualize Website KPIs with Python Using Matplotlib and Seaborn
Welcome to Phase 8.2 Crafting Compelling Visualizations
You have successfully created an interactive web dashboard using Streamlit. That is a fantastic way to make your web analytics insights accessible. Now it is time to focus on creating high quality static visualizations for reports presentations or even for embedding in your website. In this phase we will learn how to visualize key performance indicators KPIs using Python with Matplotlib and Seaborn. Think of this as creating beautiful and informative charts that tell a clear story about your website's performance. This is crucial for formal reporting and deeper analytical dives.
This step is crucial for detailed data exploration and professional data presentation.
Why Python Visualizations (Matplotlib and Seaborn) are Essential
While Streamlit provides quick interactive dashboards Matplotlib and Seaborn offer unparalleled control and flexibility for static plots. Here is why they are so valuable.
• Publication Quality Create charts that are ready to use in academic research, business reports, or professional presentations.
• Granular Control Customize every detail of your charts—from colors and fonts to labels, scales, and annotations—for full visual control.
• Complex Visualizations Build advanced chart types and multi-layered plots that may not be possible with basic dashboard tools.
• Deep Dive Analysis Use detailed visualizations to uncover patterns, investigate outliers, and explore specific hypotheses.
• Automated Report Generation Generate and save charts automatically, making it easier to include visuals in recurring reports or pipelines.
• Industry Standard Matplotlib and Seaborn are trusted tools widely used by analysts and data scientists. Learning them boosts your data visualization skills.
Python visualization libraries empower you to tell a precise and impactful data story.
Key Concepts for Python Visualizations
To create effective visualizations with Python we need to understand a few core concepts.
Matplotlib: This is the core plotting library in Python. It offers a wide variety of plotting tools and allows full customization of your charts, including colors, fonts, layout, and more.
Seaborn: Built on top of Matplotlib, Seaborn simplifies the process of creating visually appealing and informative statistical graphics. It handles complex plots with less code and better default styling.
Types of Charts: Different chart types help highlight different aspects of your data:
Line Charts: Best for showing trends over time, such as daily users or sessions.
Bar Charts: Useful for comparing categories like traffic sources or devices.
Pie Charts / Donut Charts: Great for showing proportions of a whole, such as traffic share by medium.
Histograms / KDE Plots: Help visualize the distribution of a single numeric value, like session duration.
Scatter Plots: Show relationships between two numeric variables, such as time on site vs. bounce rate.
Data Preparation: Before plotting, make sure your data is clean and structured in a Pandas DataFrame. This often means filtering, aggregating, or transforming your raw data to fit your visual needs.
Figure and Axes: In Matplotlib, a Figure
represents the whole plotting area (like a canvas), and an Axes
is a single plot within that space. You can add multiple Axes
to one Figure
to build complex layouts.
Labels, Titles, and Legends: These elements help make your charts readable. Always include clear labels for axes, a descriptive title, and a legend if you have multiple data series.
Saving Plots: Use Matplotlib’s plt.savefig()
function to export your plots as image files (PNG, JPEG, SVG). This is useful for including visuals in reports, presentations, or sharing insights offline.
Python Code for Visualizing Web Analytics KPIs
We will write a Python script that connects to your SQL Server database. It will fetch various aggregated data points. It will then use Matplotlib and Seaborn to create several types of visualizations for your web analytics KPIs.
Practical Python Code Example (visualize_kpis.py)
Here is the basic Python code you will write for visualizing your KPIs. Save this file in your 'dashboard/' folder.
# SCRIPT SETUP 1: Import necessary libraries
import pandas as pd
import pyodbc
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
# SCRIPT SETUP 2: Database connection details for SQL Server
# Replace with your actual server name, database name, and credentials
DB_CONFIG = {
'driver': '{ODBC Driver 17 for SQL Server}',
'server': 'YOUR_SQL_SERVER_NAME', # e.g., 'localhost' or 'SERVER_NAME\SQLEXPRESS'
'database': 'YOUR_DATABASE_NAME',
'uid': 'YOUR_USERNAME',
'pwd': 'YOUR_PASSWORD'
}
# FUNCTION 1: Connect to the database
def connect_to_db():
"""Establishes a connection to the SQL Server database."""
conn_str = (
f"DRIVER={DB_CONFIG['driver']};"
f"SERVER={DB_CONFIG['server']};"
f"DATABASE={DB_CONFIG['database']};"
f"UID={DB_CONFIG['uid']};"
f"PWD={DB_CONFIG['pwd']};"
)
try:
conn = pyodbc.connect(conn_str)
print("Successfully connected to SQL Server.")
return conn
except pyodbc.Error as ex:
sqlstate = ex.args[0]
print(f"Database connection error: {sqlstate}")
print(ex)
return None
# FUNCTION 2: Fetch daily traffic trends
def fetch_daily_trends(conn):
"""Fetches daily traffic trends."""
query = """
SELECT
CAST(DATEADD(s, event_timestamp / 1000000, '1970-01-01') AS DATE) AS event_date,
COUNT(DISTINCT user_pseudo_id) AS daily_users,
COUNT(DISTINCT session_id) AS daily_sessions,
COUNT(CASE WHEN event_name = 'page_view' THEN 1 END) AS daily_page_views
FROM
events
GROUP BY
CAST(DATEADD(s, event_timestamp / 1000000, '1970-01-01') AS DATE)
ORDER BY
event_date;
"""
try:
df = pd.read_sql(query, conn)
df['event_date'] = pd.to_datetime(df['event_date'])
return df
except Exception as e:
print(f"Error fetching daily trends: {e}")
return None
# FUNCTION 3: Fetch sessions by device category
def fetch_sessions_by_device(conn):
"""Fetches sessions count by device category."""
query = """
SELECT
device_category,
COUNT(DISTINCT session_id) AS total_sessions
FROM
events
WHERE
device_category IS NOT NULL AND device_category != ''
GROUP BY
device_category
ORDER BY
total_sessions DESC;
"""
try:
df = pd.read_sql(query, conn)
return df
except Exception as e:
print(f"Error fetching sessions by device: {e}")
return None
# FUNCTION 4: Fetch sessions by traffic source
def fetch_sessions_by_source(conn):
"""Fetches sessions count by traffic source."""
query = """
SELECT
traffic_source,
COUNT(DISTINCT session_id) AS total_sessions
FROM
events
WHERE
traffic_source IS NOT NULL AND traffic_source != ''
GROUP BY
traffic_source
ORDER BY
total_sessions DESC;
"""
try:
df = pd.read_sql(query, conn)
return df
except Exception as e:
print(f"Error fetching sessions by source: {e}")
return None
# FUNCTION 5: Visualize KPIs
def visualize_kpis(conn):
"""
Fetches data and creates various KPI visualizations using Matplotlib and Seaborn.
"""
# Set Seaborn style for better aesthetics
sns.set_style("whitegrid")
plt.rcParams['figure.figsize'] = (10, 6) # Default figure size
# --- Plot 1: Daily Traffic Trends (Users, Sessions, Page Views) ---
daily_trends_df = fetch_daily_trends(conn)
if daily_trends_df is not None and not daily_trends_df.empty:
plt.figure(figsize=(12, 7))
plt.plot(daily_trends_df['event_date'], daily_trends_df['daily_users'], label='Daily Users', marker='o', markersize=4)
plt.plot(daily_trends_df['event_date'], daily_trends_df['daily_sessions'], label='Daily Sessions', marker='x', markersize=4)
plt.plot(daily_trends_df['event_date'], daily_trends_df['daily_page_views'], label='Daily Page Views', marker='s', markersize=4)
plt.title('Daily Website Traffic Trends', fontsize=16)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Count', fontsize=12)
plt.legend(fontsize=10)
plt.grid(True, linestyle='--', alpha=0.7)
plt.tight_layout()
plt.savefig('E:/SankalanAnalytics/dashboard/daily_traffic_trends.png')
plt.close() # Close plot to free memory
print("Generated daily_traffic_trends.png")
else:
print("Skipping daily traffic trends plot: No data or empty DataFrame.")
# --- Plot 2: Sessions by Device Category (Bar Chart) ---
device_df = fetch_sessions_by_device(conn)
if device_df is not None and not device_df.empty:
plt.figure(figsize=(9, 6))
sns.barplot(x='device_category', y='total_sessions', data=device_df, palette='viridis')
plt.title('Total Sessions by Device Category', fontsize=16)
plt.xlabel('Device Category', fontsize=12)
plt.ylabel('Total Sessions', fontsize=12)
plt.tight_layout()
plt.savefig('E:/SankalanAnalytics/dashboard/sessions_by_device.png')
plt.close()
print("Generated sessions_by_device.png")
else:
print("Skipping sessions by device plot: No data or empty DataFrame.")
# --- Plot 3: Sessions by Traffic Source (Bar Chart) ---
source_df = fetch_sessions_by_source(conn)
if source_df is not None and not source_df.empty:
plt.figure(figsize=(10, 7))
sns.barplot(x='total_sessions', y='traffic_source', data=source_df, palette='magma')
plt.title('Total Sessions by Traffic Source', fontsize=16)
plt.xlabel('Total Sessions', fontsize=12)
plt.ylabel('Traffic Source', fontsize=12)
plt.tight_layout()
plt.savefig('E:/SankalanAnalytics/dashboard/sessions_by_source.png')
plt.close()
print("Generated sessions_by_source.png")
else:
print("Skipping sessions by traffic source plot: No data or empty DataFrame.")
# --- Plot 4: Simple Pie Chart for Device Distribution ---
if device_df is not None and not device_df.empty:
plt.figure(figsize=(8, 8))
plt.pie(device_df['total_sessions'], labels=device_df['device_category'], autopct='%1.1f%%', startangle=90, colors=sns.color_palette('pastel'))
plt.title('Session Distribution by Device Category', fontsize=16)
plt.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
plt.tight_layout()
plt.savefig('E:/SankalanAnalytics/dashboard/device_distribution_pie.png')
plt.close()
print("Generated device_distribution_pie.png")
else:
print("Skipping device distribution pie chart: No data or empty DataFrame.")
# MAIN EXECUTION 1: This block runs when the script starts
if __name__ == "__main__":
# MAIN EXECUTION 2: Connect to the database
conn = connect_to_db()
if conn:
# MAIN EXECUTION 3: Visualize KPIs
visualize_kpis(conn)
# MAIN EXECUTION 4: Close the database connection
conn.close()
print("Database connection closed.")
else:
print("Could not establish database connection. Exiting.")
Important Notes on This Code:
This Python script connects to your SQL Server database to fetch key web analytics metrics. It then uses Matplotlib and Seaborn to create a set of static visualizations that give clear insights into your website’s performance.
The script generates several charts, including daily traffic trends (users, sessions, and page views), sessions by device category, sessions by traffic source, and a pie chart showing device distribution.
Each chart is saved automatically as a .png
image in your E:\SankalanAnalytics\dashboard\
folder. Before running the script, be sure to update the DB_CONFIG
section with your actual SQL Server connection details — including the server name, database name, username, and password.
If you haven’t already installed the required libraries, use the following command:
pip install matplotlib seaborn
Understanding Your Python KPI Visualization Script
This Python script is designed to create professional quality static visualizations of your key web analytics metrics. It is a powerful way to prepare charts for reports presentations or even for embedding in other documents. Let us break down each part of the code.
1. Setting Up Your Tools and Connections (Refer to SCRIPT SETUP 1, 2 in code)
At the very top of the script you see some import lines. These lines bring in the necessary tools for our work.
import pandas as pd (SCRIPT SETUP 1) This brings in Pandas for data manipulation.
import pyodbc (SCRIPT SETUP 1) This lets Python talk to your SQL Server database.
import matplotlib.pyplot as plt (SCRIPT SETUP 1) This is the primary plotting library.
import seaborn as sns (SCRIPT SETUP 1) This library is built on Matplotlib and provides enhanced statistical plots and aesthetics.
from datetime import datetime (SCRIPT SETUP 1) This is used for working with dates and times.
Next you see DB_CONFIG.
• DB_CONFIG (SCRIPT SETUP 2) This holds all the details for connecting to your SQL Server. You need to update YOUR SQL SERVER NAME YOUR DATABASE NAME YOUR USERNAME and YOUR PASSWORD with your actual information.
2. Connecting to Your Database connect_to_db (Refer to FUNCTION 1 in code)
This function establishes the connection to your SQL Server database.
What it does It tries to open a link to your SQL Server using the details from DB_CONFIG
How it works It builds a connection string. This string helps 'pyodbc' find and log in to your database. It then attempts the connection.
Safety check It prints messages to the console if the connection is successful or if there is an error.
3. Data Fetching Functions (Refer to FUNCTIONS 2, 3, 4 in code)
These functions are responsible for retrieving specific datasets needed for our visualizations from the SQL Server database.
fetch_daily_trends(conn) (FUNCTION 2) This function fetches daily users, sessions, and page views. It is best suited for creating time series visualizations that show trends over time.
fetch_sessions_by_device(conn) (FUNCTION 3) This retrieves the total number of sessions grouped by device category such as mobile, desktop, or tablet. It helps you understand which devices users prefer when visiting your website.
fetch_sessions_by_source(conn) (FUNCTION 4) This gathers session counts grouped by traffic source — like organic search, direct visits, or referral traffic — giving insight into where your users are coming from.
How they work Each function runs a SQL query using pd.read_sql
to pull the data directly into a Pandas DataFrame. They also include error handling to ensure that even if there’s an issue with the query or connection, the script won’t break unexpectedly.
4. Visualize KPIs visualize_kpis (Refer to FUNCTION 5 and its internal steps)
This is the main function where the plotting happens. It orchestrates the data fetching and visualization process.
What it does It sets up the plotting style, calls the data fetching functions, and then creates and saves four different types of plots.
How it works The lines sns.set_style("whitegrid")
and plt.rcParams
set a clean aesthetic style for all plots and define a default figure size, making your charts look professional.
For Plot 1, Daily Traffic Trends, it fetches daily trend data using fetch_daily_trends
. It then uses plt.plot()
to create a line chart showing daily users, sessions, and page views over time. The plot includes a title, labels, legend, and grid for clarity. The plot is saved as daily_traffic_trends.png
in your dashboard/
folder using plt.savefig()
. Finally, plt.close()
is called to free up memory, which is important when generating multiple plots.
Plot 2, Sessions by Device Category, fetches device session data using fetch_sessions_by_device
. It uses sns.barplot()
to create a bar chart comparing sessions across different device categories. Seaborn simplifies creating aesthetically pleasing bar charts. This plot is saved as sessions_by_device.png
.
Plot 3, Sessions by Traffic Source, fetches traffic source data using fetch_sessions_by_source
. It uses sns.barplot()
again to compare sessions by traffic source and saves the plot as sessions_by_source.png
.
Plot 4 is a Simple Pie Chart for Device Distribution. It reuses the device_df
data and uses plt.pie()
to create a pie chart showing the proportion of sessions from each device category. The autopct
parameter formats the percentages directly on the slices. This plot is saved as device_distribution_pie.png
.
Output The function prints messages to the console indicating which plots have been generated and saved successfully.
5. Running the Script The Main Execution Block (Refer to MAIN EXECUTION 1 to 4 in code)
This part of the script tells Python to run your visualization process when the file is executed.
What it does The if __name__ == "__main__":
construct ensures that the code inside this block only runs when the script is executed directly.
How it works The script first attempts to connect to the database. If successful, it calls the visualize_kpis()
function to generate and save all the plots. Finally, it closes the database connection to free up resources.
Overall Value of Python KPI Visualizations
Creating visualizations with Matplotlib and Seaborn is a fundamental skill for any data analyst or data scientist. It allows you to produce highly customized and professional plots that can effectively communicate complex insights. These static plots are perfect for formal reports presentations or even for sharing as standalone images. This demonstrates your ability to not only analyze data but also to present your findings in a clear compelling and visually appealing manner. This is a vital skill for data storytelling and influencing business decisions.
Next Steps
You have successfully created various KPI visualizations using Python. This means you are now proficient in generating professional static charts. The next exciting phase will be to visualize page level metrics and drop heatmaps. This will allow you to create even more detailed and actionable visualizations focusing on individual page performance and user drop off points.
For now make sure you save this Python script in your E drive SankalanAnalytics dashboard folder. Name it: visualize_kpis.py
.
To run your visualization script:
1. Open your Command Prompt (CMD).
2. Navigate to your E:\SankalanAnalytics\dashboard\
folder:
cd E:\SankalanAnalytics\dashboard\
3. Run the Python script:
python visualize_kpis.py
4. After execution, you will find the generated .png
image files in the same dashboard/
folder.