Phase 3.2 – Create Custom Metrics for Session Duration and Bounce Rate
Welcome to Phase 3.2 – Transforming Your Data into Insights
You have successfully cleaned and loaded your raw website data into your SQL Server database. That was a big step. Now it is time to make that data truly useful. In this phase, we will use Python and Pandas again to define and calculate important custom metrics like Sessions and Bounce Rate. These metrics are not always simple to get from raw event data, but understanding them is key to knowing how people use your website and finding ways to improve it.
Think of this step as turning raw ingredients into a delicious meal. We are taking the basic data and cooking it into something that gives you real, clear answers about your website’s performance.
Why Custom Metrics are Valuable
Deeper Understanding Standard reports might show you page views, but custom metrics reveal more — like how long visitors stay or if they leave right away.
Tailored to Your Needs Every website is unique. Creating your own metrics lets you measure what truly matters for your business goals.
Better Decision Making With precise numbers for key metrics like bounce rate, you can make smarter changes to improve your site’s design, content, and user experience.
Foundation for Advanced Analysis These metrics act as building blocks for deeper analysis in future phases, including trend studies and even machine learning models.
Making Data Work Harder By creating your own metrics, you turn simple event data into powerful insights that guide real improvements.
Defining Key Web Analytics Metrics
Before we calculate anything it is important to clearly understand what each metric means in the context of our web analytics project.
1 Session
A session represents a group of user interactions with your website that happen within a certain time period. In GA4 a session usually ends after 30 minutes of inactivity. It also ends at midnight or if a user arrives via a new campaign source.
For our purposes we will rely on the session_id and session_number provided by GA4 in our events table.
A session helps us understand how many times users visit our site and how engaged they are during each visit.
2 Bounce Rate
Bounce rate is the percentage of sessions that are single page sessions. This means a user visited only one page on your website and then left without interacting further.
A high bounce rate on a specific page might mean the content is not relevant or the page is hard to use.
We will calculate this by identifying sessions that only have one event typically a page view event.
3 Time on Page and Session Duration
These metrics tell you how long users spend on a specific page or during an entire session. It can be more complex to calculate these from raw event data without specific end event timestamps but they are very valuable for understanding engagement.
We will explore ways to estimate these or use aggregated data in later phases.
Python Code to Calculate Custom Metrics
We will create a Python script to fetch data from your SQL Server database and then calculate these custom metrics using Pandas.
Practical Python Code Example
Here is a basic example of the Python code you will write. This code will connect to your SQL Server database fetch the necessary event data and then calculate sessions and bounce rate.
Important Notes on This Code
This script works only if your events table already has data from your GA4 CSV export.
The bounce rate calculation here is a common simple approach. It counts sessions with only one event. A stricter method would check if that single event was a page view.
Calculating session duration or time on page is more complex. It needs careful handling of timestamps and user inactivity. This is often done later during feature engineering or directly in SQL queries.
Make sure you replace the placeholders in the DB CONFIG section with your real SQL Server connection details. This includes the server name, database name, username, and password.
Understanding Your Python Custom Metrics Script
This Python script changes your raw event data into useful web analytics metrics. Here is what each part of the code does.
1. Setting Up Your Tools and Connections
At the start of the script you see import lines.
• import pandas as pd lets you work with data in table format
• import pyodbc lets Python connect to your SQL Server database
Next is DB_CONFIG. This has details for connecting to SQL Server. Replace the placeholders with your server name database name username and password.
2. Connecting to Your Database
The connect_to_db function makes the connection.
• It builds a connection string from DB_CONFIG and tries to connect
• It prints messages if the connection works or if there is an error
3. Fetching Data from the Events Table
The fetch_event_data function pulls event data.
• It runs a SQL query to get columns from the events table sorted by user and session
• It uses Pandas to store results in a DataFrame
• It prints how many records were fetched or an error if it fails
4. Calculating Custom Metrics
The calculate_custom_metrics function is where the main work happens.
• Step 1 find each unique session by user and session id
• Step 2 count events in each session
• Step 3 find sessions with only one event which are bounces
• Step 4 get bounce rate by dividing bounced sessions by total sessions then multiply by 100
• It prints total sessions total bounces and bounce rate
5. Running the Script
The main block runs everything in order
• Connect to the database
• Fetch event data
• Calculate custom metrics
• Print results
• Close the database connection
Overall Value
This script turns raw GA4 event data into metrics like sessions and bounce rate. It helps you see how users interact with your site. It also builds skill in data transformation which is important in analytics work.
Next Steps
When you run this script you will see sessions and bounce rate in the console. This means your custom metrics are ready. The next phase will be Exploratory Data Analysis where you use these metrics to find patterns. Save this script in your E drive SankalanAnalytics backend folder as calculate_metrics.py