Website Analytics |Session duration and Bounce Rate

Phase 3.2 – Create Custom Metrics for Session Duration and Bounce Rate

Welcome to Phase 3.2 – Transforming Your Data into Insights
You have successfully cleaned and loaded your raw website data into your SQL Server database. That was a big step. Now it is time to make that data truly useful. In this phase, we will use Python and Pandas again to define and calculate important custom metrics like Sessions and Bounce Rate. These metrics are not always simple to get from raw event data, but understanding them is key to knowing how people use your website and finding ways to improve it.

Think of this step as turning raw ingredients into a delicious meal. We are taking the basic data and cooking it into something that gives you real, clear answers about your website’s performance.


Why Custom Metrics are Valuable

Deeper Understanding Standard reports might show you page views, but custom metrics reveal more — like how long visitors stay or if they leave right away.

Tailored to Your Needs Every website is unique. Creating your own metrics lets you measure what truly matters for your business goals.

Better Decision Making With precise numbers for key metrics like bounce rate, you can make smarter changes to improve your site’s design, content, and user experience.

Foundation for Advanced Analysis These metrics act as building blocks for deeper analysis in future phases, including trend studies and even machine learning models.

Making Data Work Harder By creating your own metrics, you turn simple event data into powerful insights that guide real improvements.


Website analytics showing charts for session duration and bounce rate

Defining Key Web Analytics Metrics

Before we calculate anything it is important to clearly understand what each metric means in the context of our web analytics project.

1 Session
A session represents a group of user interactions with your website that happen within a certain time period. In GA4 a session usually ends after 30 minutes of inactivity. It also ends at midnight or if a user arrives via a new campaign source.
For our purposes we will rely on the session_id and session_number provided by GA4 in our events table.
A session helps us understand how many times users visit our site and how engaged they are during each visit.

2 Bounce Rate
Bounce rate is the percentage of sessions that are single page sessions. This means a user visited only one page on your website and then left without interacting further.
A high bounce rate on a specific page might mean the content is not relevant or the page is hard to use.
We will calculate this by identifying sessions that only have one event typically a page view event.

3 Time on Page and Session Duration
These metrics tell you how long users spend on a specific page or during an entire session. It can be more complex to calculate these from raw event data without specific end event timestamps but they are very valuable for understanding engagement.
We will explore ways to estimate these or use aggregated data in later phases.

Python Code to Calculate Custom Metrics
We will create a Python script to fetch data from your SQL Server database and then calculate these custom metrics using Pandas.


Practical Python Code Example
Here is a basic example of the Python code you will write. This code will connect to your SQL Server database fetch the necessary event data and then calculate sessions and bounce rate.

# SCRIPT SETUP 1: Import necessary libraries import pandas as pd import pyodbc # SCRIPT SETUP 2: Database connection details for SQL Server # Replace with your actual server name database name and credentials DB_CONFIG = { 'driver': '{ODBC Driver 17 for SQL Server}', 'server': 'YOUR_SQL_SERVER_NAME', # e.g., 'localhost' or 'SERVER_NAME\SQLEXPRESS' 'database': 'YOUR_DATABASE_NAME', 'uid': 'YOUR_USERNAME', 'pwd': 'YOUR_PASSWORD' } # FUNCTION 1: Connect to the database def connect_to_db(): """Establishes a connection to the SQL Server database.""" conn_str = ( f"DRIVER={DB_CONFIG['driver']};" f"SERVER={DB_CONFIG['server']};" f"DATABASE={DB_CONFIG['database']};" f"UID={DB_CONFIG['uid']};" f"PWD={DB_CONFIG['pwd']};" ) try: conn = pyodbc.connect(conn_str) print("Successfully connected to SQL Server.") return conn except pyodbc.Error as ex: sqlstate = ex.args[0] print(f"Database connection error: {sqlstate}") print(ex) return None # FUNCTION 2: Fetch data from events table def fetch_event_data(conn): """Fetches all necessary event data from the SQL Server events table.""" query = """ SELECT user_pseudo_id, session_id, session_number, event_timestamp, event_name, page_location FROM events ORDER BY user_pseudo_id, session_id, event_timestamp; """ try: df = pd.read_sql(query, conn) print(f"Fetched {len(df)} event records from database.") return df except Exception as e: print(f"Error fetching event data: {e}") return None # FUNCTION 3: Calculate Custom Metrics def calculate_custom_metrics(df_events): """Calculates sessions and bounce rate from event data.""" if df_events is None or df_events.empty: print("No event data to calculate metrics.") return None, None # STEP 3.1: Identify unique sessions # A session is uniquely identified by user_pseudo_id and session_id # We can also use session_number for a user's specific session sequence sessions_df = df_events.drop_duplicates(subset=['user_pseudo_id', 'session_id']) total_sessions = len(sessions_df) print(f"Total unique sessions identified: {total_sessions}") # STEP 3.2: Calculate events per session events_per_session = df_events.groupby(['user_pseudo_id', 'session_id']).size().reset_index(name='event_count') # STEP 3.3: Identify single page sessions (bounces) # A bounce is a session with only one event. # We assume 'page_view' is the primary event for a session to start. single_event_sessions = events_per_session[events_per_session['event_count'] == 1] # Further refine bounce: check if the single event is a page_view # This requires merging back to original df_events or checking event_name during grouping # For simplicity here, we'll assume a single event session implies a bounce. # A more robust bounce calculation would verify the event_name is 'page_view' # To be more precise for bounce, let's count sessions where the first event is also the only event # This requires getting the first event for each session and then checking if event_count is 1 # Let's simplify for this example to sessions with only 1 event, as per common definition bounced_sessions = len(single_event_sessions) print(f"Total bounced sessions identified: {bounced_sessions}") # STEP 3.4: Calculate Bounce Rate bounce_rate = (bounced_sessions / total_sessions) * 100 if total_sessions > 0 else 0 print(f"Calculated Bounce Rate: {bounce_rate:.2f}%") # You can also calculate total page views total_page_views = df_events[df_events['event_name'] == 'page_view'].shape[0] print(f"Total Page Views: {total_page_views}") # For average session duration or time on page, you would need more complex logic # involving event_timestamp differences within sessions or page views. # This is often done in a separate feature engineering step or directly in SQL. return total_sessions, bounce_rate, total_page_views # MAIN EXECUTION 1: This block runs when the script starts if __name__ == "__main__": # MAIN EXECUTION 2: Connect to the database conn = connect_to_db() if conn: # MAIN EXECUTION 3: Fetch event data events_df = fetch_event_data(conn) # MAIN EXECUTION 4: Calculate custom metrics if events_df is not None: total_sessions_val, bounce_rate_val, total_page_views_val = calculate_custom_metrics(events_df) if total_sessions_val is not None: print("\n--- Summary of Key Metrics ---") print(f"Total Sessions: {total_sessions_val}") print(f"Bounce Rate: {bounce_rate_val:.2f}%") print(f"Total Page Views: {total_page_views_val}") # MAIN EXECUTION 5: Close the database connection conn.close() print("Database connection closed.") else: print("Could not establish database connection. Exiting.")

Important Notes on This Code

This script works only if your events table already has data from your GA4 CSV export.

The bounce rate calculation here is a common simple approach. It counts sessions with only one event. A stricter method would check if that single event was a page view.

Calculating session duration or time on page is more complex. It needs careful handling of timestamps and user inactivity. This is often done later during feature engineering or directly in SQL queries.

Make sure you replace the placeholders in the DB CONFIG section with your real SQL Server connection details. This includes the server name, database name, username, and password.


Understanding Your Python Custom Metrics Script

This Python script changes your raw event data into useful web analytics metrics. Here is what each part of the code does.

1. Setting Up Your Tools and Connections
At the start of the script you see import lines.
• import pandas as pd lets you work with data in table format
• import pyodbc lets Python connect to your SQL Server database
Next is DB_CONFIG. This has details for connecting to SQL Server. Replace the placeholders with your server name database name username and password.

2. Connecting to Your Database
The connect_to_db function makes the connection.
• It builds a connection string from DB_CONFIG and tries to connect
• It prints messages if the connection works or if there is an error

3. Fetching Data from the Events Table
The fetch_event_data function pulls event data.
• It runs a SQL query to get columns from the events table sorted by user and session
• It uses Pandas to store results in a DataFrame
• It prints how many records were fetched or an error if it fails

4. Calculating Custom Metrics
The calculate_custom_metrics function is where the main work happens.
• Step 1 find each unique session by user and session id
• Step 2 count events in each session
• Step 3 find sessions with only one event which are bounces
• Step 4 get bounce rate by dividing bounced sessions by total sessions then multiply by 100
• It prints total sessions total bounces and bounce rate

5. Running the Script
The main block runs everything in order
• Connect to the database
• Fetch event data
• Calculate custom metrics
• Print results
• Close the database connection

Overall Value
This script turns raw GA4 event data into metrics like sessions and bounce rate. It helps you see how users interact with your site. It also builds skill in data transformation which is important in analytics work.

Next Steps
When you run this script you will see sessions and bounce rate in the console. This means your custom metrics are ready. The next phase will be Exploratory Data Analysis where you use these metrics to find patterns. Save this script in your E drive SankalanAnalytics backend folder as calculate_metrics.py


Previous Topic==>  Data Cleaning! ||  Next Topics==> Analyze Web Traffic


SQL All topics Wise Interview Questions    Employee Salary Management SQL FAQ!.   C FAQ   Top 25 PL/SQL Interview Questions

Joins With Group by Having  Equi Join  Joins with Subqueries  Self Join  Outer Join 


Website Analytics Project: Phases and Action Steps

  • Home
  • 🟢 Live App: Web Analytics Simulator
  • Phase 0: Project Setup & Problem Definition
  • 0.1 Define Project Goals & Challenges
  • 0.2 Select Tools, Tech Stack & Data Sources
  • 0.3 Software Requirements & Installation
  • 0.4 Folder Structure & GitHub Repo
  • 0.5 Testing Project Locally
  • Phase 1: Planning for Analytics
  • 1.1 Website Analytics Project Overview
  • 1.2 Define KPIs, Bounce Rate, Engagement
  • 1.3 Identify Target Users & Pain Points
  • Phase 2: Data Collection
  • 2.1 Setup Google Analytics 4 (GA4)
  • 2.2 Export GA4 Data to BigQuery/CSV
  • 2.3 Design SQL Schema for Web Analytics
  • Phase 3: Data Cleaning & Feature Engineering
  • 3.1 Clean Website Data with Python & Pandas
  • 3.2 Create Custom Metrics (Session, Bounce, etc.)
  • Phase 4: Exploratory Data Analysis (EDA)
  • 4.1 Analyze Website Traffic Trends
  • 4.2 Behavior by Device, Source, Location
  • 4.3 Top Pages & High Bounce Pages
  • 4.4 Diagnose Low Traffic & User Drop
  • Phase 5: Business Insights
  • 5.1 Funnel Analysis & Drop-Off Points
  • 5.2 New vs Returning Users
  • 5.3 Time Spent & Scroll Depth
  • Phase 6: SQL for Business
  • 6.1 SQL for Business Insights
  • 6.2 Combine Web Data Using SQL
  • 6.3 Find Problematic Pages Using SQL
  • Phase 7: Machine Learning
  • 7.1 Segment Users with Clustering
  • 7.2 Predict Bounce Rate with ML
  • 7.3 Recommend Pages or Content
  • Phase 8: Dashboards & Visualization
  • 8.1 Dashboard with Streamlit
  • 8.2 Visualize KPIs with Python
  • 8.3 Page-Level Metrics & Drop Heatmaps
  • Phase 9: Final Analytics Story
  • 9.1 Summary Report & Findings
  • Phase 10: Hosting & Portfolio Presentation
  • 10.1 Host Website Project Online
  • 10.2 Add to GitHub with ReadMe
  • 10.3 Pitch Project in Interview
  • Other Topics
  • SQL Interview Questions
  • SQL Case Study: Account Management
  • Python Interview Questions
  • Why C Language

Get in touch

  • tech2dsm@gmail.com

© Sankalan Data Tech. All rights reserved.