Top Pages and High Bounce Rate| Website Analytics

4.3 Analyze Top Pages and High Bounce Pages for Better Website Performance

Welcome to Phase 4.3 Deep Dive into Page Performance
You have successfully analyzed your overall website traffic trends which gave you a good big picture view Now it is time to zoom in In this phase we will focus on individual page performance We will identify your websites most popular pages We will also find pages where visitors tend to leave quickly known as high bounce pages Understanding these details is like finding the stars and the trouble spots on your website map It helps you improve your content and user experience directly

This step is crucial for making targeted improvements that can boost your websites success


Why Analyze Top Pages and High Bounce Pages

Looking at individual page performance provides actionable insights Here is why it is so valuable

Optimize Popular Content Knowing your top pages helps you understand what content your audience loves You can then create more of it or improve these pages even further

Fix Problematic Pages High bounce pages signal a problem Maybe the content is not what users expected Maybe the page loads slowly Or maybe the design is confusing Identifying these pages helps you fix them

Improve User Journey By understanding where users go and where they leave you can make their path through your website smoother This can lead to more conversions or engagement

Content Strategy Insights This analysis informs your content strategy You learn what topics resonate and what areas need more attention or better presentation

Resource Allocation Focus your efforts where they matter most Spend time improving pages that are underperforming or enhancing those that are already doing well

This analysis moves you from general trends to specific actions It helps you make your website better page by page


Graph showing top website pages and bounce rates to improve content and user experience

Key Metrics for Segmented User Behavior Analysis

To understand how different user groups behave, we focus on these main metrics.

Total Users The count of unique visitors in each group.

Sessions The total number of visits from each group.

Page Views The total number of pages viewed by each group.

Bounce Rate The percentage of visits where users viewed only one page in each group.


Key Metrics for Page Performance Analysis

We will focus on these core metrics to understand how individual pages are performing.

• Page Views: The total number of times a specific page was viewed.
• Unique Page Views: The number of unique users who viewed a specific page. This counts each user only once per page.
• Bounce Rate per Page: The percentage of sessions that started on this page and were single page sessions.

Python Code to Analyze Top Pages and High Bounce Pages

We will write a Python script to fetch your event data from SQL Server. It will then calculate page level metrics.
This will help us identify top pages and high bounce pages.
Practical Python Code Example
Here is a basic example of the Python code you will write. This code will connect to your SQL Server database. It will fetch event data. It will then calculate page views unique page views and bounce rate for each page.

# SCRIPT SETUP 1: Import necessary libraries import pandas as pd import pyodbc # SCRIPT SETUP 2: Database connection details for SQL Server # Replace with your actual server name database name and credentials DB_CONFIG = { 'driver': '{ODBC Driver 17 for SQL Server}', 'server': 'YOUR_SQL_SERVER_NAME', # e.g., 'localhost' or 'SERVER_NAME\SQLEXPRESS' 'database': 'YOUR_DATABASE_NAME', 'uid': 'YOUR_USERNAME', 'pwd': 'YOUR_PASSWORD' } # FUNCTION 1: Connect to the database def connect_to_db(): """Establishes a connection to the SQL Server database.""" conn_str = ( f"DRIVER={DB_CONFIG['driver']};" f"SERVER={DB_CONFIG['server']};" f"DATABASE={DB_CONFIG['database']};" f"UID={DB_CONFIG['uid']};" f"PWD={DB_CONFIG['pwd']};" ) try: conn = pyodbc.connect(conn_str) print("Successfully connected to SQL Server.") return conn except pyodbc.Error as ex: sqlstate = ex.args[0] print(f"Database connection error: {sqlstate}") print(ex) return None # FUNCTION 2: Fetch event data def fetch_event_data(conn): """Fetches necessary event data from the SQL Server events table.""" query = """ SELECT user_pseudo_id, session_id, event_name, page_location FROM events WHERE page_location IS NOT NULL AND page_location != ''; """ try: df = pd.read_sql(query, conn) print(f"Fetched {len(df)} event records for page analysis.") return df except Exception as e: print(f"Error fetching event data: {e}") return None # FUNCTION 3: Analyze Page Performance def analyze_page_performance(df_events): """Calculates page views, unique page views, and bounce rate per page.""" if df_events is None or df_events.empty: print("No event data to analyze page performance.") return None # STEP 3.1: Calculate Total Page Views per page_location # Filter for 'page_view' events as these are direct page loads df_page_views = df_events[df_events['event_name'] == 'page_view'].copy() page_views_count = df_page_views.groupby('page_location').size().reset_index(name='page_views') # STEP 3.2: Calculate Unique Page Views per page_location # Count unique users who viewed each page unique_page_views_count = df_page_views.groupby('page_location')['user_pseudo_id'].nunique().reset_index(name='unique_page_views') # STEP 3.3: Calculate Total Sessions per page_location (for bounce rate) # Count sessions that started on each page (first event of session is a page_view on that page) # This requires a more complex join or pre-calculation if 'session_start' event is not explicit # For simplicity, let's consider any session that has a page_view on that page # First, get all unique sessions and their first event's page_location # This is a simplified approach. A more robust way would be to identify the true landing page for each session. # For now, let's count total sessions that visited a page. total_sessions_per_page = df_page_views.groupby('page_location')['session_id'].nunique().reset_index(name='total_sessions_on_page') # STEP 3.4: Identify Bounced Sessions per page_location # A bounce is a session with only one event, and that event is a page_view on the page. # This requires identifying single-event sessions first. # Get event count per session events_per_session = df_events.groupby(['user_pseudo_id', 'session_id']).size().reset_index(name='event_count') # Merge event count back to df_page_views to identify single event sessions that were page views df_page_views_with_counts = pd.merge(df_page_views, events_per_session, on=['user_pseudo_id', 'session_id'], how='left') # Filter for sessions that had only one event AND that event was a page_view on that page bounced_sessions_per_page = df_page_views_with_counts[ (df_page_views_with_counts['event_count'] == 1) ].groupby('page_location')['session_id'].nunique().reset_index(name='bounced_sessions') # STEP 3.5: Merge all page metrics page_performance_df = page_views_count page_performance_df = pd.merge(page_performance_df, unique_page_views_count, on='page_location', how='outer') page_performance_df = pd.merge(page_performance_df, total_sessions_per_page, on='page_location', how='outer') page_performance_df = pd.merge(page_performance_df, bounced_sessions_per_page, on='page_location', how='outer') # Fill any missing values with 0 page_performance_df = page_performance_df.fillna(0) # STEP 3.6: Calculate Bounce Rate per Page page_performance_df['bounce_rate'] = (page_performance_df['bounced_sessions'] / page_performance_df['total_sessions_on_page']) * 100 page_performance_df['bounce_rate'] = page_performance_df['bounce_rate'].fillna(0).round(2) # Handle division by zero and round # Sort by page views to see top pages top_pages = page_performance_df.sort_values('page_views', ascending=False).head(10).reset_index(drop=True) print("\n--- Top 10 Pages by Page Views ---") print(top_pages[['page_location', 'page_views', 'unique_page_views', 'bounce_rate']]) # Sort by bounce rate to see high bounce pages (filter for pages with significant traffic) # Define a minimum session threshold to avoid pages with very few visits distorting bounce rate MIN_SESSIONS_THRESHOLD = 5 # Example threshold high_bounce_pages = page_performance_df[page_performance_df['total_sessions_on_page'] >= MIN_SESSIONS_THRESHOLD] high_bounce_pages = high_bounce_pages.sort_values('bounce_rate', ascending=False).head(10).reset_index(drop=True) print("\n--- Top 10 High Bounce Pages (with at least 5 sessions) ---") print(high_bounce_pages[['page_location', 'page_views', 'total_sessions_on_page', 'bounce_rate']]) return page_performance_df # MAIN EXECUTION 1: This block runs when the script starts if __name__ == "__main__": # MAIN EXECUTION 2: Connect to the database conn = connect_to_db() if conn: # MAIN EXECUTION 3: Fetch event data events_df = fetch_event_data(conn) # MAIN EXECUTION 4: Analyze page performance if events_df is not None: page_metrics_df = analyze_page_performance(events_df) if page_metrics_df is not None: print("\n--- Page Performance Analysis Completed ---") # You can now use 'page_metrics_df' DataFrame for plotting or further analysis # For example, to save to a new CSV: # page_metrics_df.to_csv('E:/SankalanAnalytics/data/processed/page_performance_metrics.csv', index=False) # MAIN EXECUTION 5: Close the database connection conn.close() print("Database connection closed.") else: print("Could not establish database connection. Exiting.")

Important Notes on This Code

This script connects to your SQL Server database to pull event data. It then uses Pandas to group and summarize this data by device category, traffic source, and geographic location. The bounce rate calculation assumes a bounce is a session with only one page view event, which is a common definition. The script fills missing values with "unknown" to avoid errors during calculations. City-level analysis can be very detailed, so you may want to focus on top cities or group smaller ones depending on your data size. Remember to update your actual SQL Server details in the DB CONFIG section. This includes your server name, database name, username and password.


Understanding Your Python Page Performance Analysis Script

This Python script helps you understand which pages on your website are performing well and which ones might need improvement It pulls data from your SQL Server database and calculates key metrics for each page Let us break down each part of the code

1 Setting Up Your Tools and Connections

At the very top of the script you see some import lines These bring in the necessary tools for our work The first is import pandas as pd which brings in Pandas used to work with data in a table like format and perform calculations The next is import pyodbc which lets Python talk to your SQL Server database Next you see DB_CONFIG which holds all the details for connecting to your SQL Server You need to update this with your actual server name database name username and password

2 Connecting to Your Database

The connect_to_db function is responsible for making the connection to your database It tries to open a link to your SQL Server using the details from DB_CONFIG It builds a connection string which helps pyodbc find and log in to your database It then attempts the connection and prints messages to let you know if it was successful or if there was an error

3 Fetching Event Data

The fetch_event_data function gets the necessary event data from your database for page analysis It runs a SQL query to pull specific columns like user pseudo id session id event name and page location from your events table It filters out any rows where page location is empty It uses pd.read_sql to run the query and put the results into a Pandas DataFrame It prints how many event records were fetched and shows an error message if there is a problem

4 Analyze Page Performance

This is the core function where the page level metric calculations happen It takes the event data and transforms it into insights about individual pages It calculates page views unique page views and bounce rate for each page on your website Step 3.1 filters for only page view events then groups them by page location counting how many times each page was viewed Step 3.2 groups page view events by page location then counts how many different users viewed each page with each user counted once per page Step 3.3 counts the total number of unique sessions that included a page view on a specific page used as the base for bounce rate calculation Step 3.4 finds sessions where only one event occurred and that event was a page view on that page counting them as bounced sessions Step 3.5 merges all calculated metrics including page views unique page views total sessions and bounced sessions into one easy to use table filling any empty spots with zero Step 3.6 calculates bounce rate by dividing bounced sessions by total sessions multiplying by 100 and rounding It also handles cases with no sessions It prints the top 10 pages by page views and top 10 high bounce pages applying a minimum session threshold to ensure bounce rate is meaningful

5 Running the Script

The main block runs when you start the Python file It connects to the database fetches event data analyzes page performance and prints the results It finally closes the database connection which is a good practice to free up resources

Overall Value of This Script

This Python script is a powerful tool for optimizing your website By identifying your most popular content and pinpointing pages where users quickly leave you can make informed decisions to improve user engagement and achieve your website goals This showcases your ability to perform detailed content analysis which is a vital skill in web analytics


Next Steps

Once you run this Python script you will see insights into your top performing pages and those with high bounce rates This means you have successfully analyzed individual page performance The next exciting phase will be to diagnose low traffic and user drop off to understand why some pages are not getting enough visitors or why users are not staying engaged For now make sure you save this Python script in your E drive SankalanAnalytics backend folder and name it something like analyze_pages.py


Previous Topic==> User Bhehavior Analysis! ||  Next Topics==> Diagnose Low Traffic


SQL All topics Wise Interview Questions    Employee Salary Management SQL FAQ!.   C FAQ   Top 25 PL/SQL Interview Questions

Joins With Group by Having  Equi Join  Joins with Subqueries  Self Join  Outer Join 


Website Analytics Project: Phases and Action Steps

  • Home
  • 🟢 Live App: Web Analytics Simulator
  • Phase 0: Project Setup & Problem Definition
  • 0.1 Define Project Goals & Challenges
  • 0.2 Select Tools, Tech Stack & Data Sources
  • 0.3 Software Requirements & Installation
  • 0.4 Folder Structure & GitHub Repo
  • 0.5 Testing Project Locally
  • Phase 1: Planning for Analytics
  • 1.1 Website Analytics Project Overview
  • 1.2 Define KPIs, Bounce Rate, Engagement
  • 1.3 Identify Target Users & Pain Points
  • Phase 2: Data Collection
  • 2.1 Setup Google Analytics 4 (GA4)
  • 2.2 Export GA4 Data to BigQuery/CSV
  • 2.3 Design SQL Schema for Web Analytics
  • Phase 3: Data Cleaning & Feature Engineering
  • 3.1 Clean Website Data with Python & Pandas
  • 3.2 Create Custom Metrics (Session, Bounce, etc.)
  • Phase 4: Exploratory Data Analysis (EDA)
  • 4.1 Analyze Website Traffic Trends
  • 4.2 Behavior by Device, Source, Location
  • 4.3 Top Pages & High Bounce Pages
  • 4.4 Diagnose Low Traffic & User Drop
  • Phase 5: Business Insights
  • 5.1 Funnel Analysis & Drop-Off Points
  • 5.2 New vs Returning Users
  • 5.3 Time Spent & Scroll Depth
  • Phase 6: SQL for Business
  • 6.1 SQL for Business Insights
  • 6.2 Combine Web Data Using SQL
  • 6.3 Find Problematic Pages Using SQL
  • Phase 7: Machine Learning
  • 7.1 Segment Users with Clustering
  • 7.2 Predict Bounce Rate with ML
  • 7.3 Recommend Pages or Content
  • Phase 8: Dashboards & Visualization
  • 8.1 Dashboard with Streamlit
  • 8.2 Visualize KPIs with Python
  • 8.3 Page-Level Metrics & Drop Heatmaps
  • Phase 9: Final Analytics Story
  • 9.1 Summary Report & Findings
  • Phase 10: Hosting & Portfolio Presentation
  • 10.1 Host Website Project Online
  • 10.2 Add to GitHub with ReadMe
  • 10.3 Pitch Project in Interview
  • Other Topics
  • SQL Interview Questions
  • SQL Case Study: Account Management
  • Python Interview Questions
  • Why C Language

Get in touch

  • tech2dsm@gmail.com

© Sankalan Data Tech. All rights reserved.