Find Problematic Pages | Website Analytics

6.3 Find Problematic Pages Using SQL for Better Website Performance

Welcome to Phase 6.3 Pinpointing Areas for Improvement with SQL

You have successfully learned to write basic SQL queries and combine data using SQL JOINs. That gives you a strong foundation for interacting with your web analytics database. Now it is time to apply these SQL skills to a critical task. In this phase we will learn how to use SQL to directly identify problematic pages on your website. These are pages that might have high bounce rates or very low user engagement. Think of this as using SQL to shine a spotlight on areas that need your immediate attention. It helps you prioritize your optimization efforts directly from your database.
This step is crucial for making data driven decisions to improve your website's performance.


Why Find Problematic Pages with SQL

Identifying underperforming pages is essential for website optimization. Here is why using SQL for this task is so valuable.

• Direct Database Analysis Get insights directly from your raw data without needing intermediate tools. This is fast and efficient.
• Custom Definitions Define what a problematic page means to you. You can set specific thresholds for bounce rate or time on page using SQL.
• Quick Prioritization Easily sort and filter pages to find the worst offenders. This helps you decide where to focus your efforts first.
• Actionable Insights Once identified you can investigate these pages. You can then improve their content design or technical performance.
• Foundation for Automation The SQL queries you write here can be automated later. They can power alerts or reports on problematic pages.

Using SQL to find problematic pages empowers you to be a proactive website optimizer.


Using SQL to identify website pages with high bounce rates and low engagement

Key Metrics for Identifying Problematic Pages

We will focus on these core metrics to pinpoint underperforming pages.

Bounce Rate: The percentage of sessions that started on a page and were single page sessions. High bounce rate often indicates a problem.

Average Time on Page:The average duration users spend on a specific page. Low time on page suggests lack of engagement.


Practical SQL Code Examples for Problematic Pages

Here are some SQL queries you can run directly in your SQL Server Management Studio SSMS or any SQL client connected to your database. These queries will help you identify pages that need attention.

1. Pages with the Highest Bounce Rate
This query identifies pages where a high percentage of visitors leave after viewing only that one page. This often signals that the page is not meeting user expectations or has issues.

-- Query 1: Pages with the Highest Bounce Rate WITH PageSessionDetails AS ( SELECT session_id, page_location, COUNT(*) AS events_in_session_on_page, MIN(event_timestamp) OVER (PARTITION BY session_id) AS session_start_timestamp, MAX(event_timestamp) OVER (PARTITION BY session_id) AS session_end_timestamp, COUNT(*) OVER (PARTITION BY session_id) AS total_events_in_session FROM events WHERE event_name = 'page_view' AND page_location IS NOT NULL AND page_location != '' GROUP BY session_id, page_location, event_timestamp -- Group by event_timestamp to keep individual page_view events ), BouncedPageSessions AS ( SELECT page_location, COUNT(DISTINCT session_id) AS total_sessions_on_page, SUM(CASE WHEN total_events_in_session = 1 THEN 1 ELSE 0 END) AS bounced_sessions_on_page FROM PageSessionDetails GROUP BY page_location ) SELECT TOP 20 -- Show top 20 pages b.page_location, b.total_sessions_on_page, b.bounced_sessions_on_page, CAST(b.bounced_sessions_on_page * 100.0 / NULLIF(b.total_sessions_on_page, 0) AS DECIMAL(5,2)) AS bounce_rate_percent FROM BouncedPageSessions b WHERE b.total_sessions_on_page >= 10 -- Only consider pages with at least 10 sessions to make bounce rate meaningful ORDER BY bounce_rate_percent DESC;

Explanation

• 'WITH PageSessionDetails AS (...)': This Common Table Expression (CTE) calculates details for each 'page_view' event within its session including the total number of events in that session.
• 'WITH BouncedPageSessions AS (...)': This second CTE aggregates the data by 'page_location'. It counts the total sessions that included a page view on that page ('total_sessions_on_page') and identifies how many of those were "bounced sessions" (sessions with only one event which was a page view).
• 'SELECT ...': The final 'SELECT' statement calculates the 'bounce_rate_percent' for each page.
• 'NULLIF(b.total_sessions_on_page 0)': This prevents division by zero errors if a page has 0 total sessions.
• 'WHERE b.total_sessions_on_page >= 10': This filter ensures we only look at pages with a reasonable amount of traffic preventing misleadingly high bounce rates from pages with very few visits.
• 'ORDER BY bounce_rate_percent DESC': Sorts the results to show pages with the highest bounce rates first.


2. Pages with the Lowest Average Time on Page
This query identifies pages where users spend very little time. This can indicate that the content is not engaging, not relevant or that users are struggling to find what they need.

-- Query 2: Pages with the Lowest Average Time on Page WITH PageVisitDurations AS ( SELECT session_id, page_location, MIN(event_timestamp) AS first_event_on_page_timestamp, MAX(event_timestamp) AS last_event_on_page_timestamp FROM events WHERE event_name = 'page_view' AND page_location IS NOT NULL AND page_location != '' GROUP BY session_id, page_location ) SELECT TOP 20 -- Show top 20 pages with lowest time spent pvd.page_location, COUNT(pvd.session_id) AS total_page_visits, AVG(CAST((pvd.last_event_on_page_timestamp - pvd.first_event_on_page_timestamp) AS DECIMAL(18,2)) / 1000000.0) AS avg_time_on_page_seconds FROM PageVisitDurations pvd WHERE (pvd.last_event_on_page_timestamp - pvd.first_event_on_page_timestamp) > 0 -- Exclude sessions with 0 duration (single event) AND COUNT(pvd.session_id) OVER (PARTITION BY pvd.page_location) >= 10 -- Only consider pages with at least 10 visits GROUP BY pvd.page_location ORDER BY avg_time_on_page_seconds ASC;

Explanation:
• 'WITH PageVisitDurations AS (...)': This CTE calculates the minimum and maximum 'event_timestamp' for each unique 'session_id' and 'page_location' combination where a 'page_view' event occurred. This gives an estimate of the time spent on that specific page within that session.
• 'AVG(CAST((pvd.last_event_on_page_timestamp - pvd.first_event_on_page_timestamp) AS DECIMAL(18 2)) / 1000000.0)': Calculates the duration in microseconds converts it to seconds and then averages it for all visits to that page.
• 'WHERE (pvd.last_event_on_page_timestamp - pvd.first_event_on_page_timestamp) > 0': This filters out sessions where only one event was recorded on the page as these would result in zero time spent and skew the average.
• 'AND COUNT(pvd.session_id) OVER (PARTITION BY pvd.page_location) >= 10': This window function ensures that only pages with a minimum number of visits (e.g. 10) are included in the analysis making the average more reliable.
• 'ORDER BY avg_time_on_page_seconds ASC': Sorts the results to show pages with the lowest average time spent first.


Important Notes on SQL Queries for Problematic Pages:

These queries are specifically designed for SQL Server. Remember that the 'event_timestamp' is assumed to be in Unix microseconds. Adjust the division by 1000000.0 if your timestamps are in milliseconds or seconds.
The definition of "problematic" can be customized by changing the thresholds in the 'WHERE' clauses (e.g. 'total_sessions_on_page >= 10').
Always ensure your database has appropriate indexes on columns like 'user_pseudo_id' 'session_id' 'event_timestamp' 'event_name' and 'page_location' for optimal query performance. This is especially important when working with large datasets.


Overall Value of Finding Problematic Pages with SQL
Using SQL to identify problematic pages is a direct and powerful way to drive website improvements. It allows you to quickly pinpoint areas of your site that are underperforming in terms of user engagement. By focusing your efforts on these pages you can make targeted changes that lead to a better user experience and ultimately contribute to your website's overall success. This demonstrates a practical application of your SQL skills for real world business optimization. This is a vital skill in web analytics and performance marketing.


Next Steps:

You have successfully learned to use SQL to find problematic pages on your website. This means you are now proficient in diagnosing content and engagement issues directly from your database. You have completed the SQL for Business phase. The next exciting phase will be to delve into Machine Learning. We will start by learning how to segment users with clustering. This will allow you to group similar users together for more targeted analysis and marketing.
For now make sure you save any SQL queries you write in your E drive SankalanAnalytics database folder. You can save them as '.sql' files for future reference.


Previous Topic==> Combine Web Data Using SQL  Next Topics==> Segment User | Clustering


SQL All topics Wise Interview Questions    Employee Salary Management SQL FAQ!.   C FAQ   Top 25 PL/SQL Interview Questions

Joins With Group by Having  Equi Join  Joins with Subqueries  Self Join  Outer Join 


Website Analytics Project: Phases and Action Steps

  • Home
  • 🟢 Live App: Web Analytics Simulator
  • Phase 0: Project Setup & Problem Definition
  • 0.1 Define Project Goals & Challenges
  • 0.2 Select Tools, Tech Stack & Data Sources
  • 0.3 Software Requirements & Installation
  • 0.4 Folder Structure & GitHub Repo
  • 0.5 Testing Project Locally
  • Phase 1: Planning for Analytics
  • 1.1 Website Analytics Project Overview
  • 1.2 Define KPIs, Bounce Rate, Engagement
  • 1.3 Identify Target Users & Pain Points
  • Phase 2: Data Collection
  • 2.1 Setup Google Analytics 4 (GA4)
  • 2.2 Export GA4 Data to BigQuery/CSV
  • 2.3 Design SQL Schema for Web Analytics
  • Phase 3: Data Cleaning & Feature Engineering
  • 3.1 Clean Website Data with Python & Pandas
  • 3.2 Create Custom Metrics (Session, Bounce, etc.)
  • Phase 4: Exploratory Data Analysis (EDA)
  • 4.1 Analyze Website Traffic Trends
  • 4.2 Behavior by Device, Source, Location
  • 4.3 Top Pages & High Bounce Pages
  • 4.4 Diagnose Low Traffic & User Drop
  • Phase 5: Business Insights
  • 5.1 Funnel Analysis & Drop-Off Points
  • 5.2 New vs Returning Users
  • 5.3 Time Spent & Scroll Depth
  • Phase 6: SQL for Business
  • 6.1 SQL for Business Insights
  • 6.2 Combine Web Data Using SQL
  • 6.3 Find Problematic Pages Using SQL
  • Phase 7: Machine Learning
  • 7.1 Segment Users with Clustering
  • 7.2 Predict Bounce Rate with ML
  • 7.3 Recommend Pages or Content
  • Phase 8: Dashboards & Visualization
  • 8.1 Dashboard with Streamlit
  • 8.2 Visualize KPIs with Python
  • 8.3 Page-Level Metrics & Drop Heatmaps
  • Phase 9: Final Analytics Story
  • 9.1 Summary Report & Findings
  • Phase 10: Hosting & Portfolio Presentation
  • 10.1 Host Website Project Online
  • 10.2 Add to GitHub with ReadMe
  • 10.3 Pitch Project in Interview
  • Other Topics
  • SQL Interview Questions
  • SQL Case Study: Account Management
  • Python Interview Questions
  • Why C Language

Get in touch

  • tech2dsm@gmail.com

© Sankalan Data Tech. All rights reserved.