SQL queries | Website Analytics

6.1 Unlock Business Insights with SQL: Practical Queries for Your Website Data

Welcome to Phase 6.1 Unlocking Insights with SQL

You have done a fantastic job collecting cleaning and analyzing your web analytics data using Python. You have uncovered traffic trends user behavior patterns funnel drop offs and content engagement. Now it is time to harness the power of SQL Structured Query Language. In this phase we will learn how to write powerful SQL queries directly against your SQL Server database. This will allow you to extract specific business insights quickly and efficiently. Think of SQL as the key to unlocking the full potential of your structured data. It is a fundamental skill for any data professional.
This step is crucial for direct data interaction and preparing data for advanced reporting.


Why SQL is Essential for Business Insights

While Python is great for complex transformations SQL excels at direct data retrieval and aggregation. Here is why it is so valuable.

Direct Data Access SQL allows you to query your database directly. This means you get real time insights without needing to load data into another tool first.

Powerful Aggregation SQL is built for summarizing data. You can easily count sum average and group your web events and user data in many ways.

Foundation for Reporting Most business intelligence tools like Power BI or Tableau connect directly to SQL databases. Knowing SQL helps you prepare data perfectly for these dashboards.

Understanding Data Structure Writing SQL queries forces you to understand how your data is organized in tables. This deepens your overall data literacy.

Industry Standard SQL is a universal language for databases. Mastering it makes you highly versatile in the data world.

SQL empowers you to ask precise questions of your data and get precise answers.


SQL queries on laptop screen showing business insights from web analytics data

Key Business Questions Answered by SQL

Let us explore some common business questions you can answer using SQL queries against your 'events' and 'users' tables.

Daily Website Performance How many total users sessions and page views did my website get each day?

Most Popular Content Which pages on my website are viewed the most?

Traffic by Device How many sessions come from mobile desktop or tablet devices?

Bounce Rate by Traffic Source What is the bounce rate for users coming from different marketing channels?

New vs Returning Users How many new users versus returning users visit my site?


Practical SQL Code Examples for Business Insights

Here are some SQL queries you can run directly in your SQL Server Management Studio SSMS or any SQL client connected to your database. These queries will help you answer the business questions mentioned above.

1. Total Users Sessions and Page Views per Day
This query helps you understand your website's daily performance at a glance.

-- Query 1: Daily Website Performance SELECT CAST(DATEADD(s, event_timestamp / 1000000, '1970-01-01') AS DATE) AS event_date, COUNT(DISTINCT user_pseudo_id) AS total_users, COUNT(DISTINCT session_id) AS total_sessions, COUNT(CASE WHEN event_name = 'page_view' THEN 1 END) AS total_page_views FROM events GROUP BY CAST(DATEADD(s, event_timestamp / 1000000, '1970-01-01') AS DATE) ORDER BY event_date;

Explanation

• CAST(DATEADD(s, event_timestamp / 1000000, '1970-01-01') AS DATE): This converts the GA4 Unix microsecond timestamp into a readable date. We divide by 1,000,000 to convert microseconds to seconds, then add those seconds to the Unix epoch start date of '1970-01-01'. Finally, we cast it to `DATE` to get just the date part.
• COUNT(DISTINCT user_pseudo_id): Counts the number of unique users each day.
• COUNT(DISTINCT session_id): Counts the number of unique sessions each day.
• COUNT(CASE WHEN event_name = 'page_view' THEN 1 END): Counts only events where `event_name` is 'page_view' to get total page views.
• GROUP BY event_date: Aggregates the results for each unique date.
• ORDER BY event_date: Sorts the results by date in ascending order.


2. Top 10 Most Viewed Pages
This query helps identify your most popular content or landing pages.

-- Query 2: Top 10 Most Viewed Pages SELECT TOP 10 page_location, COUNT(*) AS page_views_count, COUNT(DISTINCT user_pseudo_id) AS unique_users_count FROM events WHERE event_name = 'page_view' AND page_location IS NOT NULL AND page_location != '' GROUP BY page_location ORDER BY page_views_count DESC;

Explanation:
• 'SELECT TOP 10': Limits the results to the top 10 pages.
• 'page_location': The URL of the page.
• 'COUNT(*)': Counts all 'page_view' events for each 'page_location'.
• 'COUNT(DISTINCT user_pseudo_id)': Counts the unique users who viewed each page.
• 'WHERE event_name = 'page_view'': Filters for only page view events.
• 'AND page_location IS NOT NULL AND page_location != ''': Ensures we only count valid page URLs.
• 'GROUP BY page_location': Aggregates results for each unique page URL.
• 'ORDER BY page_views_count DESC': Sorts the results by the number of page views in descending order.


3. Sessions by Device Category
Understand how users access your website across different devices.

-- Query 3: Sessions by Device Category SELECT device_category, COUNT(DISTINCT session_id) AS total_sessions, COUNT(DISTINCT user_pseudo_id) AS total_users, CAST(COUNT(DISTINCT session_id) * 100.0 / SUM(COUNT(DISTINCT session_id)) OVER () AS DECIMAL(5,2)) AS sessions_percentage FROM events WHERE device_category IS NOT NULL AND device_category != '' GROUP BY device_category ORDER BY total_sessions DESC;

Explanation:
• 'device_category': The type of device used (e.g., 'mobile', 'desktop').
• 'COUNT(DISTINCT session_id)': Counts unique sessions for each device category.
• 'COUNT(DISTINCT user_pseudo_id)': Counts unique users for each device category.
• 'CAST(COUNT(DISTINCT session_id) * 100.0 / SUM(COUNT(DISTINCT session_id)) OVER () AS DECIMAL(5,2))': Calculates the percentage of sessions for each device category out of the total sessions. 'SUM(...) OVER ()' is a window function that gets the total sessions across all categories.
• 'WHERE device_category IS NOT NULL AND device_category != ''': Filters for valid device categories.
• 'GROUP BY device_category': Aggregates results for each device type.
• 'ORDER BY total_sessions DESC': Sorts by total sessions in descending order.


4. Bounce Rate by Traffic Source and Medium
Identify which traffic channels bring engaged users and which ones have higher bounce rates.

-- Query 4: Bounce Rate by Traffic Source and Medium WITH SessionEvents AS ( SELECT session_id, traffic_source, traffic_medium, COUNT(*) AS event_count, MAX(CASE WHEN event_name = 'page_view' THEN 1 ELSE 0 END) AS has_page_view FROM events WHERE traffic_source IS NOT NULL AND traffic_source != '' GROUP BY session_id, traffic_source, traffic_medium ) SELECT se.traffic_source, se.traffic_medium, COUNT(se.session_id) AS total_sessions, SUM(CASE WHEN se.event_count = 1 AND se.has_page_view = 1 THEN 1 ELSE 0 END) AS bounced_sessions, CAST(SUM(CASE WHEN se.event_count = 1 AND se.has_page_view = 1 THEN 1 ELSE 0 END) * 100.0 / COUNT(se.session_id) AS DECIMAL(5,2)) AS bounce_rate_percent FROM SessionEvents se GROUP BY se.traffic_source, se.traffic_medium ORDER BY total_sessions DESC;

Explanation:
• 'WITH SessionEvents AS (...)': This is a Common Table Expression CTE that first calculates the total 'event_count' for each'`session_id' and checks if the session had at least one 'page_view'.
• 'total_sessions': Counts all sessions for each source/medium.
• 'bounced_sessions': Counts sessions that had exactly one event ('event_count = 1') AND that event was a 'page_view'. This is our definition of a bounce.
• 'bounce_rate_percent': Calculates the bounce rate as a percentage.
• 'GROUP BY traffic_source, traffic_medium': Aggregates results for each unique source and medium combination.


5. New vs Returning Users (Sessions Count)
Understand the mix of new and returning visitors to your site.

-- Query 5: New vs Returning Users (Sessions Count) SELECT CASE WHEN e.event_timestamp = u.first_visit_timestamp THEN 'New User' ELSE 'Returning User' END AS user_type, COUNT(DISTINCT e.session_id) AS total_sessions FROM events e JOIN users u ON e.user_pseudo_id = u.user_pseudo_id GROUP BY CASE WHEN e.event_timestamp = u.first_visit_timestamp THEN 'New User' ELSE 'Returning User' END ORDER BY total_sessions DESC;

Explanation:
JOIN users u ON e.user_pseudo_id = u.user_pseudo_id: This part connects the events table with the users table using the common user_pseudo_id field to combine related user data with event data.

CASE WHEN e.event_timestamp = u.first_visit_timestamp THEN 'New User' ELSE 'Returning User' END: This logic classifies each session as either New User or Returning User. If the event's timestamp matches the user's first visit timestamp, it’s labeled as a New User session; otherwise, it's considered a Returning User session. Note that this is a simplified method—a more accurate classification might look at the session’s first event timestamp instead.

COUNT(DISTINCT e.session_id): This counts the number of unique sessions for each user type, helping measure how many sessions come from new versus returning users.

GROUP BY user_type: This groups the results by user type so you can see aggregated counts separately for New User and Returning User sessions.


Important Notes on SQL Queries:

These queries are written for SQL Server. Syntax might vary slightly for other database systems like PostgreSQL or MySQL. Always ensure your database has appropriate indexes on columns like 'user_pseudo_id' 'session_id' 'event_timestamp' and 'event_name' for optimal query performance. The 'event_timestamp' is assumed to be in Unix microseconds as exported from GA4. Adjust the division by 1000000 if your timestamps are in milliseconds or seconds. Understanding your data schema is crucial for writing effective SQL queries. Refer back to Phase 2.3 if needed.


Overall Value of Using SQL for Business Insights

Mastering SQL for web analytics data is a significant step in your journey. It allows you to bypass intermediate tools for many common analyses. You can directly query your structured data to answer critical business questions. This skill is highly valued in the data analytics industry. It gives you direct control over your data and insights. This prepares you for more advanced data manipulation and reporting tasks.


Next Steps
You have successfully learned to write SQL queries to extract key business insights from your web analytics data. This means you are now proficient in direct database interaction. The next exciting phase will be to learn how to use SQL Joins to combine website analytics data. This will enable you to link information across different tables like events and users to gain even richer insights.


Previous Topic==> Time spend on Website  Next Topics==> Combine Website Data


SQL All topics Wise Interview Questions    Employee Salary Management SQL FAQ!.   C FAQ   Top 25 PL/SQL Interview Questions

Joins With Group by Having  Equi Join  Joins with Subqueries  Self Join  Outer Join 


Website Analytics Project: Phases and Action Steps

  • Home
  • 🟢 Live App: Web Analytics Simulator
  • Phase 0: Project Setup & Problem Definition
  • 0.1 Define Project Goals & Challenges
  • 0.2 Select Tools, Tech Stack & Data Sources
  • 0.3 Software Requirements & Installation
  • 0.4 Folder Structure & GitHub Repo
  • 0.5 Testing Project Locally
  • Phase 1: Planning for Analytics
  • 1.1 Website Analytics Project Overview
  • 1.2 Define KPIs, Bounce Rate, Engagement
  • 1.3 Identify Target Users & Pain Points
  • Phase 2: Data Collection
  • 2.1 Setup Google Analytics 4 (GA4)
  • 2.2 Export GA4 Data to BigQuery/CSV
  • 2.3 Design SQL Schema for Web Analytics
  • Phase 3: Data Cleaning & Feature Engineering
  • 3.1 Clean Website Data with Python & Pandas
  • 3.2 Create Custom Metrics (Session, Bounce, etc.)
  • Phase 4: Exploratory Data Analysis (EDA)
  • 4.1 Analyze Website Traffic Trends
  • 4.2 Behavior by Device, Source, Location
  • 4.3 Top Pages & High Bounce Pages
  • 4.4 Diagnose Low Traffic & User Drop
  • Phase 5: Business Insights
  • 5.1 Funnel Analysis & Drop-Off Points
  • 5.2 New vs Returning Users
  • 5.3 Time Spent & Scroll Depth
  • Phase 6: SQL for Business
  • 6.1 SQL for Business Insights
  • 6.2 Combine Web Data Using SQL
  • 6.3 Find Problematic Pages Using SQL
  • Phase 7: Machine Learning
  • 7.1 Segment Users with Clustering
  • 7.2 Predict Bounce Rate with ML
  • 7.3 Recommend Pages or Content
  • Phase 8: Dashboards & Visualization
  • 8.1 Dashboard with Streamlit
  • 8.2 Visualize KPIs with Python
  • 8.3 Page-Level Metrics & Drop Heatmaps
  • Phase 9: Final Analytics Story
  • 9.1 Summary Report & Findings
  • Phase 10: Hosting & Portfolio Presentation
  • 10.1 Host Website Project Online
  • 10.2 Add to GitHub with ReadMe
  • 10.3 Pitch Project in Interview
  • Other Topics
  • SQL Interview Questions
  • SQL Case Study: Account Management
  • Python Interview Questions
  • Why C Language

Get in touch

  • tech2dsm@gmail.com

© Sankalan Data Tech. All rights reserved.