6.1 Unlock Business Insights with SQL: Practical Queries for Your Website Data
Welcome to Phase 6.1 Unlocking Insights with SQL
You have done a fantastic job collecting cleaning and analyzing your web analytics data using Python. You have uncovered traffic trends user behavior patterns funnel drop offs and content engagement. Now it is time to harness the power of SQL Structured Query Language. In this phase we will learn how to write powerful SQL queries directly against your SQL Server database. This will allow you to extract specific business insights quickly and efficiently. Think of SQL as the key to unlocking the full potential of your structured data. It is a fundamental skill for any data professional.
This step is crucial for direct data interaction and preparing data for advanced reporting.
Why SQL is Essential for Business Insights
While Python is great for complex transformations SQL excels at direct data retrieval and aggregation. Here is why it is so valuable.
Direct Data Access SQL allows you to query your database directly. This means you get real time insights without needing to load data into another tool first.
Powerful Aggregation SQL is built for summarizing data. You can easily count sum average and group your web events and user data in many ways.
Foundation for Reporting Most business intelligence tools like Power BI or Tableau connect directly to SQL databases. Knowing SQL helps you prepare data perfectly for these dashboards.
Understanding Data Structure Writing SQL queries forces you to understand how your data is organized in tables. This deepens your overall data literacy.
Industry Standard SQL is a universal language for databases. Mastering it makes you highly versatile in the data world.
SQL empowers you to ask precise questions of your data and get precise answers.
Key Business Questions Answered by SQL
Let us explore some common business questions you can answer using SQL queries against your 'events' and 'users' tables.
Daily Website Performance How many total users sessions and page views did my website get each day?
Most Popular Content Which pages on my website are viewed the most?
Traffic by Device How many sessions come from mobile desktop or tablet devices?
Bounce Rate by Traffic Source What is the bounce rate for users coming from different marketing channels?
New vs Returning Users How many new users versus returning users visit my site?
Practical SQL Code Examples for Business Insights
Here are some SQL queries you can run directly in your SQL Server Management Studio SSMS or any SQL client connected to your database. These queries will help you answer the business questions mentioned above.
1. Total Users Sessions and Page Views per Day
This query helps you understand your website's daily performance at a glance.
Explanation
• CAST(DATEADD(s, event_timestamp / 1000000, '1970-01-01') AS DATE): This converts the GA4 Unix microsecond timestamp into a readable date. We divide by 1,000,000 to convert microseconds to seconds, then add those seconds to the Unix epoch start date of '1970-01-01'. Finally, we cast it to `DATE` to get just the date part.
• COUNT(DISTINCT user_pseudo_id): Counts the number of unique users each day.
• COUNT(DISTINCT session_id): Counts the number of unique sessions each day.
• COUNT(CASE WHEN event_name = 'page_view' THEN 1 END): Counts only events where `event_name` is 'page_view' to get total page views.
• GROUP BY event_date: Aggregates the results for each unique date.
• ORDER BY event_date: Sorts the results by date in ascending order.
2. Top 10 Most Viewed Pages
This query helps identify your most popular content or landing pages.
Explanation:
• 'SELECT TOP 10': Limits the results to the top 10 pages.
• 'page_location': The URL of the page.
• 'COUNT(*)': Counts all 'page_view' events for each 'page_location'.
• 'COUNT(DISTINCT user_pseudo_id)': Counts the unique users who viewed each page.
• 'WHERE event_name = 'page_view'': Filters for only page view events.
• 'AND page_location IS NOT NULL AND page_location != ''': Ensures we only count valid page URLs.
• 'GROUP BY page_location': Aggregates results for each unique page URL.
• 'ORDER BY page_views_count DESC': Sorts the results by the number of page views in descending order.
3. Sessions by Device Category
Understand how users access your website across different devices.
Explanation:
• 'device_category': The type of device used (e.g., 'mobile', 'desktop').
• 'COUNT(DISTINCT session_id)': Counts unique sessions for each device category.
• 'COUNT(DISTINCT user_pseudo_id)': Counts unique users for each device category.
• 'CAST(COUNT(DISTINCT session_id) * 100.0 / SUM(COUNT(DISTINCT session_id)) OVER () AS DECIMAL(5,2))': Calculates the percentage of sessions for each device category out of the total sessions. 'SUM(...) OVER ()' is a window function that gets the total sessions across all categories.
• 'WHERE device_category IS NOT NULL AND device_category != ''': Filters for valid device categories.
• 'GROUP BY device_category': Aggregates results for each device type.
• 'ORDER BY total_sessions DESC': Sorts by total sessions in descending order.
4. Bounce Rate by Traffic Source and Medium
Identify which traffic channels bring engaged users and which ones have higher bounce rates.
Explanation:
• 'WITH SessionEvents AS (...)': This is a Common Table Expression CTE that first calculates the total 'event_count' for each'`session_id' and checks if the session had at least one 'page_view'.
• 'total_sessions': Counts all sessions for each source/medium.
• 'bounced_sessions': Counts sessions that had exactly one event ('event_count = 1') AND that event was a 'page_view'. This is our definition of a bounce.
• 'bounce_rate_percent': Calculates the bounce rate as a percentage.
• 'GROUP BY traffic_source, traffic_medium': Aggregates results for each unique source and medium combination.
5. New vs Returning Users (Sessions Count)
Understand the mix of new and returning visitors to your site.
Explanation:
JOIN users u ON e.user_pseudo_id = u.user_pseudo_id: This part connects the events
table with the users
table using the common user_pseudo_id
field to combine related user data with event data.
CASE WHEN e.event_timestamp = u.first_visit_timestamp THEN 'New User' ELSE 'Returning User' END: This logic classifies each session as either New User or Returning User. If the event's timestamp matches the user's first visit timestamp, it’s labeled as a New User session; otherwise, it's considered a Returning User session. Note that this is a simplified method—a more accurate classification might look at the session’s first event timestamp instead.
COUNT(DISTINCT e.session_id): This counts the number of unique sessions for each user type, helping measure how many sessions come from new versus returning users.
GROUP BY user_type: This groups the results by user type so you can see aggregated counts separately for New User and Returning User sessions.
Important Notes on SQL Queries:
These queries are written for SQL Server. Syntax might vary slightly for other database systems like PostgreSQL or MySQL. Always ensure your database has appropriate indexes on columns like 'user_pseudo_id' 'session_id' 'event_timestamp' and 'event_name' for optimal query performance. The 'event_timestamp' is assumed to be in Unix microseconds as exported from GA4. Adjust the division by 1000000 if your timestamps are in milliseconds or seconds. Understanding your data schema is crucial for writing effective SQL queries. Refer back to Phase 2.3 if needed.
Overall Value of Using SQL for Business Insights
Mastering SQL for web analytics data is a significant step in your journey. It allows you to bypass intermediate tools for many common analyses. You can directly query your structured data to answer critical business questions. This skill is highly valued in the data analytics industry. It gives you direct control over your data and insights. This prepares you for more advanced data manipulation and reporting tasks.
Next Steps
You have successfully learned to write SQL queries to extract key business insights from your web analytics data. This means you are now proficient in direct database interaction. The next exciting phase will be to learn how to use SQL Joins to combine website analytics data. This will enable you to link information across different tables like events and users to gain even richer insights.