Website Analytics |SQL Schema Design

2.3 Design Your SQL Schema for Web Analytics Data

Welcome to Phase 2.3 Structuring Your Data
You now know how to get real data from Google Analytics 4 GA4 into CSV files. The next important step is to prepare a home for that data on your computer. This phase is all about designing a strong and flexible SQL database plan. This plan will set up the tables and columns where your website analytics data will live. This makes it easy to ask questions about the data analyze it and build dashboards later.

Think of this as designing the shelves and compartments in your data warehouse. A good design ensures everything has its place. It is easy to find. It can be combined well when you need to answer business questions.


Why a Well Designed SQL Schema Matters

Structured Storage It organizes your raw CSV data. It puts it into a clean related format. This makes it much easier to work with than flat files.

Efficient Querying A logical plan lets you write powerful SQL questions. These questions quickly pull specific insights.

Data Integrity It helps make rules. These rules are for things like data types and main keys. This ensures your analytics data is steady and trustworthy.

Foundation for Analysis A solid database structure is the base. It supports all your future work. This includes data cleaning making new features exploring data and machine learning.

Dashboard Ready Your Streamlit dashboard will connect straight to this database. A well organized plan makes building dashboards smoother.


Diagram of a simple SQL database schema for storing GA4 website analytics data with tables for events and users

Key Considerations for Your Schema Design

GA4 is event based. This means every user action like a page view a click or a scroll is recorded as an event. Your schema design should reflect this. It should also allow you to capture all the important details that come with each event.

Below are the two main tables we will design. These tables will help you organize your GA4 data in a clear and useful way.

1 Events Table
This is your main table. It stores a record for every single event that happens on your website. This is the most detailed level of your data.

What it stores
It holds information about each event including who did it when it happened what type of event it was and where it happened.

Key Columns
event_id — Unique number for each event
user_pseudo_id — Anonymous user number from GA4
event_timestamp — When the event happened in microseconds or milliseconds
event_name — For example page_view scroll or click
page_location — Full web address of the page where the event happened
page_title — The title of the page
session_id — A unique number for the user session
session_number — The order number of the session for that user
traffic_source — For example google or direct
traffic_medium — For example organic cpc or none
device_category — Device type for example mobile desktop or tablet
geo_country — Country of the user
geo_city — City of the user
event_params_json — A flexible column to store extra event details as JSON text

2 Users Table
This table stores information about unique users. These details do not change often or are tied to their first visit.

What it stores
It stores user identifiers and their general info like location and device type.

Key Columns
user_pseudo_id — Main number that links to the events table
first_visit_timestamp — When the user first visited
first_traffic_source — Original source of the user
first_traffic_medium — Original way the user came
device_category — First device type used
geo_country — First country of visit
geo_city — First city of visit
user_age_bracket — For example 25 34 if available
user_gender — For example male or female if available


Example SQL Schema for SQL Server

Here is a basic SQL script to create your analytics tables in a SQL Server database. Save this script as schema.sql in your project’s database folder.

-- schema.sql for SQL Server -- Table to store individual web events CREATE TABLE events ( event_id NVARCHAR(255) PRIMARY KEY, -- Unique identifier for each event user_pseudo_id NVARCHAR(255) NOT NULL, -- Anonymous user ID event_timestamp BIGINT NOT NULL, -- Timestamp in Unix microseconds (from GA4) event_name NVARCHAR(255) NOT NULL, -- For example page_view scroll click page_location NVARCHAR(MAX), -- Full URL of the page page_title NVARCHAR(MAX), -- Title of the page session_id NVARCHAR(255), -- Unique ID for the session session_number INT, -- Session count for the user traffic_source NVARCHAR(255), -- Source for example google direct traffic_medium NVARCHAR(255), -- Medium for example organic cpc none device_category NVARCHAR(255), -- Device type for example mobile desktop tablet geo_country NVARCHAR(255), -- Country of the user geo_city NVARCHAR(255), -- City of the user event_params_json NVARCHAR(MAX) -- Store additional event parameters as JSON text ); -- Table to store unique user information CREATE TABLE users ( user_pseudo_id NVARCHAR(255) PRIMARY KEY, -- Unique anonymous user ID first_visit_timestamp BIGINT NOT NULL, -- Timestamp of user's first visit first_traffic_source NVARCHAR(255), -- Source of user's first visit first_traffic_medium NVARCHAR(255), -- Medium of user's first visit device_category NVARCHAR(255), -- Device category of first visit geo_country NVARCHAR(255), -- Country of first visit geo_city NVARCHAR(255), -- City of first visit user_age_bracket NVARCHAR(50), -- For example 25 34 if available user_gender NVARCHAR(50) -- For example male female if available ); -- Index for faster querying on user_pseudo_id in events table CREATE INDEX idx_events_user_pseudo_id ON events (user_pseudo_id); -- Index for faster querying on event_timestamp in events table CREATE INDEX idx_events_timestamp ON events (event_timestamp); -- Index for faster querying on event_name in events table CREATE INDEX idx_events_name ON events (event_name);

A Note on Data Types

In SQL Server:

NVARCHAR is used for storing text.
NVARCHAR(MAX)b> is used for longer text like URLs or JSON.
BIGINT is ideal for large numbers like timestamps.
INT works well for smaller numbers like session counts.
We use NVARCHAR(255) for primary keys to allow enough room for unique identifiers. This structure helps keep your data clean, searchable, and efficient for analysis.


Previous Topic==> Export Data! ||  Next Topics==> Clean Data.


SQL All topics Wise Interview Questions    Employee Salary Management SQL FAQ!.   C FAQ   Top 25 PL/SQL Interview Questions

Joins With Group by Having  Equi Join  Joins with Subqueries  Self Join  Outer Join 


Website Analytics Project: Phases and Action Steps

  • Home
  • 🟢 Live App: Web Analytics Simulator
  • Phase 0: Project Setup & Problem Definition
  • 0.1 Define Project Goals & Challenges
  • 0.2 Select Tools, Tech Stack & Data Sources
  • 0.3 Software Requirements & Installation
  • 0.4 Folder Structure & GitHub Repo
  • 0.5 Testing Project Locally
  • Phase 1: Planning for Analytics
  • 1.1 Website Analytics Project Overview
  • 1.2 Define KPIs, Bounce Rate, Engagement
  • 1.3 Identify Target Users & Pain Points
  • Phase 2: Data Collection
  • 2.1 Setup Google Analytics 4 (GA4)
  • 2.2 Export GA4 Data to BigQuery/CSV
  • 2.3 Design SQL Schema for Web Analytics
  • Phase 3: Data Cleaning & Feature Engineering
  • 3.1 Clean Website Data with Python & Pandas
  • 3.2 Create Custom Metrics (Session, Bounce, etc.)
  • Phase 4: Exploratory Data Analysis (EDA)
  • 4.1 Analyze Website Traffic Trends
  • 4.2 Behavior by Device, Source, Location
  • 4.3 Top Pages & High Bounce Pages
  • 4.4 Diagnose Low Traffic & User Drop
  • Phase 5: Business Insights
  • 5.1 Funnel Analysis & Drop-Off Points
  • 5.2 New vs Returning Users
  • 5.3 Time Spent & Scroll Depth
  • Phase 6: SQL for Business
  • 6.1 SQL for Business Insights
  • 6.2 Combine Web Data Using SQL
  • 6.3 Find Problematic Pages Using SQL
  • Phase 7: Machine Learning
  • 7.1 Segment Users with Clustering
  • 7.2 Predict Bounce Rate with ML
  • 7.3 Recommend Pages or Content
  • Phase 8: Dashboards & Visualization
  • 8.1 Dashboard with Streamlit
  • 8.2 Visualize KPIs with Python
  • 8.3 Page-Level Metrics & Drop Heatmaps
  • Phase 9: Final Analytics Story
  • 9.1 Summary Report & Findings
  • Phase 10: Hosting & Portfolio Presentation
  • 10.1 Host Website Project Online
  • 10.2 Add to GitHub with ReadMe
  • 10.3 Pitch Project in Interview
  • Other Topics
  • SQL Interview Questions
  • SQL Case Study: Account Management
  • Python Interview Questions
  • Why C Language

Get in touch

  • tech2dsm@gmail.com

© Sankalan Data Tech. All rights reserved.