2.3 Design Your SQL Schema for Web Analytics Data
Welcome to Phase 2.3 Structuring Your Data
You now know how to get real data from Google Analytics 4 GA4 into CSV files. The next important step is to prepare a home for that data on your computer. This phase is all about designing a strong and flexible SQL database plan. This plan will set up the tables and columns where your website analytics data will live. This makes it easy to ask questions about the data analyze it and build dashboards later.
Think of this as designing the shelves and compartments in your data warehouse. A good design ensures everything has its place. It is easy to find. It can be combined well when you need to answer business questions.
Why a Well Designed SQL Schema Matters
Structured Storage It organizes your raw CSV data. It puts it into a clean related format. This makes it much easier to work with than flat files.
Efficient Querying A logical plan lets you write powerful SQL questions. These questions quickly pull specific insights.
Data Integrity It helps make rules. These rules are for things like data types and main keys. This ensures your analytics data is steady and trustworthy.
Foundation for Analysis A solid database structure is the base. It supports all your future work. This includes data cleaning making new features exploring data and machine learning.
Dashboard Ready Your Streamlit dashboard will connect straight to this database. A well organized plan makes building dashboards smoother.
Key Considerations for Your Schema Design
GA4 is event based. This means every user action like a page view a click or a scroll is recorded as an event. Your schema design should reflect this. It should also allow you to capture all the important details that come with each event.
Below are the two main tables we will design. These tables will help you organize your GA4 data in a clear and useful way.
1 Events Table
This is your main table. It stores a record for every single event that happens on your website. This is the most detailed level of your data.
What it stores
It holds information about each event including who did it when it happened what type of event it was and where it happened.
Key Columns
event_id — Unique number for each event
user_pseudo_id — Anonymous user number from GA4
event_timestamp — When the event happened in microseconds or milliseconds
event_name — For example page_view scroll or click
page_location — Full web address of the page where the event happened
page_title — The title of the page
session_id — A unique number for the user session
session_number — The order number of the session for that user
traffic_source — For example google or direct
traffic_medium — For example organic cpc or none
device_category — Device type for example mobile desktop or tablet
geo_country — Country of the user
geo_city — City of the user
event_params_json — A flexible column to store extra event details as JSON text
2 Users Table
This table stores information about unique users. These details do not change often or are tied to their first visit.
What it stores
It stores user identifiers and their general info like location and device type.
Key Columns
user_pseudo_id — Main number that links to the events table
first_visit_timestamp — When the user first visited
first_traffic_source — Original source of the user
first_traffic_medium — Original way the user came
device_category — First device type used
geo_country — First country of visit
geo_city — First city of visit
user_age_bracket — For example 25 34 if available
user_gender — For example male or female if available
Example SQL Schema for SQL Server
Here is a basic SQL script to create your analytics tables in a SQL Server database. Save this script as schema.sql in your project’s database folder.
A Note on Data Types
In SQL Server:
NVARCHAR is used for storing text.
NVARCHAR(MAX)b> is used for longer text like URLs or JSON.
BIGINT is ideal for large numbers like timestamps.
INT works well for smaller numbers like session counts.
We use NVARCHAR(255) for primary keys to allow enough room for unique identifiers. This structure helps keep your data clean, searchable, and efficient for analysis.