Page Content Recommendation | Website Analytics

Build a Content Recommendation Engine for Your Website

Welcome to Phase 7.3 Enhancing User Experience with Recommendations

You have successfully built a machine learning model to predict bounce rate. That is a powerful step towards proactive website optimization. Now it is time to focus on enhancing user engagement even further. In this phase we will learn how to build a recommendation engine using machine learning. Think of this as having a smart assistant on your website. It suggests other pages or content that a user might find interesting. This is based on their past behavior or the behavior of similar users. This helps users discover more of your valuable content. It keeps them engaged for longer.
This step is crucial for improving content discoverability increasing time on site and driving conversions.


Why a Recommendation Engine is Essential

Recommendation engines are powerful tools for personalizing the user experience. Here is why they are so valuable.

• Enhanced User Experience Help users quickly find relevant content This makes their visit more enjoyable and productive
• Increased Engagement Keep users on your site longer Encourage them to explore more pages and interact with more content
• Higher Conversions Guide users towards content or products that align with their interests This increases the likelihood of desired actions
• Content Discoverability Surface content that users might not find otherwise This is especially useful for large websites with many articles or products
• Competitive Advantage Provide a personalized experience that sets your website apart from competitors

A recommendation engine transforms your website from a static repository of information into a dynamic personalized experience


Content recommendation system overview diagram

Key Concepts for Building a Recommendation Engine

Collaborative Filtering: This is a common recommendation technique It suggests items to a user based on the preferences or behavior of other similar users For example users who viewed one page also viewed another

Content Based Filtering: This approach recommends items that are similar to those a user has already interacted with It uses attributes of the content such as topic or type to suggest relevant items

User Item Interaction Matrix: This is a table where each row represents a user each column represents an item such as a page and the values show how strongly the user interacted with that item such as the number of views or time spent

Similarity Metrics: These are used to measure how similar two users or two items are Common methods include

Cosine Similarity: Measures the angle between two vectors It works well with sparse data like user item matrices

Euclidean Distance: Measures the straight line distance between two points in a multidimensional space It is useful for understanding how far apart users or items are

Cold Start Problem: This challenge occurs when the system has new users or new items with no interaction history Since there is no data to learn from it is difficult to make recommendations Solutions include showing popular items or using content based methods


Python Code for a Simple Recommendation Engine (Collaborative Filtering)

We will build a basic collaborative filtering recommendation engine. This engine will suggest pages based on what other users with similar browsing habits have viewed. We will fetch user page view data from SQL Server. We will then create a user item matrix. We will calculate similarity. Finally we will generate recommendations for a sample user.

Practical Python Code Example
Here is a basic example of the Python code you will write. This code will connect to your SQL Server database. It will fetch page view data. It will then build a user item matrix. It will calculate cosine similarity between users. It will generate and print recommendations for a specific user.

# SCRIPT SETUP 1: Import necessary libraries import pandas as pd import pyodbc from sklearn.metrics.pairwise import cosine_similarity # SCRIPT SETUP 2: Database connection details for SQL Server # Replace with your actual server name database name and credentials DB_CONFIG = { 'driver': '{ODBC Driver 17 for SQL Server}', 'server': 'YOUR_SQL_SERVER_NAME', # e.g., 'localhost' or 'SERVER_NAME\SQLEXPRESS' 'database': 'YOUR_DATABASE_NAME', 'uid': 'YOUR_USERNAME', 'pwd': 'YOUR_PASSWORD' } # FUNCTION 1: Connect to the database def connect_to_db(): """Establishes a connection to the SQL Server database.""" conn_str = ( f"DRIVER={DB_CONFIG['driver']};" f"SERVER={DB_CONFIG['server']};" f"DATABASE={DB_CONFIG['database']};" f"UID={DB_CONFIG['uid']};" f"PWD={DB_CONFIG['pwd']};" ) try: conn = pyodbc.connect(conn_str) print("Successfully connected to SQL Server.") return conn except pyodbc.Error as ex: sqlstate = ex.args[0] print(f"Database connection error: {sqlstate}") print(ex) return None # FUNCTION 2: Fetch user-page interaction data def fetch_user_page_views(conn): """ Fetches aggregated user-page view data from SQL Server. Counts the number of page views for each user on each unique page. """ query = """ SELECT user_pseudo_id, page_location, COUNT(*) AS page_view_count FROM events WHERE event_name = 'page_view' AND page_location IS NOT NULL AND page_location != '' GROUP BY user_pseudo_id, page_location HAVING COUNT(*) > 0; -- Ensure there's at least one page view """ try: df = pd.read_sql(query, conn) print(f"Fetched {len(df)} user-page interaction records.") return df except Exception as e: print(f"Error fetching user-page view data: {e}") return None # FUNCTION 3: Build User-Item Matrix def build_user_item_matrix(df_interactions): """ Builds a user-item matrix (users as rows, pages as columns) with page_view_count as interaction strength. """ if df_interactions is None or df_interactions.empty: print("No interaction data to build matrix.") return None, None # Create the pivot table: rows=users, columns=pages, values=page_view_count user_item_matrix = df_interactions.pivot_table( index='user_pseudo_id', columns='page_location', values='page_view_count' ).fillna(0) # Fill NaN with 0 for pages not viewed by a user print(f"User-item matrix built with {user_item_matrix.shape[0]} users and {user_item_matrix.shape[1]} pages.") return user_item_matrix, user_item_matrix.index.name # Return index name for later use # FUNCTION 4: Generate Recommendations def generate_recommendations(user_item_matrix, target_user_id, num_recommendations=5): """ Generates content recommendations for a target user using collaborative filtering. """ if user_item_matrix is None or target_user_id not in user_item_matrix.index: print(f"Target user '{target_user_id}' not found or matrix is empty.") return [] # STEP 4.1: Get the target user's interaction vector target_user_vector = user_item_matrix.loc[target_user_id].values.reshape(1, -1) # STEP 4.2: Calculate cosine similarity between target user and all other users # Exclude the target user themselves from similarity calculation other_users_matrix = user_item_matrix.drop(target_user_id) similarities = cosine_similarity(target_user_vector, other_users_matrix) # Create a Series of similarities with user_pseudo_id as index similar_users_series = pd.Series(similarities.flatten(), index=other_users_matrix.index) # Sort by similarity in descending order similar_users_series = similar_users_series.sort_values(ascending=False) # STEP 4.3: Find pages viewed by similar users but not by the target user # Get pages viewed by target user target_user_viewed_pages = user_item_matrix.loc[target_user_id][user_item_matrix.loc[target_user_id] > 0].index.tolist() recommendations = {} for similar_user_id, similarity_score in similar_users_series.items(): if similarity_score <= 0: # Only consider positively correlated users continue # Get pages viewed by the similar user similar_user_viewed_pages = user_item_matrix.loc[similar_user_id][user_item_matrix.loc[similar_user_id] > 0].index.tolist() # Identify pages that similar user viewed but target user has not potential_recommendations = [ page for page in similar_user_viewed_pages if page not in target_user_viewed_pages ] for page in potential_recommendations: # Aggregate recommendation score (e.g., sum of similarity scores of users who viewed this page) if page not in recommendations: recommendations[page] = 0 recommendations[page] += similarity_score * user_item_matrix.loc[similar_user_id, page] # Weight by interaction strength if len(recommendations) >= num_recommendations * 2: # Get more than needed to filter and sort break # STEP 4.4: Sort recommendations by score and return top N sorted_recommendations = sorted(recommendations.items(), key=lambda item: item[1], reverse=True) final_recommendations = [page for page, score in sorted_recommendations if page not in target_user_viewed_pages][:num_recommendations] return final_recommendations # MAIN EXECUTION 1: This block runs when the script starts if __name__ == "__main__": # MAIN EXECUTION 2: Connect to the database conn = connect_to_db() if conn: # MAIN EXECUTION 3: Fetch user-page interaction data interactions_df = fetch_user_page_views(conn) # MAIN EXECUTION 4: Build user-item matrix user_page_matrix, _ = build_user_item_matrix(interactions_df) # MAIN EXECUTION 5: Generate recommendations for a sample user if user_page_matrix is not None and not user_page_matrix.empty: # Select a sample user_pseudo_id from your data for testing # Replace with an actual user_pseudo_id from your 'users' table or 'events' table # Example: target_user = user_page_matrix.index[0] if not user_page_matrix.empty else None # For demonstration, let's pick a hypothetical user ID. # You should replace this with a real user_pseudo_id from your database for meaningful results. sample_user_id = 'some_user_pseudo_id_from_your_data' # <<< IMPORTANT: REPLACE THIS! if sample_user_id in user_page_matrix.index: print(f"\n--- Generating recommendations for user: {sample_user_id} ---") recommended_pages = generate_recommendations(user_page_matrix, sample_user_id, num_recommendations=5) if recommended_pages: print("\nRecommended Pages:") for i, page in enumerate(recommended_pages): print(f"{i+1}. {page}") else: print("No recommendations could be generated for this user. This might be due to cold start problem (new user) or lack of similar users/content.") else: print(f"Sample user ID '{sample_user_id}' not found in the user-item matrix. Please update 'sample_user_id' with a valid user_pseudo_id from your data.") else: print("Could not build user-item matrix. Cannot generate recommendations.") # MAIN EXECUTION 6: Close the database connection conn.close() print("Database connection closed.") else: print("Could not establish database connection. Exiting.")


Important Notes on This Code:

SQL Server Integration and Data Preparation for Recommendations

This script connects to your SQL Server database to retrieve user page view data. It constructs a user item matrix where each cell shows how many times a user viewed a specific page. It calculates cosine similarity to find users with similar browsing behavior. Based on this analysis, it recommends pages that similar users have visited but the target user has not. For the recommendations to work correctly, you must replace the sample user id in the main block with an actual user pseudo id from your events or users table.

Collaborative Filtering Approach and Configuration Details

The current approach uses collaborative filtering in a simplified form. It works well for basic recommendation systems, but for more robust and scalable applications, you might consider advanced methods like matrix factorization or hybrid models. Remember to fill in your actual SQL Server connection details in the DB CONFIG section. This includes your server name, database name, username, and password. Accurate configuration is essential to connect and fetch data successfully from your database.


Understanding Your Python Recommendation Engine Script

Introduction to Building a Basic Recommendation Engine: This Python script guides you through building a simple content recommendation engine using a collaborative filtering approach. It suggests pages to users based on the overall behavior of other visitors to your website. Let us break down each part of the code to understand how it works.

Setting Up Your Tools and Connections: At the beginning of the script, you will find several import statements that bring in the tools needed for working with data, connecting to your database, and calculating user similarity.

import pandas as pd (SCRIPT SETUP 1) This brings in Pandas, which helps you work with data in a table-like format and perform calculations
import pyodbc (SCRIPT SETUP 1) This allows Python to connect to your SQL Server database
from sklearn.metrics.pairwise import cosine_similarity (SCRIPT SETUP 1) This brings in a function from scikit-learn that calculates cosine similarity between user vectors. It helps find users with similar browsing behavior

DB_CONFIG (SCRIPT SETUP 2) This section holds the details needed to connect to your SQL Server. You will need to update YOUR SQL SERVER NAME, YOUR DATABASE NAME, YOUR USERNAME, and YOUR PASSWORD with your actual database information.


2.Connecting to Your Database using connect_to_db: This section refers to FUNCTION 1 in the code. The connect_to_db function is responsible for establishing the connection to your database.

What it does It tries to open a connection to your SQL Server using the information provided in DB_CONFIG
How it works It builds a connection string which helps pyodbc locate and access your database. After that it attempts to establish the connection
Safety check It prints a message to inform you whether the connection was successful or if there was an error


3. Fetch User Page Interaction Data using fetch_user_page_views: This section refers to FUNCTION 2 in the code. The fetch_user_page_views function gathers the raw data needed to understand which users viewed which pages.

What it does It runs a SQL query to fetch user_pseudo_id page_location and the count of how many times each user viewed a specific page It focuses only on page_view events
How it works It groups the events table by user and page location then counts the number of page_view events for each pair This count represents the interaction strength It uses pd.read_sql to pull the result into a Pandas DataFrame
Safety check It prints how many interaction records were fetched If an error occurs it prints an error message


4. Build User-Item Matrix using build_user_item_matrix: This section refers to FUNCTION 3 in the code. The build_user_item_matrix function transforms the interaction data into a format suitable for similarity calculations.

What it does It creates a user-item matrix where each row represents a unique user each column represents a unique page and the values in the cells are the page_view_count which indicates interaction strength
How it works It uses the Pandas pivot_table function to reshape the DataFrame Users become the index pages become the columns and the page_view_count values are placed in the cells fillna0 is used to insert 0 where a user has not viewed a specific page
Output It returns the created user-item matrix which can be used for further similarity and recommendation calculations


5. Generate Recommendations using generate_recommendations: This refers to FUNCTION 4 and its internal steps 4.1 to 4.4 in the code. This is the core function where recommendations are generated using collaborative filtering.

What it does For a given target user it finds other users who have similar page viewing habits Then it suggests pages that these similar users have viewed but the target user has not

How it works

Step 4.1 Get the target user's interaction vector It extracts the row corresponding to the target user from the user-item matrix This row represents the target user's viewing history

Step 4.2 Calculate cosine similarity between target user and all other users It computes the cosine similarity between the target user's viewing vector and the viewing vectors of all other users Cosine similarity measures the angle between two vectors A smaller angle means higher similarity

Step 4.3 Find pages viewed by similar users but not by the target user It iterates through the most similar users For each similar user it identifies the pages they have viewed It then filters these pages to only include those that the target user has not yet viewed It aggregates a recommendation score for each potential page

Step 4.4 Sort recommendations by score and return top N Finally it sorts the identified potential recommendations by their aggregated score in descending order and returns the top num_recommendations It also ensures that pages already viewed by the target user are not recommended

Output It returns a list of recommended page URLs


6. Running the Script The Main Block: This refers to MAIN EXECUTION 1 to 6 in the code. This part of the script puts everything into action when you run the Python file.

MAIN EXECUTION 1 This line ensures the code inside this block only runs when you directly start this Python file

MAIN EXECUTION 2 Connect to the database It calls the connect_to_db function to establish your database connection If it fails the script stops

MAIN EXECUTION 3 Fetch user-page interaction data If the connection is successful it calls fetch_user_page_views to get the necessary interaction data

MAIN EXECUTION 4 Build user-item matrix If interaction data is fetched successfully it then calls build_user_item_matrix to create the pivot table

MAIN EXECUTION 5 Generate recommendations for a sample user If the matrix is built it then calls generate_recommendations for a specified sample_user_id Remember to replace 'some_user_pseudo_id_from_your_data' with an actual user ID from your database for meaningful results

MAIN EXECUTION 6 Close the database connection Finally it closes the database connection This is a very important good practice to free up resources



Overall Value of a Recommendation Engine

Building a recommendation engine is a sophisticated application of machine learning in web analytics. It moves your website beyond static content delivery to a dynamic and personalized experience. By intelligently suggesting relevant pages or content you can significantly increase user engagement time on site and ultimately conversion rates. This demonstrates your ability to apply advanced machine learning techniques to directly impact business goals. This is a vital skill in data science and product development.


Next Steps

You have successfully built a basic recommendation engine. This means you are now proficient in applying machine learning for personalized content delivery. You have completed the Machine Learning phase. The next exciting phase will be to delve into Dashboards & Visualization. We will start by learning how to create an interactive web dashboard using Streamlit. This will allow you to visually present all the insights you have gathered and models you have built.
For now make sure you save this Python script in your E drive SankalanAnalytics backend folder. Name it something like 'recommendation_engine.py'.


Previous Topic==> Bounce Rate Prediction  Next Topics==> Dashboard


SQL All topics Wise Interview Questions    Employee Salary Management SQL FAQ!.   C FAQ   Top 25 PL/SQL Interview Questions

Joins With Group by Having  Equi Join  Joins with Subqueries  Self Join  Outer Join 


Website Analytics Project: Phases and Action Steps

  • Home
  • 🟢 Live App: Web Analytics Simulator
  • Phase 0: Project Setup & Problem Definition
  • 0.1 Define Project Goals & Challenges
  • 0.2 Select Tools, Tech Stack & Data Sources
  • 0.3 Software Requirements & Installation
  • 0.4 Folder Structure & GitHub Repo
  • 0.5 Testing Project Locally
  • Phase 1: Planning for Analytics
  • 1.1 Website Analytics Project Overview
  • 1.2 Define KPIs, Bounce Rate, Engagement
  • 1.3 Identify Target Users & Pain Points
  • Phase 2: Data Collection
  • 2.1 Setup Google Analytics 4 (GA4)
  • 2.2 Export GA4 Data to BigQuery/CSV
  • 2.3 Design SQL Schema for Web Analytics
  • Phase 3: Data Cleaning & Feature Engineering
  • 3.1 Clean Website Data with Python & Pandas
  • 3.2 Create Custom Metrics (Session, Bounce, etc.)
  • Phase 4: Exploratory Data Analysis (EDA)
  • 4.1 Analyze Website Traffic Trends
  • 4.2 Behavior by Device, Source, Location
  • 4.3 Top Pages & High Bounce Pages
  • 4.4 Diagnose Low Traffic & User Drop
  • Phase 5: Business Insights
  • 5.1 Funnel Analysis & Drop-Off Points
  • 5.2 New vs Returning Users
  • 5.3 Time Spent & Scroll Depth
  • Phase 6: SQL for Business
  • 6.1 SQL for Business Insights
  • 6.2 Combine Web Data Using SQL
  • 6.3 Find Problematic Pages Using SQL
  • Phase 7: Machine Learning
  • 7.1 Segment Users with Clustering
  • 7.2 Predict Bounce Rate with ML
  • 7.3 Recommend Pages or Content
  • Phase 8: Dashboards & Visualization
  • 8.1 Dashboard with Streamlit
  • 8.2 Visualize KPIs with Python
  • 8.3 Page-Level Metrics & Drop Heatmaps
  • Phase 9: Final Analytics Story
  • 9.1 Summary Report & Findings
  • Phase 10: Hosting & Portfolio Presentation
  • 10.1 Host Website Project Online
  • 10.2 Add to GitHub with ReadMe
  • 10.3 Pitch Project in Interview
  • Other Topics
  • SQL Interview Questions
  • SQL Case Study: Account Management
  • Python Interview Questions
  • Why C Language

Get in touch

  • tech2dsm@gmail.com

© Sankalan Data Tech. All rights reserved.