Build a Content Recommendation Engine for Your Website

Welcome to Phase 7.3 Enhancing User Experience with Recommendations

You have successfully built a machine learning model to predict bounce rate. That is a powerful step towards proactive website optimization. Now it is time to focus on enhancing user engagement even further. In this phase we will learn how to build a recommendation engine using machine learning. Think of this as having a smart assistant on your website. It suggests other pages or content that a user might find interesting. This is based on their past behavior or the behavior of similar users. This helps users discover more of your valuable content. It keeps them engaged for longer.
This step is crucial for improving content discoverability increasing time on site and driving conversions.

Why a Recommendation Engine is Essential

Recommendation engines are powerful tools for personalizing the user experience. Here is why they are so valuable.

• Enhanced User Experience Help users quickly find relevant content This makes their visit more enjoyable and productive
• Increased Engagement Keep users on your site longer Encourage them to explore more pages and interact with more content
• Higher Conversions Guide users towards content or products that align with their interests This increases the likelihood of desired actions
• Content Discoverability Surface content that users might not find otherwise This is especially useful for large websites with many articles or products
• Competitive Advantage Provide a personalized experience that sets your website apart from competitors

A recommendation engine transforms your website from a static repository of information into a dynamic personalized experience

Content recommendation system overview diagram

Key Concepts for Building a Recommendation Engine

Collaborative Filtering: This is a common recommendation technique It suggests items to a user based on the preferences or behavior of other similar users For example users who viewed one page also viewed another

Content Based Filtering: This approach recommends items that are similar to those a user has already interacted with It uses attributes of the content such as topic or type to suggest relevant items

User Item Interaction Matrix: This is a table where each row represents a user each column represents an item such as a page and the values show how strongly the user interacted with that item such as the number of views or time spent

Similarity Metrics: These are used to measure how similar two users or two items are Common methods include

Cosine Similarity: Measures the angle between two vectors It works well with sparse data like user item matrices

Euclidean Distance: Measures the straight line distance between two points in a multidimensional space It is useful for understanding how far apart users or items are

Cold Start Problem: This challenge occurs when the system has new users or new items with no interaction history Since there is no data to learn from it is difficult to make recommendations Solutions include showing popular items or using content based methods

Python Code for a Simple Recommendation Engine (Collaborative Filtering)

We will build a basic collaborative filtering recommendation engine. This engine will suggest pages based on what other users with similar browsing habits have viewed. We will fetch user page view data from SQL Server. We will then create a user item matrix. We will calculate similarity. Finally we will generate recommendations for a sample user.

Practical Python Code Example
Here is a basic example of the Python code you will write. This code will connect to your SQL Server database. It will fetch page view data. It will then build a user item matrix. It will calculate cosine similarity between users. It will generate and print recommendations for a specific user.


# SCRIPT SETUP 1: Import necessary libraries
import pandas as pd
import pyodbc
from sklearn.metrics.pairwise import cosine_similarity

# SCRIPT SETUP 2: Database connection details for SQL Server
# Replace with your actual server name database name and credentials
DB_CONFIG = {
    'driver': '{ODBC Driver 17 for SQL Server}',
    'server': 'YOUR_SQL_SERVER_NAME',  # e.g., 'localhost' or 'SERVER_NAME\SQLEXPRESS'
    'database': 'YOUR_DATABASE_NAME',
    'uid': 'YOUR_USERNAME',
    'pwd': 'YOUR_PASSWORD'
}

# FUNCTION 1: Connect to the database
def connect_to_db():
    """Establishes a connection to the SQL Server database."""
    conn_str = (
        f"DRIVER={DB_CONFIG['driver']};"
        f"SERVER={DB_CONFIG['server']};"
        f"DATABASE={DB_CONFIG['database']};"
        f"UID={DB_CONFIG['uid']};"
        f"PWD={DB_CONFIG['pwd']};"
    )
    try:
        conn = pyodbc.connect(conn_str)
        print("Successfully connected to SQL Server.")
        return conn
    except pyodbc.Error as ex:
        sqlstate = ex.args[0]
        print(f"Database connection error: {sqlstate}")
        print(ex)
        return None

# FUNCTION 2: Fetch user-page interaction data
def fetch_user_page_views(conn):
    """
    Fetches aggregated user-page view data from SQL Server.
    Counts the number of page views for each user on each unique page.
    """
    query = """
    SELECT
        user_pseudo_id,
        page_location,
        COUNT(*) AS page_view_count
    FROM
        events
    WHERE
        event_name = 'page_view'
        AND page_location IS NOT NULL
        AND page_location != ''
    GROUP BY
        user_pseudo_id, page_location
    HAVING
        COUNT(*) > 0; -- Ensure there's at least one page view
    """
    try:
        df = pd.read_sql(query, conn)
        print(f"Fetched {len(df)} user-page interaction records.")
        return df
    except Exception as e:
        print(f"Error fetching user-page view data: {e}")
        return None

# FUNCTION 3: Build User-Item Matrix
def build_user_item_matrix(df_interactions):
    """
    Builds a user-item matrix (users as rows, pages as columns)
    with page_view_count as interaction strength.
    """
    if df_interactions is None or df_interactions.empty:
        print("No interaction data to build matrix.")
        return None, None
    
    # Create the pivot table: rows=users, columns=pages, values=page_view_count
    user_item_matrix = df_interactions.pivot_table(
        index='user_pseudo_id', 
        columns='page_location', 
        values='page_view_count'
    ).fillna(0) # Fill NaN with 0 for pages not viewed by a user
    
    print(f"User-item matrix built with {user_item_matrix.shape[0]} users and {user_item_matrix.shape[1]} pages.")
    return user_item_matrix, user_item_matrix.index.name # Return index name for later use

# FUNCTION 4: Generate Recommendations
def generate_recommendations(user_item_matrix, target_user_id, num_recommendations=5):
    """
    Generates content recommendations for a target user using collaborative filtering.
    """
    if user_item_matrix is None or target_user_id not in user_item_matrix.index:
        print(f"Target user '{target_user_id}' not found or matrix is empty.")
        return []

    # STEP 4.1: Get the target user's interaction vector
    target_user_vector = user_item_matrix.loc[target_user_id].values.reshape(1, -1)

    # STEP 4.2: Calculate cosine similarity between target user and all other users
    # Exclude the target user themselves from similarity calculation
    other_users_matrix = user_item_matrix.drop(target_user_id)
    similarities = cosine_similarity(target_user_vector, other_users_matrix)
    
    # Create a Series of similarities with user_pseudo_id as index
    similar_users_series = pd.Series(similarities.flatten(), index=other_users_matrix.index)
    
    # Sort by similarity in descending order
    similar_users_series = similar_users_series.sort_values(ascending=False)

    # STEP 4.3: Find pages viewed by similar users but not by the target user
    # Get pages viewed by target user
    target_user_viewed_pages = user_item_matrix.loc[target_user_id][user_item_matrix.loc[target_user_id] > 0].index.tolist()

    recommendations = {}
    for similar_user_id, similarity_score in similar_users_series.items():
        if similarity_score <= 0: # Only consider positively correlated users
            continue

        # Get pages viewed by the similar user
        similar_user_viewed_pages = user_item_matrix.loc[similar_user_id][user_item_matrix.loc[similar_user_id] > 0].index.tolist()
        
        # Identify pages that similar user viewed but target user has not
        potential_recommendations = [
            page for page in similar_user_viewed_pages 
            if page not in target_user_viewed_pages
        ]

        for page in potential_recommendations:
            # Aggregate recommendation score (e.g., sum of similarity scores of users who viewed this page)
            if page not in recommendations:
                recommendations[page] = 0
            recommendations[page] += similarity_score * user_item_matrix.loc[similar_user_id, page] # Weight by interaction strength

        if len(recommendations) >= num_recommendations * 2: # Get more than needed to filter and sort
            break
    
    # STEP 4.4: Sort recommendations by score and return top N
    sorted_recommendations = sorted(recommendations.items(), key=lambda item: item[1], reverse=True)
    
    final_recommendations = [page for page, score in sorted_recommendations if page not in target_user_viewed_pages][:num_recommendations]

    return final_recommendations

# MAIN EXECUTION 1: This block runs when the script starts
if __name__ == "__main__":
    # MAIN EXECUTION 2: Connect to the database
    conn = connect_to_db()
    if conn:
        # MAIN EXECUTION 3: Fetch user-page interaction data
        interactions_df = fetch_user_page_views(conn)
        
        # MAIN EXECUTION 4: Build user-item matrix
        user_page_matrix, _ = build_user_item_matrix(interactions_df)
        
        # MAIN EXECUTION 5: Generate recommendations for a sample user
        if user_page_matrix is not None and not user_page_matrix.empty:
            # Select a sample user_pseudo_id from your data for testing
            # Replace with an actual user_pseudo_id from your 'users' table or 'events' table
            # Example: target_user = user_page_matrix.index[0] if not user_page_matrix.empty else None
            # For demonstration, let's pick a hypothetical user ID.
            # You should replace this with a real user_pseudo_id from your database for meaningful results.
            sample_user_id = 'some_user_pseudo_id_from_your_data' # <<< IMPORTANT: REPLACE THIS!

            if sample_user_id in user_page_matrix.index:
                print(f"\n--- Generating recommendations for user: {sample_user_id} ---")
                recommended_pages = generate_recommendations(user_page_matrix, sample_user_id, num_recommendations=5)
                
                if recommended_pages:
                    print("\nRecommended Pages:")
                    for i, page in enumerate(recommended_pages):
                        print(f"{i+1}. {page}")
                else:
                    print("No recommendations could be generated for this user. This might be due to cold start problem (new user) or lack of similar users/content.")
            else:
                print(f"Sample user ID '{sample_user_id}' not found in the user-item matrix. Please update 'sample_user_id' with a valid user_pseudo_id from your data.")
        else:
            print("Could not build user-item matrix. Cannot generate recommendations.")
        
        # MAIN EXECUTION 6: Close the database connection
        conn.close()
        print("Database connection closed.")
    else:
        print("Could not establish database connection. Exiting.")

Important Notes on This Code:

SQL Server Integration and Data Preparation for Recommendations

This script connects to your SQL Server database to retrieve user page view data. It constructs a user item matrix where each cell shows how many times a user viewed a specific page. It calculates cosine similarity to find users with similar browsing behavior. Based on this analysis, it recommends pages that similar users have visited but the target user has not. For the recommendations to work correctly, you must replace the sample user id in the main block with an actual user pseudo id from your events or users table.

Collaborative Filtering Approach and Configuration Details

The current approach uses collaborative filtering in a simplified form. It works well for basic recommendation systems, but for more robust and scalable applications, you might consider advanced methods like matrix factorization or hybrid models. Remember to fill in your actual SQL Server connection details in the DB CONFIG section. This includes your server name, database name, username, and password. Accurate configuration is essential to connect and fetch data successfully from your database.

Understanding Your Python Recommendation Engine Script

Introduction to Building a Basic Recommendation Engine: This Python script guides you through building a simple content recommendation engine using a collaborative filtering approach. It suggests pages to users based on the overall behavior of other visitors to your website. Let us break down each part of the code to understand how it works.

Setting Up Your Tools and Connections: At the beginning of the script, you will find several import statements that bring in the tools needed for working with data, connecting to your database, and calculating user similarity.

import pandas as pd (SCRIPT SETUP 1) This brings in Pandas, which helps you work with data in a table-like format and perform calculations
import pyodbc (SCRIPT SETUP 1) This allows Python to connect to your SQL Server database
from sklearn.metrics.pairwise import cosine_similarity (SCRIPT SETUP 1) This brings in a function from scikit-learn that calculates cosine similarity between user vectors. It helps find users with similar browsing behavior

DB_CONFIG (SCRIPT SETUP 2) This section holds the details needed to connect to your SQL Server. You will need to update YOUR SQL SERVER NAME, YOUR DATABASE NAME, YOUR USERNAME, and YOUR PASSWORD with your actual database information.

2.Connecting to Your Database using connect_to_db: This section refers to FUNCTION 1 in the code. The connect_to_db function is responsible for establishing the connection to your database.

What it does It tries to open a connection to your SQL Server using the information provided in DB_CONFIG
How it works It builds a connection string which helps pyodbc locate and access your database. After that it attempts to establish the connection
Safety check It prints a message to inform you whether the connection was successful or if there was an error

3. Fetch User Page Interaction Data using fetch_user_page_views: This section refers to FUNCTION 2 in the code. The fetch_user_page_views function gathers the raw data needed to understand which users viewed which pages.

What it does It runs a SQL query to fetch user_pseudo_id page_location and the count of how many times each user viewed a specific page It focuses only on page_view events
How it works It groups the events table by user and page location then counts the number of page_view events for each pair This count represents the interaction strength It uses pd.read_sql to pull the result into a Pandas DataFrame
Safety check It prints how many interaction records were fetched If an error occurs it prints an error message

4. Build User-Item Matrix using build_user_item_matrix: This section refers to FUNCTION 3 in the code. The build_user_item_matrix function transforms the interaction data into a format suitable for similarity calculations.

What it does It creates a user-item matrix where each row represents a unique user each column represents a unique page and the values in the cells are the page_view_count which indicates interaction strength
How it works It uses the Pandas pivot_table function to reshape the DataFrame Users become the index pages become the columns and the page_view_count values are placed in the cells fillna0 is used to insert 0 where a user has not viewed a specific page
Output It returns the created user-item matrix which can be used for further similarity and recommendation calculations

5. Generate Recommendations using generate_recommendations: This refers to FUNCTION 4 and its internal steps 4.1 to 4.4 in the code. This is the core function where recommendations are generated using collaborative filtering.

What it does For a given target user it finds other users who have similar page viewing habits Then it suggests pages that these similar users have viewed but the target user has not

How it works

Step 4.1 Get the target user's interaction vector It extracts the row corresponding to the target user from the user-item matrix This row represents the target user's viewing history

Step 4.2 Calculate cosine similarity between target user and all other users It computes the cosine similarity between the target user's viewing vector and the viewing vectors of all other users Cosine similarity measures the angle between two vectors A smaller angle means higher similarity

Step 4.3 Find pages viewed by similar users but not by the target user It iterates through the most similar users For each similar user it identifies the pages they have viewed It then filters these pages to only include those that the target user has not yet viewed It aggregates a recommendation score for each potential page

Step 4.4 Sort recommendations by score and return top N Finally it sorts the identified potential recommendations by their aggregated score in descending order and returns the top num_recommendations It also ensures that pages already viewed by the target user are not recommended

Output It returns a list of recommended page URLs

6. Running the Script The Main Block: This refers to MAIN EXECUTION 1 to 6 in the code. This part of the script puts everything into action when you run the Python file.

MAIN EXECUTION 1 This line ensures the code inside this block only runs when you directly start this Python file

MAIN EXECUTION 2 Connect to the database It calls the connect_to_db function to establish your database connection If it fails the script stops

MAIN EXECUTION 3 Fetch user-page interaction data If the connection is successful it calls fetch_user_page_views to get the necessary interaction data

MAIN EXECUTION 4 Build user-item matrix If interaction data is fetched successfully it then calls build_user_item_matrix to create the pivot table

MAIN EXECUTION 5 Generate recommendations for a sample user If the matrix is built it then calls generate_recommendations for a specified sample_user_id Remember to replace 'some_user_pseudo_id_from_your_data' with an actual user ID from your database for meaningful results

MAIN EXECUTION 6 Close the database connection Finally it closes the database connection This is a very important good practice to free up resources

Overall Value of a Recommendation Engine

Building a recommendation engine is a sophisticated application of machine learning in web analytics. It moves your website beyond static content delivery to a dynamic and personalized experience. By intelligently suggesting relevant pages or content you can significantly increase user engagement time on site and ultimately conversion rates. This demonstrates your ability to apply advanced machine learning techniques to directly impact business goals. This is a vital skill in data science and product development.

Next Steps

You have successfully built a basic recommendation engine. This means you are now proficient in applying machine learning for personalized content delivery. You have completed the Machine Learning phase. The next exciting phase will be to delve into Dashboards & Visualization. We will start by learning how to create an interactive web dashboard using Streamlit. This will allow you to visually present all the insights you have gathered and models you have built.
For now make sure you save this Python script in your E drive SankalanAnalytics backend folder. Name it something like 'recommendation_engine.py'.