Understanding Self Join in SQL for Banking Data Analysis

Note: All SQL queries used in the Case study is tested on Oracle,SQLServer,MySQL and PostgreSQL.

1. Find customers who share the same phone number.
SELECT c1.CustomerID, c1.Name, c1.Phone
FROM Customers c1
JOIN Customers c2 ON c1.Phone = c2.Phone
WHERE c1.CustomerID <> c2.CustomerID;
Explanation
This SQL query does a self join and returns customers who have the same number in the Customers table. Here, the table is being joined to itself; c1 and c2 are just aliases for the same Customers table. The JOIN condition is that the Phone number should match between both occurrences of the table (c1.Phone = c2.Phone). This is further ensured by making sure that the customers being compared are different individuals by checking if their CustomerID values are not equal-c1.CustomerID<>c2.CustomerID. Conclusion The result set returns CustomerID, Name, and Phone customers with the same phone number. This query works for Oracle, MySQL, PostgreSQL and SQL Server: the syntax of the self join is similar across all these databases.

2.List customers who live at the same address.
SELECT c1.CustomerID, c1.Name, c1.Address
FROM Customers c1
JOIN Customers c2 ON c1.Address = c2.Address
WHERE c1.CustomerID <> c2.CustomerID;
Explanation
This SQL statement uses a self join to find customers who live in the same address in the Customers table. Here, the same table is referenced twice with the aliases c1 and c2. The JOIN condition states that addresses should match for the two copies of the table (c1.Address = c2.Address). In addition to extracting unique customers only, excluding duplicate pairs in which a customer is compared to himself/herself, the WHERE clause checks for the condition that the CustomerID of the two instances differs as c1.CustomerID <> c2.CustomerID. The result set will contain the CustomerID, Name and Address of customers with the same address thereby making an effective indication of their duplicates based on their resident information. This can be very handy when working with customer data to be analyzed for potential duplicates or as part of a targeted marketing effort that is specific to geographic regions. For Oracle, MySQL, PostgreSQL and SQL Server the query is database system independent since the syntax of self joins is similar among these systems.

3.Show pairs of customers who share the same email domain (e.g., gmail.com).
Oracle
SELECT c1.CustomerID AS Customer1ID, c1.Name AS Customer1Name,
c2.CustomerID AS Customer2ID, c2.Name AS Customer2Name,
SUBSTR(c1.Email, INSTR(c1.Email, '@') + 1) AS EmailDomain
FROM Customers c1
JOIN Customers c2
ON
SUBSTR(c1.Email, INSTR(c1.Email, '@') + 1) = SUBSTR(c2.Email, INSTR(c2.Email, '@') + 1)
WHERE c1.CustomerID <> c2.CustomerID;
Explanation
The SQL query is written to deliver a group of pairs of customers from the table Customers based on the same email domain. This has been through a self join on the Customers table named c1 and c2. In the SELECT statement, it comes out with the CustomerID and Name for each customer but naming it Customer1ID, Customer1Name for the first case at c1 and Customer2ID, Customer2Name for the case at c2. Further, it retrieves the domain portion of the e-mail addresses of c1 by using the SUBSTR and INSTR functions together to isolate the part of the e-mail after the '@' character and names that result as EmailDomain. The join condition is that this customer instances' email domains must be equal to each other. Therefore, only those customers with the same email domain will be matched together. This WHERE clause eliminates the result set of cases where two customer records are the same to avoid comparison of a record with itself. The resultant list turns out to be distinct customer pairs who happen to belong to the same email domain and which sometimes is extremely crucial while identifying potential duplicates or analyzing the relationship between customers about shared email characteristics.

______________
MySQL
SELECT c1.CustomerID AS Customer1ID, c1.Name AS Customer1Name,
c2.CustomerID AS Customer2ID, c2.Name AS Customer2Name,
SUBSTRING_INDEX(c1.Email, '@', -1) AS EmailDomain
FROM Customers c1
JOIN Customers c2
ON
SUBSTRING_INDEX(c1.Email, '@', -1) = SUBSTRING_INDEX(c2.Email, '@', -1)
WHERE c1.CustomerID <> c2.CustomerID;
Explanation
The query finds the pairs of customers within the Customers table that have an email address with the same domain. A self join refers to referencing the Customers table twice, using different aliases. The SELECT statement returns CustomerID and Name both instances being listed with labels Customer1ID, Customer1Name, Customer2ID and Customer2Name. It also pulls the domain component of the email address of c1 with the SUBSTRING_INDEX function which will isolate the segment of an email after the '@' character and assigns a name for this result as EmailDomain. Join considers only those pairs of customers by looking through the comparison of the pulled email domains on both instances that match. The WHERE clause excludes cases where two records by the same customer just coincidentally happen to be the same. So there can't be any comparison made with themselves. A list of distinct unique pairs of customers who happen to share the same email domain can be returned which will then be used for the elimination of potential duplicates or used within a marketing strategy targeting their common email addresses.
___________________

PostgreSQL
SELECT c1.CustomerID AS Customer1ID, c1.Name AS Customer1Name,
c2.CustomerID AS Customer2ID, c2.Name AS Customer2Name,
SPLIT_PART(c1.Email, '@', 2) AS EmailDomain
FROM Customers c1
JOIN Customers c2
ON
SPLIT_PART(c1.Email, '@', 2) = SPLIT_PART(c2.Email, '@', 2)
WHERE c1.CustomerID <>c2.CustomerID;
_____________
SQL Server
SELECT c1.CustomerID AS Customer1ID, c1.Name AS Customer1Name,
c2.CustomerID AS Customer2ID, c2.Name AS Customer2Name,
RIGHT(c1.Email, LEN(c1.Email) - CHARINDEX('@', c1.Email)) AS EmailDomain
FROM Customers c1
JOIN Customers c2
ON
RIGHT(c1.Email, LEN(c1.Email) - CHARINDEX('@', c1.Email)) = RIGHT(c2.Email, LEN(c2.Email) - CHARINDEX('@', c2.Email))
WHERE c1.CustomerID <> c2.CustomerID;

Example of Self Join in SQL used for analyzing relationships within banking records

4.Identify accounts that have the same balance across different customers.
SELECT a1.AccountID AS Account1ID, a1.CustomerID AS Customer1ID, a1.Balance,
a2.AccountID AS Account2ID, a2.CustomerID AS Customer2ID
FROM Accounts a1
JOIN Accounts a2
ON
a1.Balance = a2.Balance
WHERE a1.AccountID <> a2.AccountID;
Explanation
The SQL query determines accounts that have a balance with the same value for different customers via a self join based on the Accounts table. In this query a table reference is made twice using aliases a1 and a2. The join condition in this query states that the two instances should have matching balances ((a1.Balance = a2.Balance)). In this scenario the WHERE clause is required to ensure the query returns only unique accounts since the AccountID of the two instances must not be the same. In other words a1.AccountID <> a2.AccountID. The SELECT clause is applied to retrieve both Account ID and Customer ID for both accounts that are a match and also the common Balance. The pairs of accounts that actually belong to different customers will return as a result set which is very useful in many analyses even if one is trying to find potential duplicates or analysis of the behavior of customers with a certain kind of account balance. The query syntax works perfectly on all of Oracle, MySQL, PostgreSQL and SQL Server.

5.Find accounts created on the same date by different customers.
SELECT a1.AccountID AS Account1ID, a1.CustomerID AS Customer1ID, a1.AccountCreationDate,
a2.AccountID AS Account2ID, a2.CustomerID AS Customer2ID
FROM Accounts a1
JOIN Accounts a2
ON
a1.AccountCreationDate = a2.AccountCreationDate
WHERE a1.AccountID <> a2.AccountID;
Explanation
The query will return the following accounts finding all accounts created on the same date but by different customers for a row in the Accounts table. A self join of the Accounts table is to a1 and a2. Then the JOIN condition is the matching of the account creation dates (a1.AccountCreationDate = a2.AccountCreationDate). It ensures that the results returned are distinct accounts only. This is done by checking that the AccountID of both instances is not the same in the WHERE clause (a1.AccountID <> a2.AccountID). In this case, the SELECT clause should retrieve the AccountID and CustomerID for both accounts concerned in the match along with the common AccountCreationDate. Then, the resulting pairs of accounts created on the same date but two different customers would appear. From here, interesting conclusions can be drawn about the behavior of the customer and account trends. Such a query form aligns with the parameters of Oracle, MySQL, PostgreSQL and SQL Server and therefore is applicable in various SQL environments.

Advantages of Self JOIN in SQL

Compare Data In the Same Table
A self-join allows you to compare rows in the same table. For example, you can see how different accounts held by the same customer compare by balancing a bank's Accounts table, so that analyzing individual behavior or trends needn't be done over more than one table.
Handle Hierarchical Data
Self joins are quite useful when handling hierarchic data. This is particularly well-suited for these types of relationships where employees would be reporting to managers or even a parent-child account in a bank. Use of self join would aid in arranging and querying this type of hierarchy.
Finding Duplicates and Similar Records
Self joins enable you to find duplicate or look-alike rows within the same table. You can use an example of a banking context where you would select for duplicate customer records by self-joining the Customers table and comparing names or contact information
Find Related Records
Such relationships can also be brought out when two records from the same table are linked as co-owners of a bank account through a self join. This application is quite helpful in customer relationship management or tracking a referral program in which customers refer other customers.
Time-Based Comparisons
You can apply self join to analyze over time variations about the same entity. For instance in banking you'll be able to compare how same customer's transactions differ from period to period to notice any unusual activity or trends.
Makes Complex Queries About a Single Table Easier
Self join simplifies complex queries by doing away with the necessity of multiple tables. Using self-join, you can extract multiple views of the same table in one query-which is very useful in generating a report or conducting in-depth analysis of a customer's account or transaction data.

When to Use Self JOIN ?

When You Need to Compare Rows of the Same Table
A self join is used any time you want to compare two different rows of the same table, for instance comparing two transactions made from the same account or comparing two different account balances for the same customer.
Working with Hierarchical Data
It provides you with the facility of performing a self join by which you can even run queries and analyze that particular relationship inside the very same table. For example, if your table carries hierarchical data like reporting structures of employees or parent-child accounts, you can very easily query along those relationships within the very same table.
Find Duplicate Or Similar Entries
Any time you have been cleaning your database or validating for inconsistencies, a self join will point to duplicate rows or rows that are very much alike based on certain criteria such as a customer who has a duplicate profile.
Analyzing Relationships in the same table
Use a self join if you want to capture relationships between entities that exist in a table. For example, you might find customers who have joint accounts or you might look for customers who were referred by someone else.
Time Based Data Comparison
Self joins will be the most useful in comparisons over time of the same kinds of records between each other when comparing two different states for the same record such as an account balance over two time periods or the history of a customer's transactions.

Previous Topic:-->>Natural JOIN Banking Study || Next topic:-->>Joins with Subqueries

Other Topics for Account Management
Cross Join Equi Join Joins With Group by Having Inner join Banking case study Outer Join