SQL Window Functions and Analytic Functions: Your FAQ Guide
1.What are window functions in SQL and how do they differ from regular aggregate functions?
Window functions compute across a set of table rows that are related to the current row. In contrast to aggregate functions which return a single value for a group of rows, window functions allow you to preserve the individual row details while performing calculations like running totals or ranking. A window function operates within a defined "window" of rows.
Example Query:
SELECT employee_id, salary,
SUM(salary) OVER (ORDER BY salary) AS running_total
FROM employees;
This statement calculates a running total of salaries in ascending order not collapsing the rows like an aggregate function such as SUM() would do and group all the rows.
2.What does the OVER clause of a window function do?
The OVER clause decides which window of rows on which a window function actually operates. The OVER clause can contain an ORDER BY that defines how rows will be sorted and a PARTITION BY that partitions rows into groups. Without an OVER clause, applying any window function to a particular set of rows is meaningless because all the row-specific data is lost.
Query to Show this Concept:
SELECT employee_id, salary,
RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank FROM employees;
Here, RANK() ranks all employees within departments according to salary. Again, with the use of the PARTITION BY clause, we are able to separate each ranking per department.
3.What is ROW_NUMBER()? How do you apply this function in SQL, and what are its applications?
ROW_NUMBER() returns a unique number for each row in the result set, starting with 1. It can be applied to row number results to find duplicate records or provide ranks for rows in a sorted result set.
Example Query
SELECT employee_id, salary,
ROW_NUMBER() OVER (ORDER BY salary DESC) AS row_num
FROM employees;
This assigns a sequential number to every employee based on their salary in descending order. It can be helpful when you require a unique identifier for every row or if you limit the results to a particular row number.
4.What is the RANK() function and how is it different from DENSE_RANK() in SQL?
Both RANK() and DENSE_RANK() provide rankings to rows based on an ordering specified. However, in case of ties, RANK() leaves gaps in the ranking, while DENSE_RANK() does not leave gaps and provides the same rank to tied rows.
Sample Query:
SELECT employee_id, salary,
RANK() OVER (ORDER BY salary DESC) AS rank,
DENSE_RANK() OVER (ORDER BY salary DESC) AS dense_rank
FROM employees;
If two employees have the same salary, RANK() will assign them the same rank but leave a gap in the next rank, while DENSE_RANK() will assign the next rank without gaps.
5.What are the LEAD() and LAG() functions in SQL and how do they work?
The LEAD() and LAG() functions enable you to access data from a subsequent or preceding row, respectively without having to use a self-join. These functions are helpful for comparing values between rows such as calculating differences between consecutive rows.
Example Query:
SELECT employee_id, salary,
LEAD(salary) OVER (ORDER BY salary) AS next_salary
FROM employees;
This query returns the salary of the employee and the salary of the next employee in the ordered list. LAG() would return the previous row's salary.