SQL interview questions about aggregate data and group functions with examples

6. How do you extract the minimum value in a column using MIN()?

The MIN() function in SQL helps you find the smallest (minimum) value in a column. It's often used when you're working with numeric, date, or even text data and want to identify the lowest value based on the column type.

For example, to find the lowest salary in a company, you can use:

SELECT MIN(salary) FROM employees;

This query returns the smallest salary from the employees table.

The MIN() function is very useful in situations where you're looking for:
1.The lowest price of a product.
2.The earliest order or registration date.
3.The smallest quantity sold.
4.The alphabetically first name in a list.

It skips any NULL values and only works on the available (non-null) data.

You can also combine MIN() with GROUP BY to find the minimum value within groups. For instance, the lowest salary in each department:

SELECT department, MIN(salary) FROM employees GROUP BY department;

This function is especially useful in generating business reports, comparisons and performance summaries when identifying the least(minimum) or earliest value matters.


7. How do you use GROUP BY to aggregate the data by specific columns?

The GROUP BY clause in SQL groups rows that share the same values in one or more columns. This allows you to use aggregate functions like COUNT(), SUM(), AVG(), MAX() and MIN() to calculate values for each group separately.

For example, to find the average salary for each department, you can write:

SELECT department_id, AVG(salary) FROM employees GROUP BY department_id;

This query groups employees by their department and calculates the average salary for each group.

The GROUP BY clause is very useful when you want to create reports or summaries that show data broken down by categories, like sales by region or the total number of orders by customer.


8. How would you use HAVING to filter aggregated data?

The HAVING clause in SQL is used to filter the results of aggregate functions after the data has been grouped using GROUP BY. Unlike the WHERE clause, which filters rows before grouping, HAVING lets you set conditions on aggregated data.

For example, to find departments where the average salary is greater than 50,000, you can use:

SELECT department, AVG(salary) FROM employees GROUP BY department HAVING AVG(salary) > 50000;

This query groups employees by department, calculates the average salary for each and only returns those departments where the average salary exceeds 50,000.

The HAVING clause is especially helpful in reports or data analysis when you want to filter grouped or summarized data based on aggregate values.


9. How do you calculate the total of each group using GROUP BY?

To calculate totals for each group, you use aggregate functions like SUM(), COUNT(), or AVG() along with the GROUP BY clause. First, GROUP BY organizes rows based on the column you choose, then the aggregate functions calculate values for each group.

For example, to find the total salary for each department, you can write:

SELECT department_id, SUM(salary) FROM employees GROUP BY department_id;

This query adds up the salaries of all employees in each department and shows the total salary for every department.

In this example, GROUP BY groups the rows by department_id and SUM() calculates the total salary within each group.

Using GROUP BY with aggregate functions is very helpful in generating summarized reports and understanding data trends across different categories.
You can also combine it with ORDER BY to sort the results based on the total or any other aggregated value.


10. How do you count the number of unique values in a column using COUNT(DISTINCT column_name)?

The COUNT(DISTINCT column_name) function in SQL counts the number of unique (different) values in a column, ignoring any duplicates. This is useful when you want to know how many distinct entries exist.

For example, if you want to find out how many unique countries are in the customers table, you can use:

SELECT COUNT(DISTINCT country) FROM customers;

This query returns the number of different countries listed in the table.

COUNT(DISTINCT) is helpful in data analysis to understand diversity or variety in your data, such as counting unique product categories, customer IDs or order types.
Keep in mind that NULL values are ignored and won’t be counted as unique.


11. How do you combine aggregate functions with JOIN?

You can use aggregate functions together with JOIN in SQL to summarize data from multiple tables. This is helpful when you want to combine related data and then perform calculations like totals or averages.

For example, if you want to find the total sales made by each employee, you can join the employees table with the sales table and sum the sales amounts:

SELECT e.name, SUM(s.amount) FROM employees e JOIN sales s ON e.id = s.employee_id GROUP BY e.name;

This query lists each employee’s name along with their total sales amount.

Using aggregate functions with JOIN is powerful for creating reports that combine data from different sources to give meaningful insights.


12. How do you group data by multiple columns using GROUP BY?

You can group data by more than one column by listing multiple columns in the GROUP BY clause. This lets you get aggregated results for each unique combination of values in those columns.

For example, to find the average salary for each combination of department and job title, you can write:

SELECT department, job_title, AVG(salary) FROM employees GROUP BY department, job_title;

This query calculates the average salary for every department and job title pair.

Grouping by multiple columns is helpful when you want a more detailed analysis, like total sales by both region and salesperson.


13. How do you roll aggregates like a cumulative sum or moving averages?

Rolling aggregates calculate values over a moving window of rows, such as a cumulative sum or a moving average. These are useful for analyzing trends over time, like running totals or averages that update as you move through the data.

In SQL, you can use window functions like SUM() OVER() or AVG() OVER() to perform these calculations without grouping the data into fixed groups.

For example, to get the running total of sales by date, you can use:

SELECT date, SUM(sales) OVER (ORDER BY date) AS running_total FROM sales;

This query calculates a cumulative sum of sales ordered by date, showing how total sales grow day by day.

Similarly, moving averages smooth out short-term fluctuations and can be calculated by specifying a window frame in the OVER() clause.

Rolling aggregates are powerful for financial reports, time series analysis, and tracking performance trends.


14. How do you use GROUP BY with date columns to group by year, month, etc.?

When working with dates, you can group your data by parts of the date like year, month or day. The exact functions to use depend on your database system.

Oracle: Use the EXTRACT() function to get parts of the date. For example, to get total sales by year:
SELECT EXTRACT(YEAR FROM order_date) AS order_year, SUM(total_amount) FROM orders GROUP BY EXTRACT(YEAR FROM order_date);

To group by year and month:
SELECT EXTRACT(YEAR FROM order_date) AS order_year, EXTRACT(MONTH FROM order_date) AS order_month, SUM(total_amount) FROM orders GROUP BY EXTRACT(YEAR FROM order_date), EXTRACT(MONTH FROM order_date);

SQL Server: Use the YEAR() and MONTH() functions. For example, total sales by year:
SELECT YEAR(order_date) AS order_year, SUM(total_amount) FROM orders GROUP BY YEAR(order_date);

And by year and month:
SELECT YEAR(order_date) AS order_year, MONTH(order_date) AS order_month, SUM(total_amount) FROM orders GROUP BY YEAR(order_date), MONTH(order_date);

Grouping by date parts lets you analyze your data over specific time frames, such as monthly sales trends or yearly growth.


15. How do I calculate percentages of group sums using aggregation functions?

To calculate the percentage contribution of each group to the total, you can use aggregate functions along with simple math. For example, to find what percentage of total sales each department contributes, you can write a query like this:

SELECT department_id,
SUM(sales) / (SELECT SUM(sales) FROM sales) * 100 AS percentage_sales
FROM sales
GROUP BY department_id;

This query sums sales by department, then divides each department’s total by the overall sales total and multiplies by 100 to get a percentage. It helps you understand how much each department contributes to total sales.

This method is useful for reports and dashboards where you want to see the share or weight of each category within the whole dataset.


16. How can you aggregate data without using GROUP BY with DISTINCT?

The DISTINCT keyword lets you select unique values from a column without grouping rows. For example:

SELECT DISTINCT department FROM employees;

This query returns a list of unique departments from the employees table.

However, DISTINCT is not an aggregate function—it simply filters out duplicates. If you want to perform calculations like totals, averages or counts over groups, you generally need to use GROUP BY.

So, while you can get unique values with DISTINCT to aggregate data meaningfully, it’s best to use aggregate functions combined with GROUP BY.


17. How do you use COUNT() with DISTINCT to count unique values?

The COUNT(DISTINCT column_name) function helps you find the number of unique values in a specific column by ignoring duplicates.

For example:
SELECT COUNT(DISTINCT country) FROM customers;

This query counts how many different countries appear in the customers table.

This is especially useful when you want to understand the diversity or variety in your data — like counting how many unique product categories you have or how many distinct customers made purchases.

Using COUNT(DISTINCT) gives you a quick and easy way to get insights about uniqueness within your dataset.


18. How to deal with NULL values?

In SQL, NULL represents missing or unknown data and can affect the results of aggregate functions like COUNT(), SUM(), and AVG().

By default, most aggregate functions ignore NULL values. For example, SUM() and AVG() only consider non-NULL values in their calculations. However, sometimes you may want to replace NULL with a specific value to avoid skewed results.

You can do this using functions like COALESCE() or IFNULL(). For example:

SELECT COALESCE(salary, 0) FROM employees;

This query replaces any NULL values in the salary column with 0. This way, when you use aggregate functions like SUM() or AVG() the missing values won’t affect your calculations.

Handling NULL values properly ensures your data analysis and reports are accurate and meaningful.


19. How do you use GROUP BY with conditional expressions to aggregate data?

You can use conditional expressions like CASE inside aggregate functions to apply specific conditions while grouping data.

For example, if you want to calculate the total sales per region but only include sales amounts greater than 500, you can write:

SELECT region,
SUM(CASE WHEN sales_amount > 500 THEN sales_amount ELSE 0 END) AS total_sales
FROM sales
GROUP BY region;

This query adds up sales amounts over 500 for each region. The CASE statement allows you to include only the sales that meet your condition while treating others as zero.

Using conditional expressions like this gives you more control when summarizing data, making your reports and analysis more precise.


20. How do you deal with NULL values when using COUNT() or SUM()?

The COUNT() function counts only non-NULL values by default. If you want to count the number of NULL values, you can use a CASE statement.

For example:
SELECT COUNT(CASE WHEN salary IS NULL THEN 1 END) FROM employees;

This counts how many rows have a NULL value in the salary column.

Similarly, SUM() ignores NULL values when adding up numbers. If you want to treat NULL as zero in your sum, use the COALESCE() function.

For example:
SELECT SUM(COALESCE(salary, 0)) FROM employees;

This query adds up all salary values and treats any NULL as 0.

Handling NULL values correctly ensures your counts and sums reflect the data accurately.


21. How can you use GROUP BY with the CASE statement?

You can combine GROUP BY with a CASE statement to group data based on specific conditions.

For example, if you want to categorize sales as "High" or "Low" based on the sales amount, you could write:

SELECT
CASE WHEN sales_amount > 1000 THEN 'High' ELSE 'Low' END AS sales_category,
COUNT(*)
FROM sales
GROUP BY sales_category;

This query groups sales into two categories, "High" and "Low," and counts how many sales fall into each category. The CASE statement lets you create custom groupings to better analyze your data.

Using CASE with GROUP BY is a powerful way to segment and summarize data in meaningful ways.


22. How do you group data by specific time intervals such as weeks or months?

To group data by time intervals like weeks or months, you extract parts of the date using date functions such as YEAR(), MONTH(), WEEK(), or DATE_FORMAT() depending on your database.

For example, to see total sales grouped by year and month, you can use:

SELECT YEAR(order_date) AS year, MONTH(order_date) AS month, SUM(sales_amount)
FROM orders
GROUP BY YEAR(order_date), MONTH(order_date);

This groups your data by year and month, then sums the sales for each group.

You can apply similar logic to group by weeks or other time periods, depending on what date functions your database supports.


23. How to calculate the median using SQL aggregate functions?

Unlike functions like MIN(), MAX(), or AVG(), SQL doesn’t have a built-in MEDIAN() function in many databases. To calculate the median, you often need to use window functions like ROW_NUMBER() or RANK() to order your data and find the middle value.

For example, here’s how you can calculate the median salary:

WITH ordered_salaries AS (
  SELECT salary,
    ROW_NUMBER() OVER (ORDER BY salary) AS row_num,
    COUNT(*) OVER () AS total_rows
  FROM employees
)
SELECT AVG(salary) AS median_salary
FROM ordered_salaries
WHERE row_num IN (FLOOR((total_rows + 1) / 2), CEIL((total_rows + 1) / 2));

This query orders the salaries, counts total rows, and then calculates the median by averaging the middle one or two values depending on whether the total count is odd or even.

Using window functions like these is a common way to get median values in SQL since it’s not directly supported by many database systems.


24. How would you aggregate and summarize data for a hierarchical relationship (like parent/child)?

Hierarchical data often appears in parent-child relationships, such as employees and their managers. To summarize this kind of data, you usually use JOIN operations or recursive queries to walk through the hierarchy.

For example, if you have an employees table with a manager_id column showing who each employee reports to, you can group employees by their managers:

SELECT manager_id, COUNT(*) AS num_employees
FROM employees
GROUP BY manager_id;

This query counts how many employees report to each manager.

For more complex hierarchies, some databases support recursive queries using WITH RECURSIVE (like PostgreSQL and newer versions of MySQL and SQL Server). These queries allow you to walk through multiple levels of the hierarchy to aggregate data across the entire tree.

Understanding how to work with hierarchical data is important for reporting structures, bill of materials, or any dataset where items have parent-child relationships.


25. What happens in case of empty groups while summarizing data?

Sometimes, when summarizing data by groups, you might have groups with no matching records. For example, if you want to count employees in each department, some departments might have zero employees.

To make sure those empty groups still appear in your results (instead of being left out), you can use a LEFT JOIN and replace NULL counts with zero using COALESCE() or IFNULL().

For example:
SELECT departments.department_name, COALESCE(COUNT(employees.employee_id), 0) AS employee_count
FROM departments
LEFT JOIN employees ON departments.department_id = employees.department_id
GROUP BY departments.department_name;

This query shows all departments, including those with no employees, and displays their employee count as zero.

Handling empty groups like this ensures your reports accurately reflect all categories, even those without data.