THE BIG SIX

As a child, I remember the painful process of learning to construct sentences in English. I first had to grasp a handful of the many components possible in sentences. From those components, I then started to form simple sentences. After learning more components, I was able to write sentences of varying structures and complexities.

Replace my story on learning to write sentences with my story on learning SQL and they’ll read much the same. First, I set out to understand the fundamental clauses involved in simple queries. These included the SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY clauses. I often refer to these clauses as the big six. After I acquired a firm understanding of these clauses, I began learning clauses and concepts that are often necessary and involved when solving more complex problems (e.g., joins, common table expressions, subqueries, etc.) This page is dedicated to providing a high-level overview of each of the big six clauses.

Depending on your background, this page’s contents may be very straightforward and the usage of the individual clauses, intuitive; however, don’t be discouraged if that’s not the case. I recommend reading the subpages dedicated to each of the big six clauses for a more complete understanding. Within these subpages, I’ll explore the ins and outs of each clause in greater depth and provide additional coding examples to illustrate common use cases. The links to the subpages can be found in the menu under THE BIG SIX. These links are also available at the end of this page.

SELECT

Which expressions are necessary?
Examples:
– Pre-existing columns/fields/attributes
– Derived attributes (e.g., attributes resulting from built-in functions, concatenation of first and last name)
– Scalar values (e.g., 1.10, 100, -10, etc.)
– An expression that is any combination of those examples above

Tip: Specify only those expressions required to solve your problem. Avoid using SELECT * to retrieve all pre-existing attributes in any matters other than exploratory. Less is more when it comes to query processing.

Example Task, Code, and Output

Using the employees table within the human resource (hr) schema, write a query that generates an output containing, for each employee, the following:
– Employee ID
– Employee full name (i.e., concatenation of first and last name)
– The current date and time for which the result set was generated
– Employee salary
– The value representing a 10% increase in salary

SELECT
	hr.employees.employee_id, -- pre-existing attribute
    (
    	hr.employees.first_name
        || ' '
        || hr.employees.last_name
    ) AS full_name, -- derived attribute
    SYSDATE AS date_value, -- built-in function
    hr.employees.salary, -- pre-existing attribute
    hr.employees.salary * 1.10 AS new_salary
FROM
	hr.employees;

employee_id	full_name	date_value	salary	new_salary
100	Steven King	18-NOV-23	24000	26400
101	Neena Kochhar	18-NOV-23	17000	18700
102	Lex De Haan	18-NOV-23	17000	18700
103	Alexander Hunold	18-NOV-23	9000	9900
104	Bruce Ernst	18-NOV-23	6000	6600

FROM

What is the data source?
Examples:
– Pre-existing table, view, or temporary table
– Derived table/inline-view (i.e., subquery result set)
– Common table expression

Tip: Fully-qualify pre-existing tables in the FROM clause and their attributes in the SELECT clause by specifying the schema for which they belong. This is important, not only for readability, but also to avoid ambiguity errors because there will be instances where multiple tables and attributes across different schemas share the same name.

Example Task, Code, and Output

Using the orders table within the order entry (oe) schema, write a query that generates an output containing, for each order, the following:
– Order ID
– Customer ID
– Order date
– Order total

SELECT
	oe.orders.order_id,
    oe.orders.customer_id,
    oe.orders.order_date,
	oe.orders.order_total
FROM -- pre-existing table in oe schema
	oe.orders;

order_id	customer_id	order_date	order_total
2458	101	16-AUG-07 03.34.12.234359 PM	78279.6
2397	102	19-NOV-07 02.41.54.696211 PM	42283.2
2454	103	02-OCT-07 05.49.34.678340 PM	6653.4
2354	104	14-JUL-08 06.18.23.234567 PM	46257
2358	105	08-JAN-08 05.03.12.654278 PM	7826

WHERE

Do any filters need to be applied to the data?
Examples:
– Applying filters based on a pre-existing attribute’s values
– Applying filters based on a derived value
– Applying filters based on the result of a subquery
– Applying filters that are any combination of those examples above

Tip: Use this clause assist you in retrieving the minimum amount of data required to solve your problem.

Example Task, Code, and Output

Using the orders table within the order entry (oe) schema, write a query that generates an output containing, for each order, the following:
– Order ID
– Customer ID
– Order date
– Order total

Additionally, only include those orders with an order total value between $50,000 and $100,000, inclusive.

SELECT
	oe.orders.order_id,
    oe.orders.customer_id,
    oe.orders.order_date,
	oe.orders.order_total
FROM
	oe.orders
WHERE
    oe.orders.order_total >= 50000
    AND oe.orders.order_total <= 100000;

order_id	customer_id	order_date	order_total
2458	101	16-AUG-07 03.34.12.234359 PM	78279.6
2440	107	31-AUG-07 09.53.06.008765 PM	70576.9
2357	108	08-JAN-06 08.19.44.123456 PM	59872.4
2435	144	02-SEP-07 11.22.53.134567 PM	62303
2355	104	26-JAN-06 09.22.51.962632 AM	94513.5

GROUP BY

Does the data need to be grouped?
Rows are grouped based on one or more shared attribute values and a calculation is usually performed across each set of grouped rows. Each set of grouped rows are collapsed into a single row.
Examples:
– Group rows based on their customer ID values and compute the sum of sales across each set of grouped rows, resulting in sales by customer ID
– Group rows based on employee job title and location values and compute the average salary across each set of grouped rows, resulting in the average salary for each job title and location combination

Tip: You will come across code where the GROUP BY clause is not accompanied by an aggregation (e.g., SUM, MIN, MAX, etc.) in the SELECT clause. This is usually because the author has opted to use the GROUP BY clause to address duplicates in the data based on the specified combinations. In this very specific case, I suggest you use the DISTINCT keyword to make the intent of your code clear. The query should be processed the same using either method.

Example Task, Code, and Output

Using the orders table within the order entry (oe) schema, write a query that generates an output containing the total sales by customer. Include the customer ID attribute in the output.

SELECT
    oe.orders.customer_id,
	SUM(oe.orders.order_total) AS total_sales
FROM
	oe.orders
GROUP BY
	oe.orders.customer_id;

customer_id	total_sales
107	155613.2
108	213399.7
158	25270.3
161	600
166	309

HAVING

Do any filters need to be applied to the resulting grouped rows?
Typically, the GROUP BY function is accompanied by an aggregate function. The HAVING clause can be used to apply filters based on the results of aggregate functions.
Examples:
– Applying filters based on the aggregation of pre-existing attribute values
– Applying filters based on the aggregation of derived values
– Applying filters based on the result of a subquery
– Applying filters that are any combination of those examples above

Tip: Use this clause assist to you in retrieving the minimum amount of data required to solve your problem. Remember, the HAVING clause should be used to apply filter conditions on GROUPED rows while the WHERE clause should be used apply filter conditions on UNGROUPED rows.

Example Task, Code, and Output

Using the orders table within the order entry (oe) schema, write a query that generates an output containing the total sales by customer. Include the customer ID attribute in the output. Additionally, only include those customers with total sales greater than $100,000.

SELECT
    oe.orders.customer_id,
	SUM(oe.orders.order_total) AS total_sales
FROM
	oe.orders
GROUP BY
	oe.orders.customer_id
HAVING
    SUM(oe.orders.order_total) > 100000;

customer_id	total_sales
107	155613.2
108	213399.7
109	265255.6
104	146605.5
118	100991.8

ORDER BY

Does the output of the query need to be sorted?
Examples:
– Sorting the output of the query by total sales, descending (i.e., greatest to least)
– Sorting the output of the query by total number of orders placed

Tip: Sorting the output of a query can be a costly operation. In today’s business intelligence environment, your queries will likely originate from a data visualization tool such as Tableau or Power BI. Avoid including an ORDER BY clause in these queries because it is likely you and the consumers of your reports will be using the tool to sort the data displayed dynamically. In the case of Tableau, an ORDER BY clause is not allowed when using the tool’s Custom SQL feature.

Example Task, Code, and Output

Using the orders table within the order entry (oe) schema, write a query that generates an output containing the total sales by customer. Include the customer ID attribute in the output. Additionally, only include those customers with total sales greater than $100,000 and sort the output by the total sales values in descending order.

SELECT
    oe.orders.customer_id,
	SUM(oe.orders.order_total) AS total_sales
FROM
	oe.orders
GROUP BY
	oe.orders.customer_id
HAVING
    SUM(oe.orders.order_total) > 100000
ORDER BY
    total_sales DESC;

customer_id	total_sales
149	403119.7
147	371278.2
150	282694.3
109	265255.6
108	213399.7

Putting It All Together
Using the orders table within the order entry (oe) schema, write a query that generates an output containing the total sales by customer and month. The final output should include only those orders with an order mode value of ‘direct’ and those instances where the customer spent a total of more than $5000 in the corresponding month. Include the customer ID, order month, and total sales values in the output and sort the output by the total sales values in descending order.

SELECT
    oe.orders.customer_id,
    TRUNC(oe.orders.order_date, 'MONTH') AS order_month,
	SUM(oe.orders.order_total) AS total_sales
FROM
	oe.orders
WHERE
    oe.orders.order_mode = 'direct'
GROUP BY
	oe.orders.customer_id,
    TRUNC(oe.orders.order_date, 'MONTH')
HAVING
    SUM(oe.orders.order_total) > 5000
ORDER BY
    total_sales DESC;

customer_id	order_month	total_sales
149	01-SEP-07	268651.8
117	01-JUL-07	103679.3
101	01-AUG-07	78279.6
109	01-JUL-07	77727.2
109	01-MAR-07	72836

Additional Content
For a deeper dive into each of the big six clauses, check out the pages dedicated to each by using the links below. I suggest reading the pages in the order in which they are listed.

SELECT CLAUSE
FROM CLAUSE
WHERE CLAUSE
GROUP BY CLAUSE
HAVING CLAUSE
ORDER BY CLAUSE