As a child, I remember the painful process of learning to construct sentences in English. I first had to grasp a handful of the many components possible in sentences. From those components, I then started to form simple sentences. After learning more components, I was able to write sentences of varying structures and complexities.
Replace my story on learning to write sentences with my story on learning SQL and they’ll read much the same. First, I set out to understand the fundamental clauses involved in simple queries. These included the SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY clauses. I often refer to these clauses as the big six. After I acquired a firm understanding of these clauses, I began learning clauses and concepts that are often necessary and involved when solving more complex problems (e.g., joins, common table expressions, subqueries, etc.) This page is dedicated to providing a high-level overview of each of the big six clauses.
Depending on your background, this page’s contents may be very straightforward and the usage of the individual clauses, intuitive; however, don’t be discouraged if that’s not the case. I recommend reading the subpages dedicated to each of the big six clauses for a more complete understanding. Within these subpages, I’ll explore the ins and outs of each clause in greater depth and provide additional coding examples to illustrate common use cases. The links to the subpages can be found in the menu under THE BIG SIX. These links are also available at the end of this page.
SELECT
Which expressions are necessary?
Examples:
– Pre-existing columns/fields/attributes
– Derived attributes (e.g., attributes resulting from built-in functions, concatenation of first and last name)
– Scalar values (e.g., 1.10, 100, -10, etc.)
– An expression that is any combination of those examples above
Tip: Specify only those expressions required to solve your problem. Avoid using SELECT * to retrieve all pre-existing attributes in any matters other than exploratory. Less is more when it comes to query processing.
Example Task, Code, and Output
Using the employees table within the human resource (hr) schema, write a query that generates an output containing, for each employee, the following:
– Employee ID
– Employee full name (i.e., concatenation of first and last name)
– The current date and time for which the result set was generated
– Employee salary
– The value representing a 10% increase in salary
SELECT
hr.employees.employee_id, -- pre-existing attribute
(
hr.employees.first_name
|| ' '
|| hr.employees.last_name
) AS full_name, -- derived attribute
SYSDATE AS date_value, -- built-in function
hr.employees.salary, -- pre-existing attribute
hr.employees.salary * 1.10 AS new_salary
FROM
hr.employees;
| employee_id | full_name | date_value | salary | new_salary |
|---|---|---|---|---|
| 100 | Steven King | 18-NOV-23 | 24000 | 26400 |
| 101 | Neena Kochhar | 18-NOV-23 | 17000 | 18700 |
| 102 | Lex De Haan | 18-NOV-23 | 17000 | 18700 |
| 103 | Alexander Hunold | 18-NOV-23 | 9000 | 9900 |
| 104 | Bruce Ernst | 18-NOV-23 | 6000 | 6600 |
FROM
What is the data source?
Examples:
– Pre-existing table, view, or temporary table
– Derived table/inline-view (i.e., subquery result set)
– Common table expression
Tip: Fully-qualify pre-existing tables in the FROM clause and their attributes in the SELECT clause by specifying the schema for which they belong. This is important, not only for readability, but also to avoid ambiguity errors because there will be instances where multiple tables and attributes across different schemas share the same name.
Example Task, Code, and Output
Using the orders table within the order entry (oe) schema, write a query that generates an output containing, for each order, the following:
– Order ID
– Customer ID
– Order date
– Order total
SELECT
oe.orders.order_id,
oe.orders.customer_id,
oe.orders.order_date,
oe.orders.order_total
FROM -- pre-existing table in oe schema
oe.orders;
| order_id | customer_id | order_date | order_total |
|---|---|---|---|
| 2458 | 101 | 16-AUG-07 03.34.12.234359 PM | 78279.6 |
| 2397 | 102 | 19-NOV-07 02.41.54.696211 PM | 42283.2 |
| 2454 | 103 | 02-OCT-07 05.49.34.678340 PM | 6653.4 |
| 2354 | 104 | 14-JUL-08 06.18.23.234567 PM | 46257 |
| 2358 | 105 | 08-JAN-08 05.03.12.654278 PM | 7826 |
WHERE
Do any filters need to be applied to the data?
Examples:
– Applying filters based on a pre-existing attribute’s values
– Applying filters based on a derived value
– Applying filters based on the result of a subquery
– Applying filters that are any combination of those examples above
Tip: Use this clause assist you in retrieving the minimum amount of data required to solve your problem.
Example Task, Code, and Output
Using the orders table within the order entry (oe) schema, write a query that generates an output containing, for each order, the following:
– Order ID
– Customer ID
– Order date
– Order total
Additionally, only include those orders with an order total value between $50,000 and $100,000, inclusive.
SELECT
oe.orders.order_id,
oe.orders.customer_id,
oe.orders.order_date,
oe.orders.order_total
FROM
oe.orders
WHERE
oe.orders.order_total >= 50000
AND oe.orders.order_total <= 100000;
| order_id | customer_id | order_date | order_total |
|---|---|---|---|
| 2458 | 101 | 16-AUG-07 03.34.12.234359 PM | 78279.6 |
| 2440 | 107 | 31-AUG-07 09.53.06.008765 PM | 70576.9 |
| 2357 | 108 | 08-JAN-06 08.19.44.123456 PM | 59872.4 |
| 2435 | 144 | 02-SEP-07 11.22.53.134567 PM | 62303 |
| 2355 | 104 | 26-JAN-06 09.22.51.962632 AM | 94513.5 |
GROUP BY
Does the data need to be grouped?
Rows are grouped based on one or more shared attribute values and a calculation is usually performed across each set of grouped rows. Each set of grouped rows are collapsed into a single row.
Examples:
– Group rows based on their customer ID values and compute the sum of sales across each set of grouped rows, resulting in sales by customer ID
– Group rows based on employee job title and location values and compute the average salary across each set of grouped rows, resulting in the average salary for each job title and location combination
Tip: You will come across code where the GROUP BY clause is not accompanied by an aggregation (e.g., SUM, MIN, MAX, etc.) in the SELECT clause. This is usually because the author has opted to use the GROUP BY clause to address duplicates in the data based on the specified combinations. In this very specific case, I suggest you use the DISTINCT keyword to make the intent of your code clear. The query should be processed the same using either method.
Example Task, Code, and Output
Using the orders table within the order entry (oe) schema, write a query that generates an output containing the total sales by customer. Include the customer ID attribute in the output.
SELECT
oe.orders.customer_id,
SUM(oe.orders.order_total) AS total_sales
FROM
oe.orders
GROUP BY
oe.orders.customer_id;
| customer_id | total_sales |
|---|---|
| 107 | 155613.2 |
| 108 | 213399.7 |
| 158 | 25270.3 |
| 161 | 600 |
| 166 | 309 |
HAVING
Do any filters need to be applied to the resulting grouped rows?
Typically, the GROUP BY function is accompanied by an aggregate function. The HAVING clause can be used to apply filters based on the results of aggregate functions.
Examples:
– Applying filters based on the aggregation of pre-existing attribute values
– Applying filters based on the aggregation of derived values
– Applying filters based on the result of a subquery
– Applying filters that are any combination of those examples above
Tip: Use this clause assist to you in retrieving the minimum amount of data required to solve your problem. Remember, the HAVING clause should be used to apply filter conditions on GROUPED rows while the WHERE clause should be used apply filter conditions on UNGROUPED rows.
Example Task, Code, and Output
Using the orders table within the order entry (oe) schema, write a query that generates an output containing the total sales by customer. Include the customer ID attribute in the output. Additionally, only include those customers with total sales greater than $100,000.
SELECT
oe.orders.customer_id,
SUM(oe.orders.order_total) AS total_sales
FROM
oe.orders
GROUP BY
oe.orders.customer_id
HAVING
SUM(oe.orders.order_total) > 100000;
| customer_id | total_sales |
|---|---|
| 107 | 155613.2 |
| 108 | 213399.7 |
| 109 | 265255.6 |
| 104 | 146605.5 |
| 118 | 100991.8 |
ORDER BY
Does the output of the query need to be sorted?
Examples:
– Sorting the output of the query by total sales, descending (i.e., greatest to least)
– Sorting the output of the query by total number of orders placed
Tip: Sorting the output of a query can be a costly operation. In today’s business intelligence environment, your queries will likely originate from a data visualization tool such as Tableau or Power BI. Avoid including an ORDER BY clause in these queries because it is likely you and the consumers of your reports will be using the tool to sort the data displayed dynamically. In the case of Tableau, an ORDER BY clause is not allowed when using the tool’s Custom SQL feature.
Example Task, Code, and Output
Using the orders table within the order entry (oe) schema, write a query that generates an output containing the total sales by customer. Include the customer ID attribute in the output. Additionally, only include those customers with total sales greater than $100,000 and sort the output by the total sales values in descending order.
SELECT
oe.orders.customer_id,
SUM(oe.orders.order_total) AS total_sales
FROM
oe.orders
GROUP BY
oe.orders.customer_id
HAVING
SUM(oe.orders.order_total) > 100000
ORDER BY
total_sales DESC;
| customer_id | total_sales |
|---|---|
| 149 | 403119.7 |
| 147 | 371278.2 |
| 150 | 282694.3 |
| 109 | 265255.6 |
| 108 | 213399.7 |
Putting It All Together
Using the orders table within the order entry (oe) schema, write a query that generates an output containing the total sales by customer and month. The final output should include only those orders with an order mode value of ‘direct’ and those instances where the customer spent a total of more than $5000 in the corresponding month. Include the customer ID, order month, and total sales values in the output and sort the output by the total sales values in descending order.
SELECT
oe.orders.customer_id,
TRUNC(oe.orders.order_date, 'MONTH') AS order_month,
SUM(oe.orders.order_total) AS total_sales
FROM
oe.orders
WHERE
oe.orders.order_mode = 'direct'
GROUP BY
oe.orders.customer_id,
TRUNC(oe.orders.order_date, 'MONTH')
HAVING
SUM(oe.orders.order_total) > 5000
ORDER BY
total_sales DESC;
| customer_id | order_month | total_sales |
|---|---|---|
| 149 | 01-SEP-07 | 268651.8 |
| 117 | 01-JUL-07 | 103679.3 |
| 101 | 01-AUG-07 | 78279.6 |
| 109 | 01-JUL-07 | 77727.2 |
| 109 | 01-MAR-07 | 72836 |
Additional Content
For a deeper dive into each of the big six clauses, check out the pages dedicated to each by using the links below. I suggest reading the pages in the order in which they are listed.
SELECT CLAUSE
FROM CLAUSE
WHERE CLAUSE
GROUP BY CLAUSE
HAVING CLAUSE
ORDER BY CLAUSE