THE BIG SIX

As a child, I remember the painful process of learning to construct sentences in English. I first had to grasp a handful of the many components possible in sentences. From those components, I then started to form simple sentences. After learning more components, I was able to write sentences of varying structures and complexities.

Replace my story on learning to write sentences with my story on learning SQL and they’ll read much the same. First, I set out to understand the fundamental clauses involved in simple queries. These included the SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY clauses. I often refer to these clauses as the big six. After I acquired a firm understanding of these clauses, I began learning clauses and concepts that are often necessary and involved when solving more complex problems (e.g., joins, common table expressions, subqueries, etc.) This page is dedicated to providing a high-level overview of each of the big six clauses.

Depending on your background, this page’s contents may be very straightforward and the usage of the individual clauses, intuitive; however, don’t be discouraged if that’s not the case. I recommend reading the subpages dedicated to each of the big six clauses for a more complete understanding. Within these subpages, I’ll explore the ins and outs of each clause in greater depth and provide additional coding examples to illustrate common use cases. The links to the subpages can be found in the menu under THE BIG SIX. These links are also available at the end of this page.

SELECT

Which expressions are necessary?
Examples:
– Pre-existing columns/fields/attributes
– Derived attributes (e.g., attributes resulting from built-in functions, concatenation of first and last name)
– Scalar values (e.g., 1.10, 100, -10, etc.)
– An expression that is any combination of those examples above

Tip: Specify only those expressions required to solve your problem. Avoid using SELECT * to retrieve all pre-existing attributes in any matters other than exploratory. Less is more when it comes to query processing.

Example Task, Code, and Output

Using the employees table within the human resource (hr) schema, write a query that generates an output containing, for each employee, the following:
– Employee ID
– Employee full name (i.e., concatenation of first and last name)
– The current date and time for which the result set was generated
– Employee salary
– The value representing a 10% increase in salary

SELECT
	hr.employees.employee_id, -- pre-existing attribute
    (
    	hr.employees.first_name
        || ' '
        || hr.employees.last_name
    ) AS full_name, -- derived attribute
    SYSDATE AS date_value, -- built-in function
    hr.employees.salary, -- pre-existing attribute
    hr.employees.salary * 1.10 AS new_salary
FROM
	hr.employees;
employee_idfull_namedate_valuesalarynew_salary
100Steven King18-NOV-232400026400
101Neena Kochhar18-NOV-231700018700
102Lex De Haan18-NOV-231700018700
103Alexander Hunold18-NOV-2390009900
104Bruce Ernst18-NOV-2360006600
FROM

What is the data source?
Examples:
– Pre-existing table, view, or temporary table
– Derived table/inline-view (i.e., subquery result set)
Common table expression

Tip: Fully-qualify pre-existing tables in the FROM clause and their attributes in the SELECT clause by specifying the schema for which they belong. This is important, not only for readability, but also to avoid ambiguity errors because there will be instances where multiple tables and attributes across different schemas share the same name.

Example Task, Code, and Output

Using the orders table within the order entry (oe) schema, write a query that generates an output containing, for each order, the following:
– Order ID
– Customer ID
– Order date
– Order total

SELECT
	oe.orders.order_id,
    oe.orders.customer_id,
    oe.orders.order_date,
	oe.orders.order_total
FROM -- pre-existing table in oe schema
	oe.orders; 
order_idcustomer_idorder_dateorder_total
245810116-AUG-07 03.34.12.234359 PM78279.6
239710219-NOV-07 02.41.54.696211 PM42283.2
245410302-OCT-07 05.49.34.678340 PM6653.4
235410414-JUL-08 06.18.23.234567 PM46257
235810508-JAN-08 05.03.12.654278 PM7826
WHERE

Do any filters need to be applied to the data?
Examples:
– Applying filters based on a pre-existing attribute’s values
– Applying filters based on a derived value
– Applying filters based on the result of a subquery
– Applying filters that are any combination of those examples above

Tip: Use this clause assist you in retrieving the minimum amount of data required to solve your problem.

Example Task, Code, and Output

Using the orders table within the order entry (oe) schema, write a query that generates an output containing, for each order, the following:
– Order ID
– Customer ID
– Order date
– Order total

Additionally, only include those orders with an order total value between $50,000 and $100,000, inclusive.

SELECT
	oe.orders.order_id,
    oe.orders.customer_id,
    oe.orders.order_date,
	oe.orders.order_total
FROM
	oe.orders
WHERE
    oe.orders.order_total >= 50000
    AND oe.orders.order_total <= 100000;
order_idcustomer_idorder_dateorder_total
245810116-AUG-07 03.34.12.234359 PM78279.6
244010731-AUG-07 09.53.06.008765 PM70576.9
235710808-JAN-06 08.19.44.123456 PM59872.4
243514402-SEP-07 11.22.53.134567 PM62303
235510426-JAN-06 09.22.51.962632 AM94513.5
GROUP BY

Does the data need to be grouped?
Rows are grouped based on one or more shared attribute values and a calculation is usually performed across each set of grouped rows. Each set of grouped rows are collapsed into a single row.
Examples:
– Group rows based on their customer ID values and compute the sum of sales across each set of grouped rows, resulting in sales by customer ID
– Group rows based on employee job title and location values and compute the average salary across each set of grouped rows, resulting in the average salary for each job title and location combination

Tip: You will come across code where the GROUP BY clause is not accompanied by an aggregation (e.g., SUM, MIN, MAX, etc.) in the SELECT clause. This is usually because the author has opted to use the GROUP BY clause to address duplicates in the data based on the specified combinations. In this very specific case, I suggest you use the DISTINCT keyword to make the intent of your code clear. The query should be processed the same using either method.

Example Task, Code, and Output

Using the orders table within the order entry (oe) schema, write a query that generates an output containing the total sales by customer. Include the customer ID attribute in the output.

SELECT
    oe.orders.customer_id,
	SUM(oe.orders.order_total) AS total_sales
FROM
	oe.orders
GROUP BY
	oe.orders.customer_id;
customer_idtotal_sales
107155613.2
108213399.7
15825270.3
161600
166309
HAVING

Do any filters need to be applied to the resulting grouped rows?
Typically, the GROUP BY function is accompanied by an aggregate function. The HAVING clause can be used to apply filters based on the results of aggregate functions.
Examples:
– Applying filters based on the aggregation of pre-existing attribute values
– Applying filters based on the aggregation of derived values
– Applying filters based on the result of a subquery
– Applying filters that are any combination of those examples above

Tip: Use this clause assist to you in retrieving the minimum amount of data required to solve your problem. Remember, the HAVING clause should be used to apply filter conditions on GROUPED rows while the WHERE clause should be used apply filter conditions on UNGROUPED rows.

Example Task, Code, and Output

Using the orders table within the order entry (oe) schema, write a query that generates an output containing the total sales by customer. Include the customer ID attribute in the output. Additionally, only include those customers with total sales greater than $100,000.

SELECT
    oe.orders.customer_id,
	SUM(oe.orders.order_total) AS total_sales
FROM
	oe.orders
GROUP BY
	oe.orders.customer_id
HAVING
    SUM(oe.orders.order_total) > 100000;
customer_idtotal_sales
107155613.2
108213399.7
109265255.6
104146605.5
118100991.8
ORDER BY

Does the output of the query need to be sorted?
Examples:
– Sorting the output of the query by total sales, descending (i.e., greatest to least)
– Sorting the output of the query by total number of orders placed

Tip: Sorting the output of a query can be a costly operation. In today’s business intelligence environment, your queries will likely originate from a data visualization tool such as Tableau or Power BI. Avoid including an ORDER BY clause in these queries because it is likely you and the consumers of your reports will be using the tool to sort the data displayed dynamically. In the case of Tableau, an ORDER BY clause is not allowed when using the tool’s Custom SQL feature.

Example Task, Code, and Output

Using the orders table within the order entry (oe) schema, write a query that generates an output containing the total sales by customer. Include the customer ID attribute in the output. Additionally, only include those customers with total sales greater than $100,000 and sort the output by the total sales values in descending order.

SELECT
    oe.orders.customer_id,
	SUM(oe.orders.order_total) AS total_sales
FROM
	oe.orders
GROUP BY
	oe.orders.customer_id
HAVING
    SUM(oe.orders.order_total) > 100000
ORDER BY
    total_sales DESC;
customer_idtotal_sales
149403119.7
147371278.2
150282694.3
109265255.6
108213399.7

Putting It All Together
Using the orders table within the order entry (oe) schema, write a query that generates an output containing the total sales by customer and month. The final output should include only those orders with an order mode value of ‘direct’ and those instances where the customer spent a total of more than $5000 in the corresponding month. Include the customer ID, order month, and total sales values in the output and sort the output by the total sales values in descending order.

SELECT
    oe.orders.customer_id,
    TRUNC(oe.orders.order_date, 'MONTH') AS order_month,
	SUM(oe.orders.order_total) AS total_sales
FROM
	oe.orders
WHERE
    oe.orders.order_mode = 'direct'
GROUP BY
	oe.orders.customer_id,
    TRUNC(oe.orders.order_date, 'MONTH')
HAVING
    SUM(oe.orders.order_total) > 5000
ORDER BY
    total_sales DESC;
customer_idorder_monthtotal_sales
14901-SEP-07268651.8
11701-JUL-07103679.3
10101-AUG-0778279.6
10901-JUL-0777727.2
10901-MAR-0772836

Additional Content
For a deeper dive into each of the big six clauses, check out the pages dedicated to each by using the links below. I suggest reading the pages in the order in which they are listed.

SELECT CLAUSE
FROM CLAUSE
WHERE CLAUSE
GROUP BY CLAUSE
HAVING CLAUSE
ORDER BY CLAUSE