Time to Read:
Time is perhaps the most valuable resource. It is the one thing no power on this earth can give us back. Believing that, it seems desirable to spend as little time as possible performing painful activities – right? In this post, I will walk through a handful of measures that you can take to ensure the time spent by others reading your code is as painless as possible. As a disclaimer, for SQL, there is not a style guide or standard equivalent to that available in other programming languages like python (PEP8) that is both well-known and widely adopted. That said, I do believe the measures in the forms of rules and rough guidelines that I propose are sensible and easy to implement.
Line Length
The number of characters used on each line can heavily impact the amount of time and effort required to read a SQL statement. Avoid forcing the readers of your code to scroll in both directions when traversing each line by setting soft limits on the number of characters used per line. Scrolling right to read the contents of a line and subsequently having to scroll back in order to continue with the next is a painful exercise that wastes the reader’s time and will likely make that person less enthusiastic about reviewing your code in the future.
For the remainder of this discussion, consider the SELECT statement below that is both poorly written and largely void of formatting. Some of you will say it is contrived having never been faced with poor code while others will say, “I have seen worse!” With each subsequent topic, I will continue to transform this code by addressing the elements, present or absent, reducing its readability.
WITH cte AS (
SELECT c.customer_id, c.cust_last_name || ', ' || c.cust_first_name AS customer_name, SUM(o.order_total) AS order_total, DENSE_RANK() OVER (ORDER BY SUM(o.order_total) DESC) AS order_total_sequence
FROM oe.customers c INNER JOIN oe.orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= '01-JAN-08' AND o.order_date < '01-JAN-09'
GROUP BY c.customer_id, c.cust_last_name || ', ' || c.cust_first_name
)
SELECT cte.customer_id, cte.customer_name, cte.order_total, cte.order_total_sequence
FROM cte
WHERE cte.order_total_sequence <= 5
ORDER BY 3 DESC;
Separating Clauses with Indentation and New Lines
The use of indentation and new lines in SQL code are perhaps the easiest ways to increase code readability and also address the scrolling problem resulting from excessive line length. While there are no hard-and-fast rules on where to indent and add new lines for SQL, I find it beneficial add a new line after the SELECT, FROM, WHERE, GROUP BY, HAVING, and ORDER BY clauses and to indent the contents or specifications of those clauses. A new line followed by indentation signifies to the reader, “this is the start of the clause’s specification.”
For example, the SELECT clause’s specification is a comma-separated list of the required pre-existing and/or derived attributes. Placing each attribute on a new line makes it easier to read those instructions and should feel natural to the writer (i.e., you) and reader given the concept of a list in everyday writing. The reader simply needs to scan the vertically aligned attributes from top to bottom. This also applies to the predicates in the FROM, WHERE, and HAVING clauses, and for the enumerated attributes found in GROUP BY and ORDER BY clauses.
The code below is identical to that found above; however, I have separated clause specifications with indentation and incorporated new lines for readability. Notice, even in the small amount of space allocated to the code block, scrolling in only one direction is required. Additionally, it is much easier to understand each clause’s specification given that the contents are indented, on separate lines, and vertically aligned.
WITH cte AS (
SELECT
c.customer_id,
(
c.cust_last_name
|| ', '
|| c.cust_first_name
) AS customer_name,
SUM(o.order_total) AS order_total,
DENSE_RANK() OVER (
ORDER BY SUM(o.order_total) DESC
) AS order_total_sequence
FROM
oe.customers c
INNER JOIN
oe.orders o
ON c.customer_id = o.customer_id
WHERE
o.order_date >= '01-JAN-08'
AND o.order_date < '01-JAN-09'
GROUP BY
c.customer_id,
(
c.cust_last_name
|| ', '
|| c.cust_first_name
)
)
SELECT
cte.customer_id,
cte.customer_name,
cte.order_total,
cte.order_total_sequence
FROM
cte
WHERE
cte.order_total_sequence <= 5
ORDER BY
3 DESC;
Table and Column Aliases
All relational database management systems maintain documentation to some extent on the use of table and column aliases. You will find that they are meant to increase readability; however, based on the code reviews I have performed in a professional capacity and while perusing posts on popular social media platforms like LinkedIn or video tutorials on YouTube, often the opposite occurs.
Table names are shortened to the point where the the reader must maintain a mapping of single-letter characters or acronyms and the contents of a table (e.g., a for sales force accounts, o for sales force opportunities, sfosnpsht for sales force opportunity snapshot, etc.). For simple queries containing one or few tables, sure, the additional cognitive effort is minimal, but this quickly gets out of control for many real-world queries merging data from a large amount of tables. This is especially true for code that extends beyond a single frame, resulting in the reader needing to scroll back to the initial reference of each table to remember which table the single character maps to. That is both time-consuming and painful.
Column aliases should be used to more effectively communicate its contents. This is often necessary when there is a table that is used by several different teams for reporting purposes. One team refers to the companies they do business with as “clients” while the other refers to those companies as “customers.” Another common use case is derived attributes. The concatenation of customer first and last name in the code above is used to return each customer’s full name. Aliasing for the sake of clarity based on context and/or the result of deriving something is aligned with the goal of maintaining or increasing readability. What is not? Arbitrary removal of characters for the sake of simply making the column name shorter. For example, consider an original column name of “shipping_address” and an alias of “shp_adr.” The removal of characters forces the reader to mentally fill in the gaps based on context and experience.
In the code below, I have completely removed the superfluous table aliases as the original table names, customers and orders, are clear and accurately capture their contents.
WITH cte AS (
SELECT
oe.customers.customer_id,
(
oe.customers.cust_last_name
|| ', '
|| oe.customers.cust_first_name
) AS customer_name,
SUM(oe.orders.order_total) AS order_total,
DENSE_RANK() OVER (
ORDER BY SUM(oe.orders.order_total) DESC
) AS order_total_sequence
FROM
oe.customers
INNER JOIN
oe.orders
ON oe.customers.customer_id = oe.orders.customer_id
WHERE
oe.orders.order_date >= '01-JAN-08'
AND oe.orders.order_date < '01-JAN-09'
GROUP BY
oe.customers.customer_id,
(
oe.customers.cust_last_name
|| ', '
|| oe.customers.cust_first_name
)
)
SELECT
cte.customer_id,
cte.customer_name,
cte.order_total,
cte.order_total_sequence
FROM
cte
WHERE
cte.order_total_sequence <= 5
ORDER BY
3 DESC;
Common Table Expressions
Common table expressions (CTE) allow you to structure SQL code in a way that yields a transparent problem-solving strategy. They allow you to break complex problems down into manageable chunks or subproblems. A common mistake made by SQL users is to underestimate or be unaware of the benefits of providing meaningful names to CTEs. If you perform a quick search for SQL content on any popular social media platform, then you will, without a doubt, come across code containing CTEs named “cte,” “cte_1,” “cte_2,” and other less than useful names.
Avoid squandering the opportunity to inform the reader of your code on the intent or purpose of the CTE. In the code above, the purpose of the code defining “cte” is to compute the amount spent by each customer in year 2008. By providing the CTE with a meaningful name, we can prepare the reader for what is to come and even trigger intuition based on their past experiences in writing or reading code that sought to do the same.
Suppose the CTE is renamed to “customer_spending.” When reading the definition of the CTE, the question goes from, “What is this CTEs purpose?” to “How is this CTE defining and computing customer spend?” Providing meaningful names becomes increasingly important to readability when there are multiple CTEs or they are chained together where the output of one is used as input to another.
Ordinals in GROUP BY and ORDER BY Clauses
If you have read my blog post on the use of ordinals in the GROUP BY and ORDER BY clauses within production code, then you know I find it to be antipattern. For the purposes of this discussion, imagine the code above is lengthier, extending to several scroll frames, and also contains ordinals in GROUP BY clauses. For each GROUP BY and ORDER BY clause, you must be fortunate enough for the columns referenced in those clauses to be within the same frame or to have subconsciously noted the position of each column in the corresponding SELECT clause. Given that both those things are unlikely to occur, as the reader you must scroll back up to determine the attribute being referred to by the ordinal value (e.g., 1, 2, 3, etc.). After placing that information in your working memory, you must then scroll back down to where you left off and continue reading. That does not sound like a fun exercise, does it? Let us not impose that on the readers of our code.
The updated code containing the renamed common table expression and the removal of the ordinal in the ORDER BY clause can be found below.
WITH customer_spending AS (
SELECT
oe.customers.customer_id,
(
oe.customers.cust_last_name
|| ', '
|| oe.customers.cust_first_name
) AS customer_name,
SUM(oe.orders.order_total) AS order_total,
DENSE_RANK() OVER (
ORDER BY SUM(oe.orders.order_total) DESC
) AS order_total_sequence
FROM
oe.customers
INNER JOIN
oe.orders
ON oe.customers.customer_id = oe.orders.customer_id
WHERE
oe.orders.order_date >= '01-JAN-08'
AND oe.orders.order_date < '01-JAN-09'
GROUP BY
oe.customers.customer_id,
(
oe.customers.cust_last_name
|| ', '
|| oe.customers.cust_first_name
)
)
SELECT
customer_spending.customer_id,
customer_spending.customer_name,
customer_spending.order_total,
customer_spending.order_total_sequence
FROM
customer_spending
WHERE
customer_spending.order_total_sequence <= 5
ORDER BY
customer_spending.order_total DESC;
Key Takeaways
To recap, the use of indentation and new lines allows the reader to scan your code in one direction and understand where clauses begin and end. Providing tables with succinct but complete names allow the reader to avoid constructing a mapping for original table names and specified aliases. Similarly, columns with meaningful and context relevant names convey the contents and purpose of each. When using common table expressions, ensure the names provided offer a glimpse into the code defining each. The keystrokes you save by using ordinals are not worth the additional time and effort required of the reader of your code whether that be a peer or your future self. Finally, remember, code is often written once (at least initially) and read many times.
Thank you for visiting SQL Short Reads.