SQL Antipatterns: Using Ordinals in GROUP BY and ORDER BY Clauses

Time to Read:

5–7 minutes

The SQL features available to users across relational database management system products continue to evolve over time. In the past 30 years, features have been developed allowing us to concatenate values existing across multiple rows into a single row using LISTAGG, implement highly customizable ranking systems through window functions, and traverse hierarchal data and perform data scaffolding through recursive common table expressions, among other useful features. The evolving SQL standard along with the continuous improvements made by vendors in the form of proprietary features (e.g., functions) continue to deliver for SQL users; however, these new additions are not without their misses.

MySQL permits users to specify GROUP BY and ORDER BY clauses consisting of expressions representing the ordinal positions of attributes within the corresponding SELECT clause. For example, the query below computes the total sales for each product category within each reporting month and sorts the result set by reporting month and product category. The expressions 1 and 2 within the GROUP BY and ORDER BY clauses state, “GROUP/ORDER the result set by the first and second attributes specified in the SELECT clause.”

SELECT
	CAST(
		DATE_FORMAT(report_date, '%Y-%m-01')
        AS DATE
	) AS reporting_month,
    north_america_sale.product_category,
    SUM(total_sale) AS total_sale
FROM
	north_america_sale
GROUP BY
	1, 2
ORDER BY
	1, 2;
reporting_monthproduct_categorytotal_sale
2023-01-01Hardware10532049
2023-01-01Office Supplies9527596
2023-01-01Software10032955
2023-02-01Hardware7713650
2023-02-01Office Supplies8363496
2023-02-01Software8072274
2023-03-01Hardware10107952
2023-03-01Office Supplies9308036
2023-03-01Software9566038
2023-04-01Hardware9981157
2023-04-01Office Supplies9101319
2023-04-01Software8094593
2023-05-01Hardware8224588
2023-05-01Office Supplies9854939
2023-05-01Software8719227
2023-06-01Hardware8547870
2023-06-01Office Supplies8569213
2023-06-01Software9424211
2023-07-01Hardware8956408
2023-07-01Office Supplies8742601
2023-07-01Software8850498
2023-08-01Hardware9718295
2023-08-01Office Supplies8566646
2023-08-01Software9848671
2023-09-01Hardware9198295
2023-09-01Office Supplies8742664
2023-09-01Software9532093
2023-10-01Hardware9138984
2023-10-01Office Supplies8658959
2023-10-01Software8430418
2023-11-01Hardware9559386
2023-11-01Office Supplies11124204
2023-11-01Software9945806
2023-12-01Hardware9680977
2023-12-01Office Supplies11923226
2023-12-01Software11490082

Had this post a different title and opening, you would likely be rejoicing at the thought of being able to type fewer characters to achieve the same result. Instead, you are skeptical of this perceived benefit and rightly so! Let us explore are few reasons as to why it is an overall bad idea to incorporate this feature into production code.

Explicit is Better than Implicit
If you are also a python user, then you will likely be familiar with the aphorism from the Zen of Python stating explicit is better than implicit. Not all programming best practices should be applied universally across languages; however, this one is particularly relevant to SQL in general and this specific discussion.

Given the example code above, imagine you are not the author of the code, or you are reviewing it again after not setting eyes on it for a long time. You traverse the query line by line and arrive at the GROUP BY clause. Unfortunately, your top-down approach to reading the query is interrupted due to having to retrace your steps back to the SELECT clause to clarify the expressions specified in the GROUP BY clause as indicated by the ordinal values. After clarifying which expressions are specified in the GROUP BY clause, back down to the GROUP BY clause you go. Hopefully, you remember the code you previously read lying between the SELECT clause and the GROUP BY clause. If not, then I guess you will need to read those lines again, too. With any luck, you can complete that mental exercise and can move onto the ORDER BY clause. The fun continues. Do it again. Imagine if the example query above was more complicated and lengthier, requiring you to scroll up and down. Pure joy.

When the ordinal position of expressions in the SELECT clause is relied upon in the GROUP BY or ORDER BY clauses, the code’s readability is decreased. Do yourself and the consumers of your code a favor by explicitly specifying the expressions. Readability counts.

Jenga as a Query
Data pipelines and the processes and scripts that support them are bound to change over time and for good reason. Your data customers are constantly trying to explore their data in different ways in hopes of acquiring a better understanding of it and the processes that generate it. As a result, they may request attributes to be added or removed from tables and views to take them one step closer to meeting that objective. When the ordinal position of expressions in the SELECT clause is relied upon in the GROUP BY or ORDER BY clauses, the code is reduced to an inauspicious game of jenga. It takes but one addition or removal of an attribute to cause the code to fail entirely, or worse, the code is processed successfully and produces a result set that does not meet the customer’s requirements.

For example, suppose your customers have decided they no longer need monthly sales to be broken out by product category. To account for this change, you update the code above by removing the product category attribute from the SELECT clause. As a result of not remembering to also account for this change in the GROUP BY clause, the query comes tumbling down. It fails because the second column within the SELECT clause is now the total sales associated with the reporting month instead of the product category. Additionally, the result set is also inadvertently sorted by the total sales values. This produces no visual net-change; however, the overhead associated with the now unnecessary sort remains.

-- Code fails due to invalid GROUP BY.
SELECT
	CAST(
		DATE_FORMAT(report_date, '%Y-%m-01')
        AS DATE
	) AS reporting_month,
    SUM(total_sale) AS total_sale
FROM
	north_america_sale
GROUP BY
	1, 2
ORDER BY
	1, 2;

If caution is not taken when adding or rearranging attributes within the SELECT clause of queries utilizing ordinals in the ORDER BY clause, then the query can silently fail in terms of meeting requirements. For example, suppose the original query near the top of this page is changed ever so slightly to that shown below.

SELECT
	north_america_sale.product_category,
	CAST(
		DATE_FORMAT(report_date, '%Y-%m-01')
        AS DATE
	) AS reporting_month,
    SUM(total_sale) AS total_sale
FROM
	north_america_sale
GROUP BY
	1, 2
ORDER BY
	1, 2;
product_categoryreporting_monthtotal_sale
Hardware2023-01-0110532049
Hardware2023-02-017713650
Hardware2023-03-0110107952
Hardware2023-04-019981157
Hardware2023-05-018224588
Hardware2023-06-018547870
Hardware2023-07-018956408
Hardware2023-08-019718295
Hardware2023-09-019198295
Hardware2023-10-019138984
Hardware2023-11-019559386
Hardware2023-12-019680977
Office Supplies2023-01-019527596
Office Supplies2023-02-018363496
Office Supplies2023-03-019308036
Office Supplies2023-04-019101319
Office Supplies2023-05-019854939
Office Supplies2023-06-018569213
Office Supplies2023-07-018742601
Office Supplies2023-08-018566646
Office Supplies2023-09-018742664
Office Supplies2023-10-018658959
Office Supplies2023-11-0111124204
Office Supplies2023-12-0111923226
Software2023-01-0110032955
Software2023-02-018072274
Software2023-03-019566038
Software2023-04-018094593
Software2023-05-018719227
Software2023-06-019424211
Software2023-07-018850498
Software2023-08-019848671
Software2023-09-019532093
Software2023-10-018430418
Software2023-11-019945806
Software2023-12-0111490082

Notice the re-ordering of attributes in the SELECT clause. The query produces the same data; however, the result set is not sorted in the same way because the ORDER BY clause remained the same. The original sorting specification allows consumers to easily compare product category sales within each month while the new sorting specification allows consumers to more easily compare a single product category’s performance across the entire year. Ease of use is an important factor for reporting, so customers may deem the sorting to be a key requirement.

Appropriate Use Cases
Taking advantage of ability to use ordinals in the GROUP BY and ORDER BY clause is perfectly acceptable in ad-hoc queries. Exploring a database schema and its corresponding tables? Validating the integrity of the data within a table? Save those keystrokes if you so desire!

Did this post make you laugh because you would never think of using ordinals in your GROUP BY or ORDER BY clauses, cry because you are currently suffering from you or a co-worker having taken advantage of these features, both, or neither? Let me know if the comments section.

Thank you for visiting SQL Short Reads.

Published by

Chris Perry

Author and creator of SQL Short Reads.

Leave a Reply