WINDOW/ANALYTIC FUNCTIONS

Window functions were first made available in structured query language (SQL) in the early 2000s and have since garnered a lot of attention and hype. I’d argue the hype behind window functions is well-deserved given the power they afford the user and the ease in which they can be implemented.

With minimal code, sometimes even a one-liner, aggregates can be computed over groups of rows (e.g., 3-day moving average, average salary for each department), ranking systems can be implemented based on the user’s specification (e.g., rank each student’s performance based on GPA), and calculations referencing those values in rows before or after the current row can be made. Let’s begin looking at the syntax allowing window functions to be implemented so easily.

To use a window function, simply provide the window function name, the OVER clause, and the clauses applicable to your scenario within the OVER clause. The OVER clause and the clauses embedded within it are used to specify how the function will operate on the result set. The clauses commonly found within the OVER clause are the PARTITION BY, ORDER BY, and WINDOW clauses. The PARTITION BY, ORDER BY, and WINDOW clauses are optional; however, depending on the analytic function being used, a default WINDOW clause of RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is implied and used by default. Regarding defaults, consult your specific RDBMS documentation to be certain. The number of clauses requiring understanding may seem overwhelming at first, but I promise each of them are quite intuitive. Let’s walk through each of them in turn.

PARTITION BY Clause
The PARTITION BY clause specifies how the individual rows within a result set should be grouped, if applicable. The window function specified operates on each of these groups, separately. To illustrate the idea of partitioning, consider the result set below based on the employees table within the human resource (HR) schema at livesql.oracle.com. The data has been partitioned by the job title attribute with the individual rows containing the same job title value belonging to the same partition.

employee_id	job_title	salary	partition
100	President	24000	1
101	Administration Vice President	17000	2
102	Administration Vice President	17000	2
145	Sales Manager	14000	3
146	Sales Manager	13500	3
147	Sales Manager	12000	3
148	Sales Manager	11000	3
149	Sales Manager	10500	3
120	Stock Manager	8000	4
121	Stock Manager	8200	4
122	Stock Manager	7900	4
123	Stock Manager	6500	4
124	Stock Manager	5800	4

As previously mentioned, the specified window function will operate on each of these groups. Using the same data above, suppose there is a need to compute the average salary across those employees’ salaries sharing the same job and compare that value to the employee’s salary. The AVG function can be used as a window function and the result set partitioned by the job title attribute to achieve this. See the code and corresponding result set below.

SELECT
    hr.employees.employee_id,
    hr.jobs.job_title,
    hr.employees.salary,
    AVG(hr.employees.salary) OVER(
        PARTITION BY hr.jobs.job_title
    ) AS average_job_salary,
    hr.employees.salary
    - AVG(hr.employees.salary) OVER(
    	PARTITION BY hr.jobs.job_title
    ) AS difference
FROM
	hr.employees
	INNER JOIN
	hr.jobs
		ON hr.employees.job_id = hr.jobs.job_id
WHERE
	hr.employees.job_id IN (
    	'ST_MAN', 'SA_MAN',
    	'AD_VP', 'AD_PRES'
    );

employee_id	job_title	salary	average_job_salary	difference
102	Administration Vice President	17000	17000	0
101	Administration Vice President	17000	17000	0
100	President	24000	24000	0
147	Sales Manager	12000	12200	-200
146	Sales Manager	13500	12200	1300
145	Sales Manager	14000	12200	1800
148	Sales Manager	11000	12200	-1200
149	Sales Manager	10500	12200	-1700
120	Stock Manager	8000	7280	720
121	Stock Manager	8200	7280	920
122	Stock Manager	7900	7280	620
123	Stock Manager	6500	7280	-780
124	Stock Manager	5800	7280	-1480

Notice in the result set above, each row contains an average_job_salary value. This value is the average of those employees’ salaries sharing the same job with respect to the current row’s job title value. For example, the average salary for the five sales managers is $12200. This is the result of incorporating a PARTITION BY clause using the job title attribute. Employees are separated into partitions and the average salary is computed within each partition. This value is then reflected within each row belonging to the corresponding partition. If it helps, when reading PARTITION BY clauses, read them as “for each…do this.” In this case, “for each group of employees sharing the same job title value, compute the average salary of those employees.” The derived attribute, difference, goes the extra mile to make it easier for the consumers of the data by computing the difference between each employee’s salary and the average salary value for those employees with the same job.

To reiterate why this functionality is useful, the window function allowed a row-level difference to be computed using values originating from two different levels of detail, the pre-existing employee salary within each row and the average job salary for those employees with the same job title value as the current row’s job title value. It is also worth mentioning, prior to the existence of window functions, a JOIN to a derived table would likely be the go-to strategy for achieving the same thing.

In the previous example, the PARTITION BY clause contains a single expression (i.e., job_title); however, based on the specific task, the partitions may need to be based on multiple expressions or partitioning may not be required at all.

PARTITION BY Clause Not Specified
In the absence of a PARTITION BY clause, it is implied the window function will be computed using all rows (i.e., a single group). To further reinforce the idea of a window encompassing all rows, consider the task of comparing each employee’s salary to the average salary of all employees. This can be achieved with the single line of code shown in the query below.

SELECT
	hr.employees.employee_id,
	hr.employees.salary,
    AVG(hr.employees.salary) OVER() AS average_salary,
    hr.employees.salary - AVG(hr.employees.salary) OVER() AS difference
FROM
	hr.employees;

Line 3 computes the average salary across all employees (i.e., rows) and is represented within each row. Line 4 computes the difference between each employee’s salary and the average employee salary returned by the window function. For clarity, see the result set below. Note the average salary and the difference is rounded to two decimal places.

employee_id	salary	average_salary	difference
100	24000	6461.83	17538.17
101	17000	6461.83	10538.17
102	17000	6461.83	10538.17
103	9000	6461.83	2538.17
104	6000	6461.83	-461.83
105	4800	6461.83	-1661.83
106	4800	6461.83	-1661.83
107	4200	6461.83	-2261.83
108	12008	6461.83	5546.17
109	9000	6461.83	2538.17
110	8200	6461.83	1738.17
111	7700	6461.83	1238.17
112	7800	6461.83	1338.17
113	6900	6461.83	438.17
114	11000	6461.83	4538.17
115	3100	6461.83	-3361.83
116	2900	6461.83	-3561.83
117	2800	6461.83	-3661.83
118	2600	6461.83	-3861.83
119	2500	6461.83	-3961.83
120	8000	6461.83	1538.17
121	8200	6461.83	1738.17
122	7900	6461.83	1438.17
123	6500	6461.83	38.17
124	5800	6461.83	-661.83
125	3200	6461.83	-3261.83
126	2700	6461.83	-3761.83
127	2400	6461.83	-4061.83
128	2200	6461.83	-4261.83
129	3300	6461.83	-3161.83
130	2800	6461.83	-3661.83
131	2500	6461.83	-3961.83
132	2100	6461.83	-4361.83
133	3300	6461.83	-3161.83
134	2900	6461.83	-3561.83
135	2400	6461.83	-4061.83
136	2200	6461.83	-4261.83
137	3600	6461.83	-2861.83
138	3200	6461.83	-3261.83
139	2700	6461.83	-3761.83
140	2500	6461.83	-3961.83
141	3500	6461.83	-2961.83
142	3100	6461.83	-3361.83
143	2600	6461.83	-3861.83
144	2500	6461.83	-3961.83
145	14000	6461.83	7538.17
146	13500	6461.83	7038.17
147	12000	6461.83	5538.17
148	11000	6461.83	4538.17
149	10500	6461.83	4038.17
150	10000	6461.83	3538.17
151	9500	6461.83	3038.17
152	9000	6461.83	2538.17
153	8000	6461.83	1538.17
154	7500	6461.83	1038.17
155	7000	6461.83	538.17
156	10000	6461.83	3538.17
157	9500	6461.83	3038.17
158	9000	6461.83	2538.17
159	8000	6461.83	1538.17
160	7500	6461.83	1038.17
161	7000	6461.83	538.17
162	10500	6461.83	4038.17
163	9500	6461.83	3038.17
164	7200	6461.83	738.17
165	6800	6461.83	338.17
166	6400	6461.83	-61.83
167	6200	6461.83	-261.83
168	11500	6461.83	5038.17
169	10000	6461.83	3538.17
170	9600	6461.83	3138.17
171	7400	6461.83	938.17
172	7300	6461.83	838.17
173	6100	6461.83	-361.83
174	11000	6461.83	4538.17
175	8800	6461.83	2338.17
176	8600	6461.83	2138.17
177	8400	6461.83	1938.17
178	7000	6461.83	538.17
179	6200	6461.83	-261.83
180	3200	6461.83	-3261.83
181	3100	6461.83	-3361.83
182	2500	6461.83	-3961.83
183	2800	6461.83	-3661.83
184	4200	6461.83	-2261.83
185	4100	6461.83	-2361.83
186	3400	6461.83	-3061.83
187	3000	6461.83	-3461.83
188	3800	6461.83	-2661.83
189	3600	6461.83	-2861.83
190	2900	6461.83	-3561.83
191	2500	6461.83	-3961.83
192	4000	6461.83	-2461.83
193	3900	6461.83	-2561.83
194	3200	6461.83	-3261.83
195	2800	6461.83	-3661.83
196	3100	6461.83	-3361.83
197	3000	6461.83	-3461.83
198	2600	6461.83	-3861.83
199	2600	6461.83	-3861.83
200	4400	6461.83	-2061.83
201	13000	6461.83	6538.17
202	6000	6461.83	-461.83
203	6500	6461.83	38.17
204	10000	6461.83	3538.17
205	12008	6461.83	5546.17
206	8300	6461.83	1838.17

Like the previous example, there is another way to achieve the same result. A CROSS JOIN could be used to cascade the average employee salary down the entire column, allowing the row-level comparison to be made and difference computed. That said, I find the use of the window function more intuitive when reading code I’ve written in the past and when reviewing the code of others.

PARTITION BY Clause With Multiple Expressions
In many instances, the instructions for partitioning or grouping individual rows will require more than a single expression. Building onto the example above, suppose there is a desire to compute the average salary for each job ID and manager ID combination and have those values represented in the corresponding rows. Achieving this is as easy as specifying so in the PARTITION BY clause by separating the two expressions with a comma. See the code and corresponding result set below.

SELECT
    hr.employees.employee_id,
    hr.employees.manager_id,
    hr.employees.job_id,
    hr.employees.salary,
    AVG(hr.employees.salary) OVER(
        PARTITION BY
    		hr.employees.manager_id,
    		hr.employees.job_id
    ) AS average_job_salary,
    hr.employees.salary
    - AVG(hr.employees.salary) OVER(
    	PARTITION BY
    		hr.employees.manager_id,
    		hr.employees.job_id
    ) AS difference
FROM
    hr.employees
ORDER BY
    hr.employees.manager_id,
    hr.employees.job_id,
    difference DESC;

employee_id	manager_id	job_id	salary	average_job_salary	difference
102	100	AD_VP	17000	17000	0
101	100	AD_VP	17000	17000	0
201	100	MK_MAN	13000	13000	0
114	100	PU_MAN	11000	11000	0
145	100	SA_MAN	14000	12200	1800
146	100	SA_MAN	13500	12200	1300
147	100	SA_MAN	12000	12200	-200
148	100	SA_MAN	11000	12200	-1200
149	100	SA_MAN	10500	12200	-1700
121	100	ST_MAN	8200	7280	920
120	100	ST_MAN	8000	7280	720
122	100	ST_MAN	7900	7280	620
123	100	ST_MAN	6500	7280	-780
124	100	ST_MAN	5800	7280	-1480
205	101	AC_MGR	12008	12008	0
200	101	AD_ASST	4400	4400	0
108	101	FI_MGR	12008	12008	0
203	101	HR_REP	6500	6500	0
204	101	PR_REP	10000	10000	0
103	102	IT_PROG	9000	9000	0
104	103	IT_PROG	6000	4950	1050
106	103	IT_PROG	4800	4950	-150
105	103	IT_PROG	4800	4950	-150
107	103	IT_PROG	4200	4950	-750
109	108	FI_ACCOUNT	9000	7920	1080
110	108	FI_ACCOUNT	8200	7920	280
112	108	FI_ACCOUNT	7800	7920	-120
111	108	FI_ACCOUNT	7700	7920	-220
113	108	FI_ACCOUNT	6900	7920	-1020
115	114	PU_CLERK	3100	2780	320
116	114	PU_CLERK	2900	2780	120
117	114	PU_CLERK	2800	2780	20
118	114	PU_CLERK	2600	2780	-180
119	114	PU_CLERK	2500	2780	-280
180	120	SH_CLERK	3200	2900	300
181	120	SH_CLERK	3100	2900	200
183	120	SH_CLERK	2800	2900	-100
182	120	SH_CLERK	2500	2900	-400
125	120	ST_CLERK	3200	2625	575
126	120	ST_CLERK	2700	2625	75
127	120	ST_CLERK	2400	2625	-225
128	120	ST_CLERK	2200	2625	-425
184	121	SH_CLERK	4200	3675	525
185	121	SH_CLERK	4100	3675	425
186	121	SH_CLERK	3400	3675	-275
187	121	SH_CLERK	3000	3675	-675
129	121	ST_CLERK	3300	2675	625
130	121	ST_CLERK	2800	2675	125
131	121	ST_CLERK	2500	2675	-175
132	121	ST_CLERK	2100	2675	-575
188	122	SH_CLERK	3800	3200	600
189	122	SH_CLERK	3600	3200	400
190	122	SH_CLERK	2900	3200	-300
191	122	SH_CLERK	2500	3200	-700
133	122	ST_CLERK	3300	2700	600
134	122	ST_CLERK	2900	2700	200
135	122	ST_CLERK	2400	2700	-300
136	122	ST_CLERK	2200	2700	-500
192	123	SH_CLERK	4000	3475	525
193	123	SH_CLERK	3900	3475	425
194	123	SH_CLERK	3200	3475	-275
195	123	SH_CLERK	2800	3475	-675
137	123	ST_CLERK	3600	3000	600
138	123	ST_CLERK	3200	3000	200
139	123	ST_CLERK	2700	3000	-300
140	123	ST_CLERK	2500	3000	-500
196	124	SH_CLERK	3100	2825	275
197	124	SH_CLERK	3000	2825	175
198	124	SH_CLERK	2600	2825	-225
199	124	SH_CLERK	2600	2825	-225
141	124	ST_CLERK	3500	2925	575
142	124	ST_CLERK	3100	2925	175
143	124	ST_CLERK	2600	2925	-325
144	124	ST_CLERK	2500	2925	-425
150	145	SA_REP	10000	8500	1500
151	145	SA_REP	9500	8500	1000
152	145	SA_REP	9000	8500	500
153	145	SA_REP	8000	8500	-500
154	145	SA_REP	7500	8500	-1000
155	145	SA_REP	7000	8500	-1500
156	146	SA_REP	10000	8500	1500
157	146	SA_REP	9500	8500	1000
158	146	SA_REP	9000	8500	500
159	146	SA_REP	8000	8500	-500
160	146	SA_REP	7500	8500	-1000
161	146	SA_REP	7000	8500	-1500
162	147	SA_REP	10500	7766.666666666666666666666666666666666667	2733.333333333333333333333333333333333333
163	147	SA_REP	9500	7766.666666666666666666666666666666666667	1733.333333333333333333333333333333333333
164	147	SA_REP	7200	7766.666666666666666666666666666666666667	-566.666666666666666666666666666666666667
165	147	SA_REP	6800	7766.666666666666666666666666666666666667	-966.666666666666666666666666666666666667
166	147	SA_REP	6400	7766.666666666666666666666666666666666667	-1366.666666666666666666666666666666666667
167	147	SA_REP	6200	7766.666666666666666666666666666666666667	-1566.666666666666666666666666666666666667
168	148	SA_REP	11500	8650	2850
169	148	SA_REP	10000	8650	1350
170	148	SA_REP	9600	8650	950
171	148	SA_REP	7400	8650	-1250
172	148	SA_REP	7300	8650	-1350
173	148	SA_REP	6100	8650	-2550
174	149	SA_REP	11000	8333.333333333333333333333333333333333333	2666.666666666666666666666666666666666667
175	149	SA_REP	8800	8333.333333333333333333333333333333333333	466.666666666666666666666666666666666667
176	149	SA_REP	8600	8333.333333333333333333333333333333333333	266.666666666666666666666666666666666667
177	149	SA_REP	8400	8333.333333333333333333333333333333333333	66.666666666666666666666666666666666667
178	149	SA_REP	7000	8333.333333333333333333333333333333333333	-1333.333333333333333333333333333333333333
179	149	SA_REP	6200	8333.333333333333333333333333333333333333	-2133.333333333333333333333333333333333333
202	201	MK_REP	6000	6000	0
206	205	AC_ACCOUNT	8300	8300	0
100	–	AD_PRES	24000	24000	0

Why might it be useful to perform such a computation? To answer this question, recall the first two examples above. In the second example, the average salary was computed across all employees and returned within each row to permit a row-level difference and a comparison to be made between the employee’s salary and the average salary of all employees. Such a comparison is not likely to be very meaningful given the large variances in salary values across jobs. In the first example, the average salary was computed using those employee salary values aligned to each job title (i.e., partition specified). This is likely to yield more meaningful information given a comparison that is closer to an apples-to-apples comparison. It allows consumers of the data to understand how the average salaries look across job titles and where each employee’s salary is relative to the average salary of employees having the same job. The latest query above returns a result set enabling data consumers to easily understand how salary values differ among employees holding the same job and reporting to the same manager. With little effort, data consumers can also explore how salaries for a particular job vary depending on the manager the employee reports to.

Keep in mind, while an expression was added to the PARTITION BY clause in this example, the complexity or number of expressions specified in the clause should be driven by the task at hand. Additional or fewer expressions will in most cases change the output of the window function.

ORDER BY Clause
When used within the OVER clause, the ORDER BY clause is used to specify the sorting of the rows within each partition. In the previous examples, the sort order of the individual rows within each partition was not specified and is unnecessary for those specific tasks; however, sorting is particularly useful for implementing the concept of RANK and computing moving/rolling aggregates. Before exploring a couple of examples, I feel it is important to point out the nuance of the ORDER BY clause when used within the OVER clause. Using ORDER BY within the OVER clause doesn’t guarantee the result set output by the query will be in the order specified. It is simply the explicit instruction on how to sort the rows within each partition. If it is desirable to order the result set, use the ORDER BY clause outside of the window function as seen in the examples above.

With that nuance mentioned, let’s now explore the example tasks.

Ranking
For the first example, consider the employee data below. There is one row per employee and captured within each row is the corresponding employee’s name and salary.

employee_id	employee_name	salary
100	Steven King	24000
101	Neena Kochhar	17000
102	Lex De Haan	17000
103	Alexander Hunold	9000
104	Bruce Ernst	6000
105	David Austin	4800
106	Valli Pataballa	4800
107	Diana Lorentz	4200
108	Nancy Greenberg	12008
109	Daniel Faviet	9000
110	John Chen	8200
111	Ismael Sciarra	7700
112	Jose Manuel Urman	7800
113	Luis Popp	6900
114	Den Raphaely	11000
115	Alexander Khoo	3100
116	Shelli Baida	2900
117	Sigal Tobias	2800
118	Guy Himuro	2600
119	Karen Colmenares	2500
120	Matthew Weiss	8000
121	Adam Fripp	8200
122	Payam Kaufling	7900
123	Shanta Vollman	6500
124	Kevin Mourgos	5800
125	Julia Nayer	3200
126	Irene Mikkilineni	2700
127	James Landry	2400
128	Steven Markle	2200
129	Laura Bissot	3300
130	Mozhe Atkinson	2800
131	James Marlow	2500
132	TJ Olson	2100
133	Jason Mallin	3300
134	Michael Rogers	2900
135	Ki Gee	2400
136	Hazel Philtanker	2200
137	Renske Ladwig	3600
138	Stephen Stiles	3200
139	John Seo	2700
140	Joshua Patel	2500
141	Trenna Rajs	3500
142	Curtis Davies	3100
143	Randall Matos	2600
144	Peter Vargas	2500
145	John Russell	14000
146	Karen Partners	13500
147	Alberto Errazuriz	12000
148	Gerald Cambrault	11000
149	Eleni Zlotkey	10500
150	Peter Tucker	10000
151	David Bernstein	9500
152	Peter Hall	9000
153	Christopher Olsen	8000
154	Nanette Cambrault	7500
155	Oliver Tuvault	7000
156	Janette King	10000
157	Patrick Sully	9500
158	Allan McEwen	9000
159	Lindsey Smith	8000
160	Louise Doran	7500
161	Sarath Sewall	7000
162	Clara Vishney	10500
163	Danielle Greene	9500
164	Mattea Marvins	7200
165	David Lee	6800
166	Sundar Ande	6400
167	Amit Banda	6200
168	Lisa Ozer	11500
169	Harrison Bloom	10000
170	Tayler Fox	9600
171	William Smith	7400
172	Elizabeth Bates	7300
173	Sundita Kumar	6100
174	Ellen Abel	11000
175	Alyssa Hutton	8800
176	Jonathon Taylor	8600
177	Jack Livingston	8400
178	Kimberely Grant	7000
179	Charles Johnson	6200
180	Winston Taylor	3200
181	Jean Fleaur	3100
182	Martha Sullivan	2500
183	Girard Geoni	2800
184	Nandita Sarchand	4200
185	Alexis Bull	4100
186	Julia Dellinger	3400
187	Anthony Cabrio	3000
188	Kelly Chung	3800
189	Jennifer Dilly	3600
190	Timothy Gates	2900
191	Randall Perkins	2500
192	Sarah Bell	4000
193	Britney Everett	3900
194	Samuel McCain	3200
195	Vance Jones	2800
196	Alana Walsh	3100
197	Kevin Feeney	3000
198	Donald OConnell	2600
199	Douglas Grant	2600
200	Jennifer Whalen	4400
201	Michael Hartstein	13000
202	Pat Fay	6000
203	Susan Mavris	6500
204	Hermann Baer	10000
205	Shelley Higgins	12008
206	William Gietz	8300

Additionally, imagine there is a need to rank the employees above by their individual salary values in descending order (i.e., greatest to least). A subset of window functions often referred to as ranking functions can be used to fulfill this task. See the code and corresponding result set below.

SELECT
	hr.employees.employee_id,
    (
    	hr.employees.first_name 
    	|| ' '
        || hr.employees.last_name
    ) AS employee_name,
    hr.employees.salary,
    DENSE_RANK() OVER (
        ORDER BY hr.employees.salary DESC
    ) AS salary_rank
FROM
	hr.employees
ORDER BY
	hr.employees.salary DESC;

employee_id	employee_name	salary	salary_rank
100	Steven King	24000	1
101	Neena Kochhar	17000	2
102	Lex De Haan	17000	2
145	John Russell	14000	3
146	Karen Partners	13500	4
201	Michael Hartstein	13000	5
108	Nancy Greenberg	12008	6
205	Shelley Higgins	12008	6
147	Alberto Errazuriz	12000	7
168	Lisa Ozer	11500	8
114	Den Raphaely	11000	9
148	Gerald Cambrault	11000	9
174	Ellen Abel	11000	9
149	Eleni Zlotkey	10500	10
162	Clara Vishney	10500	10
156	Janette King	10000	11
150	Peter Tucker	10000	11
204	Hermann Baer	10000	11
169	Harrison Bloom	10000	11
170	Tayler Fox	9600	12
163	Danielle Greene	9500	13
151	David Bernstein	9500	13
157	Patrick Sully	9500	13
152	Peter Hall	9000	14
103	Alexander Hunold	9000	14
158	Allan McEwen	9000	14
109	Daniel Faviet	9000	14
175	Alyssa Hutton	8800	15
176	Jonathon Taylor	8600	16
177	Jack Livingston	8400	17
206	William Gietz	8300	18
121	Adam Fripp	8200	19
110	John Chen	8200	19
120	Matthew Weiss	8000	20
159	Lindsey Smith	8000	20
153	Christopher Olsen	8000	20
122	Payam Kaufling	7900	21
112	Jose Manuel Urman	7800	22
111	Ismael Sciarra	7700	23
160	Louise Doran	7500	24
154	Nanette Cambrault	7500	24
171	William Smith	7400	25
172	Elizabeth Bates	7300	26
164	Mattea Marvins	7200	27
155	Oliver Tuvault	7000	28
178	Kimberely Grant	7000	28
161	Sarath Sewall	7000	28
113	Luis Popp	6900	29
165	David Lee	6800	30
203	Susan Mavris	6500	31
123	Shanta Vollman	6500	31
166	Sundar Ande	6400	32
179	Charles Johnson	6200	33
167	Amit Banda	6200	33
173	Sundita Kumar	6100	34
202	Pat Fay	6000	35
104	Bruce Ernst	6000	35
124	Kevin Mourgos	5800	36
106	Valli Pataballa	4800	37
105	David Austin	4800	37
200	Jennifer Whalen	4400	38
184	Nandita Sarchand	4200	39
107	Diana Lorentz	4200	39
185	Alexis Bull	4100	40
192	Sarah Bell	4000	41
193	Britney Everett	3900	42
188	Kelly Chung	3800	43
189	Jennifer Dilly	3600	44
137	Renske Ladwig	3600	44
141	Trenna Rajs	3500	45
186	Julia Dellinger	3400	46
133	Jason Mallin	3300	47
129	Laura Bissot	3300	47
125	Julia Nayer	3200	48
194	Samuel McCain	3200	48
180	Winston Taylor	3200	48
138	Stephen Stiles	3200	48
115	Alexander Khoo	3100	49
196	Alana Walsh	3100	49
181	Jean Fleaur	3100	49
142	Curtis Davies	3100	49
197	Kevin Feeney	3000	50
187	Anthony Cabrio	3000	50
116	Shelli Baida	2900	51
190	Timothy Gates	2900	51
134	Michael Rogers	2900	51
117	Sigal Tobias	2800	52
195	Vance Jones	2800	52
183	Girard Geoni	2800	52
130	Mozhe Atkinson	2800	52
139	John Seo	2700	53
126	Irene Mikkilineni	2700	53
118	Guy Himuro	2600	54
199	Douglas Grant	2600	54
198	Donald OConnell	2600	54
143	Randall Matos	2600	54
119	Karen Colmenares	2500	55
131	James Marlow	2500	55
191	Randall Perkins	2500	55
144	Peter Vargas	2500	55
182	Martha Sullivan	2500	55
140	Joshua Patel	2500	55
135	Ki Gee	2400	56
127	James Landry	2400	56
136	Hazel Philtanker	2200	57
128	Steven Markle	2200	57
132	TJ Olson	2100	58

When using ranking functions, the ORDER BY clause within the OVER clause is required. As previously mentioned, this clause is used to specify how to order the rows within each partition. In this example, the PARTITION BY clause is absent, indicating the ranking function will be applied across all employees. The highest paid employee will be assigned a salary rank value of 1, the second highest paid employee a salary rank value of 2, and so on. Because the DENSE_RANK function is being used, ties in salary values are resolved by assigning the same integer value to those rows involved in each specific tie and the row with the next highest salary value being assigned the next consecutive integer value. This example can be taken a step further by slightly changing the ranking requirement. Instead of ranking employees by their individual salary values when compared to all other employees, each employee’s rank can be computed by comparing the employee’s salary against the salaries of those employees holding the same job. This can be achieved by simply including the job title attribute within a PARTITION BY clause. As a result of doing so, each partition consists of employees with the same job and the ranking function is applied to each partition, individually. For clarity, see the code and corresponding result set below.

SELECT
	hr.employees.employee_id,
    (
    	hr.employees.first_name 
    	|| ' '
        || hr.employees.last_name
    ) AS employee_name,
    hr.jobs.job_title,
    hr.employees.salary,
    DENSE_RANK() OVER (
    	PARTITION BY hr.jobs.job_title
        ORDER BY hr.employees.salary DESC
    ) AS salary_rank
FROM
	hr.employees
	INNER JOIN
	hr.jobs
		ON hr.employees.job_id = hr.jobs.job_id
ORDER BY
	hr.jobs.job_title,
	hr.employees.salary DESC;

employee_id	employee_name	job_title	salary	salary_rank
109	Daniel Faviet	Accountant	9000	1
110	John Chen	Accountant	8200	2
112	Jose Manuel Urman	Accountant	7800	3
111	Ismael Sciarra	Accountant	7700	4
113	Luis Popp	Accountant	6900	5
205	Shelley Higgins	Accounting Manager	12008	1
200	Jennifer Whalen	Administration Assistant	4400	1
102	Lex De Haan	Administration Vice President	17000	1
101	Neena Kochhar	Administration Vice President	17000	1
108	Nancy Greenberg	Finance Manager	12008	1
203	Susan Mavris	Human Resources Representative	6500	1
201	Michael Hartstein	Marketing Manager	13000	1
202	Pat Fay	Marketing Representative	6000	1
100	Steven King	President	24000	1
103	Alexander Hunold	Programmer	9000	1
104	Bruce Ernst	Programmer	6000	2
106	Valli Pataballa	Programmer	4800	3
105	David Austin	Programmer	4800	3
107	Diana Lorentz	Programmer	4200	4
206	William Gietz	Public Accountant	8300	1
204	Hermann Baer	Public Relations Representative	10000	1
115	Alexander Khoo	Purchasing Clerk	3100	1
116	Shelli Baida	Purchasing Clerk	2900	2
117	Sigal Tobias	Purchasing Clerk	2800	3
118	Guy Himuro	Purchasing Clerk	2600	4
119	Karen Colmenares	Purchasing Clerk	2500	5
114	Den Raphaely	Purchasing Manager	11000	1
145	John Russell	Sales Manager	14000	1
146	Karen Partners	Sales Manager	13500	2
147	Alberto Errazuriz	Sales Manager	12000	3
148	Gerald Cambrault	Sales Manager	11000	4
149	Eleni Zlotkey	Sales Manager	10500	5
168	Lisa Ozer	Sales Representative	11500	1
174	Ellen Abel	Sales Representative	11000	2
162	Clara Vishney	Sales Representative	10500	3
156	Janette King	Sales Representative	10000	4
150	Peter Tucker	Sales Representative	10000	4
169	Harrison Bloom	Sales Representative	10000	4
170	Tayler Fox	Sales Representative	9600	5
151	David Bernstein	Sales Representative	9500	6
157	Patrick Sully	Sales Representative	9500	6
163	Danielle Greene	Sales Representative	9500	6
158	Allan McEwen	Sales Representative	9000	7
152	Peter Hall	Sales Representative	9000	7
175	Alyssa Hutton	Sales Representative	8800	8
176	Jonathon Taylor	Sales Representative	8600	9
177	Jack Livingston	Sales Representative	8400	10
159	Lindsey Smith	Sales Representative	8000	11
153	Christopher Olsen	Sales Representative	8000	11
154	Nanette Cambrault	Sales Representative	7500	12
160	Louise Doran	Sales Representative	7500	12
171	William Smith	Sales Representative	7400	13
172	Elizabeth Bates	Sales Representative	7300	14
164	Mattea Marvins	Sales Representative	7200	15
161	Sarath Sewall	Sales Representative	7000	16
155	Oliver Tuvault	Sales Representative	7000	16
178	Kimberely Grant	Sales Representative	7000	16
165	David Lee	Sales Representative	6800	17
166	Sundar Ande	Sales Representative	6400	18
167	Amit Banda	Sales Representative	6200	19
179	Charles Johnson	Sales Representative	6200	19
173	Sundita Kumar	Sales Representative	6100	20
184	Nandita Sarchand	Shipping Clerk	4200	1
185	Alexis Bull	Shipping Clerk	4100	2
192	Sarah Bell	Shipping Clerk	4000	3
193	Britney Everett	Shipping Clerk	3900	4
188	Kelly Chung	Shipping Clerk	3800	5
189	Jennifer Dilly	Shipping Clerk	3600	6
186	Julia Dellinger	Shipping Clerk	3400	7
180	Winston Taylor	Shipping Clerk	3200	8
194	Samuel McCain	Shipping Clerk	3200	8
181	Jean Fleaur	Shipping Clerk	3100	9
196	Alana Walsh	Shipping Clerk	3100	9
187	Anthony Cabrio	Shipping Clerk	3000	10
197	Kevin Feeney	Shipping Clerk	3000	10
190	Timothy Gates	Shipping Clerk	2900	11
183	Girard Geoni	Shipping Clerk	2800	12
195	Vance Jones	Shipping Clerk	2800	12
199	Douglas Grant	Shipping Clerk	2600	13
198	Donald OConnell	Shipping Clerk	2600	13
182	Martha Sullivan	Shipping Clerk	2500	14
191	Randall Perkins	Shipping Clerk	2500	14
137	Renske Ladwig	Stock Clerk	3600	1
141	Trenna Rajs	Stock Clerk	3500	2
129	Laura Bissot	Stock Clerk	3300	3
133	Jason Mallin	Stock Clerk	3300	3
125	Julia Nayer	Stock Clerk	3200	4
138	Stephen Stiles	Stock Clerk	3200	4
142	Curtis Davies	Stock Clerk	3100	5
134	Michael Rogers	Stock Clerk	2900	6
130	Mozhe Atkinson	Stock Clerk	2800	7
126	Irene Mikkilineni	Stock Clerk	2700	8
139	John Seo	Stock Clerk	2700	8
143	Randall Matos	Stock Clerk	2600	9
144	Peter Vargas	Stock Clerk	2500	10
140	Joshua Patel	Stock Clerk	2500	10
131	James Marlow	Stock Clerk	2500	10
135	Ki Gee	Stock Clerk	2400	11
127	James Landry	Stock Clerk	2400	11
136	Hazel Philtanker	Stock Clerk	2200	12
128	Steven Markle	Stock Clerk	2200	12
132	TJ Olson	Stock Clerk	2100	13
121	Adam Fripp	Stock Manager	8200	1
120	Matthew Weiss	Stock Manager	8000	2
122	Payam Kaufling	Stock Manager	7900	3
123	Shanta Vollman	Stock Manager	6500	4
124	Kevin Mourgos	Stock Manager	5800	5

For an in-depth discussion on ranking functions and more involved, real-world examples, see my page dedicated to the topic here.

Aggregates with Window Clauses
The next example explores the concepts of computing moving aggregates and user-defined window clauses. Imagine you’ve been asked to compute a 3-day moving average for sales using the data in the table below. The data contains one row per day and the corresponding daily sales value.

report_date	sales
01-OCT-23	1467
02-OCT-23	1192
03-OCT-23	4663
04-OCT-23	4018
05-OCT-23	4500
06-OCT-23	4711
07-OCT-23	3728
08-OCT-23	3598
09-OCT-23	1980
10-OCT-23	2471
11-OCT-23	2441
12-OCT-23	3855
13-OCT-23	4634
14-OCT-23	3083
15-OCT-23	4561
16-OCT-23	4829
17-OCT-23	4413
18-OCT-23	1166
19-OCT-23	4991
20-OCT-23	1063
21-OCT-23	3821
22-OCT-23	4631
23-OCT-23	2914
24-OCT-23	2358
25-OCT-23	3501
26-OCT-23	3594
27-OCT-23	3260
28-OCT-23	1309

The code used to generate the fictional data set above, compute and the 3-day moving average values, and the corresponding result set can be found below. Following the result set is a detailed explanation of the solution.

EXEC DBMS_RANDOM.SEED(42);

WITH daily_report (report_date, daily_order_total) AS (
    SELECT
		TRUNC(SYSDATE) AS report_date,
		FLOOR(DBMS_RANDOM.VALUE(1000, 5001)) AS daily_order_total
    FROM
		dual
	UNION ALL
    SELECT
		daily_report.report_date - 1,
		FLOOR(DBMS_RANDOM.VALUE(1000, 5001))
    FROM
		daily_report
    WHERE
		daily_report.report_date > TO_DATE('2023-10-01', 'YYYY-MM-DD')
)
SELECT
	daily_report.report_date,
	daily_report.daily_order_total,
	ROUND(
    	AVG(daily_report.daily_order_total) OVER (
			ORDER BY daily_report.report_date
			ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
		),
    	2
    ) AS three_day_moving_average
FROM
	daily_report
ORDER BY
	daily_report.report_date;

report_date	sales	three_day_moving_average
01-OCT-23	1467	1467
02-OCT-23	1192	1329.5
03-OCT-23	4663	2440.67
04-OCT-23	4018	3291
05-OCT-23	4500	4393.67
06-OCT-23	4711	4409.67
07-OCT-23	3728	4313
08-OCT-23	3598	4012.33
09-OCT-23	1980	3102
10-OCT-23	2471	2683
11-OCT-23	2441	2297.33
12-OCT-23	3855	2922.33
13-OCT-23	4634	3643.33
14-OCT-23	3083	3857.33
15-OCT-23	4561	4092.67
16-OCT-23	4829	4157.67
17-OCT-23	4413	4601
18-OCT-23	1166	3469.33
19-OCT-23	4991	3523.33
20-OCT-23	1063	2406.67
21-OCT-23	3821	3291.67
22-OCT-23	4631	3171.67
23-OCT-23	2914	3788.67
24-OCT-23	2358	3301
25-OCT-23	3501	2924.33
26-OCT-23	3594	3151
27-OCT-23	3260	3451.67
28-OCT-23	1309	2721

In the code above, a new clause is introduced within the OVER clause often referred to as the window or window clause. This clause allows a user to specify the row(s) or value(s) to be used when calculating the current row’s value. Using the code and corresponding result set above, the 3-day moving average for the first row is the row’s sales value (1467), the value for the second row is the average of the first and second rows’ values ((1467 + 1192)/2), and the value for the third row is the average of the two rows prior and the current row ((1467 + 1192 + 4663)/3). To compute the 3-day moving average value for the fourth row (04-OCT-23), the window must slide down one row. The computation will then consist of computing the average sales value of the second, third, and fourth rows’ sales values. For each subsequent row, the window slides down one row and uses the sales value in the current row and those values within the two rows prior to the current row for the computation. Perhaps not immediately obvious is the importance of the ORDER BY clause within the OVER clause in this example. It plays a crucial role in ensuring the rows are in chronological order and consequently fall into the correct window or 3-day period when performing each row’s computation.

Window Clause Options
Similar to the real-world task incorporated into the example above where each window is comprised of the current row and the two rows preceding the current row, the task of computing a 5-day moving average requires the window to be defined as ROWS BETWEEN 5 PRECEDING AND CURRENT ROW. What is more, the window may even be comprised of rows beyond the current row. In such cases, the FOLLOWING keyword can be used to specify the number of rows following the current row to use in the current row’s computation. The window specification is task dependent! Below is a list of keywords and a brief explanation of how they can be used.

ROWS
Is used to specify a fixed or unbounded number of rows used in each window relative to the current row.

RANGE
Is used to specify the values used in each window relative to the current row’s value. For example, suppose you have data on the price of a stock over time and wish to compute the average stock price for the current row’s stock date value and those stock price values in those rows with stock date values within a range of one day preceding and one day following. In such a case, the window specification could be RANGE BETWEEN INTERVAL ‘1’ DAY PRECEDING AND INTERVAL ‘1’ DAY FOLLOWING.

BETWEEN
Is used to specify the first and last row to be used in each window, if applicable. Some tasks may not require two endpoints, or the endpoint may be implied when not expressed explicitly.

PRECEDING
Is used to specify the number of rows prior to the current row to be used in each row’s computation. In some cases, a task will require each window to be comprised of the current row and all rows prior to the current row. For example, a rolling/cumulative sum will have a window specification of ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

FOLLOWING
Is used to specify the number of rows after the current row to be used in each row’s computation. Like the PRECEDING keyword, some tasks will require each window to be comprised of the current row and all rows after the current row. A task requiring the current row, the row preceding the current row, and the row following the current row will have a window specification of ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING. The PARTITION BY Clause Not Specified section above walked through the task of capturing the average salary of all employees within each row. In that example, the OVER clause is not populated and the window of ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING is implied.

For additional discussion and more real-world examples covering aggregates with window clause specifications, see my page dedicated to the topic here.

Closing Thoughts
Evidenced by the handful of real-world problems covered, it’s hard to understate the power of window functions. Minimal effort in terms of lines of code and overall SQL knowledge is required to solve each problem. Keep in mind, these are real, everyday problems and you will likely have to solve some flavor or variation of these problems at some point. When solving problems requiring window functions, remember to let the task drive the choice of window function and its corresponding arguments by first understanding what you are trying to accomplish. Additionally, do not fall into the trap of believing every problem is best solved by window functions.

Additional Information
In the final example, a recursive common table expression (CTE) was used to generate report dates and their corresponding daily order total values. If you’re unfamiliar with the concepts of recursion and/or common table expressions, consider reading my recursion and common table expression pages to learn how to incorporate these concepts into real-world coding solutions.

Knowledge Check
If you’d like to confirm your understanding on the use of window functions and other fundamental concepts, consider solving a handful of my practice problems here.