WINDOW/ANALYTIC FUNCTIONS

Window functions were first made available in structured query language (SQL) in the early 2000s and have since garnered a lot of attention and hype. I’d argue the hype behind window functions is well-deserved given the power they afford the user and the ease in which they can be implemented.

With minimal code, sometimes even a one-liner, aggregates can be computed over groups of rows (e.g., 3-day moving average, average salary for each department), ranking systems can be implemented based on the user’s specification (e.g., rank each student’s performance based on GPA), and calculations referencing those values in rows before or after the current row can be made. Let’s begin looking at the syntax allowing window functions to be implemented so easily.

To use a window function, simply provide the window function name, the OVER clause, and the clauses applicable to your scenario within the OVER clause. The OVER clause and the clauses embedded within it are used to specify how the function will operate on the result set. The clauses commonly found within the OVER clause are the PARTITION BY, ORDER BY, and WINDOW clauses. The PARTITION BY, ORDER BY, and WINDOW clauses are optional; however, depending on the analytic function being used, a default WINDOW clause of RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW is implied and used by default. Regarding defaults, consult your specific RDBMS documentation to be certain. The number of clauses requiring understanding may seem overwhelming at first, but I promise each of them are quite intuitive. Let’s walk through each of them in turn.

PARTITION BY Clause
The PARTITION BY clause specifies how the individual rows within a result set should be grouped, if applicable. The window function specified operates on each of these groups, separately. To illustrate the idea of partitioning, consider the result set below based on the employees table within the human resource (HR) schema at livesql.oracle.com. The data has been partitioned by the job title attribute with the individual rows containing the same job title value belonging to the same partition.

employee_idjob_titlesalarypartition
100President240001
101Administration Vice President170002
102Administration Vice President170002
145Sales Manager140003
146Sales Manager135003
147Sales Manager120003
148Sales Manager110003
149Sales Manager105003
120Stock Manager80004
121Stock Manager82004
122Stock Manager79004
123Stock Manager65004
124Stock Manager58004

As previously mentioned, the specified window function will operate on each of these groups. Using the same data above, suppose there is a need to compute the average salary across those employees’ salaries sharing the same job and compare that value to the employee’s salary. The AVG function can be used as a window function and the result set partitioned by the job title attribute to achieve this. See the code and corresponding result set below.

SELECT
    hr.employees.employee_id,
    hr.jobs.job_title,
    hr.employees.salary,
    AVG(hr.employees.salary) OVER(
        PARTITION BY hr.jobs.job_title
    ) AS average_job_salary,
    hr.employees.salary
    - AVG(hr.employees.salary) OVER(
    	PARTITION BY hr.jobs.job_title
    ) AS difference
FROM
	hr.employees
	INNER JOIN
	hr.jobs
		ON hr.employees.job_id = hr.jobs.job_id
WHERE
	hr.employees.job_id IN (
    	'ST_MAN', 'SA_MAN',
    	'AD_VP', 'AD_PRES'
    );
employee_idjob_titlesalaryaverage_job_salarydifference
102Administration Vice President17000170000
101Administration Vice President17000170000
100President24000240000
147Sales Manager1200012200-200
146Sales Manager13500122001300
145Sales Manager14000122001800
148Sales Manager1100012200-1200
149Sales Manager1050012200-1700
120Stock Manager80007280720
121Stock Manager82007280920
122Stock Manager79007280620
123Stock Manager65007280-780
124Stock Manager58007280-1480

Notice in the result set above, each row contains an average_job_salary value. This value is the average of those employees’ salaries sharing the same job with respect to the current row’s job title value. For example, the average salary for the five sales managers is $12200. This is the result of incorporating a PARTITION BY clause using the job title attribute. Employees are separated into partitions and the average salary is computed within each partition. This value is then reflected within each row belonging to the corresponding partition. If it helps, when reading PARTITION BY clauses, read them as “for each…do this.” In this case, “for each group of employees sharing the same job title value, compute the average salary of those employees.” The derived attribute, difference, goes the extra mile to make it easier for the consumers of the data by computing the difference between each employee’s salary and the average salary value for those employees with the same job.

To reiterate why this functionality is useful, the window function allowed a row-level difference to be computed using values originating from two different levels of detail, the pre-existing employee salary within each row and the average job salary for those employees with the same job title value as the current row’s job title value. It is also worth mentioning, prior to the existence of window functions, a JOIN to a derived table would likely be the go-to strategy for achieving the same thing.

In the previous example, the PARTITION BY clause contains a single expression (i.e., job_title); however, based on the specific task, the partitions may need to be based on multiple expressions or partitioning may not be required at all.

PARTITION BY Clause Not Specified
In the absence of a PARTITION BY clause, it is implied the window function will be computed using all rows (i.e., a single group). To further reinforce the idea of a window encompassing all rows, consider the task of comparing each employee’s salary to the average salary of all employees. This can be achieved with the single line of code shown in the query below.

SELECT
	hr.employees.employee_id,
	hr.employees.salary,
    AVG(hr.employees.salary) OVER() AS average_salary,
    hr.employees.salary - AVG(hr.employees.salary) OVER() AS difference
FROM
	hr.employees;

Line 3 computes the average salary across all employees (i.e., rows) and is represented within each row. Line 4 computes the difference between each employee’s salary and the average employee salary returned by the window function. For clarity, see the result set below. Note the average salary and the difference is rounded to two decimal places.

employee_idsalaryaverage_salarydifference
100240006461.8317538.17
101170006461.8310538.17
102170006461.8310538.17
10390006461.832538.17
10460006461.83-461.83
10548006461.83-1661.83
10648006461.83-1661.83
10742006461.83-2261.83
108120086461.835546.17
10990006461.832538.17
11082006461.831738.17
11177006461.831238.17
11278006461.831338.17
11369006461.83438.17
114110006461.834538.17
11531006461.83-3361.83
11629006461.83-3561.83
11728006461.83-3661.83
11826006461.83-3861.83
11925006461.83-3961.83
12080006461.831538.17
12182006461.831738.17
12279006461.831438.17
12365006461.8338.17
12458006461.83-661.83
12532006461.83-3261.83
12627006461.83-3761.83
12724006461.83-4061.83
12822006461.83-4261.83
12933006461.83-3161.83
13028006461.83-3661.83
13125006461.83-3961.83
13221006461.83-4361.83
13333006461.83-3161.83
13429006461.83-3561.83
13524006461.83-4061.83
13622006461.83-4261.83
13736006461.83-2861.83
13832006461.83-3261.83
13927006461.83-3761.83
14025006461.83-3961.83
14135006461.83-2961.83
14231006461.83-3361.83
14326006461.83-3861.83
14425006461.83-3961.83
145140006461.837538.17
146135006461.837038.17
147120006461.835538.17
148110006461.834538.17
149105006461.834038.17
150100006461.833538.17
15195006461.833038.17
15290006461.832538.17
15380006461.831538.17
15475006461.831038.17
15570006461.83538.17
156100006461.833538.17
15795006461.833038.17
15890006461.832538.17
15980006461.831538.17
16075006461.831038.17
16170006461.83538.17
162105006461.834038.17
16395006461.833038.17
16472006461.83738.17
16568006461.83338.17
16664006461.83-61.83
16762006461.83-261.83
168115006461.835038.17
169100006461.833538.17
17096006461.833138.17
17174006461.83938.17
17273006461.83838.17
17361006461.83-361.83
174110006461.834538.17
17588006461.832338.17
17686006461.832138.17
17784006461.831938.17
17870006461.83538.17
17962006461.83-261.83
18032006461.83-3261.83
18131006461.83-3361.83
18225006461.83-3961.83
18328006461.83-3661.83
18442006461.83-2261.83
18541006461.83-2361.83
18634006461.83-3061.83
18730006461.83-3461.83
18838006461.83-2661.83
18936006461.83-2861.83
19029006461.83-3561.83
19125006461.83-3961.83
19240006461.83-2461.83
19339006461.83-2561.83
19432006461.83-3261.83
19528006461.83-3661.83
19631006461.83-3361.83
19730006461.83-3461.83
19826006461.83-3861.83
19926006461.83-3861.83
20044006461.83-2061.83
201130006461.836538.17
20260006461.83-461.83
20365006461.8338.17
204100006461.833538.17
205120086461.835546.17
20683006461.831838.17

Like the previous example, there is another way to achieve the same result. A CROSS JOIN could be used to cascade the average employee salary down the entire column, allowing the row-level comparison to be made and difference computed. That said, I find the use of the window function more intuitive when reading code I’ve written in the past and when reviewing the code of others.

PARTITION BY Clause With Multiple Expressions
In many instances, the instructions for partitioning or grouping individual rows will require more than a single expression. Building onto the example above, suppose there is a desire to compute the average salary for each job ID and manager ID combination and have those values represented in the corresponding rows. Achieving this is as easy as specifying so in the PARTITION BY clause by separating the two expressions with a comma. See the code and corresponding result set below.

SELECT
    hr.employees.employee_id,
    hr.employees.manager_id,
    hr.employees.job_id,
    hr.employees.salary,
    AVG(hr.employees.salary) OVER(
        PARTITION BY
    		hr.employees.manager_id,
    		hr.employees.job_id
    ) AS average_job_salary,
    hr.employees.salary
    - AVG(hr.employees.salary) OVER(
    	PARTITION BY
    		hr.employees.manager_id,
    		hr.employees.job_id
    ) AS difference
FROM
    hr.employees
ORDER BY
    hr.employees.manager_id,
    hr.employees.job_id,
    difference DESC;
employee_idmanager_idjob_idsalaryaverage_job_salarydifference
102100AD_VP17000170000
101100AD_VP17000170000
201100MK_MAN13000130000
114100PU_MAN11000110000
145100SA_MAN14000122001800
146100SA_MAN13500122001300
147100SA_MAN1200012200-200
148100SA_MAN1100012200-1200
149100SA_MAN1050012200-1700
121100ST_MAN82007280920
120100ST_MAN80007280720
122100ST_MAN79007280620
123100ST_MAN65007280-780
124100ST_MAN58007280-1480
205101AC_MGR12008120080
200101AD_ASST440044000
108101FI_MGR12008120080
203101HR_REP650065000
204101PR_REP10000100000
103102IT_PROG900090000
104103IT_PROG600049501050
106103IT_PROG48004950-150
105103IT_PROG48004950-150
107103IT_PROG42004950-750
109108FI_ACCOUNT900079201080
110108FI_ACCOUNT82007920280
112108FI_ACCOUNT78007920-120
111108FI_ACCOUNT77007920-220
113108FI_ACCOUNT69007920-1020
115114PU_CLERK31002780320
116114PU_CLERK29002780120
117114PU_CLERK2800278020
118114PU_CLERK26002780-180
119114PU_CLERK25002780-280
180120SH_CLERK32002900300
181120SH_CLERK31002900200
183120SH_CLERK28002900-100
182120SH_CLERK25002900-400
125120ST_CLERK32002625575
126120ST_CLERK2700262575
127120ST_CLERK24002625-225
128120ST_CLERK22002625-425
184121SH_CLERK42003675525
185121SH_CLERK41003675425
186121SH_CLERK34003675-275
187121SH_CLERK30003675-675
129121ST_CLERK33002675625
130121ST_CLERK28002675125
131121ST_CLERK25002675-175
132121ST_CLERK21002675-575
188122SH_CLERK38003200600
189122SH_CLERK36003200400
190122SH_CLERK29003200-300
191122SH_CLERK25003200-700
133122ST_CLERK33002700600
134122ST_CLERK29002700200
135122ST_CLERK24002700-300
136122ST_CLERK22002700-500
192123SH_CLERK40003475525
193123SH_CLERK39003475425
194123SH_CLERK32003475-275
195123SH_CLERK28003475-675
137123ST_CLERK36003000600
138123ST_CLERK32003000200
139123ST_CLERK27003000-300
140123ST_CLERK25003000-500
196124SH_CLERK31002825275
197124SH_CLERK30002825175
198124SH_CLERK26002825-225
199124SH_CLERK26002825-225
141124ST_CLERK35002925575
142124ST_CLERK31002925175
143124ST_CLERK26002925-325
144124ST_CLERK25002925-425
150145SA_REP1000085001500
151145SA_REP950085001000
152145SA_REP90008500500
153145SA_REP80008500-500
154145SA_REP75008500-1000
155145SA_REP70008500-1500
156146SA_REP1000085001500
157146SA_REP950085001000
158146SA_REP90008500500
159146SA_REP80008500-500
160146SA_REP75008500-1000
161146SA_REP70008500-1500
162147SA_REP105007766.6666666666666666666666666666666666672733.333333333333333333333333333333333333
163147SA_REP95007766.6666666666666666666666666666666666671733.333333333333333333333333333333333333
164147SA_REP72007766.666666666666666666666666666666666667-566.666666666666666666666666666666666667
165147SA_REP68007766.666666666666666666666666666666666667-966.666666666666666666666666666666666667
166147SA_REP64007766.666666666666666666666666666666666667-1366.666666666666666666666666666666666667
167147SA_REP62007766.666666666666666666666666666666666667-1566.666666666666666666666666666666666667
168148SA_REP1150086502850
169148SA_REP1000086501350
170148SA_REP96008650950
171148SA_REP74008650-1250
172148SA_REP73008650-1350
173148SA_REP61008650-2550
174149SA_REP110008333.3333333333333333333333333333333333332666.666666666666666666666666666666666667
175149SA_REP88008333.333333333333333333333333333333333333466.666666666666666666666666666666666667
176149SA_REP86008333.333333333333333333333333333333333333266.666666666666666666666666666666666667
177149SA_REP84008333.33333333333333333333333333333333333366.666666666666666666666666666666666667
178149SA_REP70008333.333333333333333333333333333333333333-1333.333333333333333333333333333333333333
179149SA_REP62008333.333333333333333333333333333333333333-2133.333333333333333333333333333333333333
202201MK_REP600060000
206205AC_ACCOUNT830083000
100AD_PRES24000240000

Why might it be useful to perform such a computation? To answer this question, recall the first two examples above. In the second example, the average salary was computed across all employees and returned within each row to permit a row-level difference and a comparison to be made between the employee’s salary and the average salary of all employees. Such a comparison is not likely to be very meaningful given the large variances in salary values across jobs. In the first example, the average salary was computed using those employee salary values aligned to each job title (i.e., partition specified). This is likely to yield more meaningful information given a comparison that is closer to an apples-to-apples comparison. It allows consumers of the data to understand how the average salaries look across job titles and where each employee’s salary is relative to the average salary of employees having the same job. The latest query above returns a result set enabling data consumers to easily understand how salary values differ among employees holding the same job and reporting to the same manager. With little effort, data consumers can also explore how salaries for a particular job vary depending on the manager the employee reports to.

Keep in mind, while an expression was added to the PARTITION BY clause in this example, the complexity or number of expressions specified in the clause should be driven by the task at hand. Additional or fewer expressions will in most cases change the output of the window function.

ORDER BY Clause
When used within the OVER clause, the ORDER BY clause is used to specify the sorting of the rows within each partition. In the previous examples, the sort order of the individual rows within each partition was not specified and is unnecessary for those specific tasks; however, sorting is particularly useful for implementing the concept of RANK and computing moving/rolling aggregates. Before exploring a couple of examples, I feel it is important to point out the nuance of the ORDER BY clause when used within the OVER clause. Using ORDER BY within the OVER clause doesn’t guarantee the result set output by the query will be in the order specified. It is simply the explicit instruction on how to sort the rows within each partition. If it is desirable to order the result set, use the ORDER BY clause outside of the window function as seen in the examples above.

With that nuance mentioned, let’s now explore the example tasks.

Ranking
For the first example, consider the employee data below. There is one row per employee and captured within each row is the corresponding employee’s name and salary.

employee_idemployee_namesalary
100Steven King24000
101Neena Kochhar17000
102Lex De Haan17000
103Alexander Hunold9000
104Bruce Ernst6000
105David Austin4800
106Valli Pataballa4800
107Diana Lorentz4200
108Nancy Greenberg12008
109Daniel Faviet9000
110John Chen8200
111Ismael Sciarra7700
112Jose Manuel Urman7800
113Luis Popp6900
114Den Raphaely11000
115Alexander Khoo3100
116Shelli Baida2900
117Sigal Tobias2800
118Guy Himuro2600
119Karen Colmenares2500
120Matthew Weiss8000
121Adam Fripp8200
122Payam Kaufling7900
123Shanta Vollman6500
124Kevin Mourgos5800
125Julia Nayer3200
126Irene Mikkilineni2700
127James Landry2400
128Steven Markle2200
129Laura Bissot3300
130Mozhe Atkinson2800
131James Marlow2500
132TJ Olson2100
133Jason Mallin3300
134Michael Rogers2900
135Ki Gee2400
136Hazel Philtanker2200
137Renske Ladwig3600
138Stephen Stiles3200
139John Seo2700
140Joshua Patel2500
141Trenna Rajs3500
142Curtis Davies3100
143Randall Matos2600
144Peter Vargas2500
145John Russell14000
146Karen Partners13500
147Alberto Errazuriz12000
148Gerald Cambrault11000
149Eleni Zlotkey10500
150Peter Tucker10000
151David Bernstein9500
152Peter Hall9000
153Christopher Olsen8000
154Nanette Cambrault7500
155Oliver Tuvault7000
156Janette King10000
157Patrick Sully9500
158Allan McEwen9000
159Lindsey Smith8000
160Louise Doran7500
161Sarath Sewall7000
162Clara Vishney10500
163Danielle Greene9500
164Mattea Marvins7200
165David Lee6800
166Sundar Ande6400
167Amit Banda6200
168Lisa Ozer11500
169Harrison Bloom10000
170Tayler Fox9600
171William Smith7400
172Elizabeth Bates7300
173Sundita Kumar6100
174Ellen Abel11000
175Alyssa Hutton8800
176Jonathon Taylor8600
177Jack Livingston8400
178Kimberely Grant7000
179Charles Johnson6200
180Winston Taylor3200
181Jean Fleaur3100
182Martha Sullivan2500
183Girard Geoni2800
184Nandita Sarchand4200
185Alexis Bull4100
186Julia Dellinger3400
187Anthony Cabrio3000
188Kelly Chung3800
189Jennifer Dilly3600
190Timothy Gates2900
191Randall Perkins2500
192Sarah Bell4000
193Britney Everett3900
194Samuel McCain3200
195Vance Jones2800
196Alana Walsh3100
197Kevin Feeney3000
198Donald OConnell2600
199Douglas Grant2600
200Jennifer Whalen4400
201Michael Hartstein13000
202Pat Fay6000
203Susan Mavris6500
204Hermann Baer10000
205Shelley Higgins12008
206William Gietz8300

Additionally, imagine there is a need to rank the employees above by their individual salary values in descending order (i.e., greatest to least). A subset of window functions often referred to as ranking functions can be used to fulfill this task. See the code and corresponding result set below.

SELECT
	hr.employees.employee_id,
    (
    	hr.employees.first_name 
    	|| ' '
        || hr.employees.last_name
    ) AS employee_name,
    hr.employees.salary,
    DENSE_RANK() OVER (
        ORDER BY hr.employees.salary DESC
    ) AS salary_rank
FROM
	hr.employees
ORDER BY
	hr.employees.salary DESC;
employee_idemployee_namesalarysalary_rank
100Steven King240001
101Neena Kochhar170002
102Lex De Haan170002
145John Russell140003
146Karen Partners135004
201Michael Hartstein130005
108Nancy Greenberg120086
205Shelley Higgins120086
147Alberto Errazuriz120007
168Lisa Ozer115008
114Den Raphaely110009
148Gerald Cambrault110009
174Ellen Abel110009
149Eleni Zlotkey1050010
162Clara Vishney1050010
156Janette King1000011
150Peter Tucker1000011
204Hermann Baer1000011
169Harrison Bloom1000011
170Tayler Fox960012
163Danielle Greene950013
151David Bernstein950013
157Patrick Sully950013
152Peter Hall900014
103Alexander Hunold900014
158Allan McEwen900014
109Daniel Faviet900014
175Alyssa Hutton880015
176Jonathon Taylor860016
177Jack Livingston840017
206William Gietz830018
121Adam Fripp820019
110John Chen820019
120Matthew Weiss800020
159Lindsey Smith800020
153Christopher Olsen800020
122Payam Kaufling790021
112Jose Manuel Urman780022
111Ismael Sciarra770023
160Louise Doran750024
154Nanette Cambrault750024
171William Smith740025
172Elizabeth Bates730026
164Mattea Marvins720027
155Oliver Tuvault700028
178Kimberely Grant700028
161Sarath Sewall700028
113Luis Popp690029
165David Lee680030
203Susan Mavris650031
123Shanta Vollman650031
166Sundar Ande640032
179Charles Johnson620033
167Amit Banda620033
173Sundita Kumar610034
202Pat Fay600035
104Bruce Ernst600035
124Kevin Mourgos580036
106Valli Pataballa480037
105David Austin480037
200Jennifer Whalen440038
184Nandita Sarchand420039
107Diana Lorentz420039
185Alexis Bull410040
192Sarah Bell400041
193Britney Everett390042
188Kelly Chung380043
189Jennifer Dilly360044
137Renske Ladwig360044
141Trenna Rajs350045
186Julia Dellinger340046
133Jason Mallin330047
129Laura Bissot330047
125Julia Nayer320048
194Samuel McCain320048
180Winston Taylor320048
138Stephen Stiles320048
115Alexander Khoo310049
196Alana Walsh310049
181Jean Fleaur310049
142Curtis Davies310049
197Kevin Feeney300050
187Anthony Cabrio300050
116Shelli Baida290051
190Timothy Gates290051
134Michael Rogers290051
117Sigal Tobias280052
195Vance Jones280052
183Girard Geoni280052
130Mozhe Atkinson280052
139John Seo270053
126Irene Mikkilineni270053
118Guy Himuro260054
199Douglas Grant260054
198Donald OConnell260054
143Randall Matos260054
119Karen Colmenares250055
131James Marlow250055
191Randall Perkins250055
144Peter Vargas250055
182Martha Sullivan250055
140Joshua Patel250055
135Ki Gee240056
127James Landry240056
136Hazel Philtanker220057
128Steven Markle220057
132TJ Olson210058

When using ranking functions, the ORDER BY clause within the OVER clause is required. As previously mentioned, this clause is used to specify how to order the rows within each partition. In this example, the PARTITION BY clause is absent, indicating the ranking function will be applied across all employees. The highest paid employee will be assigned a salary rank value of 1, the second highest paid employee a salary rank value of 2, and so on. Because the DENSE_RANK function is being used, ties in salary values are resolved by assigning the same integer value to those rows involved in each specific tie and the row with the next highest salary value being assigned the next consecutive integer value. This example can be taken a step further by slightly changing the ranking requirement. Instead of ranking employees by their individual salary values when compared to all other employees, each employee’s rank can be computed by comparing the employee’s salary against the salaries of those employees holding the same job. This can be achieved by simply including the job title attribute within a PARTITION BY clause. As a result of doing so, each partition consists of employees with the same job and the ranking function is applied to each partition, individually. For clarity, see the code and corresponding result set below.

SELECT
	hr.employees.employee_id,
    (
    	hr.employees.first_name 
    	|| ' '
        || hr.employees.last_name
    ) AS employee_name,
    hr.jobs.job_title,
    hr.employees.salary,
    DENSE_RANK() OVER (
    	PARTITION BY hr.jobs.job_title
        ORDER BY hr.employees.salary DESC
    ) AS salary_rank
FROM
	hr.employees
	INNER JOIN
	hr.jobs
		ON hr.employees.job_id = hr.jobs.job_id
ORDER BY
	hr.jobs.job_title,
	hr.employees.salary DESC;
employee_idemployee_namejob_titlesalarysalary_rank
109Daniel FavietAccountant90001
110John ChenAccountant82002
112Jose Manuel UrmanAccountant78003
111Ismael SciarraAccountant77004
113Luis PoppAccountant69005
205Shelley HigginsAccounting Manager120081
200Jennifer WhalenAdministration Assistant44001
102Lex De HaanAdministration Vice President170001
101Neena KochharAdministration Vice President170001
108Nancy GreenbergFinance Manager120081
203Susan MavrisHuman Resources Representative65001
201Michael HartsteinMarketing Manager130001
202Pat FayMarketing Representative60001
100Steven KingPresident240001
103Alexander HunoldProgrammer90001
104Bruce ErnstProgrammer60002
106Valli PataballaProgrammer48003
105David AustinProgrammer48003
107Diana LorentzProgrammer42004
206William GietzPublic Accountant83001
204Hermann BaerPublic Relations Representative100001
115Alexander KhooPurchasing Clerk31001
116Shelli BaidaPurchasing Clerk29002
117Sigal TobiasPurchasing Clerk28003
118Guy HimuroPurchasing Clerk26004
119Karen ColmenaresPurchasing Clerk25005
114Den RaphaelyPurchasing Manager110001
145John RussellSales Manager140001
146Karen PartnersSales Manager135002
147Alberto ErrazurizSales Manager120003
148Gerald CambraultSales Manager110004
149Eleni ZlotkeySales Manager105005
168Lisa OzerSales Representative115001
174Ellen AbelSales Representative110002
162Clara VishneySales Representative105003
156Janette KingSales Representative100004
150Peter TuckerSales Representative100004
169Harrison BloomSales Representative100004
170Tayler FoxSales Representative96005
151David BernsteinSales Representative95006
157Patrick SullySales Representative95006
163Danielle GreeneSales Representative95006
158Allan McEwenSales Representative90007
152Peter HallSales Representative90007
175Alyssa HuttonSales Representative88008
176Jonathon TaylorSales Representative86009
177Jack LivingstonSales Representative840010
159Lindsey SmithSales Representative800011
153Christopher OlsenSales Representative800011
154Nanette CambraultSales Representative750012
160Louise DoranSales Representative750012
171William SmithSales Representative740013
172Elizabeth BatesSales Representative730014
164Mattea MarvinsSales Representative720015
161Sarath SewallSales Representative700016
155Oliver TuvaultSales Representative700016
178Kimberely GrantSales Representative700016
165David LeeSales Representative680017
166Sundar AndeSales Representative640018
167Amit BandaSales Representative620019
179Charles JohnsonSales Representative620019
173Sundita KumarSales Representative610020
184Nandita SarchandShipping Clerk42001
185Alexis BullShipping Clerk41002
192Sarah BellShipping Clerk40003
193Britney EverettShipping Clerk39004
188Kelly ChungShipping Clerk38005
189Jennifer DillyShipping Clerk36006
186Julia DellingerShipping Clerk34007
180Winston TaylorShipping Clerk32008
194Samuel McCainShipping Clerk32008
181Jean FleaurShipping Clerk31009
196Alana WalshShipping Clerk31009
187Anthony CabrioShipping Clerk300010
197Kevin FeeneyShipping Clerk300010
190Timothy GatesShipping Clerk290011
183Girard GeoniShipping Clerk280012
195Vance JonesShipping Clerk280012
199Douglas GrantShipping Clerk260013
198Donald OConnellShipping Clerk260013
182Martha SullivanShipping Clerk250014
191Randall PerkinsShipping Clerk250014
137Renske LadwigStock Clerk36001
141Trenna RajsStock Clerk35002
129Laura BissotStock Clerk33003
133Jason MallinStock Clerk33003
125Julia NayerStock Clerk32004
138Stephen StilesStock Clerk32004
142Curtis DaviesStock Clerk31005
134Michael RogersStock Clerk29006
130Mozhe AtkinsonStock Clerk28007
126Irene MikkilineniStock Clerk27008
139John SeoStock Clerk27008
143Randall MatosStock Clerk26009
144Peter VargasStock Clerk250010
140Joshua PatelStock Clerk250010
131James MarlowStock Clerk250010
135Ki GeeStock Clerk240011
127James LandryStock Clerk240011
136Hazel PhiltankerStock Clerk220012
128Steven MarkleStock Clerk220012
132TJ OlsonStock Clerk210013
121Adam FrippStock Manager82001
120Matthew WeissStock Manager80002
122Payam KauflingStock Manager79003
123Shanta VollmanStock Manager65004
124Kevin MourgosStock Manager58005

For an in-depth discussion on ranking functions and more involved, real-world examples, see my page dedicated to the topic here.

Aggregates with Window Clauses
The next example explores the concepts of computing moving aggregates and user-defined window clauses. Imagine you’ve been asked to compute a 3-day moving average for sales using the data in the table below. The data contains one row per day and the corresponding daily sales value.

report_datesales
01-OCT-231467
02-OCT-231192
03-OCT-234663
04-OCT-234018
05-OCT-234500
06-OCT-234711
07-OCT-233728
08-OCT-233598
09-OCT-231980
10-OCT-232471
11-OCT-232441
12-OCT-233855
13-OCT-234634
14-OCT-233083
15-OCT-234561
16-OCT-234829
17-OCT-234413
18-OCT-231166
19-OCT-234991
20-OCT-231063
21-OCT-233821
22-OCT-234631
23-OCT-232914
24-OCT-232358
25-OCT-233501
26-OCT-233594
27-OCT-233260
28-OCT-231309

The code used to generate the fictional data set above, compute and the 3-day moving average values, and the corresponding result set can be found below. Following the result set is a detailed explanation of the solution.

EXEC DBMS_RANDOM.SEED(42);

WITH daily_report (report_date, daily_order_total) AS (
    SELECT
		TRUNC(SYSDATE) AS report_date,
		FLOOR(DBMS_RANDOM.VALUE(1000, 5001)) AS daily_order_total
    FROM
		dual
	UNION ALL
    SELECT
		daily_report.report_date - 1,
		FLOOR(DBMS_RANDOM.VALUE(1000, 5001))
    FROM
		daily_report
    WHERE
		daily_report.report_date > TO_DATE('2023-10-01', 'YYYY-MM-DD')
)
SELECT
	daily_report.report_date,
	daily_report.daily_order_total,
	ROUND(
    	AVG(daily_report.daily_order_total) OVER (
			ORDER BY daily_report.report_date
			ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
		),
    	2
    ) AS three_day_moving_average
FROM
	daily_report
ORDER BY
	daily_report.report_date;
report_datesalesthree_day_moving_average
01-OCT-2314671467
02-OCT-2311921329.5
03-OCT-2346632440.67
04-OCT-2340183291
05-OCT-2345004393.67
06-OCT-2347114409.67
07-OCT-2337284313
08-OCT-2335984012.33
09-OCT-2319803102
10-OCT-2324712683
11-OCT-2324412297.33
12-OCT-2338552922.33
13-OCT-2346343643.33
14-OCT-2330833857.33
15-OCT-2345614092.67
16-OCT-2348294157.67
17-OCT-2344134601
18-OCT-2311663469.33
19-OCT-2349913523.33
20-OCT-2310632406.67
21-OCT-2338213291.67
22-OCT-2346313171.67
23-OCT-2329143788.67
24-OCT-2323583301
25-OCT-2335012924.33
26-OCT-2335943151
27-OCT-2332603451.67
28-OCT-2313092721

In the code above, a new clause is introduced within the OVER clause often referred to as the window or window clause. This clause allows a user to specify the row(s) or value(s) to be used when calculating the current row’s value. Using the code and corresponding result set above, the 3-day moving average for the first row is the row’s sales value (1467), the value for the second row is the average of the first and second rows’ values ((1467 + 1192)/2), and the value for the third row is the average of the two rows prior and the current row ((1467 + 1192 + 4663)/3). To compute the 3-day moving average value for the fourth row (04-OCT-23), the window must slide down one row. The computation will then consist of computing the average sales value of the second, third, and fourth rows’ sales values. For each subsequent row, the window slides down one row and uses the sales value in the current row and those values within the two rows prior to the current row for the computation. Perhaps not immediately obvious is the importance of the ORDER BY clause within the OVER clause in this example. It plays a crucial role in ensuring the rows are in chronological order and consequently fall into the correct window or 3-day period when performing each row’s computation.

Window Clause Options
Similar to the real-world task incorporated into the example above where each window is comprised of the current row and the two rows preceding the current row, the task of computing a 5-day moving average requires the window to be defined as ROWS BETWEEN 5 PRECEDING AND CURRENT ROW. What is more, the window may even be comprised of rows beyond the current row. In such cases, the FOLLOWING keyword can be used to specify the number of rows following the current row to use in the current row’s computation. The window specification is task dependent! Below is a list of keywords and a brief explanation of how they can be used.

ROWS
Is used to specify a fixed or unbounded number of rows used in each window relative to the current row.

RANGE
Is used to specify the values used in each window relative to the current row’s value. For example, suppose you have data on the price of a stock over time and wish to compute the average stock price for the current row’s stock date value and those stock price values in those rows with stock date values within a range of one day preceding and one day following. In such a case, the window specification could be RANGE BETWEEN INTERVAL ‘1’ DAY PRECEDING AND INTERVAL ‘1’ DAY FOLLOWING.

BETWEEN
Is used to specify the first and last row to be used in each window, if applicable. Some tasks may not require two endpoints, or the endpoint may be implied when not expressed explicitly.

PRECEDING
Is used to specify the number of rows prior to the current row to be used in each row’s computation. In some cases, a task will require each window to be comprised of the current row and all rows prior to the current row. For example, a rolling/cumulative sum will have a window specification of ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.

FOLLOWING
Is used to specify the number of rows after the current row to be used in each row’s computation. Like the PRECEDING keyword, some tasks will require each window to be comprised of the current row and all rows after the current row. A task requiring the current row, the row preceding the current row, and the row following the current row will have a window specification of ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING. The PARTITION BY Clause Not Specified section above walked through the task of capturing the average salary of all employees within each row. In that example, the OVER clause is not populated and the window of ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING is implied.

For additional discussion and more real-world examples covering aggregates with window clause specifications, see my page dedicated to the topic here.

Closing Thoughts
Evidenced by the handful of real-world problems covered, it’s hard to understate the power of window functions. Minimal effort in terms of lines of code and overall SQL knowledge is required to solve each problem. Keep in mind, these are real, everyday problems and you will likely have to solve some flavor or variation of these problems at some point. When solving problems requiring window functions, remember to let the task drive the choice of window function and its corresponding arguments by first understanding what you are trying to accomplish. Additionally, do not fall into the trap of believing every problem is best solved by window functions.

Additional Information
In the final example, a recursive common table expression (CTE) was used to generate report dates and their corresponding daily order total values. If you’re unfamiliar with the concepts of recursion and/or common table expressions, consider reading my recursion and common table expression pages to learn how to incorporate these concepts into real-world coding solutions.

Knowledge Check
If you’d like to confirm your understanding on the use of window functions and other fundamental concepts, consider solving a handful of my practice problems here.