GROUP BY CLAUSE

The GROUP BY clause groups rows based on one or more shared attribute values. It allows us to represent rows with identical values as a single row and is the clause largely relied upon for aggregation. Within a SELECT statement (i.e., query) the GROUP BY clause is typically accompanied by an aggregate function in the SELECT clause. Using a GROUP BY clause in conjunction with an aggregate function allows us to compute sums, minimums, maximums, and averages among other things across groups of rows. It is a cornerstone of generating data for reporting purposes.

Review the example data shown below and then let us explore the use of the GROUP BY clause to solve real-world problems.

Example Data
The schemas, corresponding tables, and data used in the example problems can be found at livesql.oracle.com. To more easily follow the examples covered on this page, consider first reading the table descriptions below and reviewing each table’s contents. To review the data maintained in each table, click on the icon next to the table name. To hide the data, click the icon again.

hr.employees
Contains information related to each employee and a hierarchal employee-manager relationship. There is one row per employee.
employee_idfirst_namelast_nameemailphone_numberhire_datejob_idsalarycommission_pctmanager_iddepartment_id
100StevenKingSKING515.123.456717-Jun-03AD_PRES2400090
101NeenaKochharNKOCHHAR515.123.456821-Sep-05AD_VP1700010090
102LexDe HaanLDEHAAN515.123.456913-Jan-01AD_VP1700010090
103AlexanderHunoldAHUNOLD590.423.45673-Jan-06IT_PROG900010260
104BruceErnstBERNST590.423.456821-May-07IT_PROG600010360
105DavidAustinDAUSTIN590.423.456925-Jun-05IT_PROG480010360
106ValliPataballaVPATABAL590.423.45605-Feb-06IT_PROG480010360
107DianaLorentzDLORENTZ590.423.55677-Feb-07IT_PROG420010360
108NancyGreenbergNGREENBE515.124.456917-Aug-02FI_MGR12008101100
109DanielFavietDFAVIET515.124.416916-Aug-02FI_ACCOUNT9000108100
110JohnChenJCHEN515.124.426928-Sep-05FI_ACCOUNT8200108100
111IsmaelSciarraISCIARRA515.124.436930-Sep-05FI_ACCOUNT7700108100
112Jose ManuelUrmanJMURMAN515.124.44697-Mar-06FI_ACCOUNT7800108100
113LuisPoppLPOPP515.124.45677-Dec-07FI_ACCOUNT6900108100
114DenRaphaelyDRAPHEAL515.127.45617-Dec-02PU_MAN1100010030
115AlexanderKhooAKHOO515.127.456218-May-03PU_CLERK310011430
116ShelliBaidaSBAIDA515.127.456324-Dec-05PU_CLERK290011430
117SigalTobiasSTOBIAS515.127.456424-Jul-05PU_CLERK280011430
118GuyHimuroGHIMURO515.127.456515-Nov-06PU_CLERK260011430
119KarenColmenaresKCOLMENA515.127.456610-Aug-07PU_CLERK250011430
120MatthewWeissMWEISS650.123.123418-Jul-04ST_MAN800010050
121AdamFrippAFRIPP650.123.223410-Apr-05ST_MAN820010050
122PayamKauflingPKAUFLIN650.123.32341-May-03ST_MAN790010050
123ShantaVollmanSVOLLMAN650.123.423410-Oct-05ST_MAN650010050
124KevinMourgosKMOURGOS650.123.523416-Nov-07ST_MAN580010050
125JuliaNayerJNAYER650.124.121416-Jul-05ST_CLERK320012050
126IreneMikkilineniIMIKKILI650.124.122428-Sep-06ST_CLERK270012050
127JamesLandryJLANDRY650.124.133414-Jan-07ST_CLERK240012050
128StevenMarkleSMARKLE650.124.14348-Mar-08ST_CLERK220012050
129LauraBissotLBISSOT650.124.523420-Aug-05ST_CLERK330012150
130MozheAtkinsonMATKINSO650.124.623430-Oct-05ST_CLERK280012150
131JamesMarlowJAMRLOW650.124.723416-Feb-05ST_CLERK250012150
132TJOlsonTJOLSON650.124.823410-Apr-07ST_CLERK210012150
133JasonMallinJMALLIN650.127.193414-Jun-04ST_CLERK330012250
134MichaelRogersMROGERS650.127.183426-Aug-06ST_CLERK290012250
135KiGeeKGEE650.127.173412-Dec-07ST_CLERK240012250
136HazelPhiltankerHPHILTAN650.127.16346-Feb-08ST_CLERK220012250
137RenskeLadwigRLADWIG650.121.123414-Jul-03ST_CLERK360012350
138StephenStilesSSTILES650.121.203426-Oct-05ST_CLERK320012350
139JohnSeoJSEO650.121.201912-Feb-06ST_CLERK270012350
140JoshuaPatelJPATEL650.121.18346-Apr-06ST_CLERK250012350
141TrennaRajsTRAJS650.121.800917-Oct-03ST_CLERK350012450
142CurtisDaviesCDAVIES650.121.299429-Jan-05ST_CLERK310012450
143RandallMatosRMATOS650.121.287415-Mar-06ST_CLERK260012450
144PeterVargasPVARGAS650.121.20049-Jul-06ST_CLERK250012450
145JohnRussellJRUSSEL011.44.1344.4292681-Oct-04SA_MAN140000.410080
146KarenPartnersKPARTNER011.44.1344.4672685-Jan-05SA_MAN135000.310080
147AlbertoErrazurizAERRAZUR011.44.1344.42927810-Mar-05SA_MAN120000.310080
148GeraldCambraultGCAMBRAU011.44.1344.61926815-Oct-07SA_MAN110000.310080
149EleniZlotkeyEZLOTKEY011.44.1344.42901829-Jan-08SA_MAN105000.210080
150PeterTuckerPTUCKER011.44.1344.12926830-Jan-05SA_REP100000.314580
151DavidBernsteinDBERNSTE011.44.1344.34526824-Mar-05SA_REP95000.2514580
152PeterHallPHALL011.44.1344.47896820-Aug-05SA_REP90000.2514580
153ChristopherOlsenCOLSEN011.44.1344.49871830-Mar-06SA_REP80000.214580
154NanetteCambraultNCAMBRAU011.44.1344.9876689-Dec-06SA_REP75000.214580
155OliverTuvaultOTUVAULT011.44.1344.48650823-Nov-07SA_REP70000.1514580
156JanetteKingJKING011.44.1345.42926830-Jan-04SA_REP100000.3514680
157PatrickSullyPSULLY011.44.1345.9292684-Mar-04SA_REP95000.3514680
158AllanMcEwenAMCEWEN011.44.1345.8292681-Aug-04SA_REP90000.3514680
159LindseySmithLSMITH011.44.1345.72926810-Mar-05SA_REP80000.314680
160LouiseDoranLDORAN011.44.1345.62926815-Dec-05SA_REP75000.314680
161SarathSewallSSEWALL011.44.1345.5292683-Nov-06SA_REP70000.2514680
162ClaraVishneyCVISHNEY011.44.1346.12926811-Nov-05SA_REP105000.2514780
163DanielleGreeneDGREENE011.44.1346.22926819-Mar-07SA_REP95000.1514780
164MatteaMarvinsMMARVINS011.44.1346.32926824-Jan-08SA_REP72000.114780
165DavidLeeDLEE011.44.1346.52926823-Feb-08SA_REP68000.114780
166SundarAndeSANDE011.44.1346.62926824-Mar-08SA_REP64000.114780
167AmitBandaABANDA011.44.1346.72926821-Apr-08SA_REP62000.114780
168LisaOzerLOZER011.44.1343.92926811-Mar-05SA_REP115000.2514880
169HarrisonBloomHBLOOM011.44.1343.82926823-Mar-06SA_REP100000.214880
170TaylerFoxTFOX011.44.1343.72926824-Jan-06SA_REP96000.214880
171WilliamSmithWSMITH011.44.1343.62926823-Feb-07SA_REP74000.1514880
172ElizabethBatesEBATES011.44.1343.52926824-Mar-07SA_REP73000.1514880
173SunditaKumarSKUMAR011.44.1343.32926821-Apr-08SA_REP61000.114880
174EllenAbelEABEL011.44.1644.42926711-May-04SA_REP110000.314980
175AlyssaHuttonAHUTTON011.44.1644.42926619-Mar-05SA_REP88000.2514980
176JonathonTaylorJTAYLOR011.44.1644.42926524-Mar-06SA_REP86000.214980
177JackLivingstonJLIVINGS011.44.1644.42926423-Apr-06SA_REP84000.214980
178KimberelyGrantKGRANT011.44.1644.42926324-May-07SA_REP70000.15149
179CharlesJohnsonCJOHNSON011.44.1644.4292624-Jan-08SA_REP62000.114980
180WinstonTaylorWTAYLOR650.507.987624-Jan-06SH_CLERK320012050
181JeanFleaurJFLEAUR650.507.987723-Feb-06SH_CLERK310012050
182MarthaSullivanMSULLIVA650.507.987821-Jun-07SH_CLERK250012050
183GirardGeoniGGEONI650.507.98793-Feb-08SH_CLERK280012050
184NanditaSarchandNSARCHAN650.509.187627-Jan-04SH_CLERK420012150
185AlexisBullABULL650.509.287620-Feb-05SH_CLERK410012150
186JuliaDellingerJDELLING650.509.387624-Jun-06SH_CLERK340012150
187AnthonyCabrioACABRIO650.509.48767-Feb-07SH_CLERK300012150
188KellyChungKCHUNG650.505.187614-Jun-05SH_CLERK380012250
189JenniferDillyJDILLY650.505.287613-Aug-05SH_CLERK360012250
190TimothyGatesTGATES650.505.387611-Jul-06SH_CLERK290012250
191RandallPerkinsRPERKINS650.505.487619-Dec-07SH_CLERK250012250
192SarahBellSBELL650.501.18764-Feb-04SH_CLERK400012350
193BritneyEverettBEVERETT650.501.28763-Mar-05SH_CLERK390012350
194SamuelMcCainSMCCAIN650.501.38761-Jul-06SH_CLERK320012350
195VanceJonesVJONES650.501.487617-Mar-07SH_CLERK280012350
196AlanaWalshAWALSH650.507.981124-Apr-06SH_CLERK310012450
197KevinFeeneyKFEENEY650.507.982223-May-06SH_CLERK300012450
198DonaldOConnellDOCONNEL650.507.983321-Jun-07SH_CLERK260012450
199DouglasGrantDGRANT650.507.984413-Jan-08SH_CLERK260012450
200JenniferWhalenJWHALEN515.123.444417-Sep-03AD_ASST440010110
201MichaelHartsteinMHARTSTE515.123.555517-Feb-04MK_MAN1300010020
202PatFayPFAY603.123.666617-Aug-05MK_REP600020120
203SusanMavrisSMAVRIS515.123.77777-Jun-02HR_REP650010140
204HermannBaerHBAER515.123.88887-Jun-02PR_REP1000010170
205ShelleyHigginsSHIGGINS515.123.80807-Jun-02AC_MGR12008101110
206WilliamGietzWGIETZ515.123.81817-Jun-02AC_ACCOUNT8300205110
oe.orders
For each order placed by a customer, various attributes are captured. There is one row per order.
order_idorder_dateorder_modecustomer_idorder_statusorder_totalsales_rep_idpromotion_id
245816-AUG-07 03.34.12.234359 PMdirect101078279.6153
239719-NOV-07 02.41.54.696211 PMdirect102142283.2154
245402-OCT-07 05.49.34.678340 PMdirect10316653.4154
235414-JUL-08 06.18.23.234567 PMdirect104046257155
235808-JAN-08 05.03.12.654278 PMdirect10527826155
238114-MAY-08 08.59.08.843679 PMdirect106323034.6156
244031-AUG-07 09.53.06.008765 PMdirect107370576.9156
235708-JAN-06 08.19.44.123456 PMdirect108559872.4158
239410-FEB-08 09.22.35.564789 PMdirect109521863158
243502-SEP-07 11.22.53.134567 PMdirect144662303159
245520-SEP-07 11.34.11.456789 AMdirect145714087.5160
237916-MAY-07 02.22.24.234567 AMdirect146817848.2161
239602-FEB-06 01.34.56.345678 AMdirect147834930161
240629-JUN-07 04.41.20.098765 AMdirect14882854.2161
243413-SEP-07 05.49.30.647893 AMdirect1498268651.8161
243602-SEP-07 06.18.04.378034 AMdirect11686394.8161
244627-JUL-07 07.03.08.302945 AMdirect1178103679.3161
244727-JUL-08 08.59.10.223344 AMdirect101833893.6161
243214-SEP-07 09.53.40.223345 AMdirect1021010523163
243313-SEP-07 10.19.00.654279 AMdirect1031078163
235526-JAN-06 09.22.51.962632 AMonline104894513.5
235626-JAN-08 09.22.41.934562 AMonline105529473.8
235908-JAN-06 09.34.13.112233 PMonline10695543.1
236014-NOV-07 12.22.31.223344 PMonline1074990.4
236113-NOV-07 01.34.21.986210 PMonline1088120131.3
236213-NOV-07 02.41.10.619477 PMonline109492829.4
236323-OCT-07 05.49.56.346122 PMonline144010082.3
236428-AUG-07 06.18.45.942399 PMonline14549500
236528-AUG-07 07.03.34.003399 PMonline146927455.3
236628-AUG-07 08.59.23.144778 PMonline147537319.4
236727-JUN-08 09.53.32.335522 PMonline14810144054.8
236826-JUN-08 10.19.43.190089 PMonline1491060065
236926-JUN-07 11.22.54.009932 PMonline116011097.4
237027-JUN-08 12.22.11.647398 AMonline1174126
237116-MAY-07 01.34.56.113356 AMonline118679405.6
237227-FEB-07 12.22.33.356789 AMonline119916447.2
237327-FEB-08 01.34.51.220065 AMonline1204416
237427-FEB-08 02.41.45.109654 AMonline12104797
237526-FEB-07 03.49.50.459233 AMonline1222103834.4
237607-JUN-07 06.18.08.883310 AMonline123611006.2
237707-JUN-07 07.03.01.001100 AMonline141538017.8
237824-MAY-07 08.59.10.010101 AMonline142525691.3
238016-MAY-07 09.53.02.909090 AMonline143327132.6
238214-MAY-08 10.19.03.828321 AMonline144871173
238312-MAY-08 11.22.30.545103 AMonline145836374.7
238412-MAY-08 12.22.34.525972 PMonline146329249.1
238508-DEC-07 11.34.11.331392 AMonline1474295892
238606-DEC-07 12.22.34.225609 PMonline1481021116.9
238711-MAR-07 03.34.56.536966 PMonline149552758.9
238804-JUN-07 04.41.12.554435 PMonline1504282694.3
238904-JUN-08 05.49.43.546954 PMonline151417620
239018-NOV-07 04.18.50.546851 PMonline15297616.8
239127-FEB-06 05.03.03.828330 PMdirect153248070.6156
239221-JUL-07 08.59.57.571057 PMdirect154926632161
239310-FEB-08 07.53.19.528202 PMdirect155423431.9161
239502-FEB-06 08.19.11.227550 PMdirect156368501163
239819-NOV-07 09.22.53.224175 PMdirect15797110.3163
239919-NOV-07 10.22.38.340990 PMdirect158025270.3161
240010-JUL-07 01.34.29.559387 AMdirect159269286.4161
240110-JUL-07 02.22.53.554822 AMdirect1603969.2163
240202-JUL-07 03.34.44.665170 AMdirect1618600154
240301-JUL-07 04.49.13.615512 PMdirect1620220154
240401-JUL-07 04.49.13.664085 PMdirect1636510158
240501-JUL-07 04.49.13.678123 PMdirect16451233159
240729-JUN-07 07.03.21.526005 AMdirect16592519155
240829-JUN-07 08.59.31.333617 AMdirect1661309158
240929-JUN-07 09.53.41.984501 AMdirect167248154
241024-MAY-08 10.19.51.985501 AMdirect168645175156
241124-MAY-07 11.22.10.548639 AMdirect169815760.5156
241229-MAR-06 10.22.09.509801 AMdirect170966816158
241329-MAR-08 01.34.04.525934 PMdirect101548552161
241429-MAR-07 02.22.40.536996 PMdirect102810794.6153
241529-MAR-06 01.34.50.545196 PMdirect1036310161
241629-MAR-07 04.41.20.945676 PMdirect1046384160
241720-MAR-07 05.49.10.974352 PMdirect10551926.6163
241820-MAR-04 04.18.21.862632 PMdirect10645546.6163
241920-MAR-07 07.03.32.764632 PMdirect107331574160
242013-MAR-07 08.59.43.666320 PMdirect108229750160
242112-MAR-07 09.53.54.562432 PMdirect109172836
242216-DEC-07 08.19.55.462332 PMdirect144211188.5153
242321-NOV-07 10.22.33.362632 AMdirect145310367.7160
242421-NOV-07 10.22.33.263332 AMdirect146413824153
242516-NOV-06 11.34.22.162552 PMdirect14751500.8163
242617-NOV-06 12.22.11.262552 AMdirect14867200
242710-NOV-07 01.34.22.362124 AMdirect14979055163
242810-NOV-07 02.41.34.463567 AMdirect116814685.8
242910-NOV-07 03.49.25.526321 AMdirect117950125154
243002-OCT-07 06.18.36.663332 AMdirect101829669.9159
243114-SEP-06 07.03.04.763452 AMdirect10215610.6163
243701-SEP-06 08.59.15.826132 AMdirect103413550163
243801-SEP-07 09.53.26.934626 AMdirect10405451154
243931-AUG-07 10.19.37.811132 AMdirect105122150.1159
244101-AUG-08 11.22.48.734526 AMdirect10652075.2160
244227-JUL-06 12.22.59.662632 PMdirect107952471.9154
244327-JUL-06 01.34.16.562632 PMdirect10803646154
244427-JUL-07 02.22.27.462632 PMdirect109177727.2155
244527-JUL-06 03.34.38.362632 PMdirect14485537.8158
244818-JUN-07 04.41.49.262632 PMdirect14551388158
244913-JUN-07 05.49.07.162632 PMdirect146686155
245011-APR-07 06.18.10.362632 PMdirect14731636159
245117-DEC-07 05.03.52.562632 PMdirect148710474.6154
245206-OCT-07 08.59.43.462632 PMdirect149512589159
245304-OCT-07 09.53.34.362632 PMdirect1160129153
245607-NOV-06 07.53.25.989889 PMdirect11703878.4163
245731-OCT-07 11.22.16.162632 PMdirect118521586.2159

Solving Real-world Problems
Using the employees and orders tables above, the GROUP BY clause can be used to answer important questions, such as the following: For each job, what do employee salaries look like? Do employee salaries vary within jobs but across managers, and if so, then how much? What do our customers look like when their order histories are summarized? What do the performances of each sales representative look like? Let us explore each of these questions in turn.

For each job, what do employee salaries look like?
To answer this question, we need to be able to compute various measures for the groups of employees holding each job. The total, average, minimum, and maximum salary paid out for each job is likely a good start. Additionally, the number of employees holding a specific job and the job’s salary range is useful in instances where the average salary is skewed or may be misleading without the associated number of employee salaries it was computed across. The code below can be used to generate an output containing these measures for each job. Take a moment to review the code and corresponding output. An explanation on how the code works follows the output.

SELECT
    hr.employees.job_id,
    SUM(hr.employees.salary) AS total_salary,
    COUNT(hr.employees.employee_id) AS employee_count,
    AVG(hr.employees.salary) AS average_salary,
    MAX(hr.employees.salary) AS maximum_salary,
    MIN(hr.employees.salary) AS minimum_salary,
    MAX(hr.employees.salary) - MIN(hr.employees.salary) AS salary_range
FROM
	hr.employees
GROUP BY
    hr.employees.job_id;
job_idtotal_salaryemployee_countaverage_salarymaximum_salaryminimum_salarysalary_range
AD_VP3400021700017000170000
FI_ACCOUNT3960057920900069002100
PU_CLERK139005278031002500600
SH_CLERK64300203215420025001700
HR_REP650016500650065000
PU_MAN1100011100011000110000
AC_MGR1200811200812008120080
ST_CLERK55700202785360021001500
AD_ASST440014400440044000
IT_PROG2880055760900042004800
SA_MAN6100051220014000105003500
AC_ACCOUNT830018300830083000
FI_MGR1200811200812008120080
ST_MAN3640057280820058002400
AD_PRES2400012400024000240000
MK_MAN1300011300013000130000
SA_REP2505003083501150061005400
MK_REP600016000600060000
PR_REP1000011000010000100000

Let us break down the steps involved in generating the output above by starting with a few facts regarding the employees table. The employees table contains 107 rows with each row representing a single employee. Across the 107 employees, nineteen unique jobs are held. The query above groups the employees based on their respective job ID values. The measures are subsequently computed within each group. The values returned for each group are represented within a single row along with the job ID value defining the group. To reinforce this idea, let us walk through the process manually for the group of IT programmers.

Based on the employees table shown at the beginning of this page, there are five employees with a job ID value of “IT PROG.” The ID values of those employees are 103, 104, 105, 106, and 107 and the salary values are $9,000, $6,000, $4,800, $4,800, and $4,200, respectively. Computing the total salary amount paid out within this group of employees results in a total salary value of $28,880. As expected, this value matches the value observed in the output for the total salary measure. If we computed each measure for each group of employees holding the same job, then we would arrive at the same results shown in the output above.

Before moving onto the next example, consider jotting down a few comments addressing the question, “What do employee salaries look like?” A few of my own can be found in the comments section below and will be included in each of the example problems that follow. Click the icon to see my comments.

Comments

– Ten of the nineteen jobs are occupied by only one employee, resulting in less than enlightening measure values.

– The largest salary range exists among sales representatives while the smallest exists among purchasing clerks.

– The highest paid employee, the president, earns greater than five times the salary of the lowest paid employee working as an administrative assistant.

Do employee salaries vary within jobs but across managers, and if so, then how much?
To tackle this question, a similar approach to that seen above can be taken. The same measures require computing; however, a different level of detail is required. We need to compute the values for each job ID and manager ID combination. Each of those combinations are the groups and those groups are made of the employees holding a specific job and reporting to a specific manager. For example, the financial accountants with employee ID values of 109, 110, 111, 112, and 113 reporting to the manager with an ID value of 108 are a single group. Each of the measures are computed using these five employees’ salaries. Their values are subsequently represented within a single row along with the job ID value and manager ID value defining the group (FI_ACCOUNT and 108, respectively).

SELECT
    hr.employees.job_id,
    hr.employees.manager_id,
    SUM(hr.employees.salary) AS total_salary,
    COUNT(hr.employees.employee_id) AS employee_count,
    AVG(hr.employees.salary) AS average_salary,
    MAX(hr.employees.salary) AS maximum_salary,
    MIN(hr.employees.salary) AS minimum_salary,
    MAX(hr.employees.salary) - MIN(hr.employees.salary) AS salary_range
FROM
	hr.employees
GROUP BY
    hr.employees.job_id,
    hr.employees.manager_id;
job_idmanager_idtotal_salaryemployee_countaverage_salarymaximum_salaryminimum_salarysalary_range
IT_PROG1031980044950600042001800
IT_PROG102900019000900090000
SA_REP1495000068333.3333333333333333333333333333333333331100062004800
SA_REP1474660067766.6666666666666666666666666666666666671050062004300
SA_REP14851900686501150061005400
SA_REP14551000685001000070003000
SA_REP14651000685001000070003000
SH_CLERK1211470043675420030001200
SH_CLERK1221280043200380025001300
SH_CLERK120116004290032002500700
SH_CLERK1231390043475400028001200
SH_CLERK124113004282531002600500
ST_CLERK1231200043000360025001100
ST_CLERK1241170042925350025001000
ST_CLERK1201050042625320022001000
ST_CLERK1221080042700330022001100
ST_CLERK1211070042675330021001200
AC_ACCOUNT205830018300830083000
AC_MGR1011200811200812008120080
AD_ASST101440014400440044000
AD_PRES2400012400024000240000
AD_VP1003400021700017000170000
FI_ACCOUNT1083960057920900069002100
FI_MGR1011200811200812008120080
HR_REP101650016500650065000
MK_MAN1001300011300013000130000
MK_REP201600016000600060000
PR_REP1011000011000010000100000
PU_CLERK114139005278031002500600
PU_MAN1001100011100011000110000
SA_MAN1006100051220014000105003500
ST_MAN1003640057280820058002400
Comments

– For most jobs, all employees holding that job report to the same manager.

– Of the four jobs where all employees holding the specific job do not report to the same manager, perhaps the most notable variation exists for programmers. The single programmer working for the manager with an ID value of 103 earns a salary close to two times the average programmer working for the manager with an ID value of 102 and 50 percent ($3000) more than the next highest paid programmer.

What do our customers look like when their order histories are summarized?
Understanding past customer purchasing behavior is beneficial to being able to potentially predict the purchasing behavior of customers in the future. Describing each customer’s behavior, at a high-level, can be a good start. The code below allows us to do that by answering the following key questions:

– How much has a customer spent in total across all their orders?
– How many orders has the customer placed?
– On average, how much did a customer spend per order?
– What was the customer’s largest and smallest amount spent in a single order?
– What is the difference, in terms of amount spent, between the customer’s largest and smallest order?
– When did the customer place their first order?
– When did the customer place their most recent order?

SELECT
    oe.orders.customer_id,
    SUM(oe.orders.order_total) AS total_order_total,
    COUNT(oe.orders.order_id) AS order_count,
    AVG(oe.orders.order_total) AS average_order_total,
    MAX(oe.orders.order_total) AS maximum_order_total,
    MIN(oe.orders.order_total) AS minimum_order_total,
    MAX(oe.orders.order_total) - MIN(oe.orders.order_total) AS order_total_range,
    MAX(oe.orders.order_date) AS latest_order_date,
    MIN(oe.orders.order_date) AS first_order_date
FROM
	oe.orders
GROUP BY
    oe.orders.customer_id;
customer_idtotal_order_totalorder_countaverage_order_totalmaximum_order_totalminimum_order_totalorder_total_rangefirst_order_datelatest_order_date
107155613.2438903.370576.9990.469586.527-JUL-06 12.22.59.662632 PM14-NOV-07 12.22.31.223344 PM
108213399.7453349.925120131.33646116485.308-JAN-06 08.19.44.123456 PM13-NOV-07 01.34.21.986210 PM
15825270.3125270.325270.325270.3019-NOV-07 10.22.38.340990 PM19-NOV-07 10.22.38.340990 PM
1616001600600600002-JUL-07 03.34.44.665170 AM02-JUL-07 03.34.44.665170 AM
1663091309309309029-JUN-07 08.59.31.333617 AM29-JUN-07 08.59.31.333617 AM
10561376.5415344.12529473.81926.627547.220-MAR-07 05.49.10.974352 PM26-JAN-08 09.22.41.934562 AM
109265255.6466313.992829.42186370966.412-MAR-07 09.53.54.562432 PM10-FEB-08 09.22.35.564789 PM
14327132.6127132.627132.627132.6016-MAY-07 09.53.02.909090 AM16-MAY-07 09.53.02.909090 AM
15969286.4169286.469286.469286.4010-JUL-07 01.34.29.559387 AM10-JUL-07 01.34.29.559387 AM
1622201220220220001-JUL-07 04.49.13.615512 PM01-JUL-07 04.49.13.615512 PM
1635101510510510001-JUL-07 04.49.13.664085 PM01-JUL-07 04.49.13.664085 PM
168451751451754517545175024-MAY-08 10.19.51.985501 AM24-MAY-08 10.19.51.985501 AM
104146605.5436651.37594513.538494129.526-JAN-06 09.22.51.962632 AM14-JUL-08 06.18.23.234567 PM
118100991.8250495.979405.621586.257819.416-MAY-07 01.34.56.113356 AM31-OCT-07 11.22.16.162632 PM
14571717.9514343.5836374.7138834986.718-JUN-07 04.41.49.262632 PM12-MAY-08 11.22.30.545103 AM
12147971479747974797027-FEB-08 02.41.45.109654 AM27-FEB-08 02.41.45.109654 AM
14138017.8138017.838017.838017.8007-JUN-07 07.03.01.001100 AM07-JUN-07 07.03.01.001100 AM
15523431.9123431.923431.923431.9010-FEB-08 07.53.19.528202 PM10-FEB-08 07.53.19.528202 PM
101190395.1447598.77578279.629669.948609.716-AUG-07 03.34.12.234359 PM27-JUL-08 08.59.10.223344 AM
10320591.445147.8513550781347229-MAR-06 01.34.50.545196 PM02-OCT-07 05.49.34.678340 PM
1163230748076.7514685.812914556.826-JUN-07 11.22.54.009932 PM10-NOV-07 02.41.34.463567 AM
1204161416416416027-FEB-08 01.34.51.220065 AM27-FEB-08 01.34.51.220065 AM
14225691.3125691.325691.325691.3024-MAY-07 08.59.10.010101 AM24-MAY-07 08.59.10.010101 AM
151176201176201762017620004-JUN-08 05.49.43.546954 PM04-JUN-08 05.49.43.546954 PM
156685011685016850168501002-FEB-06 08.19.11.227550 PM02-FEB-06 08.19.11.227550 PM
1577110.317110.37110.37110.3019-NOV-07 09.22.53.224175 PM19-NOV-07 09.22.53.224175 PM
16915760.5115760.515760.515760.5024-MAY-07 11.22.10.548639 AM24-MAY-07 11.22.10.548639 AM
14688462.6517692.5229249.18629163.116-MAY-07 02.22.24.234567 AM12-MAY-08 12.22.34.525972 PM
148185700.5537140.1144054.82854.2141200.617-NOV-06 12.22.11.262552 AM27-JUN-08 09.53.32.335522 PM
149403119.7580623.94268651.89055259596.811-MAR-07 03.34.56.536966 PM26-JUN-08 10.19.43.190089 PM
122103834.41103834.4103834.4103834.4026-FEB-07 03.49.50.459233 AM26-FEB-07 03.49.50.459233 AM
1527616.817616.87616.87616.8018-NOV-07 04.18.50.546851 PM18-NOV-07 04.18.50.546851 PM
170668161668166681666816029-MAR-06 10.22.09.509801 AM29-MAR-06 10.22.09.509801 AM
144160284.6532056.92711735537.865635.227-JUL-06 03.34.38.362632 PM14-MAY-08 10.19.03.828321 AM
11916447.2116447.216447.216447.2027-FEB-07 12.22.33.356789 AM27-FEB-07 12.22.33.356789 AM
15348070.6148070.648070.648070.6027-FEB-06 05.03.03.828330 PM27-FEB-06 05.03.03.828330 PM
16412331123312331233001-JUL-07 04.49.13.678123 PM01-JUL-07 04.49.13.678123 PM
167481484848029-JUN-07 09.53.41.984501 AM29-JUN-07 09.53.41.984501 AM
10269211.4417302.8542283.25610.636672.614-SEP-06 07.03.04.763452 AM19-NOV-07 02.41.54.696211 PM
10636199.549049.87523034.62075.220959.420-MAR-04 04.18.21.862632 PM01-AUG-08 11.22.48.734526 AM
147371278.2574255.642958921500.8294391.202-FEB-06 01.34.56.345678 AM08-DEC-07 11.34.11.331392 AM
117157808.7439452.175103679.3126103553.307-NOV-06 07.53.25.989889 PM27-JUN-08 12.22.11.647398 AM
12311006.2111006.211006.211006.2007-JUN-07 06.18.08.883310 AM07-JUN-07 06.18.08.883310 AM
150282694.31282694.3282694.3282694.3004-JUN-07 04.41.12.554435 PM04-JUN-07 04.41.12.554435 PM
154266321266322663226632021-JUL-07 08.59.57.571057 PM21-JUL-07 08.59.57.571057 PM
160969.21969.2969.2969.2010-JUL-07 02.22.53.554822 AM10-JUL-07 02.22.53.554822 AM
16525191251925192519029-JUN-07 07.03.21.526005 AM29-JUN-07 07.03.21.526005 AM

In the output above, there is one row per customer. In each of those rows, the results of aggregating the customer’s orders is populated. How did this come to be? In the same manner as previous examples, the GROUP BY clause grouped the rows based on the specified attribute(s). In this case, the clause grouped rows based on the customer ID value. Subsequently, several aggregate functions were computed within each group (i.e., customer) and across the respective group’s orders.

Comments

– About sixty-two percent (29/47) of customers have only placed a single order. The other eighteen customers’ order counts range from two to five, with the majority having placed at least 4 orders. If this data was not fictional, it may be beneficial to survey those customers having placed only one order to better understand their experiences and reasons behind not placing a subsequent order.

– The range of total spend among customers is quite large ($403,071.70). The least amount spent by a customer having placed at least one order is $48. The largest amount spent by a single customer is $403,119.70.

– For those customers having placed more than one order, the range of order total values for those customers’ orders exceeds $10,000. For example, the smallest and largest dollar amounts associated with the customer with an ID value of 103 are $13,550 and $78, respectively.

What do the performances of each sales representative look like?
For sales representatives and those managing sales representatives, job performance is often a key factor in determining total compensation. Typically, a sales commission is awarded based on the number of sales a representative has made. Other factors may also contribute to the amount awarded. Using the code below, we can quickly summarize the performances of each sales representative. Notice, the code is largely the same as that used in the previous example. There are two differences. The first difference is the sales representative ID attribute has replaced the customer ID attribute in the SELECT and GROUP BY clauses. This is required to summarize each sales representative’s data. The second difference is the use of a WHERE clause. The WHERE clause ensures only those orders associated with a sales representative are included. This filtering operation takes place prior to the grouping operation. For clarity, see the code’s corresponding output.

SELECT
    oe.orders.sales_rep_id,
    SUM(oe.orders.order_total) AS total_order_total,
    COUNT(oe.orders.order_id) AS order_count,
    AVG(oe.orders.order_total) AS average_order_total,
    MAX(oe.orders.order_total) AS maximum_order_total,
    MIN(oe.orders.order_total) AS minimum_order_total,
    MAX(oe.orders.order_total) - MIN(oe.orders.order_total) AS order_total_range,
    MAX(oe.orders.order_date) AS latest_order_date,
    MIN(oe.orders.order_date) AS first_order_date
FROM
	oe.orders
WHERE
    oe.orders.sales_rep_id IS NOT NULL
GROUP BY
    oe.orders.sales_rep_id;
sales_rep_idtotal_order_totalorder_countaverage_order_totalmaximum_order_totalminimum_order_totalorder_total_rangefirst_order_datelatest_order_date
158156296.2722328.0285714285714285714285714285714286668163096650710-FEB-08 09.22.35.564789 PM08-JAN-06 08.19.44.123456 PM
161661734.51350902.6538461538461538461538461538461538268651.8310268341.827-JUL-08 08.59.10.223344 AM02-FEB-06 01.34.56.345678 AM
159151167.2721595.31428571428571428571428571428571436230312336107031-OCT-07 11.22.16.162632 PM11-APR-07 06.18.10.362632 PM
163128249.51210687.458333333333333333333333333333333368501786842319-NOV-07 09.22.53.224175 PM20-MAR-04 04.18.21.862632 PM
155134415.2526883.0477727.28677641.214-JUL-08 06.18.23.234567 PM13-JUN-07 05.49.07.162632 PM
156202617.6540523.5270576.915760.554816.424-MAY-08 10.19.51.985501 AM27-FEB-06 05.03.03.828330 PM
153114215.7522843.1478279.612978150.616-DEC-07 08.19.55.462332 PM29-MAR-07 02.22.40.536996 PM
154171973.11017197.3152471.94852423.917-DEC-07 05.03.52.562632 PM27-JUL-06 12.22.59.662632 PM
16088238.4614706.4315743843119001-AUG-08 11.22.48.734526 AM13-MAR-07 08.59.43.666320 PM

I won’t reiterate how the GROUP BY clause works again; however, I do want to touch on how a few of the measures computed and displayed in the output above can be useful. As previously mentioned, the sales commission awarded to each sales representative will likely be based on the number of sales made and perhaps even the total dollar amount associated with the orders. What may not be immediately apparent is the usefulness of the latest order date and first order date values. With a little extra work, the values derived can be used to compute the number of days elapsed between the beginning of a period (e.g., sales year, sales month) and the first sale made by each sales representative. Short durations could indicate a strong start while long durations could indicate a slow start. Likewise, using the latest order date values can tell us whether or not the sales representative has recently made a sale or is struggling to do so. Keep in mind, when using a non-fictional data set a WHERE clause should be used here to filter to the date period of interest unless it is desirable to look at all-time sales performance.

Missing Values
Aggregate functions, with the exception of COUNT(*), ignore NULL values. For example, a SUM computed across the values, 100, 200, 200, NULL, and 300 returns 800. Likewise, an average computed across the same same values will return 200. If rows containing NULLs for the attribute being aggregated are required to be accounted for when computing an average, then consider using the COALESCE function within the average function to replace the missing values with zero prior to the average being computed. For more information on handling missing values, see my page dedicated to this topic, here.

COUNT(*) vs. COUNT(attribute)
When using the COUNT function, it is important to understand the difference in the function’s behavior when the argument passed to the function is an attribute instead of the “*” character. When an attribute is passed to the function, the count returned represents the number of rows with non-null values for the specified attribute while COUNT(*) simply represents the number of rows returned. For clarity, see the example code and corresponding output below.

SELECT
	COUNT(*) AS order_count,
	COUNT(oe.orders.sales_rep_id) AS assisted_order_count
FROM
	oe.orders;
order_countassisted_order_count
10570

In the output above, COUNT(*) returns the number of rows (i.e., orders) within the orders table. When the sales representative ID attribute is passed into the COUNT function, the number of rows with a sales representative ID value populated is returned (70). This value tells us seventy orders were placed with the assistance of a sales representative.

Common Errors
For users new to SQL, the GROUP BY clause can cause a large amount of frustration. Most of this frustration is rooted in how the SELECT and GROUP BY clauses work together to produce a desired output. When the GROUP BY clause is used, any attributes not treated as constants or are non-aggregates in the SELECT clause must also be specified in the GROUP BY clause. When this is not the case, Oracle database will return the error, ORA-00979: not a GROUP BY expression. To help reinforce why Oracle and other relational database management system products return an error when the SELECT and GROUP BY clauses are not in agreement, let us discuss the code below.

-- Returns ORA-00979: not a GROUP BY expression.
SELECT
    hr.employees.job_id,
    hr.employees.manager_id,
    COUNT(hr.employees.employee_id) AS employee_count
FROM
    hr.employees
GROUP BY
    hr.employees.job_id;

The GROUP BY clause specified states to group the rows based on the unique job ID values. To reiterate, this specification states to return one row per job ID value. This is problematic because two attributes, job ID and manager ID, are specified in the SELECT clause. The SELECT clause is processed after the GROUP BY clause, so by the time the SELECT clause is being processed for this query, the rows have already been grouped by job ID values. As a result, the level of detail in SELECT clause is not aligned with the GROUP BY clause. This is easy to see for any job ID value associated with more than one manager. With only one row returned for each job ID value, there are not enough rows to also account for each manager ID and job ID combination. The employee count computation also presents an issue in that it is computed at the job ID level. This does not make sense given the attributes in the SELECT clause.

If you are learning the clauses in the typical order and as displayed in the menus, then your next stop is the HAVING clause page. This clause allows users to perform filtering operations on grouped rows. For more information, see the HAVING clause page.

Knowledge Check
If you’d like to confirm your understanding on the GROUP BY clause and other fundamental concepts, consider solving a handful of my practice problems here.