NUMERIC FUNCTIONS

Numerical data is ubiquitous as is our reliance upon it. It can be found in earnings reports used to convey past and future company performance, IT support tickets where the time to respond and successfully close the ticket is captured or can be derived, and even in the tracking of our own personal goals over time. Luckily, the pervasiveness of data and its ever-increasing nature has been embraced and systems have been created to effectively manage and make use of it. Relational database management systems (RDBMS) come equipped with built-in functions meant to handle numeric data and assist us in answering questions based on, in full or in part, the data itself.

Let’s begin exploring numeric functions by tackling a classic programming problem where the task is to classify integer values as even or odd. The schemas, corresponding tables, and data used in the examples can be found at livesql.oracle.com.

Classifying Integers as Even or Odd

SELECT
	oe.orders.order_id,
	CASE MOD(oe.orders.order_id, 2)
		WHEN 0
			THEN 'Even'
		WHEN 1
    		THEN 'Odd'
    	ELSE NULL
	END AS order_id_type
FROM
	oe.orders
FETCH FIRST 5 ROWS ONLY;

order_id	order_id_type
2354	Even
2355	Odd
2356	Even
2357	Odd
2358	Even

In the example above, the MOD function has been used to classify whether the pre-existing order ID values are even or odd. The function has two parameters, the numerator and denominator, and returns the remainder of the resulting division. This function is particularly useful because it provides a straightforward way to complete the task. If the denominator is two and the remainder is zero, then the order ID or first argument passed must be even. Otherwise, the value must be odd when the denominator is two. The example below is a bit more involved. It includes a recursive common table expression (CTE) and the use ROUND and DBMS_RANDOM.VALUE functions to generate one hundred random integer values. Subsequently, the MOD function is used to classify each integer as even or odd. For an in-depth explanation on this example, see my page on recursion.

/*
DBMS_RANDOM.VALUE() is used to generate 100 random integers. Subsequently, MOD is used to classify each integer as even or odd.
*/
WITH random_number_generator (iteration, random_number) AS (
    SELECT
    	1,
    	ROUND(DBMS_RANDOM.VALUE(0, 100))
    FROM
    	dual
    UNION ALL
    SELECT
    	iteration + 1,
    	ROUND(DBMS_RANDOM.VALUE(0, 100))
    FROM
        random_number_generator
    WHERE
    	random_number_generator.iteration < 100  
)
SELECT
	random_number_generator.iteration,
    random_number_generator.random_number,
    CASE MOD(random_number_generator.random_number, 2)
		WHEN 0
			THEN 'Even'
		WHEN 1
    		THEN 'Odd'
    	ELSE NULL
	END AS integer_type
FROM
	random_number_generator;

iteration	random_number	integer_type
1	98	Even
2	13	Odd
3	16	Even
4	28	Even
5	45	Odd
6	74	Even
7	91	Odd
8	73	Odd
9	81	Odd
10	48	Even
11	6	Even
12	4	Even
13	2	Even
14	5	Odd
15	74	Even
16	13	Odd
17	18	Even
18	54	Even
19	6	Even
20	66	Even
21	35	Odd
22	78	Even
23	31	Odd
24	83	Odd
25	75	Odd
26	67	Odd
27	55	Odd
28	50	Even
29	44	Even
30	27	Odd
31	60	Even
32	69	Odd
33	43	Odd
34	75	Odd
35	82	Even
36	89	Odd
37	55	Odd
38	48	Even
39	67	Odd
40	20	Even
41	75	Odd
42	6	Even
43	1	Odd
44	38	Even
45	54	Even
46	96	Even
47	47	Odd
48	22	Even
49	34	Even
50	42	Even
51	18	Even
52	34	Even
53	64	Even
54	61	Odd
55	4	Even
56	73	Odd
57	80	Even
58	71	Odd
59	78	Even
60	89	Odd
61	0	Even
62	51	Odd
63	41	Odd
64	24	Even
65	23	Odd
66	88	Even
67	93	Odd
68	58	Even
69	52	Even
70	41	Odd
71	33	Odd
72	93	Odd
73	61	Odd
74	7	Odd
75	3	Odd
76	90	Even
77	74	Even
78	78	Even
79	6	Even
80	9	Odd
81	9	Odd
82	73	Odd
83	84	Even
84	8	Even
85	88	Even
86	64	Even
87	42	Even
88	23	Odd
89	46	Even
90	60	Even
91	78	Even
92	56	Even
93	98	Even
94	8	Even
95	44	Even
96	98	Even
97	94	Even
98	94	Even
99	95	Odd
100	7	Odd

Rounding Numeric Values
One of the most basic transformations conceptually and in its implementation for numeric data is rounding. It’s highly likely that you’ve had the need to do this type of operation and have done so in other programming languages or software (e.g., Tableau, Microsoft Excel). Using SQL, the task is no more difficult. Built-in functions, such as ROUND, CEIL (i.e., ceiling), and FLOOR are typically available to RDBMS users in some form. ROUND is used to round numeric values in the most common sense and accepts two arguments, the value to be rounded and the integer value associated with the number of places to the right of the decimal point. It allows us to round a numeric value to the nearest integer value, tenth, hundredth, thousandth, etc. CEIL and FLOOR functions behave a bit differently. CEIL will round any non-whole number upward to the next integer value while FLOOR will round any non-whole number downward to the previous integer value. The following example code and corresponding result set illustrates the functionality afforded by ROUND, CEIL, and FLOOR functions given the original total sales value corresponding to each sales representative.

SELECT
    oe.orders.sales_rep_id,
    SUM(oe.orders.order_total) AS total_sales,
	ROUND(SUM(oe.orders.order_total), 0) AS round,
    CEIL(SUM(oe.orders.order_total)) AS ceiling,
    FLOOR(SUM(oe.orders.order_total)) AS floor
FROM
	oe.orders
WHERE
    oe.orders.sales_rep_id IS NOT NULL
GROUP BY
    oe.orders.sales_rep_id;

sales_rep_id	total_sales	round	ceiling	floor
158	156296.2	156296	156297	156296
161	661734.5	661735	661735	661734
159	151167.2	151167	151168	151167
163	128249.5	128250	128250	128249
155	134415.2	134415	134416	134415
156	202617.6	202618	202618	202617
153	114215.7	114216	114216	114215
154	171973.1	171973	171974	171973
160	88238.4	88238	88239	88238

Calculating Mean Absolute Deviation (MAD)
In statistics, the mean absolution deviation is the average distance between each data point and the mean of the data. In the example below, the common table expression (CTE) by the name of statistic is responsible for computing the average order total across all orders. Subsequently, the summation of the absolute differences between the average order total value and each order total value is computed. This row-level difference of the mean and each order total value is made possible by the CROSS JOIN. If you’re not familiar with the CROSS JOIN, then please consider checking out my page dedicated to the topic. Moving on, the result of that sum is then divided by the number of rows (i.e., observations or order total values) and finally rounded to the nearest hundredth using Oracle’s built-in function, ROUND.

WITH statistic AS (
    SELECT
    	AVG(oe.orders.order_total) AS average_order_total
    FROM
        oe.orders
)
SELECT
    ROUND(
        SUM(
        	ABS(oe.orders.order_total - statistic.average_order_total)
        )
        	/ COUNT(*),
    	2
    ) AS mad
FROM
	oe.orders
	CROSS JOIN
	statistic;

mad
32812.29

Calculating Sample Variance
Like mean absolution deviation, we can calculate the sample variance using SQL. Sample variance is a measure of variability and can be calculated by taking the average of the squared deviations from the mean. It tells us how far each data point is from the mean of the data. In the following example, I will first calculate the sample variance manually, without a built-in function. Where possible, it’s important to not allow the lack of built-in functions provided by a specific RDBMS to limit analyses. Following the manual calculation, I will use the built-in function provided by Oracle. These functions are convenient in that the heavy lifting in terms of implementing the functionality is transferred from the developer or code writer to the RDBMS.

In the example below, there exists two CTEs. The first CTE, job_statistic, calculates the average salary observed for those employees corresponding to each job ID value. The second CTE, employee_salary, simply retrieves the job ID and salary for each employee. The outputs from both CTEs are necessary to compute the difference between each employee’s salary and the average salary of those employees with the same job. This is observed in the SELECT statement following the initial CTEs. In the SELECT statement, the JOIN and corresponding ON clause ensure the correct differences are computed. Correct meaning the average salary for the job corresponding to the specific employee’s job is used. Additionally, a CASE expression is used to determine the number of employees corresponding to each job ID. When the number of employees holding a specific job is one, the sample variance value returned is NULL. Otherwise, the average of the squared deviations from the mean is returned.

-- Manually calculating sample variance of employee salaries within each department.
WITH job_statistic AS (
    SELECT
        hr.employees.job_id,
        AVG(hr.employees.salary) AS average_salary
    FROM
        hr.employees
    GROUP BY
        hr.employees.job_id
),
employee_salary AS (
    SELECT
        hr.employees.job_id,
        hr.employees.salary
    FROM
        hr.employees
)
SELECT
    job_statistic.job_id,
    CASE
        WHEN COUNT(employee_salary.job_id) = 1
            THEN NULL
        ELSE SUM(
            (employee_salary.salary - job_statistic.average_salary) * (employee_salary.salary - job_statistic.average_salary)
        )
            / (COUNT(employee_salary.job_id) - 1)
    END AS sample_variance
FROM
    job_statistic
    INNER JOIN
    employee_salary
        ON job_statistic.job_id = employee_salary.job_id
GROUP BY
    job_statistic.job_id;

job_id	sample_variance
AD_VP	0
FI_ACCOUNT	587000
PU_CLERK	57000
SH_CLERK	300289.4737
HR_REP	–
AC_MGR	–
PU_MAN	–
ST_CLERK	205552.6316
AD_ASST	–
IT_PROG	3708000
AC_ACCOUNT	–
SA_MAN	2325000
FI_MGR	–
ST_MAN	1137000
AD_PRES	–
MK_MAN	–
SA_REP	2323275.862
MK_REP	–
PR_REP	–

In the example code below, Oracle’s built-in function, VAR_SAMP, is used to calculate the sample variance of employee salaries within each department. Underneath the hood, the implementation is similar to that seen above in the manual calculation. Both methods return the same result.

-- Using the built-in function, VAR_SAMP to calculate the sample variance of employee salaries within each department.
SELECT
    hr.employees.job_id,
    VAR_SAMP(hr.employees.salary) AS sample_variance
FROM
    hr.employees
GROUP BY
    hr.employees.job_id;

job_id	sample_variance
AD_VP	0
FI_ACCOUNT	587000
PU_CLERK	57000
SH_CLERK	300289.4737
HR_REP	–
AC_MGR	–
PU_MAN	–
ST_CLERK	205552.6316
AD_ASST	–
IT_PROG	3708000
AC_ACCOUNT	–
SA_MAN	2325000
FI_MGR	–
ST_MAN	1137000
AD_PRES	–
MK_MAN	–
SA_REP	2323275.862
MK_REP	–
PR_REP	–

Calculating Sample Standard Deviation
When learning statistics, calculating variance is often tackled first because standard deviation is simply the square root of the variance number. I’ll use this fact to adjust the proposed manual solution above to calculate the standard deviation of employee salaries for each job ID value. I do cheat a little here by taking advantage of Oracle’s built-in square root function, SQRT. It’s worth noting that the units of the resulting standard deviation values are of the same units as the values used in the computation (e.g., $). Following the manual solution is a solution utilizing Oracle’s built-in function, STDDEV_SAMP.

-- Manually calculating sample standard deviation of employee salaries within each department.
WITH job_statistic AS (
    SELECT
        hr.employees.job_id,
        AVG(hr.employees.salary) AS average_salary
    FROM
        hr.employees
    GROUP BY
        hr.employees.job_id
),
employee_salary AS (
    SELECT
        hr.employees.job_id,
        hr.employees.salary
    FROM
        hr.employees
)
SELECT
    job_statistic.job_id,
    CASE
        WHEN COUNT(employee_salary.job_id) = 1
            THEN NULL
        ELSE SQRT(
    			SUM(
            	(employee_salary.salary - job_statistic.average_salary) * (employee_salary.salary - job_statistic.average_salary)
        		)
                / (COUNT(employee_salary.job_id) - 1)
    		)
    END AS sample_standard_deviation
FROM
    job_statistic
    INNER JOIN
    employee_salary
        ON job_statistic.job_id = employee_salary.job_id
GROUP BY
    job_statistic.job_id;

job_id	sample_standard_deviation
AD_VP	0
FI_ACCOUNT	766.1592524
PU_CLERK	238.7467277
SH_CLERK	547.9867459
HR_REP	–
AC_MGR	–
PU_MAN	–
ST_CLERK	453.3791257
AD_ASST	–
IT_PROG	1925.616784
AC_ACCOUNT	–
SA_MAN	1524.795068
FI_MGR	–
ST_MAN	1066.302021
AD_PRES	–
MK_MAN	–
SA_REP	1524.229596
MK_REP	–
PR_REP	–

As seen previously when calculating sample variance, using Oracle’s built-in functions can significantly reduce the amount of effort involved in completing a task.

-- Using the built-in function, STDDEV_SAMP to calculate the sample standard deviation of employee salaries within each department.
SELECT
    hr.employees.job_id,
    STDDEV_SAMP(hr.employees.salary) AS sample_standard_deviation
FROM
    hr.employees
GROUP BY
    hr.employees.job_id;

job_id	sample_standard_deviation
AD_VP	0
FI_ACCOUNT	766.1592524
PU_CLERK	238.7467277
SH_CLERK	547.9867459
HR_REP	–
AC_MGR	–
PU_MAN	–
ST_CLERK	453.3791257
AD_ASST	–
IT_PROG	1925.616784
AC_ACCOUNT	–
SA_MAN	1524.795068
FI_MGR	–
ST_MAN	1066.302021
AD_PRES	–
MK_MAN	–
SA_REP	1524.229596
MK_REP	–
PR_REP	–

While the numeric functions used above should not be considered an exhaustive list, I hope exploring those functions leads to a general understanding on the usefulness of built-in functions and the importance of being able to derive or compute something manually when a function to do so is not offered by a specific RDBMS.