NUMERIC FUNCTIONS

Numerical data is ubiquitous as is our reliance upon it. It can be found in earnings reports used to convey past and future company performance, IT support tickets where the time to respond and successfully close the ticket is captured or can be derived, and even in the tracking of our own personal goals over time. Luckily, the pervasiveness of data and its ever-increasing nature has been embraced and systems have been created to effectively manage and make use of it. Relational database management systems (RDBMS) come equipped with built-in functions meant to handle numeric data and assist us in answering questions based on, in full or in part, the data itself.

Let’s begin exploring numeric functions by tackling a classic programming problem where the task is to classify integer values as even or odd. The schemas, corresponding tables, and data used in the examples can be found at livesql.oracle.com.

Classifying Integers as Even or Odd

SELECT
	oe.orders.order_id,
	CASE MOD(oe.orders.order_id, 2)
		WHEN 0
			THEN 'Even'
		WHEN 1
    		THEN 'Odd'
    	ELSE NULL
	END AS order_id_type
FROM
	oe.orders
FETCH FIRST 5 ROWS ONLY;
order_idorder_id_type
2354Even
2355Odd
2356Even
2357Odd
2358Even

In the example above, the MOD function has been used to classify whether the pre-existing order ID values are even or odd. The function has two parameters, the numerator and denominator, and returns the remainder of the resulting division. This function is particularly useful because it provides a straightforward way to complete the task. If the denominator is two and the remainder is zero, then the order ID or first argument passed must be even. Otherwise, the value must be odd when the denominator is two. The example below is a bit more involved. It includes a recursive common table expression (CTE) and the use ROUND and DBMS_RANDOM.VALUE functions to generate one hundred random integer values. Subsequently, the MOD function is used to classify each integer as even or odd. For an in-depth explanation on this example, see my page on recursion.

/*
DBMS_RANDOM.VALUE() is used to generate 100 random integers. Subsequently, MOD is used to classify each integer as even or odd.
*/
WITH random_number_generator (iteration, random_number) AS (
    SELECT
    	1,
    	ROUND(DBMS_RANDOM.VALUE(0, 100))
    FROM
    	dual
    UNION ALL
    SELECT
    	iteration + 1,
    	ROUND(DBMS_RANDOM.VALUE(0, 100))
    FROM
        random_number_generator
    WHERE
    	random_number_generator.iteration < 100  
)
SELECT
	random_number_generator.iteration,
    random_number_generator.random_number,
    CASE MOD(random_number_generator.random_number, 2)
		WHEN 0
			THEN 'Even'
		WHEN 1
    		THEN 'Odd'
    	ELSE NULL
	END AS integer_type
FROM
	random_number_generator;
iterationrandom_numberinteger_type
198Even
213Odd
316Even
428Even
545Odd
674Even
791Odd
873Odd
981Odd
1048Even
116Even
124Even
132Even
145Odd
1574Even
1613Odd
1718Even
1854Even
196Even
2066Even
2135Odd
2278Even
2331Odd
2483Odd
2575Odd
2667Odd
2755Odd
2850Even
2944Even
3027Odd
3160Even
3269Odd
3343Odd
3475Odd
3582Even
3689Odd
3755Odd
3848Even
3967Odd
4020Even
4175Odd
426Even
431Odd
4438Even
4554Even
4696Even
4747Odd
4822Even
4934Even
5042Even
5118Even
5234Even
5364Even
5461Odd
554Even
5673Odd
5780Even
5871Odd
5978Even
6089Odd
610Even
6251Odd
6341Odd
6424Even
6523Odd
6688Even
6793Odd
6858Even
6952Even
7041Odd
7133Odd
7293Odd
7361Odd
747Odd
753Odd
7690Even
7774Even
7878Even
796Even
809Odd
819Odd
8273Odd
8384Even
848Even
8588Even
8664Even
8742Even
8823Odd
8946Even
9060Even
9178Even
9256Even
9398Even
948Even
9544Even
9698Even
9794Even
9894Even
9995Odd
1007Odd

Rounding Numeric Values
One of the most basic transformations conceptually and in its implementation for numeric data is rounding. It’s highly likely that you’ve had the need to do this type of operation and have done so in other programming languages or software (e.g., Tableau, Microsoft Excel). Using SQL, the task is no more difficult. Built-in functions, such as ROUND, CEIL (i.e., ceiling), and FLOOR are typically available to RDBMS users in some form. ROUND is used to round numeric values in the most common sense and accepts two arguments, the value to be rounded and the integer value associated with the number of places to the right of the decimal point. It allows us to round a numeric value to the nearest integer value, tenth, hundredth, thousandth, etc. CEIL and FLOOR functions behave a bit differently. CEIL will round any non-whole number upward to the next integer value while FLOOR will round any non-whole number downward to the previous integer value. The following example code and corresponding result set illustrates the functionality afforded by ROUND, CEIL, and FLOOR functions given the original total sales value corresponding to each sales representative.

SELECT
    oe.orders.sales_rep_id,
    SUM(oe.orders.order_total) AS total_sales,
	ROUND(SUM(oe.orders.order_total), 0) AS round,
    CEIL(SUM(oe.orders.order_total)) AS ceiling,
    FLOOR(SUM(oe.orders.order_total)) AS floor
FROM
	oe.orders
WHERE
    oe.orders.sales_rep_id IS NOT NULL
GROUP BY
    oe.orders.sales_rep_id;
sales_rep_idtotal_salesroundceilingfloor
158156296.2156296156297156296
161661734.5661735661735661734
159151167.2151167151168151167
163128249.5128250128250128249
155134415.2134415134416134415
156202617.6202618202618202617
153114215.7114216114216114215
154171973.1171973171974171973
16088238.4882388823988238

Calculating Mean Absolute Deviation (MAD)
In statistics, the mean absolution deviation is the average distance between each data point and the mean of the data. In the example below, the common table expression (CTE) by the name of statistic is responsible for computing the average order total across all orders. Subsequently, the summation of the absolute differences between the average order total value and each order total value is computed. This row-level difference of the mean and each order total value is made possible by the CROSS JOIN. If you’re not familiar with the CROSS JOIN, then please consider checking out my page dedicated to the topic. Moving on, the result of that sum is then divided by the number of rows (i.e., observations or order total values) and finally rounded to the nearest hundredth using Oracle’s built-in function, ROUND.

WITH statistic AS (
    SELECT
    	AVG(oe.orders.order_total) AS average_order_total
    FROM
        oe.orders
)
SELECT
    ROUND(
        SUM(
        	ABS(oe.orders.order_total - statistic.average_order_total)
        )
        	/ COUNT(*),
    	2
    ) AS mad
FROM
	oe.orders
	CROSS JOIN
	statistic;
mad
32812.29

Calculating Sample Variance
Like mean absolution deviation, we can calculate the sample variance using SQL. Sample variance is a measure of variability and can be calculated by taking the average of the squared deviations from the mean. It tells us how far each data point is from the mean of the data. In the following example, I will first calculate the sample variance manually, without a built-in function. Where possible, it’s important to not allow the lack of built-in functions provided by a specific RDBMS to limit analyses. Following the manual calculation, I will use the built-in function provided by Oracle. These functions are convenient in that the heavy lifting in terms of implementing the functionality is transferred from the developer or code writer to the RDBMS.

In the example below, there exists two CTEs. The first CTE, job_statistic, calculates the average salary observed for those employees corresponding to each job ID value. The second CTE, employee_salary, simply retrieves the job ID and salary for each employee. The outputs from both CTEs are necessary to compute the difference between each employee’s salary and the average salary of those employees with the same job. This is observed in the SELECT statement following the initial CTEs. In the SELECT statement, the JOIN and corresponding ON clause ensure the correct differences are computed. Correct meaning the average salary for the job corresponding to the specific employee’s job is used. Additionally, a CASE expression is used to determine the number of employees corresponding to each job ID. When the number of employees holding a specific job is one, the sample variance value returned is NULL. Otherwise, the average of the squared deviations from the mean is returned.

-- Manually calculating sample variance of employee salaries within each department.
WITH job_statistic AS (
    SELECT
        hr.employees.job_id,
        AVG(hr.employees.salary) AS average_salary
    FROM
        hr.employees
    GROUP BY
        hr.employees.job_id
),
employee_salary AS (
    SELECT
        hr.employees.job_id,
        hr.employees.salary
    FROM
        hr.employees
)
SELECT
    job_statistic.job_id,
    CASE
        WHEN COUNT(employee_salary.job_id) = 1
            THEN NULL
        ELSE SUM(
            (employee_salary.salary - job_statistic.average_salary) * (employee_salary.salary - job_statistic.average_salary)
        )
            / (COUNT(employee_salary.job_id) - 1)
    END AS sample_variance
FROM
    job_statistic
    INNER JOIN
    employee_salary
        ON job_statistic.job_id = employee_salary.job_id
GROUP BY
    job_statistic.job_id;
job_idsample_variance
AD_VP0
FI_ACCOUNT587000
PU_CLERK57000
SH_CLERK300289.4737
HR_REP
AC_MGR
PU_MAN
ST_CLERK205552.6316
AD_ASST
IT_PROG3708000
AC_ACCOUNT
SA_MAN2325000
FI_MGR
ST_MAN1137000
AD_PRES
MK_MAN
SA_REP2323275.862
MK_REP
PR_REP

In the example code below, Oracle’s built-in function, VAR_SAMP, is used to calculate the sample variance of employee salaries within each department. Underneath the hood, the implementation is similar to that seen above in the manual calculation. Both methods return the same result.

-- Using the built-in function, VAR_SAMP to calculate the sample variance of employee salaries within each department.
SELECT
    hr.employees.job_id,
    VAR_SAMP(hr.employees.salary) AS sample_variance
FROM
    hr.employees
GROUP BY
    hr.employees.job_id;
job_idsample_variance
AD_VP0
FI_ACCOUNT587000
PU_CLERK57000
SH_CLERK300289.4737
HR_REP
AC_MGR
PU_MAN
ST_CLERK205552.6316
AD_ASST
IT_PROG3708000
AC_ACCOUNT
SA_MAN2325000
FI_MGR
ST_MAN1137000
AD_PRES
MK_MAN
SA_REP2323275.862
MK_REP
PR_REP

Calculating Sample Standard Deviation
When learning statistics, calculating variance is often tackled first because standard deviation is simply the square root of the variance number. I’ll use this fact to adjust the proposed manual solution above to calculate the standard deviation of employee salaries for each job ID value. I do cheat a little here by taking advantage of Oracle’s built-in square root function, SQRT. It’s worth noting that the units of the resulting standard deviation values are of the same units as the values used in the computation (e.g., $). Following the manual solution is a solution utilizing Oracle’s built-in function, STDDEV_SAMP.

-- Manually calculating sample standard deviation of employee salaries within each department.
WITH job_statistic AS (
    SELECT
        hr.employees.job_id,
        AVG(hr.employees.salary) AS average_salary
    FROM
        hr.employees
    GROUP BY
        hr.employees.job_id
),
employee_salary AS (
    SELECT
        hr.employees.job_id,
        hr.employees.salary
    FROM
        hr.employees
)
SELECT
    job_statistic.job_id,
    CASE
        WHEN COUNT(employee_salary.job_id) = 1
            THEN NULL
        ELSE SQRT(
    			SUM(
            	(employee_salary.salary - job_statistic.average_salary) * (employee_salary.salary - job_statistic.average_salary)
        		)
                / (COUNT(employee_salary.job_id) - 1)
    		)
    END AS sample_standard_deviation
FROM
    job_statistic
    INNER JOIN
    employee_salary
        ON job_statistic.job_id = employee_salary.job_id
GROUP BY
    job_statistic.job_id;
job_idsample_standard_deviation
AD_VP0
FI_ACCOUNT766.1592524
PU_CLERK238.7467277
SH_CLERK547.9867459
HR_REP
AC_MGR
PU_MAN
ST_CLERK453.3791257
AD_ASST
IT_PROG1925.616784
AC_ACCOUNT
SA_MAN1524.795068
FI_MGR
ST_MAN1066.302021
AD_PRES
MK_MAN
SA_REP1524.229596
MK_REP
PR_REP

As seen previously when calculating sample variance, using Oracle’s built-in functions can significantly reduce the amount of effort involved in completing a task.

-- Using the built-in function, STDDEV_SAMP to calculate the sample standard deviation of employee salaries within each department.
SELECT
    hr.employees.job_id,
    STDDEV_SAMP(hr.employees.salary) AS sample_standard_deviation
FROM
    hr.employees
GROUP BY
    hr.employees.job_id;
job_idsample_standard_deviation
AD_VP0
FI_ACCOUNT766.1592524
PU_CLERK238.7467277
SH_CLERK547.9867459
HR_REP
AC_MGR
PU_MAN
ST_CLERK453.3791257
AD_ASST
IT_PROG1925.616784
AC_ACCOUNT
SA_MAN1524.795068
FI_MGR
ST_MAN1066.302021
AD_PRES
MK_MAN
SA_REP1524.229596
MK_REP
PR_REP

While the numeric functions used above should not be considered an exhaustive list, I hope exploring those functions leads to a general understanding on the usefulness of built-in functions and the importance of being able to derive or compute something manually when a function to do so is not offered by a specific RDBMS.