COVAR_POP & COVAR_SAMP Analytic Functions in Oracle SQL
COVARIANCE is a measure of the relationship between two random variables. Measure of the variance between two variables, the variance of one variable equals the variance of the other variable.
The variance can be positive or negative values as follows:
Positive covariance: Indicates that two variables tend to move in the same direction.
Negative covariance: Reveals that two variables tend to move in inverse directions.
Difference:
COVARIANCE measures the total variation of two random variables from their expected values.
CORRELATION measures the strength of the relationship between variables.
COVAR_POP returns the population covariance of a set of data.
For population covariance formula:
COVAR_POP(expr1,expr2) = (SUM(expr1 * expr2) – SUM(expr2) * SUM(expr1) / n) / n
COVAR_SAMP returns the sample covariance of a set of data.
For sample covariance formula:
COVAR_SAMP(expr1,expr2) = (SUM(expr1 * expr2) – SUM(expr1) * SUM(expr2) / n) / (n-1)
Xi – the values of the X-variable
Yj – the values of the Y-variable
X̄ – the mean (average) of the X-variable
Ȳ – the mean (average) of the Y-variable
n – the number of the data points
Example of COVARIANCE
Employee table check the covariance with hire_data and salary going in same direction with respect to Job_id.
SELECT job_id,
COVAR_POP(SYSDATE-hire_date, salary) AS covar_pop,
COVAR_SAMP(SYSDATE-hire_date, salary) AS covar_samp
FROM employees
WHERE department_id in (50, 80)
GROUP BY job_id
ORDER BY job_id, covar_pop, covar_samp;
JOB_ID COVAR_POP COVAR_SAMP ---------- ---------- ---------- SA_MAN 660700 825875 SA_REP 579988.466 600702.34 SH_CLERK 212432.5 223613.158 ST_CLERK 176577.25 185870.789 ST_MAN 436092 545115
Take example of share market
Suppose two companies A and B having stock history in previous 4 year.
User want to check the both are going in same direction otherwise he invest in other company. Find COVARIANCE from last 4 year data.
Company stocks detail of A and B companies:
select * from stocks;
COMPANY VALUE YEAR ---------- ---------- ---------- A 1500 2015 A 1700 2016 A 1600 2017 A 2000 2018 B 700 2015 B 1000 2016 B 1100 2017 B 1200 2018
Calculate covariance with self join and COVAR_SAMP function with over().
It return positive value, its mean both company going in same direction.
select a.company,b.company, COVAR_SAMP(a.value,b.value) over() "Sample covariance",COVAR_POP(a.value,b.value) over() "Population covariance" from stocks a , stocks b
where a.year=b.year and a.company='A' and b.company='B';
COMPANY COMPANY Sample covariance Population covariance ---------- ---------- ----------------- --------------------- A B 36666.6667 27500 A B 36666.6667 27500 A B 36666.6667 27500 A B 36666.6667 27500
Example show manually calculate by formula provided above:
1. Find Mean value of both companies A and B.
A = (1500+1700+1600+2000)/4 = 1700
B = (700+1000+1100+1200)/4 = 1000
2. Difference between each value and mean price as shown in below table and multiply that difference A*B
Year A B A*B ------ ------ -------- ---------- 2015 -200 -300 60000 2016 0 0 0 2017 -100 100 -10000 2018 300 200 60000
3. Sum of multiple column(A*B).
Sum (A*B column) = 110000
4. Find sample co-variance.
*n = 4 no of rows.
Sample Coverence = 110000 /(4-1) = 36666.6667
Population Coverence = 110000/4 = 27500
Note: Its exact match with Oracle COVAR_SAMP & COVAR_POP function.