Expected Values, Covariance and Correlation

\(\newcommand{\Cov}{\mathrm{Cov}}\) \(\newcommand{\Corr}{\mathrm{Corr}}\)

Expected Value

The expected value of a function \(h(X,Y)\) is \(\int\int h(x,y)f(x,y)\ dxdy\).

Covariance

The covariance between \(X\) and \(Y\) is:

\begin{equation*} \Cov(X,Y)=E[(x-\mu_{X})(y-\mu_{Y})]=\int\int(x-\mu_{X})(y-\mu_{Y})f(x,y)\ dxdy \end{equation*}

It is a measure of how closely related the two variables are. I suppose you could also think of it as a generalized variance.

Strong positive relationship is a positive number
Strong negative relationship is a negative number
No relationship is a number close to 0

Another way of calculating it:

\begin{equation*} \Cov(X,Y)=E(XY)-\mu_{X}\mu_{Y} \end{equation*}

Note that \(\Cov(X,X)=\sigma_{X}^{2}\)

Note that \(\Cov(X,Y)^{2}\le\sigma_{X}^{2}\sigma_{Y}^{2}\)

This follows from the Cauchy-Schwarz Inequality, and follows from the fact that the covariance follows all the properties of an inner product.

Correlation Coefficient

The problem with the covariance is that it depends on the units of the variables. So instead we use the correlation coefficient:

\begin{equation*} \Corr(X,Y)=\rho_{X,Y}=\frac{\Cov(X,Y)}{\sigma_{X}\sigma_{Y}} \end{equation*}

If \(a,c\) are of the same sign, then \(\Corr(aX+b,cY+d)=\Corr(X,Y)\)

Also, \(-1\le\Corr(X,Y)\le1\)

If \(|\rho|\ge0.8\), we say the correlation is strong.
If \(0.5<|\rho|<0.8\), we say the correlation is moderate.
If \(|\rho|\le0.5\), we say the correlation is weak.

These are rules of thumbs and they vary from discipline to discipline.

If \(X,Y\) are independent, then the coefficient is 0. However, the reverse need not be true. You can have a strongly dependent random variable set whose correlation coefficient is 0.

\(\rho=0\) is called uncorrelated, even when they are highly dependent.

\(\rho=\pm 1\) iff \(Y=aX+b\). Thus, \(\rho\) is a measure of the degree of linear relationship.

Note that if \(X,Y\) are independent, \(E(XY)=E(X)E(Y)\). This is used to show that \(\Corr(X,Y)=0\) for independent variables.