Discrete Random Variables

Random Variables

A discrete random variable is one whose set of possible values is countable.

Probability Distributions for Discrete Random Variables

A probability mass function of a discrete rv is \(p(x)=P(X=x)\) that satisfies \(0\le p(x)\le1\) and \(\sum_{x}p(x)=1\).

The cumulative distribution function (cdf) \(F(x)\) of a rv \(X\) with pmf \(p(x)\) is given by \(F(x)=P(X\le x)=\sum_{y\le x}p(y)\)

Note that the cdf is not continuous for a discrete distribution.

To get the pmf from the cdf: \(P(a\le x\le b)=F(b)-F(a^{-})\) where \(a^{-}\) is the largest value of \(x\) not equal to \(a\).

And \(P(X=a)=F(a)-F(a^{-})\)

Expected Values of Discrete Random Variables

The expected value of a discrete distribution: \(E(X)=\mu_{X}=\sum_{x\in D}xp(x)\) Beware of the word expected, as its value may not be an allowed value for \(x\).

Any distribution that has a large amount of probability far from \(\mu\) is said to have a heavy tail. This is trivially true when \(\mu\) is infinite, but it need not be infinite to have a heavy tail. Be careful when making inferences of such distributions!

The variance of a distribution:

\begin{equation*} V(X)=\sigma_{X}^{2}=\sum_{x\in D}(x-\mu)^{2}p(x)=E(X^{2})-[E(X)]^{2} \end{equation*}

Let \(h(x)\) be a function of \(X\). \(E[h(x)]=\sum_{x\in D}h(x)p(x)\)

The variance of \(h(x)\) is \(V[h(x)]=\sum_{x\in D}\{h(x)-E[h(x)]\}^{2}p(x)\)

Linear scaling of \(X\):

\begin{equation*} E(aX+b)=a\mu_{X}+b \end{equation*}

\begin{equation*} V(aX+b)=a^{2}\sigma_{X}^{2} \end{equation*}

The mode of a discrete distribution is the value at which the function is maximized.

Standardized Variables

If we define \(Y=\frac{X-\mu_{X}}{\sigma_{X}}\), then \(\mu_{Y}=0,\sigma_{Y}=1\).

Independence of Random Variables

If \(X\) and \(Y\) are random variables, then they are independent if and only if \(P\{X=i,Y=j\}=p_{X}(i)p_{Y}(j)\ \forall i,j\)

Markov Inequality

Let \(Y\) be a nonnegative random variable. Then for \(c\ge0\):

\begin{equation*} P(Y\ge c)\le\frac{\mu_{Y}}{c} \end{equation*}

Proof Sketch:

Let Y take on values \(u_{i}\ge0\) Note that:

\begin{equation*} \mu_{Y}=\sum_{i}u_{i}P(u_{i}) \end{equation*}

Split the sum for \(0<u_{i}<c\) and \(u_{i}\ge c\) and use these inequalities.

Chebyshev’s Inequality

Chebyshev’s Inequality: If \(k\ge 1\), then:

\begin{equation*} P(|x-\mu|\ge k\sigma)\le\frac{1}{k^{2}} \end{equation*}

This gives an upper bound for being an integral number of \(\sigma\)‘s from the mean.

This is true for any discrete distribution. Because of its generality, it is not a tight bound.

Proof sketch:

Let \(Y=\frac{X-\mu_{x}}{k\sigma_{X}}\). Then \(\mu_{Y}=0,\sigma_{Y}=1/k\).

Trivially, we know that:

\begin{equation*} \sum_{|y|\ge1}y^{2}p(y)\le\sum_{y}y^{2}p(y)=\sigma_{Y}^{2}=\frac{1}{k^{2}} \end{equation*}

Now the important thing to show is that \(p(|y|\ge1)=p(|X-\mu_{X}|\ge k\sigma_{X})\). This is not hard.

Looking at the sum again, note that

\begin{equation*} \sum_{|y|\ge1}p(y)\le\sum_{|y|\ge1}y^{2}p(y)\le\frac{1}{k^{2}} \end{equation*}

So each \(p(y)\) must be less than \(1/k^{2}\) for all \(y^{2}\ge1\). This concludes the proof.

Another way to prove it is to use Markov’s Inequality with \(Y=|X-\mu|^{2}\) and \(c=k^{2}\sigma^{2}\)