## Random Variables

A **discrete** random variable is one whose set of possible values is countable.

## Probability Distributions for Discrete Random Variables

A **probability mass function** of a discrete rv is \(p(x)=P(X=x)\)
that satisfies \(0\le p(x)\le1\) and \(\sum_{x}p(x)=1\).

The **cumulative distribution function** (cdf) \(F(x)\) of a rv
\(X\) with pmf \(p(x)\) is given by
\(F(x)=P(X\le x)=\sum_{y\le x}p(y)\)

Note that the cdf is *not* continuous for a discrete distribution.

To get the pmf from the cdf: \(P(a\le x\le b)=F(b)-F(a^{-})\) where \(a^{-}\) is the largest value of \(x\) not equal to \(a\).

And \(P(X=a)=F(a)-F(a^{-})\)

## Expected Values of Discrete Random Variables

The **expected value** of a discrete distribution:
\(E(X)=\mu_{X}=\sum_{x\in D}xp(x)\) Beware of the word *expected*,
as its value may not be an allowed value for \(x\).

Any distribution that has a large amount of probability far from
\(\mu\) is said to have a **heavy tail**. This is trivially true
when \(\mu\) is infinite, but it need not be infinite to have a
heavy tail. Be careful when making inferences of such distributions!

The variance of a distribution:

Let \(h(x)\) be a function of \(X\). \(E[h(x)]=\sum_{x\in D}h(x)p(x)\)

The variance of \(h(x)\) is \(V[h(x)]=\sum_{x\in D}\{h(x)-E[h(x)]\}^{2}p(x)\)

Linear scaling of \(X\):

The **mode** of a discrete distribution is the value at which the
function is maximized.

## Standardized Variables

If we define \(Y=\frac{X-\mu_{X}}{\sigma_{X}}\), then \(\mu_{Y}=0,\sigma_{Y}=1\).

## Independence of Random Variables

If \(X\) and \(Y\) are random variables, then they are
**independent** if and only if
\(P\{X=i,Y=j\}=p_{X}(i)p_{Y}(j)\ \forall i,j\)

## Markov Inequality

Let \(Y\) be a nonnegative random variable. Then for \(c\ge0\):

Proof Sketch:

Let Y take on values \(u_{i}\ge0\) Note that:

Split the sum for \(0<u_{i}<c\) and \(u_{i}\ge c\) and use these inequalities.

## Chebyshev’s Inequality

**Chebyshev’s Inequality**: If \(k\ge 1\), then:

This gives an upper bound for being an integral number of \(\sigma\)‘s from the mean.

This is true for *any* discrete distribution. Because of its generality,
it is not a tight bound.

Proof sketch:

Let \(Y=\frac{X-\mu_{x}}{k\sigma_{X}}\). Then \(\mu_{Y}=0,\sigma_{Y}=1/k\).

Trivially, we know that:

Now the important thing to show is that \(p(|y|\ge1)=p(|X-\mu_{X}|\ge k\sigma_{X})\). This is not hard.

Looking at the sum again, note that

So each \(p(y)\) must be less than \(1/k^{2}\) for all \(y^{2}\ge1\). This concludes the proof.

Another way to prove it is to use Markov’s Inequality with \(Y=|X-\mu|^{2}\) and \(c=k^{2}\sigma^{2}\)