\(\newcommand{\Cov}{\mathrm{Cov}}\) \(\newcommand{\Corr}{\mathrm{Corr}}\) \(\newcommand{\Sample}{X_{1},\dots,X_{n}}\)

## The Method of Moments

Let \(\Sample\) be a random sample from a pmf or a pdf. For
\(k=1,2,3,\dots\), the **kth moment** is
\(\frac{1}{n}\sum_{i=1}^{n}X_{i}^{k}\). So the first moment is the mean.

If the pdf is \(f(x;\theta_{1},\dots,\theta_{m})\) then we sample
\(\theta_{i}\) by equating the first \(m\) sample moments to the
first \(m\) population moments and solving. These are called
**moment estimators**.

Note that this method could lead to problems like a negative value for a parameter when the distribution requires a positive value.

## Maximum Likelihood Estimation

This method is preferred when the sample size is large.

Let \(\Sample\) have a joint pdf:
\(f(x_{1},x_{2},\dots,x_{n};\theta_{1},\dots,\theta_{m})\) where
\(\theta_{i}\) are unknown. If \(x_{i}\) are observed values,
the above is called the **likelihood function**. The maximum likelihood
estimates are the values of \(\theta_{i}\) that maximize the
likelihood function.

In words, find the set of parameters that maximizes the probability of observing this particular sample.

Note that these methods may yield biased estimators.

A common trick: Take \(\ln f(x)\) to assist in calculating the maximum. This works as \(f(x)\) is always positive.

To use the MLE, you need to know the underlying distribution.

### Example

As an example, let \(\Sample\) form an exponential distribution. Then:

Differentiate with respect to \(\lambda\).

The MLE for:

### Exponential MLE

The exponential MLE for \(\lambda\). is \(\frac{1}{\bar{x}}\)

### Binomial MLE

The binomial MLE for \(p\) is \(x/n\) where \(x\) is the number of successes.

### Normal MLE

For the normal MLE:

- \(\hat{\mu}=\bar{x}\)
- \(\hat{\sigma}^{2}=\sum\frac{\left(x_{i}-\mu\right)^{2}}{n}\):
- Note that this expression uses the unknown \(\mu\). We often just swap it for the sample mean.
- If using the sample mean, then this is clearly a biased estimator. It is, however, consistent (see below).

## The Invariance Principle

Let \(\hat{\theta_{i}},\dots,\hat{\theta_{m}}\) be the mle’s of the parameters \(\theta_{i},\dots,\theta_{m}\). Then the mle of any function \(h(\theta_{1},\dots,\theta_{m})\) is \(h(\hat{\theta_{i}},\dots,\hat{\theta_{m}})\)

Note that in some distributions, the mle of the mean is *not*
\(\bar{x}\).

## A Desirable Property of the Maximum Likelihood Estimate

In general when \(n\) is large, the mle of \(\theta\) is close to the MVUE of \(\theta\) (i.e. even if biased, it is almost unbiased).

## Some Complications

Occasionally, calculus will fail you when trying to calculate the MLE.

Also, you need to know the distribution.

Some problems will yield multiple solutions to the MLE problem. In other cases, there is no maximum.

Occasionally, you can get a nonsensical solution.

## Consistency

An estimator \(\hat{\theta}\) is said to be **consistent** if
\(\forall\epsilon>0,P(|\hat{\theta}-\theta|\ge\epsilon)\rightarrow 0\)
as \(n\rightarrow\infty\).

\(\bar{X}\) is a consistent estimator for \(\mu\) if \(\sigma^{2}<\infty\). Sketch of proof: Use Chebyshev’s Inequality and note that \(\sigma_{\bar{x}}^{2}=\frac{\sigma^{2}}{n^{2}}\)

An MLE is consistent if:

**Identification**. If the parameters for the distribution differ, then so does the distribution. In other words, no two sets of \(\{\theta_{i}\}\) result in the same distribution.**Compactness**of the parameter space. This ensures a unique maximum, as well as no limit existing that comes arbitrarily close to the maximum. This is not a necessary condition.- Continuous for almost all values of \(x\).
**Dominance**: There exists an integrable \(D(x)\) such that \(\ln f(x;\Theta)<D(x)\ \forall\theta\in\Theta\)