Say we have data on home ownership in North Carolina, collected as yes/no
| Owns a home (1=Yes) | 
|---|
| 0 | 
| 1 | 
| 1 | 
| … | 
| 0 | 
We want to know about the proportion of home ownership in the state. Using the data, what might be our best guess?
Similarly, if I want to know about each of the parameters given below, what might be the best guess using a sample?
mean \(\mu\)
median \(\eta\)
standard deviation \(\sigma\)
These estimators (like \(\bar{x}\) for \(\mu\)) are intuitive. But sometimes it’s not so simple!
Let \(X_1, X_2, ..., X_n \stackrel{iid}{\sim} F_{\theta}\)
Then the joint distribution of (\(X_1, ..., X_n\)) is given by \(f_{\theta}(x_1,...,x_n)=\prod_{i=1}^{n} f_{\theta}(x_i)\)
Now, we define the likelihood function as a function of the fixed population parameter \(\theta\) : \(L(\theta)=f_{\theta}(X_1,...,X_n)=\prod_{i=1}^nf_{\theta}(X_i)\)
We want to find the value of \(\theta\) that maximizes the likelihood function and call it \(\hat{\theta}\)
Going back to our example of home ownership, let’s say we have n=100, where 70 own their home. What distribution can we assume for this variable?
We can use the Bernoulli distribution, so
\(\prod_{i=1}^{n}p^{x_i} (1-p)^{1-x_i}\) and \(L(p)=p^{70}(1-p)^{30}\)
We can plot \(L(p)=p^{70}(1-p)^{30}\) over a range of values of \(p\)
\(L(p) = \prod_{i=1}^{n}p^{x_i} (1-p)^{1-x_i}\)
Bias: Average distance from the population parameter
Standard Error (SE): standard deviation of the estimator (i.e., SD of the sampling distribution)
Mean Squared Error (MSE): combines standard error and bias
Consistency: the estimate converges to the true value as \(n \rightarrow \infty\)
Let \(X_1, X_2, ..., X_n \stackrel{iid}{\sim} Poisson(\lambda)\)
Derive the MLE, \(\hat{\lambda}\), and determine if this is an unbiased estimator of \(\lambda\)
\(P(X=x)=\frac{\lambda^x e^{-\lambda}}{x !}\)