
We have a population. (E.g., voters in next election, who will vote Democrat or Republican).

We don't know the population mean. (E.g., fraction of voters who will vote Democrat).

We take several samples (observations). From them we want to estimate the population mean and standard deviation. (Ask 1000 potential voters; 520 say they will vote Democrat. Sample mean is .52)

We want error bounds on our estimates. (.52 plus or minus .04, 95 times out of 100)

Another application: testing whether 2 populations have the same mean. (Is this batch of Guiness as good as the last one?)

Observations cost money, so we want to do as few as possible.

This gets beyond this course, but the biggest problems may be nonmath ones. E.g., how do you pick a random likely voter? In the past phone books were used. In a famous 1936 Presidential poll, that biased against poor people, who voted for Roosevelt.

In probability, we know the parameters (e.g., mean and standard deviation) of a distribution and use them to compute the probability of some event.
E.g., if we toss a fair coin 4 times what's the probability of exactly 4 heads? Answer: 1/16.

In statistics we do not know all the parameters, though we usually know that type the distribution is, e.g., normal. (We often know the standard deviation.)
 We make observations about some members of the distribution, i.e., draw some samples.
 From them we estimate the unknown parameters.
 We often also compute a confidence interval on that estimate.
 E.g., we toss an unknown coin 100 times and see 60 heads. A good estimate for the probability of that coin coming up heads is 0.6.

Some estimators are better than others, though that gets beyond this course.
 Suppose I want to estimate the average height of an RPI student by measuring the heights of N random students.
 The mean of the highest and lowest heights of my N students would converge to the population mean as N increased.
 However the median of my sample would converge faster. Technically, the variance of the sample median is smaller than the variance of the sample hilo mean.
 The mean of my whole sample would converge the fastest. Technically, the variance of the sample mean is smaller than the variance of any other estimator of the population mean. That's why we use it.
 However perhaps the population's distribution is not normal. Then one of the other estimators might be better. It would be more robust.

(Enrichment) How to tell if the population is normal? We can do various plots of the observations and look. We can compute the probability that the observations would be this uneven if the population were normal.

An estimator may be biased. We have an distribution that is U[0,b] for unknown b. We take a sample. The max of the sample has a mean n/(n+1)b though it converges to b as n increases.

Example 8.2, page 413: Onetailed probability. This is the probability that the mean of our sample is at least so far above the population mean. $$\alpha = P[\overline{X_n}\mu > c] = Q\left( \frac{c}{\sigma_x / \sqrt{n} } \right)$$ Q is defined on page 169: $$Q(x) = \int_x^ { \infty} \frac{1}{\sqrt{2\pi} } e^{\frac{x^2}{2} } dx$$

Application: You sample n=100 students' verbal SAT scores, and see $ \overline{X} = 550$. You know that $\sigma=100$. If $\mu = 525$, what is the probability that $\overline{X_n} > 550$ ?
Answer: Q(2.5) = 0.006

This means that if we take 1000 random sample of students, each with 100 students, and measure each sample's mean, then, on average, 6 of those 1000 samples will have a mean over 550.

This is often worded as the probability of the population's mean being under 525 is 0.006, which is different. The problem with saying that is that presumes some probability distribution for the population mean.

The formula also works for the other tail, computing the probability that our sample mean is at least so far below the population mean.

The 2tail probability is the probability that our sample mean is at least this far away from the sample mean in either direction. It is twice the 1tail probability.

All this also works when you know the probability and want to know c, the cutoff.