.. title: Engineering Probability Class 28 Mon 2018-04-30
.. slug: class28
.. date: 2018-04-29
.. tags: mathjax
.. category: class
.. link:
.. description:
.. type: text
.. sectnum::
.. contents:: Table of contents
..
.. raw:: html
.. role:: red
.. role:: blue
Grades
------
#. I think I've responded to all grade emails. Please resend any that I overlooked.
#. Any that hasn't been complained about is resumed to be correct.
#. The conflict exam is Thurs May 10 at 3pm, in a room TBD. It is open only to students with conflicts who wrote me. If you're one of those students, but you don't plan to write it, then please tell me. E.g., a smaller room might then suffice.
#. We'll try to get updated guaranteed grades uploaded, so you can decide whether to write the final exam.
Material from text
------------------
Hypothesis testing
======================================================
#. Say we want to test whether the average height of an RPI student (called the population) is 2m.
#. We assume that the distribution is Gaussian (normal) and that the standard deviation of heights is, say, 0.2m.
#. However we don't know the mean.
#. We do an experiment and measure the heights of n=100 random students. Their mean height is, say, 1.9m.
#. The question on the table is, is the population mean 2m?
#. This is different from the earlier question that we analyzed, which was this: What is the most likely population mean? (Answer: 1.9m.)
#. Now we have a hypothesis (that the population mean is 2m) that we're testing.
#. The standard way that this is handled is as follows.
#. Define a null hypothesis, called H0, that the population mean is 2m.
#. Define an alternate hypothesis, called HA, that the population mean is not 2m.
#. Note that we observed our sample mean to be $0.5 \\sigma$ below the population mean, if H0 is true.
#. Each time we rerun the experiment (measure 100 students) we'll observe a different number.
#. We compute the probability that, if H0 is true, our sample mean would be this far from 2m.
#. Depending on what our underlying model of students is, we might use a 1-tail or a 2-tail probability.
#. Perhaps we think that the population mean might be less than 2m but it's not going to be more. Then a 1-tail distribution makes sense.
#. That is, our assumptions affect the results.
#. The probability is Q(5), which is very small.
#. Therefore we reject H0 and accept HA.
#. We make a type-1 error if we reject H0 and it was really true. See http://en.wikipedia.org/wiki/Type_I_and_type_II_errors
#. We make a type-2 error if we accept H0 and it was really false.
#. These two errors trade off: by reducing the probability of one we increase the probability of the other, for a given sample size.
#. E.g. in a criminal trial we prefer that a guilty person go free to having an innocent person convicted.
#. Rejecting H0 says nothing about what the population mean really is, just that it's not likely 2m.
#. Enrichment: Random sampling is hard. The US government got it wrong here: http://politics.slashdot.org/story/11/05/13/2249256/Algorithm-Glitch-Voids-Outcome-of-US-Green-Card-Lottery
#. Example 8.1 page 412.
#. Example 8.21 page 442.
#. Example 8.23.
Iclicker questions
------------------
#. Suppose that RPI students' heights have mean 1.8m and standard deviation 0.2m. (These are fictitious numbers.)
You measure a sample of 16 students, and compute the sample mean $m$.
What is E[m]?
a. 10
#. .2
#. .05
#. 9.8
#. 2.5
#. What is STD[m]?
a. 10
#. .2
#. .05
#. 9.8
#. 2.5
Counterintuitive things in statistics
-------------------------------------
Statistics has some surprising examples, which would appear to be impossible. Here are some.
#. Average income can increase faster in a whole country than in any part of the country.
a. Consider a country with two parts: east and west.
#. Each part has 100 people.
#. Each person in the west makes \\$100 per year; each person in the east \\$200.
#. The total income in the west is \\$10K, in the east \\$20K, and in the whole country \\$30K.
#. The average income in the west is \\$100, in the east \\$200, and in the whole country \\$150.
#. Assume that next year nothing changes except that one westerner moves east and gets an average eastern job, so he now makes \\$200 instead of \\$100.
#. The west now has 99 people @ \\$100; its average income didn't change.
#. The east now has 101 people @ \\$200; its average income didn't change.
#. The whole country's income is \\$30100 for an average of \\$150.50; that went up.
#. College acceptance rate surprise.
a. Imagine that we have two groups of people: Albanians and Bostonians.
#. They're applying to two programs at the university: Engineering and Humanities.
#. Here are the numbers. The fractions are accepted/applied.
========== ===== ===== =====
city-major Engin Human Total
========== ===== ===== =====
Albanians 11/15 2/5 13/20
Bostonians 4/5 7/15 11/20
Total 15/20 9/20 24/40
========== ===== ===== =====
E.g, 15 Albanians applied to Engin; 11 were accepted.
#. Note that in Engineering, a *smaller* fraction of Albanian applicants were accepted than Bostonian applicants. *(corrected)*
#. Ditto in Humanities.
#. However in all, a *larger* fraction of Albanian applicants were accepted than Bostonian applicants.
#. I could go on.
Relevant Xkcd comics
--------------------
#. `Meteorologist `_
#. `Significant `_
#. `P-Values `_
#. `Correlation `_
#. `Linear Regression `_
#. `Cell Phones `_
#. `Frequentists vs. Bayesians `_
#. `Seashell `_
#. `Conditional Risk `_
#. `Null Hypothesis `_