This is the homepage of course ECSE-2500 Engineering Probability, Rensselaer Polytechnic Institute, Spring 2011.

Loading...

Search all EngProbSpring2011 pages:

Syllabus

Click here for Syllabus page, or click on the section heading below.

Course calendar

This Google calendar will list course due dates. You may display it together with other Google calendars or import into into various other calendar programs, like Thunderbird.

Homeworks

There will be about a dozen homeworks. Email your solutions to wrfranklin+homework ATgmail.com , replacing AT with @.

#	Due	Questions	Answers
1	Feb 1	Homework 1	hw1sol.pdf
2	Feb 8	Homework 2	hw2sol.pdf
3	Feb 15	Homework 3	hw3sol.pdf
4	Feb 22	Homework 4	hw4sol.pdf
5	Mar 1	Homework 5	hw5sol.pdf
6	Mar 29	Homework 6	hw6sol.pdf
7	Apr 22	Homework 7	hw7sol.pdf
8	Apr 29	Homework 8	hw8sol.pdf

Lectures, Jan - Feb

Lectures March

Lecture 11, Tues Mar 1: Exam 1

Closed book but a calculator and one 2-sided letter-paper-size note sheet is allowed.
Material is from chapters 1-3.
Questions will be based on book, class, and homework, examples and exercises.
The hard part for you may be deciding what formula to use.
Any calculations will (IMHO) be easy.
Speed should not be a problem; most people should finish in 1/2 the time.
Last year's exam is here.
This exam is here.

Lecture 12, Fri Mar 4

Anyone finding an error in this wiki (more than a simple typo) or in my lecture will get a little extra credit. I welcome corrections in class. If you find an error, come down to the front after class to sign a log. If you find an error in the web site, email me.
I switched to a new math engine for this wiki (from jsMath to mathjax, its replacement). It should be compatible. However, since you have many different browsers, please report any problems.
Chapter 4.
1. I will try to ignore most of the theory at the start of the chapter.
2. Now we will see continuous random variables. The probability of the r.v being any exact value is infinitesimal, so we talk about the probability that it's in a range.
3. Sometimes there are mixed discrete and continuous r.v. Let X be the time X to get a taxi at the airport. 80% of the time a taxi is already there, so p(X=0)=.8. Otherwise we wait a uniform time from 0 to 20 minutes, so p(a<x<b)=.01(b-a), for 0<a<b<20.
4. Remember that for discrete r.v. we have a probability mass function (pmf).
5. For continuous r.v. we now have a probability density function (pdf), f_X(x).
6. p(a<x<a+da) = f(a)da
7. For any r.v., we have a cumulative distribution function (cdf) F_X(x)
8. The subscript is interesting only when we are using more than one cdf and need to tell them apart.
9. Definition: F(x) = P(X≤x).
10. The ≤ is relevant only for discrete r.v.
11. As usual Wikipedia isn't bad, and is deeper than we need here. http://en.wikipedia.org/wiki/Cumulative_distribution_function
12. We compute means and other moments by the obvious integrals.
iclicker. For the taxi example, what is F(0)?
1. 0
2. .2
3. .8
4. .81
5. 1
iclicker. For the taxi example, what is F(1)?
1. 0
2. .8
3. .81
4. .9
5. 1
Simple continuous r.v. examples: uniform, exponential.
The exponential distribution complements the Poisson distribution. The Poisson describes the number of arrivals per unit time. The exponential describes the distribution of the times between consecutive arrivals.
The most common continuous distribution is the normal distribution.
Conditional probabilities work the same with continuous distributions as with discrete distributions.

Lecture 13, Tues Mar 8

Extracurricular hike: On Sat April 16, Jeff Trinkle, some other people, and I are leading a hike to an interesting place TBD, possible Mt Greylock. Please RSVP to fdrc@rpi.edu. Space is limited (and this hike is being announced in various places). This is an chance for profs and students to informally meet.
If this is popular, we might lead more.
Sina will be available this Fri 4-5pm in the Flip Flop lounge to answer questions about grading. We will also set up another time after spring break.
In honor of spring break, no homework is due.
Today: more chapter 4.

Lecture 14, Fri Mar 11

Solutions to exam 1 are online: Exam1
Midterm status report emailed. If you didn't get one, give me your correct address (and check Respite; it once blocked a personal message to me from the Provost).
On that email, I said HW4 for HW5. No bonus points for pointing that one out.
If you need more help:
1. See Hang during his office hours.
2. Email any TA to meet at other times.
3. Talk to me after class. I stay until 4pm most days (but not today).
4. Email me with questions, or to set up a phone call or a meeting at a different time.
I want everyone to succeed. That doesn't mean that this is a free ride, since I might be using a product that you design after you graduate and don't want to be killed by it. RPI grad Theodore Cooper's carelessness killed 80 people in 1907, when the Quebec Bridge collapsed. He was the most famous American bridge designer at that time, but was old and complacent and trying to save money for his client. Luckily that was the most recent example I know of. Most of our grads are excellent.
Using Matlab: Matlab, Mathematica, and Maple all will help you do problems too big to do by hand. I'll demo Matlab since IMO more of the class knows it.
Iclicker. Which of the following do you prefer to use?
1. Matlab
2. Maple
3. Mathematica
4. Paper. It was good enough for Bernoulli and Gauss; it's good enough for me.
5. Something else (please email about it me after the class).

Matlab

Major functions:

 
cdf(dist,X,A,...)
pdf(dist,X,A,...)

Common cases of dist (there are many others):

 
'Binomial'
'Exponential'
'Poisson'
'Normal'
'Geometric'
'Uniform'
'Discrete Uniform'

Examples

pdf('Normal',-2:2,0,1)
cdf('Normal',-2:2,0,1)

p=0.2
n=10
k=0:10
bp=pdf('Binomial',k,n,p)
bar(k,bp)
grid on

bc=cdf('Binomial',k,n,p)
bar(k,bc)
grid on

x=-3:.2:3
np=pdf('Normal',x,0,1)
plot(x,np)

Interactive GUI to explore distributions: disttool

Random numbers:

 
rand(3)
rand(1,5)
randn(1,10)
randn(1,10)*100+500
randi(100,4)

Interactive GUI to explore random numbers: randtool

Plotting two things at once:

 
x=-3:.2:3
n1=pdf('Normal',x,0,1)
n2=pdf('Normal',x,0,2)
plot(x,n1,n2)
plot(x,n1,x,n2)
plot(x,n1,'--r',x,n2,'.g')

Use Matlab to understand Exam 1, question 4.
Use Matlab to compute a geometric pdf w/o using the builtin function.
Matlab workspace, containing the variables: 0311.mat.

Lecture 15, Tues Mar 22

Webpage format updates: My goals are to make this site readable on everything from a wide screen laptop to a mobile device, while packing as much info as possible on the screen. It's not perfect, but is closer to that now, provided that you're using Mozilla or a related browser that implements column-width. It works on my Droid in landscape mode. If the text size is not too large, no horizontal scrolling is needed. The number of columns changes depending on your screen and font size. Sorry, Internet Explorer users. Also, now in Mozilla you don't need to click a special button to format the page for printing. (The problem was that Mozilla doesn't implement the CSS overflow attribute correctly.)
Homework 6 is out.
Review: This question is about tires on your car going flat when you drive on remote gravel roads like the Trans-Labrador Highway.
1. Use discrete probability distributions for simplicity. We'll redo this with continuous distributions later.
2. The geometric and negative binomial distributions might be relevant. However the textbook, page 116, defines negative binomial slightly differently from Matlab, on the nbinpdf help page.
3. Your car needs 4 good tires to operate.
4. The probability of one specific tire going flat in one day is {$p=0.01$} .
5. What is the expected lifetime of a tire?
6. If you have no spare tires, what is the probability that your car will be disabled in one day?
7. What is the mean time until your car is disabled?
8. If you have one spare tire, what is the probability that your car will be disabled in one day?
9. and what is the mean time until your car is disabled?
How many spare tires...
Plot the mean time to disablement vs the number of spare tires, from 0 to 5 spare tires.
Review: Markov and Chebyshev inequalities.
1. Your web server averages 10 hits/second.
2. It will crash if it gets 20 hits.
3. By the Markov inequality, that has a probability at most 0.5.
4. That is way way too conservative, but it makes no assumptions about the distribution of hits.
5. For the Chebyshev inequality, assume that the variance is 10.
6. It gives the probability of crashing at under 0.1. That is tighter.
7. Assuming the distribution is Poisson with a=10, use Matlab 1-cdf('Poisson',20,10). That gives 0.0016.
8. The more we assume, the better the answer we can compute.
9. However, our assumptions had better be correct.
pdf and cdf of the max of 2 random variables:
If Z=max(X,Y) then F_Z(x) = F_X(x) F_Y(x)
E.g. if X and Y and U[0,1], so F_X(x) = x for 0<=x=1, then F_Z(x) = x²
What are the pdf and mean here? What about the max of 3 r.v.? What about the min?
Iclicker. What is the cdf (for 0<=x<=1) of the max of 3 r.v. that are each U[0,1]?
1. x
2. x²
3. x³
4. 1
5. 0
pdf of the sum of 2 r.v. If Z=X+Y then {$$ f_Z(z) = \int_x f_X(x) f_Y(z-x) dx $$} E.g. If X and Y and U[0,1] then f_Z(z) = ?
What is the mean?
Section 4.7, page 184, Transform methods: characteristic function.
1. The characteristic function {$ \Phi_X(\omega) $} of a pdf f(x) is like its Fourier transform.
2. One application is that the moments of f can be computed from the derivatives of {$ \Phi $}.
3. We will compute the characteristic functions of the uniform and exponential distributions.
4. The table of p 164-5 lists a lot of characteristic functions.
For discrete nonnegative r.v., the moment generating function is more useful.
1. It's like the Laplace transform.
2. The pmf and moments can be computed from it.
4.8 Reliability
1. The reliability R(t) is the probability that the item is still functioning at t. R(t) = 1-F(t).
2. What is the reliability of an exponential r.v.?
3. The Mean Time to Failure (MTTF) is obvious.
4. ... for an exponential r.v.?
5. The failure rate is the probability of a widget that is still alive now dying in the next second.
6. If the failure rate is constant, the distribution is exponential.
4.9 Generating r.v - Ignore. It's hard to do right, but has been implemented in builtin routines. Use them.
4.10 Entropy - ignore since it's starred.

Lecture 16, Fri Mar 25

Final exam conflicts? Please email me if you have a conflict under RPI rules, e.g., with a lower-numbered course, or have at least two other exams in lower-numbered courses on the same day. Describe the conflict. Please also tell me what nearby days you have times.
Note on the importance of getting the fundamentals (or foundations) right: In the past 40 years, two major bridges in the Capital district have collapsed because of inadequate foundations. The Green Island Bridge collapsed on 3/15/77, see http://en.wikipedia.org/wiki/Green_Island_Bridge, http://www.cbs6albany.com/video/v/59005381001/wrgb-bridge-wrgb. The Thruway (I-90) bridge over Schoharie Creek collapsed on 4/5/87, killing 10 people.
Why RPI likes the Roeblings: none of their bridges collapsed. E.g., when designing the Brooklyn Bridge, Roebling Sr knew what he didn't know. He realized that something hung on cables might sway in the wind, in a complicated way that he couldn't analyze. So he added a lot of diagonal bracing. The designers of the original Tacoma Narrows Bridge were smart enough that they didn't need this expensive margin of safety.
We'll continue Tuesday's example of computing the pdf of the sum of two uniform r.v. The answer will be a hat function. It looks a little more like a normal distribution than the square uniform distribution did.
The sum of 3 uniform r.v. would look even more normal, and so on.
Another way to look at reliability: think of people.
1. Your reliability R(t) is the probability that you live to age t, given that you were born alive. In the US, that's 98.7% for age 20, 96.4% for 40, 87.8% for 60 (http://upload.wikimedia.org/wikipedia/commons/b/be/Excerpt_from_CDC_2003_Table_1.png)
2. MTTF is your life expectancy at birth. In the US, that's 77.5 years.
3. Your failure rate, r(t), is your probability of dying in the next dt, divided by dt, at different ages. E.g. for a 20-year-old, it's 0.13%/year for a male and 0.046%/year for a female (http://www.ssa.gov/oact/STATS/table4c6.html). For 40-year-olds, it's 0.24% and 0.14%. For 60-year-olds, it's 1.2% and 0.7%. At 80, it's 7% and 5%. At 100, it's 37% and 32%.
P190: If the failure rate is constant, then the distribution is exponential. We'll show this.
If several subsystems are all necessary, e.g., are in serial, then their reliabilities multiply. The result is less reliable.
If only one of them is necessary, e.g. are in parallel, then their complementary reliabilities multiply. The result is more reliable.
An application would be different types of RAIDs. (Redundant Array of Inexpensive Independent Disks). In one version you stripe a file over two hard drives to get increased speed, but decreased reliability. In another version you triplicate the file over three drives to get increased reliability. (You can also do a hybrid setup.)

Chapter 5, Two Random Variables

One experiment might produce two r.v. E.g.,
1. Shoot an arrow; it lands at (x,y).
2. Toss two dice.
3. Measure the height and weight of people.
4. Measure the voltage of a signal at several times.
The definitions for pmf, pdf and cdf are reasonable extensions of one r.v.
The math is messier.
The two r.v. may be dependent and correlated.
The correlation coefficient, ρ, is a dimensionless measure of linear dependence. -1<=ρ<=1.
ρ may be 0 when the variables have a nonlinear dependent relation.
Integrating (or summing) out one variable gives a marginal distribution.
We'll do some simple examples:
1. Toss two 4-sided dice.
2. Toss two 4-sided loaded dice. The marginal pmfs are uniform.
3. Pick a point uniformly in a square.
4. Pick a point uniformly in a triangle. x and y are now dependent.
The big example is a 2 variable normal distribution.
1. The pdf is messier.
2. It looks elliptical unless ρ=0.

Lecture 17, Tues Mar 29

Browsers: Google Chrome crashes while trying to display this page. However, both Firefox 3 and 4 and Internet Explorer 8 display the page fine. Google is a great company, but its SW can be buggy. (I've also had problems with Docs and My Tracks. Last spring, Docs lost 2 days of edits to a spreadsheet; luckily I'd download a csv file from it. My Tracks can get wedged so badly that it has to be uninstalled and reinstalled.)
Because of the exam next week and because many of you have another exam next week, there is no homework due next week.
Other universities' probability websites. Leon Garcia is the mostly widely used Probability textbook, so other universities also have lecture notes online.
1. http://dspace.mit.edu/bitstream/handle/1721.1/35860/6-041Fall-2002/OcwWeb/Electrical-Engineering-and-Computer-Science/6-041Probabilistic-Systems-Analysis-and-Applied--ProbabilityFall2002/LectureNotes/index.htm
2. http://anadolu.sdsu.edu/abut/EE553/Chap1_2006.pdf You have to download each chapter individually since the higher level directory is not publicly readable. This course used an earlier edition so the chapter numbers are different. E.g., multivariate starts in chapter 4.
In honor of exam 2, which is next Tues, Hang Zhang will hold office hour at 7pm-8pm this Thursday and 10am-11am next Tuesday, in addition to his usual 7:10-8pm on Mondays.
Since the normal distribution is so important, we will work out some exercises with it. However, to keep things simple, we will use {$\mu=0,\ \ \sigma=1 $} as often as possible.
Reminder: {$$f_N(x) = \frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}} $$}
Show that {$ \int f(x) dx =1 $}
Show that, if X and Y are normal, then so is Z=X+Y. If X and Y and N(0,1) then Z is N(0, {$\sqrt{2} $} ).
Cdf of mixed continuous - discrete random variables: section 5.3.1 on page 247. The input signal X is 1 or -1. It is perturbed by noise N that is U[-2,2] to give the output Y.. What is P[X=1|Y<=0]?
Independence: Example 5.22 on page 256. Are 2 normal r.v. independent for different values of ρ?
Expected value of sum of two r.v. It sums, regardless of whether they are independent.
5.6.2 Joint moments etc
1. Work out for 2 3-sided dice.
2. Work out for tossing dart onto triangular board.
Example 5.27: correlation measures linear dependence. If the dependence is more complicated, the variables may be dependent but not correlated.

Lecture 18, Fri Apr 1

Exam 2 topics

The exam will cover up thru last Tues, Lecture 17. It will be mostly material since Exam 1 but will include some older material. Here are some topics that may well be on the exam:

A question very similar to a question on Exam 1.
A noisy communications channel.
Parts lifetime and replacement.
Markov and Chebyshev inequalities.
A mixed continuous and discrete random variable.
Reliability.
A normal pdf integration.
Computing the {pdf | cdf} of the {sum | min | max} of two random variables.
Computing the marginal {pdf | cdf} of a 2-variable distribution.
Computing a {1st | 2nd} order moment or correlation coefficiant of a 2-variable distribution.
Any of these discrete random variables: uniform, Poisson, binomial, Bernoulli, geometric.
Any of these continuous random variables: uniform, exponential, normal.
A problem that requires you to determine which distribution is the appropriate one, and then use it.

Here are some topics that will not be on the exam.

Matlab.
Characteristic and generating functions and transforms, since you get them in other courses where they are more important.
Computations that IMO are complicated. In the real world you have access to computers.

You may bring a calculator, but it probably won't help much.

Browser wars, ctd.

My mathjax test page crashes Internet Explorer 9. It also causes Chrome to give an uninformative error message, by apparently crashing the thread running that tab. Firefox and Explorer 8 are fine. I've reverted back from mathjax for jsMath for this course.
Personal opinion: It's not acceptable for a public program to be crashable by user input. (Research and prototype programs are different.) The time and lines of code required to validate the input are well worth it. Also, many security exploits, such as SQL injection attacks, start with illegal input.

Probability

Review Extend section 5.3.1 example 5.14 on page 247.
Example 5.31 on page 264. This is a noisy comm channel, now with Gaussian (normal) noise. The problems are:
1. what input signal to infer from each output, and
2. how accurate is this?
Covariance, correlation coefficient.

Lecture 19, Tues Apr 5, Exam 2

Exam 2, Exam 2 Sol. You are welcome to store and redistribute my exam and solution, provided that you keep the credits and don't charge.

You may bring 2 2-sided crib sheets, such as the one you prepared for exam 1, and a new one.

Lecture 20, Fri Apr 8

Extracurricular hike: On Sat April 16, Jeff Trinkle, some other people, and I are leading a hike to an interesting place TBD, possible Mt Greylock. Please RSVP to fdrc@rpi.edu. Space is limited (and this hike is being announced in various places). This is an chance for profs and students to informally meet.
Section 5.7, page 261. Conditional pdf. There is nothing majorly new here; it's an obvious extension of 1 variable.
1. Discrete: Work out an example with a pair of 3-sided loaded dice.
2. Continuous: a triangular dart board. There is one little trick because for P[X=x]=0 since X is continuous, so how can we compute P[Y=y|X=x] = P[Y=y & X=x]/P[x]? The answer is that we take the limiting probability P[x<X<x+dx] etc as dx shrinks, which nets out to using f(x) etc.
Example 5.31 on page 264. This is a noisy comm channel, now with Gaussian (normal) noise. This is a more realistic version of the earlier example with uniform noise. The application problems are:
1. what input signal to infer from each output,
2. how accurate is this, and
3. what cutoff minimizes this?
In the real world there are several ways you could reduce that error:
1. Increase the transmitted signal,
2. Reduce the noise,
3. Retransmit several times and vote.
4. Handshake: Include a checksum and ask for retransmission if it fails.
5. Instead of just deciding X=+1 or X=-1 depending on Y, have a 3rd decision, i.e., uncertain if |Y|<0.5, and ask for retransmission in that case.
Section 5.8 page 271: Functions of two random variables.
1. We already saw how to compute the pdf of the sum and max of 2 r.v.
2. What's the point of transforming variables in engineering? E.g. in video, (R,G,B) might be transformed to (Y,I,Q) with a 3x3 matrix multiply. Y is brightness (mostly the green component). I and Q are approximately the red and blue. Since we see brightness more accurately than color hue, we want to transmit Y with greater precision. So, we want to do probabilities on all this.

Lecture 21, Fri Apr 15

Functions of 2 random variables
1. This is an important topic.
2. Example 5.44, page 275. Tranform two independent Gaussian r.v from (X,Y) to (R, {$\theta$} ).
3. Linear transformation of two Gaussian r.v.
4. Sum and difference of 2 Gaussian r.v. are independent.
Section 5.9, page 278: pairs of jointly Gaussian r.v.
1. I will simplify formula 5.61a by assuming the {$\mu=0, \sigma=1$}.
  {$$ f_{XY}(x,y)= \frac{e^{ \frac{-\left( x^2-2\rho x y + y^2\right)}{2(1-\rho^2)} } }{2\pi \sqrt{1-\rho^2}} $$} .
2. The r.v. are probably dependent. {$\rho$} says how much.
3. The formula degenerates if {$ |\rho|=1 $} since the numerator and denominator are both zero. However the pdf is still valid. You could make the formula valid with l'Hopital's rule.
4. The lines of equal probability density are ellipses.
5. The marginal pdf is a 1 variable Gaussian.
Example 5.47, page 282: Estimation of signal in noise
1. This is our perennial example of signal and noise. However, here the signal is not just {$ \pm1 $} but is normal. Our job is to find the most likely input signal for a given output.
Next time: We've seen 1 r.v., we've seen 2 r.v. Now we'll see several r.v.

Lecture 22, Tues Apr 19

Hang's will change his Mon office hour to Wed 2-3 this week.
Important concept in the noisy channel example (with X and N both being Gaussian): On Friday we saw that the most likely value of X given Y is not Y but is somewhat smaller, depending on the relative sizes of {$\sigma_X$} and {$\sigma_N$}. This is true in spite of {$\mu_N=0$}. It would be really useful for you to understand this intuitively. Here's one way:
If you don't know Y, then the most likely value of X is 0. Knowing Y gives you more information, which you combine with your initial info (that X is {$N(0,\sigma_X)$} to get a new estimate for the most likely X. The smaller the noise, the more valuable is Y. If the noise is very small, then the mostly likely X is close to Y. If the noise is very large (on average) then the most likely X is still close to 0.
Example 5.47, page 282: Estimation of signal in noise - in more detail. I'll assume {$\sigma_X=1$}.

Chapter 6: Vector random variables.

Skip the starred sections.
Examples:
1. arrivals in a multiport switch,
2. audio signal at different times.
pmf, cdf, marginal pmf and cdf are obvious.
conditional pmf has a nice chaining rule.
For continuous random variables, the pdf, cdf, conditional pdf etc are all obvious.
Independence is obvious.
Work out example 6.5, page 306. The input ports are a distraction. This problem reduces to a multinomial probability where N is itself a random variable.

Lecture 23, Fri Apr 22

Tutorial on probability density

Since the meaning of probability density when you transform variables is still causing problems for some people, think of changing units from English to metric. First, with one variable, X.

Let X be in feet and be U[0,1].
{$$ f_X(x) = \begin{cases} 1& \text{if } 0\le x\le1\\ 0&\text{otherwise} \end{cases} $$}
{$ P[.5\le x\le .51] = 0.01 $}.
Now change to centimeters. The transformation is {$Y=30X$}.
{$$ f_Y(y) = \begin{cases} 1/30 & \text{if } 0\le y\le30\\ 0&\text{otherwise} \end{cases} $$}
Why is 1/30 reasonable?
First, the pdf has to integrate to 1: {$$ \int_{-\infty}^{\infty} f_Y(y) =1 $$}
Second, {$$ \begin{align} & P[.5\le x\le .51] \\ &= \int_{.5}^{.51} f_X(x) dx \\& =0.01 \\& = P[15\le y\le 15.3] \\& = \int_{15}^{15.3} f_Y(y) dy \end{align} $$}

Now, let's do 2 variables, which is what I did in class on Tues.

We're throwing darts uniformly at a one foot square dartboard.
We observe 2 random variables, X, Y, where the dart hits (in Cartesian coordinates).
{$$ f_{X,Y}(x,y) = \begin{cases} 1& \text{if}\,\, 0\le x\le1 \cap 0\le y\le1\\ 0&\text{otherwise} \end{cases} $$}
{$$ \begin{align} &P[.5\le x\le .6 \cap .8\le y\le.9] \\& = \int_{.5}^{.6}\int_{.8}^{.9} f_{XY} f(x,y) dx \, dy = 0.01 \end{align}$$}.
Transform to centimeters: {$$ \begin{bmatrix}V\\W\end{bmatrix} = \begin{pmatrix}30&0\\0&30\end{pmatrix} \begin{bmatrix}X\\Y\end{bmatrix} $$}
{$$ \begin{multline}f_{V,W}(v,w) \\ = \begin{cases} 1/900& \text{if } 0\le v\le30 \cap 0\le w\le30\\ 0&\text{otherwise} \end{cases} \end{multline}$$}
{$$ \begin{align} &P[15\le v\le 18 \cap 24\le w\le27] \\ & = \int_{15}^{18}\int_{24}^{27} f_{VW} f(v,w) dv\, dw \\&= \frac{ (18-15)(27-24) }{900} = 0.01\end{align} $$}.

Exam stats

The exam 1 mean was 49/68. The grades were:
20, 22, 24, 32, 34, 35, 36, 36, 38, 39, 39, 39, 40, 40, 41, 41, 41, 41, 41, 41, 42, 42, 42, 42, 43, 43, 43, 44, 45, 45, 45, 46, 46, 47, 47, 47, 48, 48, 48, 49, 49, 49, 49, 49, 50, 50, 50, 50, 52, 52, 53, 53, 54, 54, 54, 54, 54, 54, 55, 56, 57, 58, 58, 58, 59, 59, 59, 59, 59, 60, 60, 61, 61, 61, 62, 64, 65, 65, 68, 68, 68, 68, 68
The exam 2 mean was 20/30. The grades were:
7, 9, 11, 11, 11, 11, 12, 12, 12, 13, 13, 14, 14, 15, 15, 15, 15, 15, 15, 16, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 23, 23, 23, 23, 24, 24, 24, 24, 25, 25, 25, 25, 25, 26, 26, 26, 27, 27, 27, 27, 27, 27, 28, 28, 29, 29, 30, 30

Section 6.5, page 332: Estimation of random variables.

Assume that we want to know X but can only see Y, which depends on X.
This is a generalization of our long-running noisy communication channel example. We'll do things a little more precisely now.
Another application would be to estimate tomorrow's price of GOOG (X) given the prices to date (Y).
Sometimes, but not always, we have a prior probability for X.
For the communication channel we do, for GOOG, we don't.
If we do, it's a maximum a posteriori estimator.
If we don't, it's a maximum likelihood estimator. We effectively assume that that prior probability of X is uniform, even though that may not completely make sense.
Some of this is from Prof Vastola.
You toss a fair coin 3 times. X is the number of heads, from 0 to 3. Y is the position of the 1st head. from 0 to 3.

(X,Y) p(X,Y)

(0,0) 1/8

(1,1) 1/8

(1,2) 1/8

(1,3) 1/8

(2,1) 2/8

(2,2) 1/8

(3,1) 1/8

E.g., 1 head can occur 3 ways (our of 8): HTT, THT, TTH. The 1st (and only) head occurs in position 1, one of those ways. p=1/8.
Conditional probabilities:

p(x|y) y=0 y=1 y=2 y=3

x=0 1 0 0 0

x=1 0 1/4 1/2 1

x=2 0 1/2 1/2 0

x=3 0 1/4 0 0

- - - - -

g_MAP(y) 0 2 1 or 2 1

P_error(y)] 0 1/2 1/2 0

p(y) 1/8 1/2 1/4 1/8

The total probability of error is 3/8.
We observe Y and want to guess X from Y. E.g., If we observe {$$ \small y= \begin{pmatrix}0\\1\\2\\3\end{pmatrix} \text{then } x= \begin{pmatrix}0\\ 2 \text{ most likely} \\ 1, 2 \text{ equally likely} \\ 1 \end{pmatrix} $$}
There are different formulae. The above one was the MAP, maximum a posteriori probability.
{$$ g_{\text{MAP}} (y) = \max_x p_x(x|y) \text{ or } f_x(x|y) $$}
What if we don't know p(x|y)? If we know p(y|x), we can use Bayes. We might measure p(y|x) experimentally, e.g., by sending many messages over the channel.
Bayes requires p(x). What if we don't know even that? E.g. we don't know the probability of the different possible transmitted messages.
Then use maximum likelihood estimator, ML.
{$$ g_{\text{ML}} (y) = \max_x p_y(y|x) \text{ or } f_y(y|x) $$}
There are other estimators for different applications. E.g., regression using least squares might attempt to predict a graduate's QPA from his/her entering SAT scores. At Saratoga in August we might attempt to predict a horse's chance of winning a race from its speed in previous races.

(X,Y)	p(X,Y)
(0,0)	1/8
(1,1)	1/8
(1,2)	1/8
(1,3)	1/8
(2,1)	2/8
(2,2)	1/8
(3,1)	1/8

p(x\|y)	y=0	y=1	y=2	y=3
x=0	1	0	0	0
x=1	0	1/4	1/2	1
x=2	0	1/2	1/2	0
x=3	0	1/4	0	0
-	-	-	-	-
g_MAP(y)	0	2	1 or 2	1
P_error(y)]	0	1/2	1/2	0
p(y)	1/8	1/2	1/4	1/8

Vector random variables, ctd.

Work out examples 6.7 - 6.11.
Section 6.3, page 316, extends the covariance to a matrix. Even with N variables, note that we're comparing only pairs of variables. If there were a complicated 3 variable dependency, which could happen (and did in a much earlier example), all the pairwise covariances would be 0.
Note the sequence.
1. First, the correlation matrix has the expectations of the products.
2. Then the covariance matrix corrects for the means not being 0.
3. Finally the correlation coefficents (not shown here) correct for the variances not being 1.

Lecture notes

Notes written on my tablet during class: 422.pdf.

Lecture 24, Tues Apr 26

Chapter 7, p 359, Sums of Random Variables

The long term goal of this section is to summarize information from a large group of random variables. E.g., the mean is one way. We will start with that, and go farther.

The next step is to infer the true mean of a large set of variables from a small sample.

Lecture notes

Notes written on my tablet during class: 426.pdf.

Lecture 25, Fri Apr 29

Starting salaries for BS grads

These are class of 2009, but still might be interesting.

	US	RPI
CSYS	$60,280	$66,659
EE	$57,603	$60,143

All those ECSE grads passed a 4 credit required probability. You have it so easy now.

Sums of random variables ctd

Let Z=X+Y.
{$f_Z$} is convolution of {$f_X$} and {$f_Y$}: {$$ f_Z(z) = (f_X * f_Y)(z) $$} {$$ f_Z(z) = \int f_X(x) f_Y(z-x) dx $$}
Characteristic functions are useful. {$$ \Phi_X(\omega) = E[e^{j\omega X} ] $$}
{$ \Phi_Z = \Phi_X \Phi_Y $}.
This extends to the sum of n random variables: if {$ Z=\sum_i X_i $} then {$ \Phi_Z (\omega) = \Pi_i \Phi_{X_i} (\omega) $}
E.g. Exponential with {$\lambda=1$}: {$\Phi_1(\omega) = 1/(1-j\omega) $} (page 164).
Sum of m exponentials has {$\Phi(\omega)= 1/(1-j\omega)^m $}. That's called an m-Erlang.
Example 2: sum of n iid Bernoullis. Probability generating function is more useful for discrete random variables.
Example 3: sum of n iid Gaussians. {$$ \Phi_{X_1} = e^{j\mu\omega - \frac{1}{2} \sigma^2 \omega^2} $$} {$$ \Phi_{Z} = e^{jn\mu\omega - \frac{1}{2}n \sigma^2 \omega^2} $$} I.e., mean and variance sum.
As the number increases, no matter what distribution the initial random variance is (provided that its moments are finite), for the sum {$\Phi$} starts looking like a Gaussian.
The mean {$M_n$} of n random variables is itself a random variable.
As {$ n\rightarrow\infty$} {$M_n \rightarrow \mu $}.
That's a law of large numbers (LLN).
{$ E[ M_n ] = \mu $}. It's an unbiased estimator.
{$ VAR[ M_n ] = n \sigma ^2 $}
Weak law of large numbers {$$ \forall \epsilon >0 \lim_{n\rightarrow\infty} P[|M_n-\mu| < \epsilon] = 1 $$}
How fast does it happen? We can use Chebyshev, though that is very conservative.
Strong law of large numbers {$$ P [ \lim _ {n\rightarrow\infty} M_n = \mu ] =1 $$}
As {$ n\rightarrow\infty$}, {$ F_{M_n} $} becomes Gaussian. That's the Central Limit Theorem (CLT).

Viewgraph notes

Notes written on the viewgraph during class: 429.pdf.

Lecture 26, Tues May 3

Conflict final exam

If you told me about needed a conflict final exam and did not get email this morning, tell me again.

Central limit theorem etc

Review: Almost no matter what distribution the random variable X is, {$ F_{M_n} $} quickly becomes Gaussian as n increases. n=5 already gives a good approximation.
nice applets:
1. http://onlinestatbook.com/stat_sim/normal_approx/index.html This tests how good is the normal approximation to the binomial distribution.
2. http://onlinestatbook.com/stat_sim/sampling_dist/index.html This lets you define a distribution, and take repeated samples of a given size. It shows how the means of the samples are distributed. For sample with more than a few observations, they look fairly normal.
3. http://www.umd.umich.edu/casl/socsci/econ/StudyAids/JavaStat/CentralLimitTheorem.html This might also be interesting.
Sample problems.
1. Problem 7.1 on page 402.
2. Problem 7.22.
3. Problem 7.25.

Chapter 8, Statistics

We have a population. (E.g., voters in next election, who will vote Democrat or Republican).
We don't know the population mean. (E.g., fraction of voters who will vote Democrat).
We take several samples (observations). From them we want to estimate the population mean and standard deviation. (Ask 1000 potential voters; 520 say they will vote Democrat. Sample mean is .52)
We want error bounds on our estimates. (.52 plus or minus .04, 95 times out of 100)
Another application: testing whether 2 populations have the same mean. (Is this batch of Guiness as good as the last one?)
Observations cost money, so we want to do as few as possible.
This gets beyond this course, but the biggest problems may be non-math ones. E.g., how do you pick a random likely voter? In the past phone books were used. In a famous 1936 Presidential poll, that biased against poor people, who voted for Roosevelt.

Viewgraph notes

Notes written on the viewgraph during class: 503.pdf.

Lecture 27, Fri May 6

Extra office hours

For the rest of the course, until the exam, there will be extra TA office hours to give everyone all the help they want. The office hours will be in the Flop flop lounge. Tentatively, they are:

When	Who
Mon 1-2 (May 16)	Sina
Mon 4:30-5:30 (May 9)	Sina
Mon 7:10-8	Hang
Tues 4-5	Hang
Thurs 5-6	Sina
Fri 5:15-6:15, today and next week	Harish

DOSO letters

Anyone who has a letter from the DOSO for me, please remind me what it says. You don't need to give me the letter again, if you gave it to me before.

Statistics continued

In probability, we know the parameters (e.g., mean and standard deviation) of a distribution and use them to compute the probability of some event.
E.g., if we toss a fair coin 4 times what's the probability of exactly 4 heads? Answer: 1/16.
In statistics we do not know all the parameters, though we usually know that type the distribution is, e.g., normal. (We often know the standard deviation.)
1. We make observations about some members of the distribution, i.e., draw some samples.
2. From them we estimate the unknown parameters.
3. We often also compute a confidence interval on that estimate.
4. E.g., we toss an unknown coin 100 times and see 60 heads. A good estimate for the probability of that coin coming up heads is 0.6.
Some estimators are better than others, though that gets beyond this course.
1. Suppose I want to estimate the average height of an RPI student by measuring the heights of N random students.
2. The mean of the highest and lowest heights of my N students would converge to the population mean as N increased.
3. However the median of my sample would converge faster. Technically, the variance of the sample median is smaller than the variance of the sample hi-lo mean.
4. The mean of my whole sample would converge the fastest. Technically, the variance of the sample mean is smaller than the variance of any other estimator of the population mean. That's why we use it.
5. However perhaps the population's distribution is not normal. Then one of the other estimators might be better. It would be more robust.
(Enrichment) How to tell if the population is normal? We can do various plots of the observations and look. We can compute the probability that the observations would be this uneven if the population were normal.
An estimator may be biased. We have an distribution that is U[0,b] for unknown b. We take a sample. The max of the sample has a mean n/(n+1)b though it converges to b as n increases.
Example 8.2, page 413: One-tailed probability. This is the probability that the mean of our sample is at least so far above the population mean. {$$ \alpha = P[\overline{X_n}-\mu > c] = Q\left( \frac{c}{\sigma_x / \sqrt{n} } \right) $$} Q is defined on page 169: {$$ Q(x) = \int_x^ { \infty} \frac{1}{\sqrt{2\pi} } e^{-\frac{x^2}{2} } dx $$}
Application: You sample n=100 students' verbal SAT scores, and see {$ \overline{X} = 550 $}. You know that {$\sigma=100 $}. If {$\mu = 525 $}, what is the probability that {$ \overline{X_n} > 550 $} ?
Answer: Q(2.5) = 0.006
This means that if we take 1000 random sample of students, each with 100 students, and measure each sample's mean, then, on average, 6 of those 1000 samples will have a mean over 550.
This is often worded as the probability of the population's mean being under 525 is 0.006, which is different. The problem with saying that is that presumes some probability distribution for the population mean.
The formula also works for the other tail, computing the probability that our sample mean is at least so far below the population mean.
The 2-tail probability is the probability that our sample mean is at least this far away from the sample mean in either direction. It is twice the 1-tail probability.
All this also works when you know the probability and want to know c, the cutoff.

Viewgraph notes

Notes written on the viewgraph during class: 506.pdf.

Lecture 28, Tues May 10

Grading

Grade reports:
1. They were mailed out last night. Please report any errors. We are not accepting error reports for grades listed in the previous grade report unless you are re-reporting an error that we haven't fixed.
2. The formula for the iclicker grade was: one point for each correct answer and one point for each day that we used the iclickers that you answered a question, correct or not. Some questions, such as which algebra SW did you prefer, were not graded.
Final exam notes:
1. You may bring three double-sided cheat sheets.
2. There will be a group TA office hour a few days after the exam for you to read your graded exam and ask the TAs for explanations.
3. The exam will mostly be on the later part of the course, but you'll need to know earlier material to answer these questions.
4. One question from exam 2 will be recycled, with a few changes.
5. Some homework question may also be repeated.
6. Subjects that occupied a lot of class time are more likely to be on the final exam. E.g., the max of two random variables would be a good candidate.
7. You will be allowed to omit a question or two.
Grading formula: Tentatively, an A will be >=95, A-: >= 90, and so on every 5 points. If this appears to give a lower QPA than comparable courses, I'll raise it.
Possible exam topics:
1. Noisy communication channel: maximum a posteriori and maximum likelihood.
2. Two random variables, either discrete or continuous: covariance, correlation coefficient
3. Two Gaussian random variables.
4. Conditional and marginal probability.
5. Vector random variables.
6. Functions of random variables, e.g., transforming from feet to meters. E.g., sum, max, min.
7. Estimation of random variables; maximum a posteriori vs maximum likelihood
8. Law of large numbers, Central limit theorem.
9. Statistics: estimating the population mean from a sample mean. Putting a confidence interval on that estimate.

Hypothesis testing

Say we want to test whether the average height of an RPI student (called the population) is 2m.
We assume that the distribution is Gaussian (normal) and that the standard deviation of heights is, say, 0.2m.
However we don't know the mean.
We do an experiment and measure the heights of n=100 random students. Their mean height is, say, 1.9m.
The question on the table is, is the population mean 2m?
This is different from the earlier question that we analyzed, which was this: What is the most likely population mean? (Answer: 1.9m.)
Now we have a hypothesis (that the population mean is 2m) that we're testing.
The standard way that this is handled is as follows.
Define a null hypothesis, called H0, that the population mean is 2m.
Define an alternate hypothesis, called HA, that the population mean is not 2m.
Note that we observed our sample mean to be {$ 0.5 \sigma$} below the population mean, if H0 is true.
Each time we rerun the experiment (measure 100 students) we'll observe a different number.
We compute the probability that, if H0 is true, our sample mean would be this far from 2m.
Depending on what our underlying model of students is, we might use a 1-tail or a 2-tail probability.
Perhaps we think that the population mean might be less than 2m but it's not going to be more. Then a 1-tail distribution makes sense.
That is, our assumptions affect the results.
The probability is Q(5), which is very small.
Therefore we reject H0 and accept HA.
We make a type-1 error if we reject H0 and it was really true. See http://en.wikipedia.org/wiki/Type_I_and_type_II_errors
We make a type-2 error if we accept H0 and it was really false.
These two errors trade off: by reducing the probability of one we increase the probability of the other, for a given sample size.
. E.g. in a criminal trial we prefer that a guilty person go free to having an innocent person convicted.
Rejecting H0 says nothing about what the population mean really is, just that it's not likely 2m.

Viewgraph notes

Notes written on the viewgraph during class: 510.pdf.

Notes after the last lecture

The score emailed to people was out of 75. The final exam is out of 25, which will bring the total to 100. The sorted list of scores is:
14.7%, 24.9%, 29.8%, 37.8%, 38.2%, 38.7%, 39.9%, 39.9%, 39.9%, 40.3%, 41.3%, 41.6%, 42.5%, 44.0%, 44.3%, 45.2%, 45.8%, 46.0%, 46.3%, 46.8%, 47.6%, 47.6%, 47.9%, 48.5%, 48.9%, 49.9%, 50.3%, 50.5%, 50.8%, 51.9%, 51.9%, 52.2%, 52.9%, 53.3%, 53.4%, 53.4%, 54.1%, 54.3%, 54.5%, 54.8%, 54.8%, 55.5%, 55.5%, 55.5%, 55.6%, 56.4%, 56.6%, 56.7%, 56.8%, 57.1%, 57.2%, 57.3%, 57.6%, 58.5%, 58.6%, 58.8%, 61.1%, 61.2%, 61.3%, 61.6%, 61.6%, 61.8%, 61.9%, 62.1%, 62.2%, 63.2%, 63.4%, 63.5%, 63.9%, 64.0%, 64.4%, 64.7%, 64.9%, 65.7%, 65.9%, 66.5%, 66.9%, 67.3%, 69.6%, 71.6%, 71.7%, 72.4%, 75.3%
There will be no generating functions on the exam.
(Enrichment) Random sampling is hard. The US government got it wrong here: http://politics.slashdot.org/story/11/05/13/2249256/Algorithm-Glitch-Voids-Outcome-of-US-Green-Card-Lottery
For exam 2, there was no relation between the grade and the order of finishing.
On Thursday from 2 to 3pm, all 3 TAs will be available in the flipflop lounge to let you see your graded exams.

Conflict final exam, Mon May 16

2-5pm, JEC 5030.

The exam; the answers.

Sina graded questions 1-3, Hang graded 4-5, and Harish graded 6-8.

Final exam, Tues May 17

3-6pm in Darrin (aka DCC aka CC) 324.

The exam; the answers.

There was one exam w/o a name.

Grading notes

Exam 3 scores

0%, 0%, 0%, 0%, 0%, 23%, 30%, 35%, 35%, 37%, 37%, 40%, 45%, 45%, 45%, 47%, 47%, 48%, 48%, 48%, 50%, 50%, 50%, 50%, 52%, 52%, 52%, 52%, 52%, 53%, 53%, 53%, 55%, 55%, 55%, 57%, 57%, 57%, 57%, 58%, 58%, 60%, 60%, 62%, 62%, 63%, 63%, 63%, 63%, 65%, 65%, 67%, 67%, 67%, 67%, 70%, 72%, 72%, 72%, 73%, 73%, 75%, 75%, 75%, 77%, 77%, 77%, 77%, 78%, 78%, 78%, 78%, 78%, 80%, 83%, 83%, 83%, 83%, 85%, 85%, 87%, 88%, 90%, 92%, 95%, 97%

Total for course

Cutoff	Letter	Count
0.0%	FF	4
44.5%	DD	1
49.5%	DP	8
54.5%	CM	3
59.5%	CC	13
64.5%	CP	8
69.5%	BM	14
74.5%	BB	13
79.5%	BP	8
84.5%	AM	6
89.5%	AA	6
	I	1

These cutoffs are 5.5% more generous than I originally posted.

Terminology: BB->B, BP->B+, BM->B-, etc. Otherwise Excel's VLOOKUP can go wrong.

Average course QPA = 2.5.

The corrected Exam3 average percentage, ignoring the 0s, is 64%. I copied the wrong number when assembling the mail.

After the course

Feel free to contact me to ask questions or to talk.