This is the homepage of course ECSE2500 Engineering Probability, Rensselaer Polytechnic Institute, Spring 2011.
Loading...

Search all EngProbSpring2011 pages: 

Syllabus
Click here for Syllabus page, or click on the section heading below.
Course calendar
This Google calendar will list course due dates. You may display it together with other Google calendars or import into into various other calendar programs, like Thunderbird.
Homeworks
There will be about a dozen homeworks. Email your solutions to wrfranklin+homework ATgmail.com , replacing AT with @.
#  Due  Questions  Answers 

1  Feb 1  Homework 1  hw1sol.pdf 
2  Feb 8  Homework 2  hw2sol.pdf 
3  Feb 15  Homework 3  hw3sol.pdf 
4  Feb 22  Homework 4  hw4sol.pdf 
5  Mar 1  Homework 5  hw5sol.pdf 
6  Mar 29  Homework 6  hw6sol.pdf 
7  Apr 22  Homework 7  hw7sol.pdf 
8  Apr 29  Homework 8  hw8sol.pdf 
Lectures, Jan  Feb
Lectures March
Lecture 11, Tues Mar 1: Exam 1
 Closed book but a calculator and one 2sided letterpapersize note sheet is allowed.
 Material is from chapters 13.
 Questions will be based on book, class, and homework, examples and exercises.
 The hard part for you may be deciding what formula to use.
 Any calculations will (IMHO) be easy.
 Speed should not be a problem; most people should finish in 1/2 the time.
 Last year's exam is here.
 This exam is here.
Lecture 12, Fri Mar 4
 Anyone finding an error in this wiki (more than a simple typo) or in my lecture will get a little extra credit. I welcome corrections in class. If you find an error, come down to the front after class to sign a log. If you find an error in the web site, email me.
 I switched to a new math engine for this wiki (from jsMath to mathjax, its replacement). It should be compatible. However, since you have many different browsers, please report any problems.
 Chapter 4.
 I will try to ignore most of the theory at the start of the chapter.
 Now we will see continuous random variables. The probability of the r.v being any exact value is infinitesimal, so we talk about the probability that it's in a range.
 Sometimes there are mixed discrete and continuous r.v. Let X be the time X to get a taxi at the airport. 80% of the time a taxi is already there, so p(X=0)=.8. Otherwise we wait a uniform time from 0 to 20 minutes, so p(a<x<b)=.01(ba), for 0<a<b<20.
 Remember that for discrete r.v. we have a probability mass function (pmf).
 For continuous r.v. we now have a probability density function (pdf), f_{X}(x).
 p(a<x<a+da) = f(a)da
 For any r.v., we have a cumulative distribution function (cdf) F_{X}(x)
 The subscript is interesting only when we are using more than one cdf and need to tell them apart.
 Definition: F(x) = P(X≤x).
 The ≤ is relevant only for discrete r.v.
 As usual Wikipedia isn't bad, and is deeper than we need here. http://en.wikipedia.org/wiki/Cumulative_distribution_function
 We compute means and other moments by the obvious integrals.
 iclicker. For the taxi example, what is F(0)?
 0
 .2
 .8
 .81
 1
 iclicker. For the taxi example, what is F(1)?
 0
 .8
 .81
 .9
 1
 Simple continuous r.v. examples: uniform, exponential.
 The exponential distribution complements the Poisson distribution. The Poisson describes the number of arrivals per unit time. The exponential describes the distribution of the times between consecutive arrivals.
 The most common continuous distribution is the normal distribution.
 Conditional probabilities work the same with continuous distributions as with discrete distributions.
Lecture 13, Tues Mar 8
 Extracurricular hike: On Sat April 16, Jeff Trinkle, some other people, and I are leading a hike to an interesting place TBD, possible Mt Greylock. Please RSVP to fdrc@rpi.edu. Space is limited (and this hike is being announced in various places). This is an chance for profs and students to informally meet. If this is popular, we might lead more.
 Sina will be available this Fri 45pm in the Flip Flop lounge to answer questions about grading. We will also set up another time after spring break.
 In honor of spring break, no homework is due.
 Today: more chapter 4.
Lecture 14, Fri Mar 11
 Solutions to exam 1 are online: Exam1
 Midterm status report emailed. If you didn't get one, give me your correct address (and check Respite; it once blocked a personal message to me from the Provost).
 On that email, I said HW4 for HW5. No bonus points for pointing that one out.
 If you need more help:
 See Hang during his office hours.
 Email any TA to meet at other times.
 Talk to me after class. I stay until 4pm most days (but not today).
 Email me with questions, or to set up a phone call or a meeting at a different time.
 Using Matlab: Matlab, Mathematica, and Maple all will help you do problems too big to do by hand. I'll demo Matlab since IMO more of the class knows it.
 Iclicker. Which of the following do you prefer to use?
 Matlab
 Maple
 Mathematica
 Paper. It was good enough for Bernoulli and Gauss; it's good enough for me.
 Something else (please email about it me after the class).
 Matlab
 Major functions:
cdf(dist,X,A,...) pdf(dist,X,A,...)
 Common cases of dist (there are many others):
'Binomial' 'Exponential' 'Poisson' 'Normal' 'Geometric' 'Uniform' 'Discrete Uniform'
 Examples
pdf('Normal',2:2,0,1) cdf('Normal',2:2,0,1) p=0.2 n=10 k=0:10 bp=pdf('Binomial',k,n,p) bar(k,bp) grid on bc=cdf('Binomial',k,n,p) bar(k,bc) grid on x=3:.2:3 np=pdf('Normal',x,0,1) plot(x,np)
 Interactive GUI to explore distributions:
disttool
 Random numbers:
rand(3) rand(1,5) randn(1,10) randn(1,10)*100+500 randi(100,4)
 Interactive GUI to explore random numbers:
randtool
 Plotting two things at once:
x=3:.2:3 n1=pdf('Normal',x,0,1) n2=pdf('Normal',x,0,2) plot(x,n1,n2) plot(x,n1,x,n2) plot(x,n1,'r',x,n2,'.g')
 Major functions:
 Use Matlab to understand Exam 1, question 4.
 Use Matlab to compute a geometric pdf w/o using the builtin function.
 Matlab workspace, containing the variables: 0311.mat.
Lecture 15, Tues Mar 22
 Webpage format updates: My goals are to make this site readable on everything from a wide screen laptop to a mobile device, while packing as much info as possible on the screen. It's not perfect, but is closer to that now, provided that you're using Mozilla or a related browser that implements columnwidth. It works on my Droid in landscape mode. If the text size is not too large, no horizontal scrolling is needed. The number of columns changes depending on your screen and font size. Sorry, Internet Explorer users. Also, now in Mozilla you don't need to click a special button to format the page for printing. (The problem was that Mozilla doesn't implement the CSS overflow attribute correctly.)
 Homework 6 is out.
 Review: This question is about tires on your car going flat when you drive on remote gravel roads like the TransLabrador Highway.
 Use discrete probability distributions for simplicity. We'll redo this with continuous distributions later.
 The geometric and negative binomial distributions might be relevant. However the textbook, page 116, defines negative binomial slightly differently from Matlab, on the nbinpdf help page.
 Your car needs 4 good tires to operate.
 The probability of one specific tire going flat in one day is {$p=0.01$} .
 What is the expected lifetime of a tire?
 If you have no spare tires, what is the probability that your car will be disabled in one day?
 What is the mean time until your car is disabled?
 If you have one spare tire, what is the probability that your car will be disabled in one day?
 and what is the mean time until your car is disabled?
 Review: Markov and Chebyshev inequalities.
 Your web server averages 10 hits/second.
 It will crash if it gets 20 hits.
 By the Markov inequality, that has a probability at most 0.5.
 That is way way too conservative, but it makes no assumptions about the distribution of hits.
 For the Chebyshev inequality, assume that the variance is 10.
 It gives the probability of crashing at under 0.1. That is tighter.
 Assuming the distribution is Poisson with a=10, use Matlab 1cdf('Poisson',20,10). That gives 0.0016.
 The more we assume, the better the answer we can compute.
 However, our assumptions had better be correct.
 pdf and cdf of the max of 2 random variables: If Z=max(X,Y) then F_{Z}(x) = F_{X}(x) F_{Y}(x) E.g. if X and Y and U[0,1], so F_{X}(x) = x for 0<=x=1, then F_{Z}(x) = x^{2} What are the pdf and mean here? What about the max of 3 r.v.? What about the min?
 Iclicker. What is the cdf (for 0<=x<=1) of the max of 3 r.v. that are each U[0,1]?
 x
 x^{2}
 x^{3}
 1
 0
 pdf of the sum of 2 r.v. If Z=X+Y then {$$ f_Z(z) = \int_x f_X(x) f_Y(zx) dx $$} E.g. If X and Y and U[0,1] then f_{Z}(z) = ? What is the mean?
 Section 4.7, page 184, Transform methods: characteristic function.
 The characteristic function {$ \Phi_X(\omega) $} of a pdf f(x) is like its Fourier transform.
 One application is that the moments of f can be computed from the derivatives of {$ \Phi $}.
 We will compute the characteristic functions of the uniform and exponential distributions.
 The table of p 1645 lists a lot of characteristic functions.
 For discrete nonnegative r.v., the moment generating function is more useful.
 It's like the Laplace transform.
 The pmf and moments can be computed from it.
 4.8 Reliability
 The reliability R(t) is the probability that the item is still functioning at t. R(t) = 1F(t).
 What is the reliability of an exponential r.v.?
 The Mean Time to Failure (MTTF) is obvious.
 ... for an exponential r.v.?
 The failure rate is the probability of a widget that is still alive now dying in the next second.
 If the failure rate is constant, the distribution is exponential.
 4.9 Generating r.v  Ignore. It's hard to do right, but has been implemented in builtin routines. Use them.
 4.10 Entropy  ignore since it's starred.
Lecture 16, Fri Mar 25
 Final exam conflicts? Please email me if you have a conflict under RPI rules, e.g., with a lowernumbered course, or have at least two other exams in lowernumbered courses on the same day. Describe the conflict. Please also tell me what nearby days you have times.
 Note on the importance of getting the fundamentals (or foundations) right: In the past 40 years, two major bridges in the Capital district have collapsed because of inadequate foundations. The Green Island Bridge collapsed on 3/15/77, see http://en.wikipedia.org/wiki/Green_Island_Bridge, http://www.cbs6albany.com/video/v/59005381001/wrgbbridgewrgb. The Thruway (I90) bridge over Schoharie Creek collapsed on 4/5/87, killing 10 people. Why RPI likes the Roeblings: none of their bridges collapsed. E.g., when designing the Brooklyn Bridge, Roebling Sr knew what he didn't know. He realized that something hung on cables might sway in the wind, in a complicated way that he couldn't analyze. So he added a lot of diagonal bracing. The designers of the original Tacoma Narrows Bridge were smart enough that they didn't need this expensive margin of safety.
 We'll continue Tuesday's example of computing the pdf of the sum of two uniform r.v. The answer will be a hat function. It looks a little more like a normal distribution than the square uniform distribution did. The sum of 3 uniform r.v. would look even more normal, and so on.
 Another way to look at reliability: think of people.
 Your reliability R(t) is the probability that you live to age t, given that you were born alive. In the US, that's 98.7% for age 20, 96.4% for 40, 87.8% for 60 (http://upload.wikimedia.org/wikipedia/commons/b/be/Excerpt_from_CDC_2003_Table_1.png)
 MTTF is your life expectancy at birth. In the US, that's 77.5 years.
 Your failure rate, r(t), is your probability of dying in the next dt, divided by dt, at different ages. E.g. for a 20yearold, it's 0.13%/year for a male and 0.046%/year for a female (http://www.ssa.gov/oact/STATS/table4c6.html). For 40yearolds, it's 0.24% and 0.14%. For 60yearolds, it's 1.2% and 0.7%. At 80, it's 7% and 5%. At 100, it's 37% and 32%.
 P190: If the failure rate is constant, then the distribution is exponential. We'll show this.
 If several subsystems are all necessary, e.g., are in serial, then their reliabilities multiply. The result is less reliable. If only one of them is necessary, e.g. are in parallel, then their complementary reliabilities multiply. The result is more reliable. An application would be different types of RAIDs. (Redundant Array of Inexpensive Independent Disks). In one version you stripe a file over two hard drives to get increased speed, but decreased reliability. In another version you triplicate the file over three drives to get increased reliability. (You can also do a hybrid setup.)
Chapter 5, Two Random Variables
 One experiment might produce two r.v. E.g.,
 Shoot an arrow; it lands at (x,y).
 Toss two dice.
 Measure the height and weight of people.
 Measure the voltage of a signal at several times.
 The definitions for pmf, pdf and cdf are reasonable extensions of one r.v.
 The math is messier.
 The two r.v. may be dependent and correlated.
 The correlation coefficient, ρ, is a dimensionless measure of linear dependence. 1<=ρ<=1.
 ρ may be 0 when the variables have a nonlinear dependent relation.
 Integrating (or summing) out one variable gives a marginal distribution.
 We'll do some simple examples:
 Toss two 4sided dice.
 Toss two 4sided loaded dice. The marginal pmfs are uniform.
 Pick a point uniformly in a square.
 Pick a point uniformly in a triangle. x and y are now dependent.
 The big example is a 2 variable normal distribution.
 The pdf is messier.
 It looks elliptical unless ρ=0.
Lecture 17, Tues Mar 29
 Browsers: Google Chrome crashes while trying to display this page. However, both Firefox 3 and 4 and Internet Explorer 8 display the page fine. Google is a great company, but its SW can be buggy. (I've also had problems with Docs and My Tracks. Last spring, Docs lost 2 days of edits to a spreadsheet; luckily I'd download a csv file from it. My Tracks can get wedged so badly that it has to be uninstalled and reinstalled.)
 Because of the exam next week and because many of you have another exam next week, there is no homework due next week.
 Other universities' probability websites. Leon Garcia is the mostly widely used Probability textbook, so other universities also have lecture notes online.
 http://dspace.mit.edu/bitstream/handle/1721.1/35860/6041Fall2002/OcwWeb/ElectricalEngineeringandComputerScience/6041ProbabilisticSystemsAnalysisandAppliedProbabilityFall2002/LectureNotes/index.htm
 http://anadolu.sdsu.edu/abut/EE553/Chap1_2006.pdf You have to download each chapter individually since the higher level directory is not publicly readable. This course used an earlier edition so the chapter numbers are different. E.g., multivariate starts in chapter 4.
 In honor of exam 2, which is next Tues, Hang Zhang will hold office hour at 7pm8pm this Thursday and 10am11am next Tuesday, in addition to his usual 7:108pm on Mondays.
 Since the normal distribution is so important, we will work out some exercises with it. However, to keep things simple, we will use {$\mu=0,\ \ \sigma=1 $} as often as possible.
 Reminder: {$$f_N(x) = \frac{1}{\sqrt{2\pi}}e^{\frac{x^2}{2}} $$}
 Show that {$ \int f(x) dx =1 $}
 Show that, if X and Y are normal, then so is Z=X+Y. If X and Y and N(0,1) then Z is N(0, {$\sqrt{2} $} ).
 Cdf of mixed continuous  discrete random variables: section 5.3.1 on page 247. The input signal X is 1 or 1. It is perturbed by noise N that is U[2,2] to give the output Y.. What is P[X=1Y<=0]?
 Independence: Example 5.22 on page 256. Are 2 normal r.v. independent for different values of ρ?
 Expected value of sum of two r.v. It sums, regardless of whether they are independent.
 5.6.2 Joint moments etc
 Work out for 2 3sided dice.
 Work out for tossing dart onto triangular board.
 Example 5.27: correlation measures linear dependence. If the dependence is more complicated, the variables may be dependent but not correlated.
Lecture 18, Fri Apr 1
Exam 2 topics
The exam will cover up thru last Tues, Lecture 17. It will be mostly material since Exam 1 but will include some older material. Here are some topics that may well be on the exam:
 A question very similar to a question on Exam 1.
 A noisy communications channel.
 Parts lifetime and replacement.
 Markov and Chebyshev inequalities.
 A mixed continuous and discrete random variable.
 Reliability.
 A normal pdf integration.
 Computing the {pdf  cdf} of the {sum  min  max} of two random variables.
 Computing the marginal {pdf  cdf} of a 2variable distribution.
 Computing a {1st  2nd} order moment or correlation coefficiant of a 2variable distribution.
 Any of these discrete random variables: uniform, Poisson, binomial, Bernoulli, geometric.
 Any of these continuous random variables: uniform, exponential, normal.
 A problem that requires you to determine which distribution is the appropriate one, and then use it.
Here are some topics that will not be on the exam.
 Matlab.
 Characteristic and generating functions and transforms, since you get them in other courses where they are more important.
 Computations that IMO are complicated. In the real world you have access to computers.
You may bring a calculator, but it probably won't help much.
Browser wars, ctd.
 My mathjax test page crashes Internet Explorer 9. It also causes Chrome to give an uninformative error message, by apparently crashing the thread running that tab. Firefox and Explorer 8 are fine. I've reverted back from mathjax for jsMath for this course. Personal opinion: It's not acceptable for a public program to be crashable by user input. (Research and prototype programs are different.) The time and lines of code required to validate the input are well worth it. Also, many security exploits, such as SQL injection attacks, start with illegal input.
Probability
 Review Extend section 5.3.1 example 5.14 on page 247.
 Example 5.31 on page 264. This is a noisy comm channel,
now with Gaussian (normal) noise. The problems are:
 what input signal to infer from each output, and
 how accurate is this?
 Covariance, correlation coefficient.
Lecture 19, Tues Apr 5, Exam 2
Exam 2, Exam 2 Sol. You are welcome to store and redistribute my exam and solution, provided that you keep the credits and don't charge.
 You may bring 2 2sided crib sheets, such as the one you prepared for exam 1, and a new one.
Lecture 20, Fri Apr 8
 Extracurricular hike: On Sat April 16, Jeff Trinkle, some other people, and I are leading a hike to an interesting place TBD, possible Mt Greylock. Please RSVP to fdrc@rpi.edu. Space is limited (and this hike is being announced in various places). This is an chance for profs and students to informally meet.
 Section 5.7, page 261. Conditional pdf. There is nothing majorly
new here; it's an obvious extension of 1 variable.
 Discrete: Work out an example with a pair of 3sided loaded dice.
 Continuous: a triangular dart board. There is one little trick because for P[X=x]=0 since X is continuous, so how can we compute P[Y=yX=x] = P[Y=y & X=x]/P[x]? The answer is that we take the limiting probability P[x<X<x+dx] etc as dx shrinks, which nets out to using f(x) etc.
 Example 5.31 on page 264. This is a noisy comm channel,
now with Gaussian (normal) noise. This is a more realistic version of the earlier example with uniform noise. The application problems are:
 what input signal to infer from each output,
 how accurate is this, and
 what cutoff minimizes this?
 Increase the transmitted signal,
 Reduce the noise,
 Retransmit several times and vote.
 Handshake: Include a checksum and ask for retransmission if it fails.
 Instead of just deciding X=+1 or X=1 depending on Y, have a 3rd decision, i.e., uncertain if Y<0.5, and ask for retransmission in that case.
 Section 5.8 page 271: Functions of two random variables.
 We already saw how to compute the pdf of the sum and max of 2 r.v.
 What's the point of transforming variables in engineering? E.g. in video, (R,G,B) might be transformed to (Y,I,Q) with a 3x3 matrix multiply. Y is brightness (mostly the green component). I and Q are approximately the red and blue. Since we see brightness more accurately than color hue, we want to transmit Y with greater precision. So, we want to do probabilities on all this.
Lecture 21, Fri Apr 15
 Functions of 2 random variables
 This is an important topic.
 Example 5.44, page 275. Tranform two independent Gaussian r.v from (X,Y) to (R, {$\theta$} ).
 Linear transformation of two Gaussian r.v.
 Sum and difference of 2 Gaussian r.v. are independent.
 Section 5.9, page 278: pairs of jointly Gaussian r.v.
 I will simplify formula 5.61a by assuming the {$\mu=0, \sigma=1$}.
{$$ f_{XY}(x,y)= \frac{e^{ \frac{\left( x^22\rho x y + y^2\right)}{2(1\rho^2)} } }{2\pi \sqrt{1\rho^2}} $$} .  The r.v. are probably dependent. {$\rho$} says how much.
 The formula degenerates if {$ \rho=1 $} since the numerator and denominator are both zero. However the pdf is still valid. You could make the formula valid with l'Hopital's rule.
 The lines of equal probability density are ellipses.
 The marginal pdf is a 1 variable Gaussian.
 I will simplify formula 5.61a by assuming the {$\mu=0, \sigma=1$}.
 Example 5.47, page 282: Estimation of signal in noise
 This is our perennial example of signal and noise. However, here the signal is not just {$ \pm1 $} but is normal. Our job is to find the most likely input signal for a given output.
 Next time: We've seen 1 r.v., we've seen 2 r.v. Now we'll see several r.v.
Lecture 22, Tues Apr 19
 Hang's will change his Mon office hour to Wed 23 this week.
 Important concept in the noisy channel example (with X and N both being Gaussian): On Friday we saw that the most likely value of X given Y is not Y but is somewhat smaller, depending on the relative sizes of {$\sigma_X$} and {$\sigma_N$}. This is true in spite of {$\mu_N=0$}. It would be really useful for you to understand this intuitively. Here's one way: If you don't know Y, then the most likely value of X is 0. Knowing Y gives you more information, which you combine with your initial info (that X is {$N(0,\sigma_X)$} to get a new estimate for the most likely X. The smaller the noise, the more valuable is Y. If the noise is very small, then the mostly likely X is close to Y. If the noise is very large (on average) then the most likely X is still close to 0.
 Example 5.47, page 282: Estimation of signal in noise  in more detail. I'll assume {$\sigma_X=1$}.
Chapter 6: Vector random variables.
 Skip the starred sections.
 Examples:
 arrivals in a multiport switch,
 audio signal at different times.
 pmf, cdf, marginal pmf and cdf are obvious.
 conditional pmf has a nice chaining rule.
 For continuous random variables, the pdf, cdf, conditional pdf etc are all obvious.
 Independence is obvious.
 Work out example 6.5, page 306. The input ports are a distraction. This problem reduces to a multinomial probability where N is itself a random variable.
Lecture 23, Fri Apr 22
Tutorial on probability density
Since the meaning of probability density when you transform variables is still causing problems for some people, think of changing units from English to metric. First, with one variable, X.
 Let X be in feet and be U[0,1].
{$$ f_X(x) = \begin{cases} 1& \text{if } 0\le x\le1\\ 0&\text{otherwise} \end{cases} $$}  {$ P[.5\le x\le .51] = 0.01 $}.
 Now change to centimeters. The transformation is {$Y=30X$}.
 {$$ f_Y(y) = \begin{cases} 1/30 & \text{if } 0\le y\le30\\ 0&\text{otherwise} \end{cases} $$}
 Why is 1/30 reasonable?
 First, the pdf has to integrate to 1: {$$ \int_{\infty}^{\infty} f_Y(y) =1 $$}
 Second, {$$ \begin{align} & P[.5\le x\le .51] \\ &= \int_{.5}^{.51} f_X(x) dx \\& =0.01 \\& = P[15\le y\le 15.3] \\& = \int_{15}^{15.3} f_Y(y) dy \end{align} $$}
Now, let's do 2 variables, which is what I did in class on Tues.
 We're throwing darts uniformly at a one foot square dartboard.
 We observe 2 random variables, X, Y, where the dart hits (in Cartesian coordinates).
 {$$ f_{X,Y}(x,y) = \begin{cases} 1& \text{if}\,\, 0\le x\le1 \cap 0\le y\le1\\ 0&\text{otherwise} \end{cases} $$}
 {$$ \begin{align} &P[.5\le x\le .6 \cap .8\le y\le.9] \\& = \int_{.5}^{.6}\int_{.8}^{.9} f_{XY} f(x,y) dx \, dy = 0.01 \end{align}$$}.
 Transform to centimeters: {$$ \begin{bmatrix}V\\W\end{bmatrix} = \begin{pmatrix}30&0\\0&30\end{pmatrix} \begin{bmatrix}X\\Y\end{bmatrix} $$}
 {$$ \begin{multline}f_{V,W}(v,w) \\ = \begin{cases} 1/900& \text{if } 0\le v\le30 \cap 0\le w\le30\\ 0&\text{otherwise} \end{cases} \end{multline}$$}
 {$$ \begin{align} &P[15\le v\le 18 \cap 24\le w\le27] \\ & = \int_{15}^{18}\int_{24}^{27} f_{VW} f(v,w) dv\, dw \\&= \frac{ (1815)(2724) }{900} = 0.01\end{align} $$}.
Exam stats
 The exam 1 mean was 49/68. The grades were: 20, 22, 24, 32, 34, 35, 36, 36, 38, 39, 39, 39, 40, 40, 41, 41, 41, 41, 41, 41, 42, 42, 42, 42, 43, 43, 43, 44, 45, 45, 45, 46, 46, 47, 47, 47, 48, 48, 48, 49, 49, 49, 49, 49, 50, 50, 50, 50, 52, 52, 53, 53, 54, 54, 54, 54, 54, 54, 55, 56, 57, 58, 58, 58, 59, 59, 59, 59, 59, 60, 60, 61, 61, 61, 62, 64, 65, 65, 68, 68, 68, 68, 68
 The exam 2 mean was 20/30. The grades were: 7, 9, 11, 11, 11, 11, 12, 12, 12, 13, 13, 14, 14, 15, 15, 15, 15, 15, 15, 16, 17, 17, 17, 17, 17, 17, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22, 22, 22, 23, 23, 23, 23, 24, 24, 24, 24, 25, 25, 25, 25, 25, 26, 26, 26, 27, 27, 27, 27, 27, 27, 28, 28, 29, 29, 30, 30
Section 6.5, page 332: Estimation of random variables.
 Assume that we want to know X but can only see Y, which depends on X.
 This is a generalization of our longrunning noisy communication channel example. We'll do things a little more precisely now.
 Another application would be to estimate tomorrow's price of GOOG (X) given the prices to date (Y).
 Sometimes, but not always, we have a prior probability for X.
 For the communication channel we do, for GOOG, we don't.
 If we do, it's a maximum a posteriori estimator.
 If we don't, it's a maximum likelihood estimator. We effectively assume that that prior probability of X is uniform, even though that may not completely make sense.
 Some of this is from Prof Vastola.
 You toss a fair coin 3 times. X is the number of heads, from 0 to 3. Y
is the position of the 1st head. from 0 to 3.
E.g., 1 head can occur 3 ways (our of 8): HTT, THT, TTH. The 1st (and only) head occurs in position 1, one of those ways. p=1/8.(X,Y) p(X,Y) (0,0) 1/8 (1,1) 1/8 (1,2) 1/8 (1,3) 1/8 (2,1) 2/8 (2,2) 1/8 (3,1) 1/8  Conditional probabilities:
The total probability of error is 3/8.p(xy) y=0 y=1 y=2 y=3 x=0 1 0 0 0 x=1 0 1/4 1/2 1 x=2 0 1/2 1/2 0 x=3 0 1/4 0 0      g_{MAP}(y) 0 2 1 or 2 1 P_{error}(y)] 0 1/2 1/2 0 p(y) 1/8 1/2 1/4 1/8  We observe Y and want to guess X from Y. E.g., If we observe {$$ \small y= \begin{pmatrix}0\\1\\2\\3\end{pmatrix} \text{then } x= \begin{pmatrix}0\\ 2 \text{ most likely} \\ 1, 2 \text{ equally likely} \\ 1 \end{pmatrix} $$}
 There are different formulae. The above one was the MAP, maximum a posteriori probability.
{$$ g_{\text{MAP}} (y) = \max_x p_x(xy) \text{ or } f_x(xy) $$}  What if we don't know p(xy)? If we know p(yx), we can use Bayes. We might measure p(yx) experimentally, e.g., by sending many messages over the channel.
 Bayes requires p(x). What if we don't know even that? E.g. we don't know the probability of the different possible transmitted messages.
 Then use maximum likelihood estimator, ML.
{$$ g_{\text{ML}} (y) = \max_x p_y(yx) \text{ or } f_y(yx) $$}  There are other estimators for different applications. E.g., regression using least squares might attempt to predict a graduate's QPA from his/her entering SAT scores. At Saratoga in August we might attempt to predict a horse's chance of winning a race from its speed in previous races.
Vector random variables, ctd.
 Work out examples 6.7  6.11.
 Section 6.3, page 316, extends the covariance to a matrix. Even with N variables, note that we're comparing only pairs of variables. If there were a complicated 3 variable dependency, which could happen (and did in a much earlier example), all the pairwise covariances would be 0.
 Note the sequence.
 First, the correlation matrix has the expectations of the products.
 Then the covariance matrix corrects for the means not being 0.
 Finally the correlation coefficents (not shown here) correct for the variances not being 1.
Lecture notes
Notes written on my tablet during class: 422.pdf.
Lecture 24, Tues Apr 26
Chapter 7, p 359, Sums of Random Variables
The long term goal of this section is to summarize information from a large group of random variables. E.g., the mean is one way. We will start with that, and go farther.
The next step is to infer the true mean of a large set of variables from a small sample.
Lecture notes
Notes written on my tablet during class: 426.pdf.
Lecture 25, Fri Apr 29
Starting salaries for BS grads
These are class of 2009, but still might be interesting.
US  RPI  

CSYS  $60,280  $66,659 
EE  $57,603  $60,143 
All those ECSE grads passed a 4 credit required probability. You have it so easy now.
Sums of random variables ctd
 Let Z=X+Y.
 {$f_Z$} is convolution of {$f_X$} and {$f_Y$}: {$$ f_Z(z) = (f_X * f_Y)(z) $$} {$$ f_Z(z) = \int f_X(x) f_Y(zx) dx $$}
 Characteristic functions are useful. {$$ \Phi_X(\omega) = E[e^{j\omega X} ] $$}
 {$ \Phi_Z = \Phi_X \Phi_Y $}.
 This extends to the sum of n random variables: if {$ Z=\sum_i X_i $} then {$ \Phi_Z (\omega) = \Pi_i \Phi_{X_i} (\omega) $}
 E.g. Exponential with {$\lambda=1$}: {$\Phi_1(\omega) = 1/(1j\omega) $} (page 164).
 Sum of m exponentials has {$\Phi(\omega)= 1/(1j\omega)^m $}. That's called an mErlang.
 Example 2: sum of n iid Bernoullis. Probability generating function is more useful for discrete random variables.
 Example 3: sum of n iid Gaussians. {$$ \Phi_{X_1} = e^{j\mu\omega  \frac{1}{2} \sigma^2 \omega^2} $$} {$$ \Phi_{Z} = e^{jn\mu\omega  \frac{1}{2}n \sigma^2 \omega^2} $$} I.e., mean and variance sum.
 As the number increases, no matter what distribution the initial random variance is (provided that its moments are finite), for the sum {$\Phi$} starts looking like a Gaussian.
 The mean {$M_n$} of n random variables is itself a random variable.
 As {$ n\rightarrow\infty$} {$M_n \rightarrow \mu $}.
 That's a law of large numbers (LLN).
 {$ E[ M_n ] = \mu $}. It's an unbiased estimator.
 {$ VAR[ M_n ] = n \sigma ^2 $}
 Weak law of large numbers {$$ \forall \epsilon >0 \lim_{n\rightarrow\infty} P[M_n\mu < \epsilon] = 1 $$}
 How fast does it happen? We can use Chebyshev, though that is very conservative.
 Strong law of large numbers {$$ P [ \lim _ {n\rightarrow\infty} M_n = \mu ] =1 $$}
 As {$ n\rightarrow\infty$}, {$ F_{M_n} $} becomes Gaussian. That's the Central Limit Theorem (CLT).
Viewgraph notes
Notes written on the viewgraph during class: 429.pdf.
Lecture 26, Tues May 3
Conflict final exam
If you told me about needed a conflict final exam and did not get email this morning, tell me again.
Central limit theorem etc
 Review: Almost no matter what distribution the random variable X is, {$ F_{M_n} $} quickly becomes Gaussian as n increases. n=5 already gives a good approximation.
 nice applets:
 http://onlinestatbook.com/stat_sim/normal_approx/index.html This tests how good is the normal approximation to the binomial distribution.
 http://onlinestatbook.com/stat_sim/sampling_dist/index.html This lets you define a distribution, and take repeated samples of a given size. It shows how the means of the samples are distributed. For sample with more than a few observations, they look fairly normal.
 http://www.umd.umich.edu/casl/socsci/econ/StudyAids/JavaStat/CentralLimitTheorem.html This might also be interesting.
 Sample problems.
 Problem 7.1 on page 402.
 Problem 7.22.
 Problem 7.25.
Chapter 8, Statistics
 We have a population. (E.g., voters in next election, who will vote Democrat or Republican).
 We don't know the population mean. (E.g., fraction of voters who will vote Democrat).
 We take several samples (observations). From them we want to estimate the population mean and standard deviation. (Ask 1000 potential voters; 520 say they will vote Democrat. Sample mean is .52)
 We want error bounds on our estimates. (.52 plus or minus .04, 95 times out of 100)
 Another application: testing whether 2 populations have the same mean. (Is this batch of Guiness as good as the last one?)
 Observations cost money, so we want to do as few as possible.
 This gets beyond this course, but the biggest problems may be nonmath ones. E.g., how do you pick a random likely voter? In the past phone books were used. In a famous 1936 Presidential poll, that biased against poor people, who voted for Roosevelt.
Viewgraph notes
Notes written on the viewgraph during class: 503.pdf.
Lecture 27, Fri May 6
Extra office hours
For the rest of the course, until the exam, there will be extra TA office hours to give everyone all the help they want. The office hours will be in the Flop flop lounge. Tentatively, they are:
When  Who 

Mon 12 (May 16)  Sina 
Mon 4:305:30 (May 9)  Sina 
Mon 7:108  Hang 
Tues 45  Hang 
Thurs 56  Sina 
Fri 5:156:15, today and next week  Harish 
DOSO letters
Anyone who has a letter from the DOSO for me, please remind me what it says. You don't need to give me the letter again, if you gave it to me before.
Statistics continued
 In probability, we know the parameters (e.g., mean and standard deviation) of a distribution and use them to compute the probability of some event. E.g., if we toss a fair coin 4 times what's the probability of exactly 4 heads? Answer: 1/16.
 In statistics we do not know all the parameters, though we usually
know that type the distribution is, e.g., normal. (We often know the
standard deviation.)
 We make observations about some members of the distribution, i.e., draw some samples.
 From them we estimate the unknown parameters.
 We often also compute a confidence interval on that estimate.
 E.g., we toss an unknown coin 100 times and see 60 heads. A good estimate for the probability of that coin coming up heads is 0.6.
 Some estimators are better than others, though that gets beyond this course.
 Suppose I want to estimate the average height of an RPI student by measuring the heights of N random students.
 The mean of the highest and lowest heights of my N students would converge to the population mean as N increased.
 However the median of my sample would converge faster. Technically, the variance of the sample median is smaller than the variance of the sample hilo mean.
 The mean of my whole sample would converge the fastest. Technically, the variance of the sample mean is smaller than the variance of any other estimator of the population mean. That's why we use it.
 However perhaps the population's distribution is not normal. Then one of the other estimators might be better. It would be more robust.
 (Enrichment) How to tell if the population is normal? We can do various plots of the observations and look. We can compute the probability that the observations would be this uneven if the population were normal.
 An estimator may be biased. We have an distribution that is U[0,b] for unknown b. We take a sample. The max of the sample has a mean n/(n+1)b though it converges to b as n increases.
 Example 8.2, page 413: Onetailed probability. This is the probability that the mean of our sample is at least so far above the population mean. {$$ \alpha = P[\overline{X_n}\mu > c] = Q\left( \frac{c}{\sigma_x / \sqrt{n} } \right) $$} Q is defined on page 169: {$$ Q(x) = \int_x^ { \infty} \frac{1}{\sqrt{2\pi} } e^{\frac{x^2}{2} } dx $$}
 Application: You sample n=100 students' verbal SAT scores, and see {$ \overline{X} = 550 $}. You know that {$\sigma=100 $}. If {$\mu = 525 $}, what is the probability that {$ \overline{X_n} > 550 $} ? Answer: Q(2.5) = 0.006
 This means that if we take 1000 random sample of students, each with 100 students, and measure each sample's mean, then, on average, 6 of those 1000 samples will have a mean over 550.
 This is often worded as the probability of the population's mean being under 525 is 0.006, which is different. The problem with saying that is that presumes some probability distribution for the population mean.
 The formula also works for the other tail, computing the probability that our sample mean is at least so far below the population mean.
 The 2tail probability is the probability that our sample mean is at least this far away from the sample mean in either direction. It is twice the 1tail probability.
 All this also works when you know the probability and want to know c, the cutoff.
Viewgraph notes
Notes written on the viewgraph during class: 506.pdf.
Lecture 28, Tues May 10
Grading
 Grade reports:
 They were mailed out last night. Please report any errors. We are not accepting error reports for grades listed in the previous grade report unless you are rereporting an error that we haven't fixed.
 The formula for the iclicker grade was: one point for each correct answer and one point for each day that we used the iclickers that you answered a question, correct or not. Some questions, such as which algebra SW did you prefer, were not graded.
 Final exam notes:
 You may bring three doublesided cheat sheets.
 There will be a group TA office hour a few days after the exam for you to read your graded exam and ask the TAs for explanations.
 The exam will mostly be on the later part of the course, but you'll need to know earlier material to answer these questions.
 One question from exam 2 will be recycled, with a few changes.
 Some homework question may also be repeated.
 Subjects that occupied a lot of class time are more likely to be on the final exam. E.g., the max of two random variables would be a good candidate.
 You will be allowed to omit a question or two.
 Grading formula: Tentatively, an A will be >=95, A: >= 90, and so on every 5 points. If this appears to give a lower QPA than comparable courses, I'll raise it.
 Possible exam topics:
 Noisy communication channel: maximum a posteriori and maximum likelihood.
 Two random variables, either discrete or continuous: covariance, correlation coefficient
 Two Gaussian random variables.
 Conditional and marginal probability.
 Vector random variables.
 Functions of random variables, e.g., transforming from feet to meters. E.g., sum, max, min.
 Estimation of random variables; maximum a posteriori vs maximum likelihood
 Law of large numbers, Central limit theorem.
 Statistics: estimating the population mean from a sample mean. Putting a confidence interval on that estimate.
Hypothesis testing
 Say we want to test whether the average height of an RPI student (called the population) is 2m.
 We assume that the distribution is Gaussian (normal) and that the standard deviation of heights is, say, 0.2m.
 However we don't know the mean.
 We do an experiment and measure the heights of n=100 random students. Their mean height is, say, 1.9m.
 The question on the table is, is the population mean 2m?
 This is different from the earlier question that we analyzed, which was this: What is the most likely population mean? (Answer: 1.9m.)
 Now we have a hypothesis (that the population mean is 2m) that we're testing.
 The standard way that this is handled is as follows.
 Define a null hypothesis, called H0, that the population mean is 2m.
 Define an alternate hypothesis, called HA, that the population mean is not 2m.
 Note that we observed our sample mean to be {$ 0.5 \sigma$} below the population mean, if H0 is true.
 Each time we rerun the experiment (measure 100 students) we'll observe a different number.
 We compute the probability that, if H0 is true, our sample mean would be this far from 2m.
 Depending on what our underlying model of students is, we might use a 1tail or a 2tail probability.
 Perhaps we think that the population mean might be less than 2m but it's not going to be more. Then a 1tail distribution makes sense.
 That is, our assumptions affect the results.
 The probability is Q(5), which is very small.
 Therefore we reject H0 and accept HA.
 We make a type1 error if we reject H0 and it was really true. See http://en.wikipedia.org/wiki/Type_I_and_type_II_errors
 We make a type2 error if we accept H0 and it was really false.
 These two errors trade off: by reducing the probability of one we increase the probability of the other, for a given sample size.
 . E.g. in a criminal trial we prefer that a guilty person go free to having an innocent person convicted.
 Rejecting H0 says nothing about what the population mean really is, just that it's not likely 2m.
Viewgraph notes
Notes written on the viewgraph during class: 510.pdf.
Notes after the last lecture
 The score emailed to people was out of 75. The final exam is out of 25, which will bring the total to 100. The sorted list of scores is: 14.7%, 24.9%, 29.8%, 37.8%, 38.2%, 38.7%, 39.9%, 39.9%, 39.9%, 40.3%, 41.3%, 41.6%, 42.5%, 44.0%, 44.3%, 45.2%, 45.8%, 46.0%, 46.3%, 46.8%, 47.6%, 47.6%, 47.9%, 48.5%, 48.9%, 49.9%, 50.3%, 50.5%, 50.8%, 51.9%, 51.9%, 52.2%, 52.9%, 53.3%, 53.4%, 53.4%, 54.1%, 54.3%, 54.5%, 54.8%, 54.8%, 55.5%, 55.5%, 55.5%, 55.6%, 56.4%, 56.6%, 56.7%, 56.8%, 57.1%, 57.2%, 57.3%, 57.6%, 58.5%, 58.6%, 58.8%, 61.1%, 61.2%, 61.3%, 61.6%, 61.6%, 61.8%, 61.9%, 62.1%, 62.2%, 63.2%, 63.4%, 63.5%, 63.9%, 64.0%, 64.4%, 64.7%, 64.9%, 65.7%, 65.9%, 66.5%, 66.9%, 67.3%, 69.6%, 71.6%, 71.7%, 72.4%, 75.3%
 There will be no generating functions on the exam.
 (Enrichment) Random sampling is hard. The US government got it wrong here: http://politics.slashdot.org/story/11/05/13/2249256/AlgorithmGlitchVoidsOutcomeofUSGreenCardLottery
 For exam 2, there was no relation between the grade and the order of finishing.
 On Thursday from 2 to 3pm, all 3 TAs will be available in the flipflop lounge to let you see your graded exams.
Conflict final exam, Mon May 16
25pm, JEC 5030.
Sina graded questions 13, Hang graded 45, and Harish graded 68.
Final exam, Tues May 17
Grading notes
Exam 3 scores
0%, 0%, 0%, 0%, 0%, 23%, 30%, 35%, 35%, 37%, 37%, 40%, 45%, 45%, 45%, 47%, 47%, 48%, 48%, 48%, 50%, 50%, 50%, 50%, 52%, 52%, 52%, 52%, 52%, 53%, 53%, 53%, 55%, 55%, 55%, 57%, 57%, 57%, 57%, 58%, 58%, 60%, 60%, 62%, 62%, 63%, 63%, 63%, 63%, 65%, 65%, 67%, 67%, 67%, 67%, 70%, 72%, 72%, 72%, 73%, 73%, 75%, 75%, 75%, 77%, 77%, 77%, 77%, 78%, 78%, 78%, 78%, 78%, 80%, 83%, 83%, 83%, 83%, 85%, 85%, 87%, 88%, 90%, 92%, 95%, 97%
Total for course
Cutoff  Letter  Count 

0.0%  FF  4 
44.5%  DD  1 
49.5%  DP  8 
54.5%  CM  3 
59.5%  CC  13 
64.5%  CP  8 
69.5%  BM  14 
74.5%  BB  13 
79.5%  BP  8 
84.5%  AM  6 
89.5%  AA  6 
I  1 
These cutoffs are 5.5% more generous than I originally posted.
Terminology: BB>B, BP>B+, BM>B, etc. Otherwise Excel's VLOOKUP can go wrong.
Average course QPA = 2.5.
The corrected Exam3 average percentage, ignoring the 0s, is 64%. I copied the wrong number when assembling the mail.
After the course
Feel free to contact me to ask questions or to talk.