Skip to main content

Engineering Probability Class 12 Mon 2021-03-08

1 Poisson vs Binomial vs Normal distributions

The binomial distribution is the exact formula for the probability of k successes from n trials (with replacement).

When n and k are large but p=k/n is small, then the Poisson distribution is a good approximation to the binomial. Roughly, n>10, k<5.

When n is large and p is not too small or too large, then the normal distribution, which we haven't seen yet, is an excellent approximation. Roughly, n>10 and \(|n-k|>2\ \sqrt{n}\) .

For big n, you cannot use binomial, and for really big n, cannot use Poisson. Imagine that your experiment is to measure the number of atoms decaying in this uranium ore . How would you compute \(\left(10^{23}\right)!\) ?

OTOH, for small n, you can compute binomial by hand. Poisson and normal probably require a calculator.

2 Chapter 4

  1. I will try to ignore most of the theory at the start of the chapter.

  2. Now we will see continuous random variables.

    1. The probability of the r.v being any exact value is infinitesimal,

    2. so we talk about the probability that it's in a range.

  3. Sometimes there are mixed discrete and continuous r.v.

    1. Let X be the time X to get a taxi at the airport.

    2. 80% of the time a taxi is already there, so p(X=0)=.8.

    3. Otherwise we wait a uniform time from 0 to 20 minutes, so p(a<x<b)=.01(b-a), for 0<a<b<20.

  4. Remember that for discrete r.v. we have a probability mass function (pmf).

  5. For continuous r.v. we now have a probability density function (pdf), \(f_X(x)\).

  6. p(a<x<a+da) = f(a)da

  7. For any r.v., we have a cumulative distribution function (cdf) \(F_X(x)\).

  8. The subscript is interesting only when we are using more than one cdf and need to tell them apart.

  9. Definition: F(x) = P(X<=x).

  10. The <= is relevant only for discrete r.v.

  11. As usual Wikipedia isn't bad, and is deeper than we need here, Cumulative_distribution_function.

  12. We compute means and other moments by the obvious integrals.

3 Notation

How to parse \(F_X(x)\)

  1. Uppercase F means that this is a cdf. Different letters may indicate different distributions.

  2. The subscript X is the name of the random variable.

  3. The x is an argument, i.e., an input.

  4. \(F_X(x)\) returns the probability that the random variable is less or equal to the value x, i.e. prob(X<=x).

4 Matlab

  1. Matlab, Mathematica, and Maple all will help you do problems too big to do by hand. Sometime I'll demo one or the other.

  2. Matlab

    1. Major functions:

      cdf(dist,X,A,...)
      pdf(dist,X,A,...)
    2. Common cases of dist (there are many others):

      'Binomial'
      'Exponential'
      'Poisson'
      'Normal'
      'Geometric'
      'Uniform'
      'Discrete Uniform'
    3. Examples:

      pdf('Normal',-2:2,0,1)
      cdf('Normal',-2:2,0,1)
      
      p=0.2
      n=10
      k=0:10
      bp=pdf('Binomial',k,n,p)
      bar(k,bp)
      grid on
      
      bc=cdf('Binomial',k,n,p)
      bar(k,bc)
      grid on
      
      x=-3:.2:3
      np=pdf('Normal',x,0,1)
      plot(x,np)
    4. Interactive GUI to explore distributions: disttool

    5. Random numbers:

      rand(3)
      rand(1,5)
      randn(1,10)
      randn(1,10)*100+500
      randi(100,4)
    6. Interactive GUI to explore random numbers: randtool

    7. Plotting two things at once:

      x=-3:.2:3
      n1=pdf('Normal',x,0,1)
      n2=pdf('Normal',x,0,2)
      plot(x,n1,n2)
      plot(x,n1,x,n2)
      plot(x,n1,'--r',x,n2,'.g')
  3. Use Matlab to compute a geometric pdf w/o using the builtin function.

  4. Review. Which of the following do you prefer to use?

    1. Matlab

    2. Maple

    3. Mathematica

    4. Paper. It was good enough for Bernoulli and Gauss; it's good enough for me.

    5. Something else (please email about it me after the class).

4.1 My opinion

This is my opinion of Matlab.

  1. Advantages

    1. Excellent quality numerical routines.

    2. Free at RPI.

    3. Many toolkits available.

    4. Uses parallel computers and GPUs.

    5. Interactive - you type commands and immediately see results.

    6. No need to compile programs.

  2. Disadvantages

    1. Very expensive outside RPI.

    2. Once you start using Matlab, you can't easily move away when their prices rise.

    3. You must force your data structures to look like arrays.

    4. Long programs must still be developed offline.

    5. Hard to write in Matlab's style.

    6. Programs are hard to read.

  3. Alternatives

    1. Free clones like Octave are not very good

    2. The excellent math routines in Matlab are also available free in C++ librarues

    3. With C++ libraries using template metaprogramming, your code looks like Matlab.

    4. They compile slowly.

    5. Error messages are inscrutable.

    6. Executables run very quickly.

5 Chapter 4 ctd

  1. Text 4.2 p 148 pdf

  2. Simple continuous r.v. examples: uniform, exponential.

  3. The exponential distribution complements the Poisson distribution. The Poisson describes the number of arrivals per unit time. The exponential describes the distribution of the times between consecutive arrivals.

    The exponential is the continuous analog to the geometric. If the random variable is the integral number of seconds, use geometric. If the r.v. is the real number time, use exponential.

    Ex 4.7 p 150: exponential r.v.

  4. Properties

    1. Memoryless.

    2. \(f(x) = \lambda e^{-\lambda x}\) if \(x\ge0\), 0 otherwise.

    3. Example: time for a radioactive atom to decay.

  5. Skip 4.2.1 for now.

  6. The most common continuous distribution is the normal distribution.

  7. 4.2.2 p 152. Conditional probabilities work the same with continuous distributions as with discrete distributions.

  8. p 154. Gaussian r.v.

    1. \(f(x) = \frac{1}{\sqrt{2\pi} \cdot \sigma} e^{\frac{-(x-\mu)^2}{2\sigma^2}}\)

    2. cdf often called \(\Psi(x)\)

    3. cdf complement:

      1. \(Q(x)=1-\Psi(x) = \int_x^\infty \frac{1}{\sqrt{2\pi} \cdot \sigma} e^{\frac{-(t-\mu)^2}{2\sigma^2}} dt\)

      2. E.g., if \(\mu=500, \sigma=100\),

        1. P[x>400]=0.66

        2. P[x>500]=0.5

        3. P[x>600]=0.16

        4. P[x>700]=0.02

        5. P[x>800]=0.001

  9. Text 4.3 p 156 Expected value

  10. Skip the other distributions (for now?).