.. title: Engineering Probability Class 9 Thu 2022-02-10
.. slug: class09
.. date: 2022-02-10
.. tags: class
.. link: 
.. description: 
.. type: text
.. has_math: true

.. sectnum::
.. contents:: Table of contents::
..


Poisson vs Binomial vs Normal distributions
-------------------------------------------

The binomial distribution is the exact formula for the probability of k successes from n trials (with replacement).

When n and k are large but p=k/n is small, then the Poisson distribution is a good approximation to the binomial.  Roughly, n>10, k<5.   

When n is large and p is not too small or too large, then the normal distribution, which we haven't seen yet, is an excellent approximation.  Roughly, n>10 and :math:`|n-k|>2\ \sqrt{n}` .

For big n, you cannot use binomial, and for really big n, cannot use Poisson.   Imagine that your experiment is to measure the number of atoms decaying in this `uranium ore <https://www.amazon.com/Images-SI-Uranium-Ore/dp/B000796XXM>`_ .  How would you compute :math:`\left(10^{23}\right)!` ?   

OTOH, for small n, you can compute binomial by hand.  Poisson and normal probably require a calculator.


      
Chapter 4
---------

#. I will try to ignore most of the theory at the start of the chapter.
#. Now we will see continuous random variables.  

   a. The probability of the r.v being any exact value is infinitesimal,
   #. so we talk about the probability that it's in a range.

#. Sometimes there are mixed discrete and continuous r.v.   

   a. Let X be the time X to get a taxi at the airport.
   #. 80% of the time a taxi is already there, so p(X=0)=.8.
   #. Otherwise we wait a uniform time from 0 to 20 minutes, so p(a<x<b)=.01(b-a), for 0<a<b<20.

#. Remember that for discrete r.v. we have a **probability mass function (pmf)**.
#. For continuous r.v. we now have a **probability density function**
   **(pdf)**, :math:`f_X(x)`.
#. p(a<x<a+da) = f(a)da
#. For any r.v., we have a **cumulative distribution function (cdf)**  :math:`F_X(x)`.
#. The subscript is interesting only when we are using more than one cdf and
   need to tell them apart.
#. Definition: F(x) = P(X<=x).
#. The <= is relevant only for discrete r.v.
#. As usual Wikipedia isn't bad, and is deeper than we need here, `Cumulative_distribution_function <http://en.wikipedia.org/wiki/Cumulative_distribution_function>`_.
#. We compute means and other moments by the obvious integrals.


#. Text 4.2 p 148 pdf

#. Simple continuous r.v. examples: uniform, exponential.

#. The **exponential** distribution complements the Poisson distribution.  The
   Poisson describes the number of arrivals per unit time.  The exponential
   describes the distribution of the times between consecutive arrivals.

   The exponential is the continuous analog to the geometric.  If the random variable is the integral number of seconds, use geometric.  If the r.v. is the real number time, use exponential.

   Ex 4.7 p 150: exponential r.v.

#. Properties

   a. Memoryless.

   #. :math:`f(x) = \lambda e^{-\lambda x}` if :math:`x\ge0`, 0 otherwise.

   #. Example: time for a radioactive atom to decay.

#. Skip    4.2.1 for now.
   
#. The most common continuous distribution is the **normal** distribution.

#. 4.2.2 p 152. Conditional probabilities work the same with
   continuous distributions as with discrete distributions.

#. p 154.  Gaussian r.v.

   a. :math:`f(x) = \frac{1}{\sqrt{2\pi} \cdot \sigma} e^{\frac{-(x-\mu)^2}{2\sigma^2}}`

   #. cdf often called :math:`\Psi(x)`

   #. cdf complement:

      a. :math:`Q(x)=1-\Psi(x) = \int_x^\infty \frac{1}{\sqrt{2\pi} \cdot \sigma} e^{\frac{-(t-\mu)^2}{2\sigma^2}} dt`

      #. E.g., if :math:`\mu=500, \sigma=100`,

         i. P[x>400]=0.66
         #. P[x>500]=0.5
         #. P[x>600]=0.16
	 #. P[x>700]=0.02
	 #. P[x>800]=0.001

#. Text 4.3 p 156 Expected value	    


   

Notation
--------

How to parse :math:`F_X(x)`

#. Uppercase F means that this is a cdf.   Different letters may indicate different distributions. 

#. The subscript X is the name of the random variable.

#. The x is an argument, i.e., an input.

#. :math:`F_X(x)` returns the probability that the random variable is less or equal to the value x, i.e. prob(X<=x).
   
   

Matlab
------

#. Matlab, Mathematica, and Maple all will help you do
   problems too big to do by hand.  Sometime I'll demo one or the other.

#. Matlab

   #. Major functions::

	cdf(dist,X,A,...)
	pdf(dist,X,A,...) 

   #. Common cases of dist (there are many others)::

	'Binomial'
	'Exponential'
	'Poisson'
	'Normal'
	'Geometric'
	'Uniform'
	'Discrete Uniform' 

   #. Examples::

	pdf('Normal',-2:2,0,1)
	cdf('Normal',-2:2,0,1)

	p=0.2
	n=10
	k=0:10
	bp=pdf('Binomial',k,n,p)
	bar(k,bp)
	grid on

	bc=cdf('Binomial',k,n,p)
	bar(k,bc)
	grid on

	x=-3:.2:3
	np=pdf('Normal',x,0,1)
	plot(x,np)

   #. Interactive GUI to explore distributions:  disttool
   #. Random numbers::

	rand(3)
	rand(1,5)
	randn(1,10)
	randn(1,10)*100+500
	randi(100,4)

   #. Interactive GUI to explore random numbers:  randtool
   #. Plotting two things at once::

	x=-3:.2:3
	n1=pdf('Normal',x,0,1)
	n2=pdf('Normal',x,0,2)
	plot(x,n1,n2)
	plot(x,n1,x,n2)
	plot(x,n1,'--r',x,n2,'.g')

#. Use Matlab to compute a geometric pdf w/o using the builtin function.

#. Review.  Which of the following do you prefer to use?

   a. Matlab
   #. Maple
   #. Mathematica
   #. Paper.  It was good enough for Bernoulli and Gauss; it's good enough for me.
   #. Something else (please email about it me after the class).


My opinion
==========

This is my opinion of Matlab.

#. Advantages

   #. Excellent quality numerical routines.
   #. Free at RPI.
   #. Many toolkits available.
   #. Uses parallel computers and GPUs.
   #. Interactive - you type commands and immediately see results.
   #. No need to compile programs.

#. Disadvantages

   #. Very expensive outside RPI.
   #. Once you start using Matlab, you can't easily move away when their prices rise.
   #. You must force your data structures to look like arrays.
   #. Long programs must still be developed offline.
   #. Hard to write in Matlab's style.
   #. Programs are hard to read.

#. Alternatives

   #. Free clones like Octave are not very good
   #. The excellent math routines in Matlab are also available free in C++ librarues
   #. With C++ libraries using template metaprogramming, your code looks like Matlab.
   #. They compile slowly.
   #. Error messages are inscrutable.
   #. Executables run very quickly.

	
Comic
----------

`Broomhilda <https://www.gocomics.com/broomhilda/2019/02/13>`_

