Engineering Probability Class 14 Mon 2021-03-15

W Randolph Franklin (WRF), Rensselaer Polytechnic Institute (RPI)

2021-03-15 00:00

Table of contents::

1 Exam 2

Will be in class on Mon April 5. Same rules as before.

(I asked for opinions on Webex.)

2 Tutorial on probability density

Since the meaning of probability density when you transform variables is still causing problems for some people, think of changing units from English to metric. First, with one variable, X.

Let X be in feet and be U[0,1].

$$f_X(x) = \begin{cases} 1& \text{if } 0\le x\le1\\ 0&\text{otherwise} \end{cases}$$
$P[.5\le x\le .51] = 0.01$.
Now change to centimeters. The transformation is $Y=30X$.
$$f_Y(y) = \begin{cases} 1/30 & \text{if } 0\le y\le30\\ 0&\text{otherwise} \end{cases}$$
Why is 1/30 reasonable?
First, the pdf has to integrate to 1: $$\int_{-\infty}^\infty f_Y(y) =1$$
Second, $$\begin{align} & P[.5\le x\le .51] \\ &= \int_.5^.51 f_X(x) dx \\& =0.01 \\& = P[15\le y\le 15.3] \\& = \int_{15}^{15.3} f_Y(y) dy \end{align}$$

3 Functions of a r.v.

Example 4.29 page 175.
Linear function: Example 4.31 on page 176.

4 Markov and Chebyshev inequalities (Section 4.6, page 181)

Your web server averages 10 hits/second.
It will crash if it gets 20 hits.
By the Markov inequality, that has a probability at most 0.5.
That is way way too conservative, but it makes no assumptions about the distribution of hits.
For the Chebyshev inequality, assume that the variance is 10.
It gives the probability of crashing at under 0.1. That is tighter.
Assuming the distribution is Poisson with a=10, use Matlab 1-cdf('Poisson',20,10). That gives 0.0016.
The more we assume, the better the answer we can compute.
However, our assumptions had better be correct.
(Editorial): In the real world, and especially economics, the assumptions are, in fact, often false. However, the models still usually work (at least, we can't prove they don't work). Until they stop working, e.g., https://en.wikipedia.org/wiki/Long-Term_Capital_Management . Jamie Dimon, head of JP Morgan, has observed that the market swings more widely than is statistically reasonable.

5 Reliability (section 4.8, page 189)

The reliability R(t) is the probability that the item is still functioning at t. R(t) = 1-F(t).
What is the reliability of an exponential r.v.? ( $F(t)=1-e^{\lambda t}$ ).
The Mean Time to Failure (MTTF) is obvious. The equation near the top of page 190 should be

$E[T] = \int_0^\infty \textbf{t} f(t) dt$
... for an exponential r.v.?
The failure rate is the probability of a widget that is still alive now dying in the next second.
The importance of getting the fundamentals (or foundations) right:

In the past 50 years, two major bridges in the Capital district have collapsed because of inadequate foundations. The Green Island Bridge collapsed on 3/15/77, see http://en.wikipedia.org/wiki/Green_Island_Bridge , http://cbs6albany.com/news/local/recalling-the-schoharie-bridge-collapse-30-years-later . The Thruway (I-90) bridge over Schoharie Creek collapsed on 4/5/87, killing 10 people.

Why RPI likes the Roeblings: none of their bridges collapsed. E.g., when designing the Brooklyn Bridge, Roebling Sr knew what he didn't know. He realized that something hung on cables might sway in the wind, in a complicated way that he couldn't analyze. So he added a lot of diagonal bracing. The designers of the original Tacoma Narrows Bridge were smart enough that they didn't need this expensive margin of safety.
Another way to look at reliability: think of people.
1. Your reliability R(t) is the probability that you live to age t, given that you were born alive. In the US, that's 98.7% for age 20, 96.4% for 40, 87.8% for 60.
2. MTTF is your life expectancy at birth. In the US, that's 77.5 years.
3. Your failure rate, r(t), is your probability of dying in the next dt, divided by dt, at different ages. E.g. for a 20-year-old, it's 0.13%/year for a male and 0.046%/year for a female http://www.ssa.gov/oact/STATS/table4c6.html . For 40-year-olds, it's 0.24% and 0.14%. For 60-year-olds, it's 1.2% and 0.7%. At 80, it's 7% and 5%. At 100, it's 37% and 32%.
Example 4.47, page 190. If the failure rate is constant, the distribution is exponential.
If several subsystems are all necessary, e.g., are in serial, then their reliabilities multiply. The result is less reliable.

If only one of them is necessary, e.g. are in parallel, then their complementary reliabilities multiply. The result is more reliable.

An application would be different types of RAIDs. (Redundant Array of Inexpensivexxxxxxxxxxxxx Independent Disks). In one version you stripe a file over two hard drives to get increased speed, but decreased reliability. In another version you triplicate the file over three drives to get increased reliability. (You can also do a hybrid setup.)

(David Patterson at Berkeley invented RAID (and also RISC). He intended I to mean Inexpensive. However he said that when this was commercialized, companies said that the I meant Independent.)
Example 4.49 page 193, reliability of series subsystems.
Example 4.50 page 193, increased reliability of parallel subsystems.

6 4.9 Generating r.v

Ignore. It's surprisingly hard to do right, and has been implemented in builtin routines. Use them.

7 4.10 Entropy

Ignore since it's starred.

8 Chapter 5, Two Random Variables

One experiment might produce two r.v. E.g.,
1. Shoot an arrow; it lands at (x,y).
2. Toss two dice.
3. Measure the height and weight of people.
4. Measure the voltage of a signal at several times.
The definitions for pmf, pdf and cdf are reasonable extensions of one r.v.
The math is messier.
The two r.v. may be *dependent* and *correlated*.
The *correlation coefficient*, $\rho$, is a dimensionless measure of linear dependence. $-1\le\rho\le1$.
$\rho$ may be 0 when the variables have a nonlinear dependent relation.
Integrating (or summing) out one variable gives a marginal distribution.
We'll do some simple examples:
1. Toss two 4-sided dice.
2. Toss two 4-sided ''loaded'' dice. The marginal pmfs are uniform.
3. Pick a point uniformly in a square.
4. Pick a point uniformly in a triangle. x and y are now dependent.
The big example is a 2 variable normal distribution.
1. The pdf is messier.
2. It looks elliptical unless $\rho$=0.
I finished the class with a high level overview of Chapter 5, w/o any math.

9 Comic

Dilbert