.. title: Engineering Probability Class 28 Mon 2022-04-25
.. slug: class28
.. date: 2022-04-24
.. tags: class
.. link: 
.. description: 
.. type: text
.. has_math: true

.. sectnum::
.. contents:: Table of contents::
..


Homework 11
-----------

#. The due date was accidentally set too late, and has been changed to Thurs.  (Assignments are not allowed in reading period.)

#. As soon as possible after that, we'll calculate what letter grade you'd get if you didn't write the final, and upload it to LMS.
      


Misc statistics topics
----------------------

Reviewing the videos.

T-test
======

#. You have 2 populations.

#. Do they have the same mean?

#. Take a sample of observations from each population.

#. Calculate the sample means.

#. They're probably different.

#. What's the prob the sample means would be at least that different if the population means were the same.

#. **At least** can be 1 sided or 2 sided.

ANOVA
=====

#. **Analysis of variance**

#. Test for possible difference in several groups.

#. E.g. you're searching for a cure for lycanthropy.

#. 5 possible treatments:  aspirin, silver crosses, sunlight, being bitten by Dracula, nothing.

#. Take 100 people with lycanthropy.

#. Assign different treatments randomly.

#. Measure length of hair at next full moon.

#. Did any treatment work?

#. Real work application: The worldwide pharma industry grosses $$10^{12}$$ dollars a year.  A new drug costs several $$10^9$$ to develop, including the costs of the failures.  To get a new drug approved, you have to prove, with trials and statistics, that it works.
   
   
Linear regression
=================

#. To explore possible linear relationships between several variables.

#. Several possible independent variables.

#. One dependent variable.


#. One independent variable example:

   #. student score vs time on exam 2:

   #. Independent variable: time to finish.

   #. Dependent variable: score.

   #. Is there a linear relationship?

   #. What is it?

   #. How good is it?

#. Multiple independent variables example:
   
   #.  Try to predict first year student performance at RPI.

   #. Dependent variable: first year GPA.

   #. Independent variables:

      #. high school grade

      #. high school rank

      #. number of AP

      #. fraternity?

      #. athlete?

      #. home state

      #. height

      #. weight

   #. which one is the strongest predictor?

   #. Add the independent variables one by one in order of importance.

   #. However, independent variables may be correlated with each other.

   #. with enough independent variables you can explain anything.

   #. what about nonlinear relationships?


Non parametric stats
====================

#. no assumptions about the distribution, except that the observations are independent.

#. Often use order stats.

#. E.g., Wilcoxon rank-sum test (aka Mann-Whitney) to test if two pops have same mean:

   #. combine the observations from the two populations, X and Y.

   #. sort them all together

   #. see if the observations from population X are clustered at the start.

   #. by computing U score: count number of times Xi>Yj,

      #. for large enough n, U is normal, with mean $$n^2/2$$ and variance $$n^2(2n+1)/12$$.	 

   #. what's the probability that observations would be this biassed (towards the start) if the population means were the same?  I.e., that U would be this far off mean?

#. There are many tests.

#. You need to decide what "biassed" means.  I.e., pick your alternative hypothesis.

#. Not as powerful but more robust.


How to lie with statistics
==========================

https://en.wikipedia.org/wiki/How_to_Lie_with_Statistics   
	 
https://www.amazon.com/How-Lie-Statistics-Darrell-Huff/dp/0393310728


Machine learning
================

current hot application of stats.


	 
Final exam
--------------

#. The material will go up to homework 11,

#. There will be no statistics or paradoxes, since we didn't have homeworks on that.

#. The final exam will be as specified by the registrar.

#. It will be in person, using gradescope.

#. Bring blank scratch paper.

#. You may have three (3) 2-sided crib sheets.

#. As specified in the syllabus, all 3 exams have the same weight, and the lowest will be dropped.

#. The lowest homework will also be dropped.

#. There was no final exam last year because RPI was shut down by the computer hack.

#. Here is from 2 years ago: 
`Spring 2020 final exam <final-2020.html>`_. `Answers <../files/final-answers-s2020.pdf>`_ .

#. However the material covered changes somewhat each year.


   
After the course
----------------

We have a professional relationship.  I'm available to discuss any legal ethical topic even after you graduate.

Even after I retire, you have my non-RPI email.

Parting advice: look at the famous alumni on the Darrin windows.  What can you do in later life, so your picture goes there also?



