WEBVTT 1 00:06:07.619 --> 00:06:11.879 No. 2 00:06:11.879 --> 00:06:20.158 Can you hear me now? Thank you. Okay now the screen it says viewing my application. 3 00:06:20.158 --> 00:06:23.699 But you're not feeling my application I don't think. 4 00:06:23.699 --> 00:06:29.908 So, Eva. 5 00:06:29.908 --> 00:06:34.319 I stop and start. 6 00:06:38.488 --> 00:06:41.999 And start. 7 00:06:44.579 --> 00:06:49.978 Silence. 8 00:06:52.319 --> 00:06:57.298 This. 9 00:07:03.778 --> 00:07:09.059 Well. 10 00:07:10.949 --> 00:07:15.658 I assume you can't see my application. 11 00:07:16.889 --> 00:07:21.569 Silence. 12 00:07:21.569 --> 00:07:25.678 Can you see my screen. 13 00:07:30.959 --> 00:07:35.428 Yeah, you see, it's trying, but no, yeah Thank you. 14 00:07:38.819 --> 00:07:45.329 Same. Okay. Thanks, Nick. Well, I'll talk for a minute or 2 and then. 15 00:07:45.329 --> 00:07:48.569 I think it may start up again. 16 00:07:48.569 --> 00:07:54.569 Spontaneously I'll talk for a minute or 2, and then we'll see. 17 00:07:54.569 --> 00:08:04.379 What I'm walking, what I'm going from is the well, the class blog I put up for today. 18 00:08:04.379 --> 00:08:14.999 1st, I just got a paragraph on Mathematica. Why I spent a class on it. It's a very powerful tool and it's useful for the sort of stuff. 19 00:08:14.999 --> 00:08:21.538 We're seeing in 2 variables and things like that and also. 20 00:08:21.538 --> 00:08:26.338 Stuff where you've got, you know, peace wise functions, like. 21 00:08:26.338 --> 00:08:38.818 You know, the form from 0 to 1, but they're extremely tedious to do. By hand you want to integrate that. For example, what I showed you, when you involved to you did some. 22 00:08:38.818 --> 00:08:51.389 Needed density for the sum of the 2 uniform things, integrating that stuff. By hand, you get all these different intervals and it's a horrible mess. And mathematics is a tool to do it. It's like, you do a risk. Hey, finally, it came through. 23 00:08:51.389 --> 00:08:56.759 Okay um, computers can be really slow. 24 00:08:57.504 --> 00:09:06.714 So you can see my screen now I'm talking about Mathematica and so it's an analogy, use a calculator to do arithmetic use Mathematica to do algebra. 25 00:09:07.134 --> 00:09:22.104 Oh, well, I'm not basically any grades on it, because that's what I said in the syllabus and also learning enough mathematics to be useful. Does take some time. I admit, and it's easy to get a mathematical formula. That just takes a lot of compute time to do. 26 00:09:22.104 --> 00:09:22.974 Like, I showed, you. 27 00:09:23.249 --> 00:09:33.359 Or I was doing, say, expected values and for something that involved basically a double integral and it was we were waiting for a minute or 2. 28 00:09:33.359 --> 00:09:46.649 But, yeah, okay, what I want to do is I want to show you a few of keys videos now, so you don't have to take time out of class to watch them. So you can it's a chance we can watch them together and ask questions. 29 00:09:46.649 --> 00:09:54.928 The 3rd thing is, I've got a section on counterintuitive things and statistics. There's a lot of surprising things happen and I'll list a few of them here. 30 00:09:54.928 --> 00:10:02.278 And then I've also got what I was telling you what I was just I've got written down here what I was telling, you. 31 00:10:02.278 --> 00:10:09.958 I'm noticing that the, what you're saying the screen is not updating as fast as I'm changing things, but. 32 00:10:11.219 --> 00:10:14.219 Yeah, well. 33 00:10:14.874 --> 00:10:29.244 I'm not certain early in the semester, move another platform for any case. The section 4 of today's blog has some counterintuitive talks about statistics section. 3 talks about some counterintuitive things. Section. 34 00:10:29.244 --> 00:10:35.124 4 talks about what I was telling you. I've got them written down here. I've got my take on it, which may help you. 35 00:10:35.399 --> 00:10:47.578 And section 5 talks about hypothesis testing, which freshly Rad he will talk about. Also I've talked about it, but I think it's important to write it down. 36 00:10:47.578 --> 00:10:53.399 I might take on it, so to help, you maybe understand it because it's real world stuff. You see a lot. 37 00:10:53.399 --> 00:10:58.198 And section sick, I just have some other videos. They're not going to be. 38 00:10:58.198 --> 00:11:03.149 You're not going to be graded on and that's what enrichment means, but there are other takes on some of this stuff. 39 00:11:03.149 --> 00:11:08.938 And the section 7 has some statistics videos that I thought are interesting. Um. 40 00:11:08.938 --> 00:11:15.149 Well, the T tests that get us brewery data on something called analysis of variance and so on. 41 00:11:15.149 --> 00:11:19.048 I may show a few of these things, not all of them. 42 00:11:19.048 --> 00:11:24.389 Okay, let's go back now and hey, it's now and let's do. 43 00:11:24.389 --> 00:11:28.408 A few Rad key video, 64 367. 44 00:11:28.408 --> 00:11:32.129 And see his take on this. 45 00:11:36.629 --> 00:11:41.219 What I'll attempt to do is actually share just the screen. 46 00:11:43.349 --> 00:11:52.558 No, we'll see what happens here. I have a chat window open. 47 00:11:52.558 --> 00:11:57.778 I've got 2 computers in front of me and an iPad. I can use if necessary. 48 00:11:57.778 --> 00:12:03.269 So the 2nd computer will have the chat window open. 49 00:12:03.269 --> 00:12:06.928 So, the theory is that when the hardware fails. 50 00:12:06.928 --> 00:12:11.908 Not the hardware ever, or the software ever fails then. 51 00:12:11.908 --> 00:12:24.269 You can type something on the chat window because it'll be maximizing my prime the 2 in my primary window. So you won't be able to see it there. So he cannot hear the sound or something. Tell me. 52 00:12:29.099 --> 00:12:33.778 We're going to talk about isn't constantly confidence intervals and this is a little bit. 53 00:12:33.778 --> 00:12:37.078 Kind of a combination of the central limit theorem and. 54 00:12:37.078 --> 00:12:42.869 Kinds of problems that physically do if you're dealing with real data. Okay. The setup is the. 55 00:12:42.869 --> 00:12:47.908 And, uh, X1 through our ID. 56 00:12:47.908 --> 00:12:51.538 Random variables. Okay. With mean. 57 00:12:51.538 --> 00:12:55.229 And I'm. 58 00:12:55.229 --> 00:13:01.288 Standard Deviations. Okay. What I'm going to focus on right here is basically. 59 00:13:01.288 --> 00:13:04.828 Uh, so that which is the average. 60 00:13:04.828 --> 00:13:11.339 Um, these random variables. Okay. 61 00:13:11.339 --> 00:13:14.428 Now, I know that kind of the. 62 00:13:16.438 --> 00:13:20.519 Various laws of large numbers. Tell me in a way. This is kind of slagging. 63 00:13:20.519 --> 00:13:23.999 That in the limit, this mean looks like. 64 00:13:23.999 --> 00:13:29.639 Okay, with probability 1 in practice, what I'll do is I want to say, okay, well. 65 00:13:29.639 --> 00:13:32.849 I see a bunch of data that I sample from the distribution. 66 00:13:32.849 --> 00:13:40.168 What is a good range an interval that I can be pretty sure that mean is inside right? So kind of the way I want to phrase it is. 67 00:13:40.168 --> 00:13:43.288 That's what. 68 00:13:43.288 --> 00:13:47.009 1, minus alpha be some large number. 69 00:13:50.158 --> 00:13:55.349 Like, 95% or 99% what I'm going to do is I want to find. 70 00:13:55.349 --> 00:13:59.788 Hell, lower bound, upper bound. 71 00:13:59.788 --> 00:14:03.239 Such that the probability that. 72 00:14:03.239 --> 00:14:06.538 The truth is in. 73 00:14:06.538 --> 00:14:10.708 This interval is equal to 1 minus alpha. 74 00:14:10.708 --> 00:14:15.328 That's kind of the set up. It's a little bit different than what we talked about before. 75 00:14:15.328 --> 00:14:18.808 Um, this year is called the confidence level. 76 00:14:22.048 --> 00:14:25.859 And this range is called the confidence interval. 77 00:14:30.688 --> 00:14:35.458 Okay, and the idea again comes from the central limit theory and where we know that in the limit. 78 00:14:35.458 --> 00:14:38.908 Things are calcium. Okay so here. 79 00:14:38.908 --> 00:14:43.198 These you would depend on a whole bunch of everything. 80 00:14:43.198 --> 00:14:47.009 Drives from the same Pedia right? 81 00:14:47.009 --> 00:14:54.149 So, what I know is that this number, which comes from the Gaussian CDF in the central element from. 82 00:14:54.149 --> 00:14:57.269 Is approximately the probability. 83 00:14:57.269 --> 00:15:00.719 That, um, my sample mean. 84 00:15:00.719 --> 00:15:05.158 Minus true mean is within. 85 00:15:05.158 --> 00:15:10.558 This C Sigma over squared bandwidth we spent a bunch of lessons kind of pulling out with right? 86 00:15:10.558 --> 00:15:14.099 And so, let me be more explicit. 87 00:15:14.099 --> 00:15:17.369 This is like, saying that by. 88 00:15:17.369 --> 00:15:22.589 Is like this. 89 00:15:25.589 --> 00:15:29.609 Which is the same as saying if I just put you in the middle. 90 00:15:29.609 --> 00:15:34.859 That's like saying, I have a being. 91 00:15:34.859 --> 00:15:41.519 In this range, and this is, in fact, what the confidence interval. 92 00:15:41.519 --> 00:15:44.578 Yes okay. And so. 93 00:15:45.629 --> 00:15:49.109 The lower the upper bounds I talked about are, in fact. 94 00:15:50.369 --> 00:15:57.958 Basically, this range. 95 00:15:57.958 --> 00:16:02.219 So, basically, this is what I see from my measurements. 96 00:16:02.219 --> 00:16:05.219 And this is the actual kind of deviation I get. 97 00:16:05.219 --> 00:16:10.469 Now, the way the, um, you know, here on the left hand side. 98 00:16:10.469 --> 00:16:15.778 I have a cue function, which is under my head. Right? So, let me rewrite the slide. 99 00:16:15.778 --> 00:16:26.489 So this is like, I have my head if I could just say, interplay it a little since I'm not showing all the previous intervals. 100 00:16:26.489 --> 00:16:33.629 All the previous videos, so what's happening up here is so Z is the. 101 00:16:33.629 --> 00:16:36.778 Standard deviation of the whole population. 102 00:16:36.778 --> 00:16:42.749 And Z over square root event will be the standard deviation. 103 00:16:42.749 --> 00:16:46.979 Of a sample of the standard deviation of the mean. 104 00:16:46.979 --> 00:16:53.519 Of the sample of size and that's why they that's how we get Sigma divided by square root event. 105 00:16:53.519 --> 00:16:57.149 And Z is a, um. 106 00:16:57.413 --> 00:17:11.814 Is a calcium distribution and so Z is a random variable for Gaussian distribution. It's a value of a random variable. So what we're saying is the probability that this normal random variable is in a certain range. 107 00:17:11.814 --> 00:17:13.163 So that's what's going on here. 108 00:17:13.409 --> 00:17:17.878 And Q, again is a probability of a tail. 109 00:17:17.878 --> 00:17:21.298 Of the calcium, right? So we rewrite the says, like. 110 00:17:21.298 --> 00:17:24.719 Just 1 thing I have the. 111 00:17:24.719 --> 00:17:30.449 Probably that my mean is in this range is equal to 1 minus to. 112 00:17:30.449 --> 00:17:34.469 Which I earlier said was 1 minus alpha and so here. 113 00:17:34.469 --> 00:17:39.419 Basically here, it's like, saying, my Alpha is to Z. 114 00:17:39.419 --> 00:17:43.528 Way of saying it is that my C is. 115 00:17:43.528 --> 00:17:48.388 Um, you know, kind of what I want to find is see some alpha over to. 116 00:17:48.388 --> 00:17:53.308 Such that 2 of this number gives me. 117 00:17:53.308 --> 00:17:59.729 Alpha over 2. okay. And luckily there are tables that come from the queue function. 118 00:17:59.729 --> 00:18:03.689 That give me exactly these numbers right? So if you're doing an exam or something like this. 119 00:18:03.689 --> 00:18:11.429 I would give you kind of like an inverse queue table that says, okay, if I want to say, for example, that alpha is equal to. 120 00:18:11.429 --> 00:18:16.648 Um, 0.05, then I would look for the number. 121 00:18:16.648 --> 00:18:19.709 C of 0.0 to 5. 122 00:18:19.709 --> 00:18:23.459 Which I find in my table is 1.96. 123 00:18:23.459 --> 00:18:30.838 That is Q, a 1.96 equals 0.0 to 5, which is the number that right? So, instead of having to kind of. 124 00:18:30.838 --> 00:18:41.818 Look at the work, your tables and find like these averages instead you can actually just kind of look at it from this inverse table. So, let me just do a quick example to show how you use this practice. Okay. 125 00:18:41.818 --> 00:18:46.949 So, here's the example. So, let's suppose that. 126 00:18:46.949 --> 00:18:50.098 Has unknown me. 127 00:18:52.798 --> 00:18:55.828 And variance equal to 1. okay. 128 00:18:55.828 --> 00:18:59.939 I measure X a 100 times. 129 00:19:01.229 --> 00:19:05.009 To obtain this empirical measurement. 130 00:19:05.009 --> 00:19:09.358 That the mean of those numbers was 5.25. 131 00:19:09.358 --> 00:19:14.519 And now I want to find 95%. 132 00:19:14.519 --> 00:19:18.898 Competence interval on people. 133 00:19:22.138 --> 00:19:26.338 Okay, well, by them, it's going to be. 134 00:19:26.338 --> 00:19:29.338 Basically. 135 00:19:29.338 --> 00:19:33.808 And then plus, I guess it's basically going to be. 136 00:19:33.808 --> 00:19:37.828 And then plus or minus the range. 137 00:19:37.828 --> 00:19:42.659 Um, you know, seize Sigma over square to them. 138 00:19:42.659 --> 00:19:46.588 And now, what do I know? Right I know that this is like. 139 00:19:46.588 --> 00:19:50.159 Um, 5.25, plus or minus. 140 00:19:50.159 --> 00:19:55.078 You know, what is bias so, my thought it was that conference that means that. 141 00:19:55.078 --> 00:20:00.088 Again, I need exactly the number that I happen to derive on the previous page. Right? My see. 142 00:20:00.088 --> 00:20:03.898 Is going to be I need to see of. 143 00:20:03.898 --> 00:20:07.739 0.025 right which is 1.96. 144 00:20:07.739 --> 00:20:11.878 So, it's going to be 1.96 my cigarettes equal to 1. 145 00:20:11.878 --> 00:20:14.909 My squared of that is going to be equal to squared of a 100. 146 00:20:14.909 --> 00:20:19.108 So my confidence interval is basically 5.25. 147 00:20:19.108 --> 00:20:23.578 Minus whatever this number is, which is, like, basically almost point 2, right? 148 00:20:23.578 --> 00:20:29.128 Yeah, 2 and a 5.25 plus. 149 00:20:29.128 --> 00:20:32.249 For a month, which is equal to. 150 00:20:32.249 --> 00:20:35.638 5.055.45. 151 00:20:35.638 --> 00:20:39.689 Right. So this will be my 95% confidence interval on the mean. 152 00:20:39.689 --> 00:20:43.919 Given the values that I saw right now. 153 00:20:43.919 --> 00:20:51.269 It's kinda weird because in practice, it's like, saying, well, under what circumstances what I know of the variance, but I wouldn't know the meeting. Right. That's kind of set up here. 154 00:20:51.269 --> 00:21:02.519 In that case, when I don't know if the variance or the mean, when you use different tables right? So instead of using your table, you use a different table. And if you ever done a problem that involves like, the distribution. 155 00:21:02.519 --> 00:21:08.189 That's where that kind of thing comes from. You have tables for distribution tables for Chi square distribution. 156 00:21:08.189 --> 00:21:13.348 Things like, you know, depending on the type of interval you want to find and the information that, you know. 157 00:21:13.348 --> 00:21:19.979 You use different numerical tables and so I may talk a little bit in the future video about something like the tea distribution, but for the moment. 158 00:21:19.979 --> 00:21:23.038 Probability I think that this is like the, we're. 159 00:21:23.038 --> 00:21:26.128 Directly from the central limit here kind of problem. Your likelies. 160 00:21:29.128 --> 00:21:34.199 Okay, any questions about that. 161 00:21:35.909 --> 00:21:39.239 No confidence intervals. Okay. 162 00:21:39.239 --> 00:21:48.509 Talk about some of these estimation techniques, even though I talked about them in class somewhat, but I think there are reasonably important topic. So let's. 163 00:21:48.509 --> 00:21:51.538 See, professor Rad, Keyes, take on these. 164 00:22:03.959 --> 00:22:08.398 So, now I wanna talk about are what are called estimation of problems and the set up is pretty simple. 165 00:22:08.398 --> 00:22:12.058 So, I'm giving a value. 166 00:22:12.058 --> 00:22:16.469 Of a random variable why that I observed. 167 00:22:16.469 --> 00:22:20.878 I want to find the best estimates. 168 00:22:24.058 --> 00:22:28.709 Of some corresponding, or some correlated random variable. 169 00:22:28.709 --> 00:22:32.548 Okay, and these guys of problems are. 170 00:22:32.548 --> 00:22:38.429 Uh, kind of pervasive in engineering and science in general basically, is I have some sort of underlying. 171 00:22:38.429 --> 00:22:42.419 Uh, ran the variable that gets put through some sort of a. 172 00:22:42.419 --> 00:22:49.138 Process or a system in electrical engineering a lot of time you see this in the context of the communications channel right? X. 173 00:22:49.138 --> 00:22:55.048 Is the bit that goes into the channel it gets maybe noise stuff and what comes out is some random, random variable. Why? 174 00:22:55.048 --> 00:22:59.278 It is no longer 1 or 0, but it sends a continuous number, for example and so. 175 00:22:59.278 --> 00:23:03.118 Then, what I want to do is I want to process what I saw with. 176 00:23:03.118 --> 00:23:07.019 What I would call an Estimator which produces for me. 177 00:23:07.019 --> 00:23:10.618 Another number X had, which is an estimate. 178 00:23:10.618 --> 00:23:16.469 That I get by applying some deterministic function to why. Okay so I see why. 179 00:23:16.469 --> 00:23:19.558 I do something to it and now I have the best estimate of excellence. I can make. 180 00:23:19.558 --> 00:23:29.098 And so this kinds of best varies according to different definitions of optimal. Right? So the 1, I want to talk about in this lesson, and we'll go on to other ones in. 181 00:23:29.098 --> 00:23:33.088 The next ones 1st of all I talk about is called map estimation. 182 00:23:34.858 --> 00:23:39.449 And this stands for maximum. 183 00:23:42.778 --> 00:23:46.648 So, the philosophy behind estimation is simple. It basically says. 184 00:23:46.648 --> 00:23:52.679 Given the value of why. 185 00:23:52.679 --> 00:23:56.818 What value of X is most likely to have occurred? 186 00:23:59.909 --> 00:24:03.628 Okay, and saying this in a different way. 187 00:24:03.628 --> 00:24:07.709 Is we're gonna use the conditional probability that we kind of talked about in a while back? 188 00:24:07.709 --> 00:24:10.709 The estimate I'm going to get is a function. 189 00:24:10.709 --> 00:24:15.898 Of why sub map and that is going to be the X. 190 00:24:16.949 --> 00:24:21.179 That gives me the maximum conditional probability. 191 00:24:23.278 --> 00:24:33.088 Executing 1, right I tell you why it makes sense that what I want to do is find the corresponding lead maximizing X, given that I know why it happened. Right? And so draw a picture. 192 00:24:33.088 --> 00:24:38.278 Um, this is just like, saying, I give you why. 193 00:24:38.278 --> 00:24:42.538 That means that I produce, then a new. 194 00:24:42.538 --> 00:24:46.469 On X, which is some. 195 00:24:46.469 --> 00:24:50.429 Your PDF and the map estimate is basically. 196 00:24:50.429 --> 00:24:53.818 The value of X that maximizes this. 197 00:24:53.818 --> 00:24:59.489 This is going to be my g and and this is, um. 198 00:24:59.489 --> 00:25:04.558 It works, so, let me to make this country go back to our client, for example, right? 199 00:25:04.558 --> 00:25:08.939 So, I have this example that I used several times before. 200 00:25:08.939 --> 00:25:16.919 The idea is, I flip a coin 3 times, and I'd let X me the number of heads and why be the position of the 1st 10. okay. And so you can see here. 201 00:25:16.919 --> 00:25:21.838 That basically, I have 3 or 4 possibly audience with this. 202 00:25:21.838 --> 00:25:31.199 And then I have 4 most valuable why? And these are the joint probabilities in the table. And then these are the marginals that I get from it. 203 00:25:31.199 --> 00:25:39.179 Adding down the columns, adding across the roads. Right? So, hopefully this is a familiar example if you watch the previous lessons. Okay. So now I want to know. 204 00:25:39.179 --> 00:25:44.098 The conditional probability X given why what that says is, okay, I give you why. 205 00:25:44.098 --> 00:25:54.778 And then what I should do is I should normalize along a given column to produce a new PDF. That's an excellent right so, if I have to do that, this example, that'd be like, saying, okay. 206 00:25:54.778 --> 00:25:58.979 Here are the columns and then I would say, okay, so. 207 00:25:58.979 --> 00:26:04.439 Now that I've normalized 1 of these columns, I choose the value of X that maximizes the corresponding. 208 00:26:04.439 --> 00:26:11.338 Conditional piano, and so in this case, I can kind of read them off just by looking at them. So the maximum. 209 00:26:11.338 --> 00:26:14.999 In this column is this the next column this column is this. 210 00:26:14.999 --> 00:26:19.798 Maximum this call is this the maximum here is kind of a toss up. It doesn't really matter. I can choose. 211 00:26:19.798 --> 00:26:22.919 Either 1 of these things so if I want to be. 212 00:26:22.919 --> 00:26:26.669 I agree what I would say would be that the. 213 00:26:26.669 --> 00:26:30.118 Estimate of why. 214 00:26:31.318 --> 00:26:35.189 Depends on what why is right? So why go 0 Michael's 1. 215 00:26:35.189 --> 00:26:41.818 Michael's to Michael's 3 if I look back at this, it'd be like, saying, well, why was equal to 0 I should choose X equals. 216 00:26:41.818 --> 00:26:44.999 Why was it called the 1? I should choose X equals too. And so on. 217 00:26:44.999 --> 00:26:52.679 So, let me just write down what we learn from the previous picture here. Basically I could choose either 1 or 2. it doesn't really matter. So I can just say, okay. 218 00:26:52.679 --> 00:26:56.459 Of the coin between 1 or 2, and this would be 1 that's. 219 00:26:56.459 --> 00:26:59.489 And then I can get get the probability of. 220 00:26:59.489 --> 00:27:04.318 Making a mistake with this estimate, right? Because clearly. 221 00:27:04.318 --> 00:27:07.528 You know, um, go back to this, so it. 222 00:27:07.528 --> 00:27:12.028 Y, a. 0, or why is 3 that I'm never gonna make a mistake cause there's only 1 possible value of X. 223 00:27:12.028 --> 00:27:16.199 If why is 1 or 2 that I have some non 0 probability of making a mistake, right? 224 00:27:16.199 --> 00:27:20.519 So, I can give you what that is and I can use the logical probability. It's basically. 225 00:27:20.519 --> 00:27:24.269 The sum of the probability of error. 226 00:27:24.269 --> 00:27:27.749 Given that why is equal to K. 227 00:27:27.749 --> 00:27:31.288 Times the underlying probably like, why is equal to K. 228 00:27:31.288 --> 00:27:34.888 Which I can get from the marginal that I already know. Right. 229 00:27:34.888 --> 00:27:38.249 And so, in this case, that's like, saying, well, I have. 230 00:27:38.249 --> 00:27:44.338 Um, 0, probability of error if I'm in the 0 case. 231 00:27:44.338 --> 00:27:47.669 I have 1, half probability of error. 232 00:27:47.669 --> 00:27:51.898 If I'm in the Y, equals 1 case and that had probability why equals 1. 233 00:27:51.898 --> 00:27:57.689 I'm in a I have basically 1 half probably error the Bibles to case. 234 00:27:57.689 --> 00:28:01.499 Which I probably quarter and I have 0 probability there. 235 00:28:01.499 --> 00:28:05.608 Different to the wide girls, 3 kits so I can add this up and I get 0. 0. 236 00:28:05.608 --> 00:28:10.499 1 quarter and 18, so the total probably an error here. 237 00:28:10.499 --> 00:28:13.648 Is 3 it's okay, so. 238 00:28:13.648 --> 00:28:19.169 That's the math estimate in this discreet case to do a continuous example what? I could say something like. 239 00:28:19.169 --> 00:28:29.009 Remember the jointly calcium random variables, right? So if I have. 240 00:28:29.009 --> 00:28:32.429 0 means and. 241 00:28:32.429 --> 00:28:36.358 Standard deviation is equal to 1 and some. 242 00:28:36.358 --> 00:28:42.358 Non 0, correlation coefficient then what we found out was that f of X given why. 243 00:28:42.358 --> 00:28:49.019 Was Gaussian with me and. 244 00:28:51.689 --> 00:28:57.239 Row Y, and standard deviation. Okay. 245 00:28:57.239 --> 00:29:01.439 So, barely carbon segregation. All I care about is that I've got a calcium. 246 00:29:01.439 --> 00:29:06.298 And the center of the calcium is at row wise that would tell me. 247 00:29:06.298 --> 00:29:10.679 That if I tell you, why, then Ro, why is the. 248 00:29:13.828 --> 00:29:19.888 Location on the X axis that maximizes the conditional probability. 249 00:29:19.888 --> 00:29:23.338 Right. So that's pretty easy to do just from looking at. 250 00:29:23.338 --> 00:29:26.398 The problem with all this, is that a lot of times we don't know. 251 00:29:26.398 --> 00:29:32.818 X given why right? It's more likely that we know why get an X. right? So, let me just say for a 2nd. 252 00:29:32.818 --> 00:29:36.328 That, uh, that's why we have other ways of doing estimation. So. 253 00:29:36.328 --> 00:29:40.378 We may not know or be able to obtain. 254 00:29:40.378 --> 00:29:43.439 A good estimate of this. 255 00:29:43.439 --> 00:29:46.439 Class conditional probability. 256 00:29:46.439 --> 00:29:53.608 Usually, it's the other way around. Usually what we have is f of. 257 00:29:53.608 --> 00:30:00.749 Why give an X right? And that's true because for example, let's suppose I have my communications channel. What I could do is I could just. 258 00:30:00.749 --> 00:30:04.409 Put a whole bunch of X's through the channel. 259 00:30:04.409 --> 00:30:10.979 And then look at the wise that come out and I know. Okay, if excellent. And this is the why I saw. So, this is really what I'm getting. 260 00:30:10.979 --> 00:30:15.328 By doing this experiment where I have access to the physical system, right? 261 00:30:15.328 --> 00:30:19.288 Now, even if I had this, what I could do is, I could say, well, okay, I could get. 262 00:30:19.288 --> 00:30:23.848 This by doing bazel, right? So I know. 263 00:30:23.848 --> 00:30:27.449 That this is true, right? 264 00:30:27.449 --> 00:30:31.828 This is basically days earlier if I move this over, I can get the joints. So this is kind of. 265 00:30:31.828 --> 00:30:38.159 Something where I could say, okay, well, I could observe this and I could. 266 00:30:38.159 --> 00:30:41.249 You know, uh, measure this also. 267 00:30:41.249 --> 00:30:44.519 At the end point, and this is okay if I know. 268 00:30:44.519 --> 00:30:47.939 During my testing process. 269 00:30:47.939 --> 00:30:51.959 All right, so, I mean, when I'm allowed to touch the channel. 270 00:30:51.959 --> 00:30:55.769 I know what the distribution exits also, but. 271 00:30:55.769 --> 00:30:59.578 The problem is that in the real world. 272 00:30:59.578 --> 00:31:05.459 I don't really know what the distribution exits if I did, I wouldn't really have this problem. Right? So. 273 00:31:05.459 --> 00:31:09.929 Uh, this is the tricky part is that, you know, in the real world, I don't know what the distribution of the. 274 00:31:09.929 --> 00:31:15.179 Data is all I do, is I see how the system is acted on this underlying true data. 275 00:31:15.179 --> 00:31:22.709 And so we're going to talk next lesson is called maximum, like the estimation which kind of gets around this problem when you don't have a good estimate of the distribution. 276 00:31:23.729 --> 00:31:27.058 Silence. 277 00:31:27.058 --> 00:31:33.929 Questions okay, we'll see another 1 maximum likelihood. 278 00:31:33.929 --> 00:31:38.249 You don't know. 279 00:31:38.249 --> 00:31:43.469 Prior. 280 00:31:45.058 --> 00:31:53.608 So last time we talked about this idea of. 281 00:31:53.608 --> 00:32:00.118 Estimating a value of X given observed value why? And we talked about the maximum or map estimates. 282 00:32:00.118 --> 00:32:04.499 So, remember that the map estimate. 283 00:32:07.558 --> 00:32:12.209 Was the value of X that maximized. 284 00:32:12.209 --> 00:32:17.308 The conditional probability of skipping 1. 285 00:32:17.308 --> 00:32:20.548 Okay, and that's really in theory what we want. Right? 286 00:32:20.548 --> 00:32:24.179 Uh, I want to know given the full joint distribution X and Y. 287 00:32:24.179 --> 00:32:29.699 What is the best X I can choose now? In the real world? 288 00:32:29.699 --> 00:32:32.939 We talked at the end of the last lesson about, um. 289 00:32:32.939 --> 00:32:40.138 Hard to get this executed. Why? So, what I could do instead would be to do what's called. 290 00:32:40.138 --> 00:32:50.969 The likelihood, or and sometimes you see this called. 291 00:32:50.969 --> 00:32:54.179 Estimation and that's what I should do. 292 00:32:56.909 --> 00:33:00.598 That's like saying, maximize. 293 00:33:00.598 --> 00:33:04.259 The X using the other conditional problem. Okay. 294 00:33:04.259 --> 00:33:09.479 And this is something that is usually easier to get in the real world. Okay. 295 00:33:09.479 --> 00:33:16.288 So, let me just write down the idea behind maximum likely estimation. 296 00:33:16.288 --> 00:33:20.489 Just saying given the value of why. 297 00:33:21.689 --> 00:33:27.719 In the value of why what value of X. 298 00:33:31.108 --> 00:33:34.858 Is most likely to have produced. 299 00:33:36.239 --> 00:33:41.249 And I know this is a subtle difference between this and. 300 00:33:41.249 --> 00:33:44.818 Map estimation. 301 00:33:44.818 --> 00:33:49.169 The distinction is a little bit tricky, right? And the distinction lies in the fact that. 302 00:33:49.169 --> 00:33:52.679 Um, for example, suppose that when. 303 00:33:52.679 --> 00:33:56.489 X equals too. I have super high probability that Y equals 5. 304 00:33:56.489 --> 00:34:02.578 Right. And so I might clue with estimation that X equals to is the best thing to choose. Right? 305 00:34:02.578 --> 00:34:07.679 But it could be that under the hood X equals too, is a very unlikely thing to happen, right? 306 00:34:07.679 --> 00:34:12.208 The difference between and map is the map is using the underlying. 307 00:34:12.208 --> 00:34:17.608 Pdf the underlying probabilities of X to find the best decision, whereas was kind of. 308 00:34:17.608 --> 00:34:23.489 Assuming that we don't know anything about the prior probabilities of X. so sometimes you hear this called. 309 00:34:23.489 --> 00:34:26.909 And uninformative. 310 00:34:26.909 --> 00:34:31.318 Prior or you're agnostic about. 311 00:34:31.318 --> 00:34:35.159 Um, you know, the prior, right? And so. 312 00:34:35.159 --> 00:34:41.159 Let's go back to the examples that we did in the previous lesson on map to see how things change and we've got right. 313 00:34:41.159 --> 00:34:45.358 So, here again is this PDF uh, I guess it's actually a joint PM. 314 00:34:45.358 --> 00:34:49.349 Where I flip the coin 3 times the number of heads, why is the position in the 1st, 10. 315 00:34:49.349 --> 00:34:53.728 This is the joint PDF, right? So, now what I want to do is I want to compute. 316 00:34:53.728 --> 00:34:57.478 The conditional propabilities why give the next. 317 00:34:57.478 --> 00:35:01.768 So that's like, saying, fix X. each of the com, each of the rows. 318 00:35:01.768 --> 00:35:06.838 And then normalize each row so that every row sums to 1, right? 319 00:35:06.838 --> 00:35:10.438 And so, let me just do that on a separate piece of paper. That's like, saying. 320 00:35:10.438 --> 00:35:13.918 That why given X looks like. 321 00:35:13.918 --> 00:35:17.668 Why is 0 1 2 3. 322 00:35:17.668 --> 00:35:25.498 X is 0123 so now if I normalize each of these rows and the 1st row is like 1, 0 0. 0. 323 00:35:25.498 --> 00:35:31.378 The next row is 04th 4th 4th and so on, right? 324 00:35:34.619 --> 00:35:37.949 The extra 02 thirds. 4th. 325 00:35:37.949 --> 00:35:41.759 And the last row is this okay? 326 00:35:41.759 --> 00:35:48.869 And so now, what is maximum likelihood telling me you guys saying? I tell you why so now we're back in the world of. 327 00:35:48.869 --> 00:35:53.639 Fixing a value of why a column. 328 00:35:53.639 --> 00:35:59.668 And then finding the X that produces the largest value in that column. Right? So, here I have this. 329 00:35:59.668 --> 00:36:05.188 You are I have this here again. I have kind of a random toss up between these. 330 00:36:05.188 --> 00:36:08.878 And here I have this, I'm just choosing the value of X that gives me. 331 00:36:08.878 --> 00:36:13.438 The biggest value in the column and so now I can write down. What is the. 332 00:36:13.438 --> 00:36:16.469 And now estimate of why. 333 00:36:16.469 --> 00:36:23.099 Well, I'm just going to go back to this. It's 0 Here it's 3 here 1 or 2 and then 1. 334 00:36:23.099 --> 00:36:27.208 So it's 0, if Y equals 0. 335 00:36:27.208 --> 00:36:33.869 It's 31, it's, you know, flip a coin in 1 or 2. if Michael's 2. 336 00:36:33.869 --> 00:36:37.079 And it's 1, if why goes through. 337 00:36:37.079 --> 00:36:41.998 So, let's go back and compare it to the map estimate that we got from the previous lesson. 338 00:36:41.998 --> 00:36:47.039 I can see that. I got a different number. Right? So, here in particular Y equals 1. 339 00:36:47.039 --> 00:36:51.898 You're in the app and the math case that has to be 2 in the mxa likely case estimate. 340 00:36:51.898 --> 00:36:55.708 Right. So let's compute the probably making a mistake. 341 00:36:55.708 --> 00:36:58.949 Using right. 342 00:36:58.949 --> 00:37:02.278 So, again, I have to go back to my underlying, um. 343 00:37:02.278 --> 00:37:05.938 Probability is so I have 0 probability or. 344 00:37:05.938 --> 00:37:09.028 Yes, I am in the Y, goals 1 case. 345 00:37:09.028 --> 00:37:12.599 I have 3 quarter probability of error. 346 00:37:12.599 --> 00:37:15.809 If I'm in the Bibles 1 case. 347 00:37:15.809 --> 00:37:18.929 Why is that? Right? So I go back and I look at my, um. 348 00:37:18.929 --> 00:37:22.858 Dream it's like, saying, well, I'm choosing this. 349 00:37:22.858 --> 00:37:30.568 Right but there's 3 quarters probably left over here. If I rebalance this column to equal to 1. right? It's like, saying that this. 350 00:37:30.568 --> 00:37:33.599 The probability of X equals through is actually pretty. 351 00:37:33.599 --> 00:37:39.478 Be unlikely in the joint case, it's only by normalizing that. I kind of made this look so great. But in fact. 352 00:37:39.478 --> 00:37:44.759 The value too as much more likely to value 3. right? So, that's why I have a higher error for why equals 1. 353 00:37:44.759 --> 00:37:48.509 And I have a similar error to before when I'm in the 2 case. 354 00:37:48.509 --> 00:37:53.338 And I'm not going to make an error if I'm in the 3 days. So now I add up these numbers I get. 355 00:37:53.338 --> 00:37:56.639 3, plus, 18 plus 0, so I got 1 half. 356 00:37:56.639 --> 00:38:02.849 And so my probability of error is higher for the maximum likelihood that is for the estimate. 357 00:38:02.849 --> 00:38:08.068 And that's because I didn't take into account the underlying probabilities of X. okay. 358 00:38:08.068 --> 00:38:14.250 To go back to the other situation that we had, which was basically a continuous case, right? 359 00:38:14.250 --> 00:38:18.269 So, in a calcium case, again, if we have a joint calcium. 360 00:38:20.070 --> 00:38:23.760 The same 1 from before 0 mean. 361 00:38:25.019 --> 00:38:29.789 Sickness equal to 1, and then some color correlation. Coefficient. 362 00:38:29.789 --> 00:38:34.619 So, I know that the marginal here. 363 00:38:34.619 --> 00:38:39.780 Is calcium with me in. 364 00:38:41.639 --> 00:38:46.199 Row X and variance. 365 00:38:47.940 --> 00:38:51.389 Um, Square. 366 00:38:51.389 --> 00:38:54.690 Right and so that is my. 367 00:38:54.690 --> 00:38:58.500 Joint PM. 368 00:38:58.500 --> 00:39:02.610 Looks like this. 369 00:39:05.429 --> 00:39:12.480 Just writing in definition. 370 00:39:12.480 --> 00:39:16.860 And now, unlike before now, this is like, saying, okay. 371 00:39:16.860 --> 00:39:20.849 I need to find the the X that maximizes this. What is that? 372 00:39:20.849 --> 00:39:26.070 This I can't just necessarily read off of it. Right I have to actually take the derivative with respect to X. so. 373 00:39:26.070 --> 00:39:29.610 As tech ddx of this. 374 00:39:29.610 --> 00:39:37.739 And said equal to 0. what's that going to be? Well, I mean, it's going to be a little messy, but the operative thing is what happens when I take the derivative up here. 375 00:39:37.739 --> 00:39:40.860 It's going to be, um, to. 376 00:39:40.860 --> 00:39:44.730 Y, minus Ro, X times a minus row. 377 00:39:44.730 --> 00:39:52.469 Times either or whatever right. Doesn't really matter because I know somebody goes 0, the way that that gets set equals 0, is if I choose X to be. 378 00:39:52.469 --> 00:39:56.550 1, over a row times. Y, right this is the. 379 00:39:56.550 --> 00:40:00.360 Basically, maximum likelihood estimate. 380 00:40:00.360 --> 00:40:03.449 Of why this is like my ex hat, right? 381 00:40:03.449 --> 00:40:08.909 So this is definitely not the same as the math estimate. The map estimate was Ro, why? So. 382 00:40:08.909 --> 00:40:12.929 Only, when the 2 variables are basically. 383 00:40:12.929 --> 00:40:16.980 Uh, you know, Roy equals 1, which means they're fundamentally. 384 00:40:16.980 --> 00:40:22.199 So, we had to most there at all then what I have the same as otherwise. I'm gonna get different estimates. So. 385 00:40:22.199 --> 00:40:25.949 So, if I want you to take away anything from this, it's basically that. 386 00:40:25.949 --> 00:40:29.130 You know, you should always try to do the map estimate. 387 00:40:29.130 --> 00:40:35.639 Yeah, but in cases where you don't know what the Pryor is, you can do the maximum likely estimate with the understanding that your error. 388 00:40:35.639 --> 00:40:39.869 Will be more than it could be. If you had known the underlying distribution of X. 389 00:40:46.050 --> 00:40:57.630 Yeah, let's do this 1. so what next. 390 00:40:57.630 --> 00:41:02.909 And maximum a estimation and detection. 391 00:41:02.909 --> 00:41:06.929 Here, so let's. 392 00:41:06.929 --> 00:41:20.550 My equals, no minimum means square 67. 393 00:41:24.989 --> 00:41:28.230 1, more Estimator were doing estimators here. 394 00:41:37.980 --> 00:41:44.909 So, I wanted to use 1 more concept, relay destination and this kind of relates to a different concept of what you mean by the best estimate. 395 00:41:44.909 --> 00:41:49.380 Okay, so this is called minimum means Square. 396 00:41:51.329 --> 00:41:55.920 Estimation means square estimation. 397 00:41:59.489 --> 00:42:02.639 And you see this abbreviated M. M. S. 398 00:42:03.719 --> 00:42:09.329 And the philosophy here is that I have an error, which is the, especially the value. 399 00:42:09.329 --> 00:42:12.780 Of the estimate I make minus. 400 00:42:12.780 --> 00:42:17.460 The true value of X and we're square that so they're always going to get a positive number. 401 00:42:17.460 --> 00:42:21.119 So, I want to minimize deviations of. 402 00:42:21.119 --> 00:42:24.780 The estimate from the true value. 403 00:42:24.780 --> 00:42:28.139 Over all the possible functions. G. 404 00:42:28.139 --> 00:42:37.139 That I could use to estimate, I'm maximizing over the class of functions. G, right. Which is kind of a, an unusual thing to do. Right? 405 00:42:37.139 --> 00:42:41.400 So, let me just start with an easier problem. Right? So suppose that. 406 00:42:41.400 --> 00:42:45.360 All that I would give you would be a constant function I said, okay. 407 00:42:45.360 --> 00:42:49.590 You have to choose a constant and that constant has to do the best possible job of. 408 00:42:49.590 --> 00:42:53.730 Um, getting close to X, right? So what is the constant. 409 00:42:55.440 --> 00:43:00.690 See. 410 00:43:00.690 --> 00:43:06.960 Minimizes this mean square. 411 00:43:10.199 --> 00:43:13.739 Well, that's like, saying I want to minimize in this case. 412 00:43:13.739 --> 00:43:19.710 It's easy to do the meditation problem, because I'm not minimizing over class of functions here. I'm just minimizing over. 413 00:43:19.710 --> 00:43:23.159 The constant function, right? So this, I can do this is like, saying. 414 00:43:23.159 --> 00:43:27.300 Find me the best to see to minimize. That's right. 415 00:43:27.300 --> 00:43:31.260 Which is a value of C squared minus 2 X? 416 00:43:31.260 --> 00:43:37.829 Plus expert, and what is this? This is basically. 417 00:43:37.829 --> 00:43:42.960 Um, is is a custom, so basically, what I have is. 418 00:43:45.179 --> 00:43:52.079 So, if I take the derivative with respect to see inside equals 0, what I have is to see. 419 00:43:52.079 --> 00:43:55.199 Minus 2 index equals 0. 420 00:43:55.199 --> 00:44:01.590 And so if all I gave you was the constant, then it kind of makes sense that the best constant that I should choose is. 421 00:44:01.590 --> 00:44:05.519 The meeting is the expected value of X. okay. So. 422 00:44:05.519 --> 00:44:08.639 Makes sense right so, let's just kick it up and and say. 423 00:44:08.639 --> 00:44:13.440 Now, consider what we want to do. 424 00:44:13.440 --> 00:44:17.670 Which is the, especially a value of. 425 00:44:17.670 --> 00:44:21.570 Some function of why minus X squared. 426 00:44:21.570 --> 00:44:24.989 Okay, now here's where I'm going to do a little trick. 427 00:44:24.989 --> 00:44:28.590 And again, this is like an expectation over X, right? 428 00:44:28.590 --> 00:44:32.460 So, what I could do is I could do with the law that expectation. 429 00:44:34.679 --> 00:44:42.239 That we talked about a while back to say, okay. 430 00:44:42.239 --> 00:44:46.079 Well, this is actually the same as the expected value. 431 00:44:46.079 --> 00:44:49.829 Of the expected value of. 432 00:44:49.829 --> 00:44:53.280 Why minus X squared. 433 00:44:53.280 --> 00:44:56.699 Give him that Y, equals what. 434 00:44:57.929 --> 00:45:01.260 I have enough parentheses here 1 more. 435 00:45:01.260 --> 00:45:05.730 So, again, this is basically like a expectation in. 436 00:45:05.730 --> 00:45:09.360 X, and this is like an expectation in why. 437 00:45:09.360 --> 00:45:14.070 Right. So now I can kind of rewrite this and say. 438 00:45:14.070 --> 00:45:17.550 I have expected value of. 439 00:45:17.550 --> 00:45:20.760 G of Y, minus. 440 00:45:20.760 --> 00:45:24.929 X squared given why goes why. 441 00:45:24.929 --> 00:45:29.400 Times this idea of. 442 00:45:29.400 --> 00:45:33.480 And it's like, saying that, okay for every why. 443 00:45:33.480 --> 00:45:42.570 This is the thing that I want to multiply by the PDF to compute the value. Right? So if I want to minimize this whole thing, right? So I want to minimize this. 444 00:45:42.570 --> 00:45:46.920 So, 1 thing that I could do to make this certainly minimize is to say. 445 00:45:46.920 --> 00:45:51.000 If I could minimize this value for every value, why. 446 00:45:51.000 --> 00:45:56.340 Then, for sure, when I integrate against this, I would have the new value for the whole interval. Right? 447 00:45:56.340 --> 00:46:02.400 So, it is minimize this for each. 448 00:46:02.400 --> 00:46:06.960 Value of why okay and. 449 00:46:06.960 --> 00:46:11.190 Now, the kind of key insight is that if Y is fixed, right? 450 00:46:11.190 --> 00:46:14.280 Then, why is just some number? It's a constant right? 451 00:46:14.280 --> 00:46:20.250 And so I just figured out from my previous slide, what should that be? Should be these negative value, right? 452 00:46:20.250 --> 00:46:25.079 So, kind of insight is that the thing that I should do. 453 00:46:25.079 --> 00:46:32.010 To get the best, or is take the expected value of X given. 454 00:46:32.010 --> 00:46:36.059 That Y equals why right? It's basically the conditional mean of X. 455 00:46:36.059 --> 00:46:42.929 Given the value of why that I saw and so that minimizes the here. So a different way of saying this, which is basically. 456 00:46:42.929 --> 00:46:54.750 Not that different. Is this right? So this is a function that depends on why it produces a number and that number is the expected value of X given why. 457 00:46:54.750 --> 00:47:00.059 So 1 issue is that this function is likely to be. 458 00:47:00.059 --> 00:47:04.380 A mess actually compute in the real world, right? Because this is going to require me to. 459 00:47:04.380 --> 00:47:11.010 Do some sort of a crummy integral, and we already learned for doing 2nd values and often these intervals are hard to do by hand, right? 460 00:47:11.010 --> 00:47:14.309 So, next time is going to be making an approximation to this. 461 00:47:14.309 --> 00:47:17.519 That is more attractable to do kind of in the real world. 462 00:47:20.369 --> 00:47:23.699 Okay, so. 463 00:47:26.159 --> 00:47:29.639 So now just to. 464 00:47:29.639 --> 00:47:32.909 Expand your consciousness a little here. 465 00:47:32.909 --> 00:47:37.500 I'd like to show you some counterintuitive things. 466 00:47:37.500 --> 00:47:43.619 And statistics and. 467 00:47:44.005 --> 00:47:58.585 1st, 1 is, it's suppose we just look at average income in a country and say, we've got to have the country, the East and the West and for the whole country, it is possible that the average income for the country as a whole. 468 00:47:58.860 --> 00:48:07.829 Might rise faster than the average, and the rate of increase for the average income in either half of the country. 469 00:48:07.829 --> 00:48:14.250 So perhaps either half the average income is increasing at 1%. 470 00:48:14.250 --> 00:48:17.820 In the country as a whole, it might increase it 1 and a half percent. 471 00:48:18.929 --> 00:48:22.980 Really can happen and I've just got a simple example here to show you that. 472 00:48:24.630 --> 00:48:31.829 Nice and simple. Country's got 2. par, teach part is 100 people countries a whole is 200 people. 473 00:48:31.829 --> 00:48:35.610 Each person makes that and the West makes 100 bucks a year. 474 00:48:35.610 --> 00:48:38.909 Each person he makes 200 bucks let's say. 475 00:48:38.909 --> 00:48:52.170 So, the total income in the West is 100 dollars times 100 people, it's 10,000 dollars and these 200 dollars times. 100 people is 20,000 dollars the whole country as domestic product or 30. 476 00:48:52.170 --> 00:49:01.829 30,000 dollars and the averages in the West is 100 these 200 in the whole country's 150 30,000 dollars divided by 200 people. 477 00:49:01.829 --> 00:49:12.030 Now, let's imagine that 1 person from the West, that's the poor half moves East. Let's say he's an average person. He moves East and gets an average job in the East. 478 00:49:12.030 --> 00:49:19.110 So, the average income of the West does not change, stays at 100 because an average person left it. 479 00:49:19.110 --> 00:49:22.980 The East, the average income does not change. 480 00:49:22.980 --> 00:49:30.510 Because the new person, the immigrant gets an average job so the West stays at 100. these stays at 200. 481 00:49:30.510 --> 00:49:34.650 However, the West I was 99 people least as 101. 482 00:49:34.650 --> 00:49:39.869 So, the total income in the West is 99 times. 100 East is 101 times 200. 483 00:49:39.869 --> 00:49:43.739 Total income for the country as a whole is 30,100. 484 00:49:43.739 --> 00:49:46.860 And the average is 150 50 the average went up. 485 00:49:46.860 --> 00:49:50.099 Faster than either app. 486 00:49:52.289 --> 00:49:57.929 There's another statistics example, which can surprise people. 487 00:49:57.929 --> 00:50:01.590 So, what happens here is that. 488 00:50:01.590 --> 00:50:06.210 I've got 2 majors for college and 2 colleges. 489 00:50:11.039 --> 00:50:18.989 So, I got 2 people or 2 people applying to college. Sorry 1, college, 2 groups of applicants. 490 00:50:18.989 --> 00:50:26.610 Albanian and Bostonians, let's say, and the universities got 2 majors, engineering and humanities. 491 00:50:26.610 --> 00:50:30.000 And here I've got to table of numbers, so. 492 00:50:30.000 --> 00:50:35.969 15 Albanian supply to engineering and 11 are accepted. 493 00:50:35.969 --> 00:50:40.349 5, Albanian supply to humanities, too are accepted. 494 00:50:40.349 --> 00:50:44.909 So, in total 20 Albanian supplied to college and 13 were accepted. 495 00:50:46.170 --> 00:50:53.039 5, Bostonians apply to engineering forward, accepted 15, apply to humanities 7 or accepted. 496 00:50:53.039 --> 00:50:58.260 In total also 20 Bostonians applied to college and 11 were accepted. 497 00:50:58.260 --> 00:51:05.610 So, now now, let's look at these now and you could sum down also there are. 498 00:51:06.659 --> 00:51:16.289 20 people applied to engineering if he accepted 20 applied to humanities 9 or except in total, 40 applied and 24 were accepted. 499 00:51:16.289 --> 00:51:24.960 Okay, now let's look at the table a little more detail. Let's look at engineering. 500 00:51:24.960 --> 00:51:31.380 1115 Albanian were accepted in 4 to 5. Bostonians were accepted. 501 00:51:31.380 --> 00:51:35.969 Bostonians had a higher chance of getting into engineering, so. 502 00:51:35.969 --> 00:51:42.329 80% of the Bostonian applicants to engineering or accepted, but less than 80 per cent. 503 00:51:42.329 --> 00:51:46.889 Of the Albanian, so for an engineering applicant, a Bostonian. 504 00:51:46.889 --> 00:51:51.719 Applicant to engineering at a higher chance of getting accepted in an abandoned applicant to engineering. 505 00:51:52.889 --> 00:51:59.730 If we look at humanities, same thing happens a Bostonian applicant to humanities. 506 00:51:59.730 --> 00:52:05.099 At a higher chance of getting accepted than me and then I'll banyan applicant to humanities. 507 00:52:05.099 --> 00:52:09.150 The Bostonian 715 suburbian, 615. 508 00:52:10.440 --> 00:52:15.150 So, it's interesting, you're thinking have a higher chance of getting into college. 509 00:52:15.150 --> 00:52:19.110 Then Albanian submitted the hire the. 510 00:52:19.110 --> 00:52:24.239 Bust on engineering applicants at a higher chance, the bus Tony humanities applicants that are higher chance. 511 00:52:24.239 --> 00:52:31.199 But now some across, and what have we got there were 20 was Tony, is applied to college and 11 got accepted. 512 00:52:31.199 --> 00:52:35.070 Funny Albanian applied and 13 got accepted. 513 00:52:35.070 --> 00:52:39.329 What's going on in each and every separate major. 514 00:52:39.329 --> 00:52:46.889 Bostonians at a higher chance of getting accepted, but aggregated together. Albanian had a higher chance of getting accepted. 515 00:52:46.889 --> 00:52:59.940 Totally counterintuitive and these sort of things might actually happen sometime. So I'll let you look at this table and see if you can Intuit what is going on. I mean, these are real. 516 00:52:59.940 --> 00:53:04.829 You can really happen. I didn't do a fake edition on you or something here. So. 517 00:53:04.829 --> 00:53:09.239 In fact, if anyone would like to, um. 518 00:53:11.579 --> 00:53:16.530 I, I'm opening the chat window on my 2nd computer if anyone would like to. 519 00:53:16.530 --> 00:53:21.960 Take a guess why or even on mute your Mike is going on here. 520 00:53:21.960 --> 00:53:28.079 You can think about it. I might ask Monday. So, my point is that statistics. 521 00:53:28.079 --> 00:53:32.039 Statistics, whatever can be really counter intuitive. 522 00:53:32.039 --> 00:53:35.369 This going to be irrelevant to public policy because. 523 00:53:35.369 --> 00:53:39.960 I think this actually happened in the University of California system sometimes and then. 524 00:53:39.960 --> 00:53:43.590 People think that there's some fakery in the numbers, but nope. 525 00:53:46.440 --> 00:53:49.889 Okay, there's other examples of that. 526 00:53:52.530 --> 00:53:57.239 And, okay, so now I want to talk about confidence intervals here so I'm keeping it. 527 00:53:57.239 --> 00:54:04.139 Nonpartisan and so voters in next election, going to vote, say, either party. 528 00:54:04.139 --> 00:54:08.880 And we have the that's the population of voters. 529 00:54:08.880 --> 00:54:12.239 And don't worry about likely voters, cetera, et cetera. 530 00:54:12.239 --> 00:54:18.210 Well, we want to find the mean the fractional vote Democratic let's say it doesn't matter which. 531 00:54:19.380 --> 00:54:24.840 And we take observations, so we may ask a 1000 voters. 532 00:54:24.840 --> 00:54:33.239 Potential voters, and let's assume it's that we've got a fair sample on bias, which is actually in the real world the hardest possible problem. 533 00:54:33.239 --> 00:54:38.070 So, 520 say they'll vote Democrat. The sample means 52. 534 00:54:38.070 --> 00:54:42.809 Percent and now we want an error bound on the actual. 535 00:54:42.809 --> 00:54:46.079 Mean of the population, so this is sort of like. 536 00:54:46.079 --> 00:54:51.780 But he's talking about and again, it's example, I keep bringing about bringing up so. 537 00:54:53.519 --> 00:54:59.309 So this is what I'm doing here is how to motivate what this confidence interval means. 538 00:54:59.309 --> 00:55:06.989 Probability so probabilities we toss a fair coin. 539 00:55:06.989 --> 00:55:13.170 4 times probably forehand just 116th. That's probably we go from the known. 540 00:55:13.170 --> 00:55:19.289 Populations parameters, half and half the time. Fair point. 541 00:55:19.289 --> 00:55:28.260 To an observation statistics we go from the observation backwards to try to learn something about the parameter. So. 542 00:55:29.579 --> 00:55:34.469 Like, I tasked an unknown coin a 100 times. I see 60 heads. 543 00:55:34.469 --> 00:55:39.929 So, what are the so, the best estimates. 544 00:55:39.929 --> 00:55:44.130 Talking about are you talking about estimate for the problem? 545 00:55:44.130 --> 00:55:49.110 So that the coin comes up head 60% of the time. But now, can we put an interval around that? 546 00:55:49.110 --> 00:55:54.389 And well, what's the probability maybe that the coin was actually fair. 547 00:55:56.099 --> 00:56:01.230 You know, we're trying to get evidence that there's a fraud going on perhaps or that. There's no fraud. So. 548 00:56:01.230 --> 00:56:05.940 And here, the thing I'm talking about in section 10 here. 549 00:56:05.940 --> 00:56:11.280 Is that there are different types of estimators than some are better than others depending on. 550 00:56:11.280 --> 00:56:14.460 Some other conditions of the problem. 551 00:56:14.460 --> 00:56:19.409 And I mentioned the students have been using that for the last 2 classes. 552 00:56:19.409 --> 00:56:24.960 Average height, so I take a sample of saved when talking about, say, 70 students. 553 00:56:24.960 --> 00:56:34.409 And there are many possible estimators I could get from that sample of and talking about the mean height of the 70 students. I could just take the mean of the highest to lowest. 554 00:56:34.409 --> 00:56:38.489 Why would I want to do that? It's let's work. I'm lazy. 555 00:56:38.489 --> 00:56:44.909 I could take the median or it could have the mean the whole sample, the 3 different estimators. 556 00:56:44.909 --> 00:56:49.829 And if the student's height is normally distributed, the meat of the whole sample is best. 557 00:56:49.829 --> 00:56:55.320 Because the student's height is not normally to serve to then maybe that's not the best. And maybe the medium. 558 00:56:55.320 --> 00:57:02.460 Might be better what I mean by better, is that the mean of the sample is the, the Estimator of the sample converges to the true mean. 559 00:57:02.460 --> 00:57:08.460 Faster, which converges fastest, and which is most sensitive to other things like that. The. 560 00:57:08.460 --> 00:57:17.460 Whole population is normal, and there's weird things about the Estimator might be biased. 561 00:57:17.460 --> 00:57:26.429 It's mean, might not be it might not be the mean of the sample, but it turns out to be good. Oh. 562 00:57:26.429 --> 00:57:30.690 I'll go light on that maybe. Well. 563 00:57:32.730 --> 00:57:36.059 1 example, okay. Let me this example here. Okay. 564 00:57:36.059 --> 00:57:40.170 We have uniform distribution. 565 00:57:40.170 --> 00:57:45.059 It's continuous 0 to be, we don't know the high we don't know. Be. 566 00:57:45.059 --> 00:57:49.440 And so we take a sample of maybe a 100. 567 00:57:49.440 --> 00:57:57.719 Random variables and a good Estimator for B is the maximum of those 100 samples. 568 00:57:57.719 --> 00:58:03.179 But the meaning of that maximum is that over and plus 1 B, it's biased. It's not. 569 00:58:03.179 --> 00:58:10.739 It converges to be, but it's some bias, but it may be better than any estimator. That does not converge. This gets subtle. 570 00:58:10.739 --> 00:58:16.590 1 tale probably, this is professor he was talking about, so this will just help you here. So. 571 00:58:19.230 --> 00:58:23.429 An example, from s, a. T scores and so on. 572 00:58:23.429 --> 00:58:27.960 And that you take 100. 573 00:58:27.960 --> 00:58:34.380 You look at 100 s a. T scores of the samples 550. we know segments 100. let's say. 574 00:58:34.380 --> 00:58:42.329 And so it's a true mean was 525 what's the probability that the population mean. 575 00:58:42.329 --> 00:58:46.289 Is greater than 550 and the way you could do that. 576 00:58:46.289 --> 00:58:49.409 Well, the population segment is a 100. 577 00:58:49.409 --> 00:58:53.789 The sample signal will be 100 divided by squared 100. we can. 578 00:58:53.789 --> 00:58:57.900 So so 550. 579 00:58:57.900 --> 00:59:01.559 Compared to 525, it's 25 more that's 2 and a half Sigma. 580 00:59:01.559 --> 00:59:05.730 Over new, so the probability. 581 00:59:05.730 --> 00:59:12.840 That the sample means over 550 is Q of 2 and a half so it's a probability to the right tale of distribution. 582 00:59:12.840 --> 00:59:18.480 From 2 and a half up to infinity and you look it up in a table point. 6%. So. 583 00:59:18.480 --> 00:59:21.570 And I wrote down here. 584 00:59:21.570 --> 00:59:24.780 In my point 15 what that means so. 585 00:59:28.050 --> 00:59:38.010 Okay, and 16 is an informal way of saying it, which captures the idea, but is technically wrong. 586 00:59:38.010 --> 00:59:44.219 So, and the 2 tail versus 1 K. 587 00:59:44.219 --> 00:59:47.670 Is what do you know about the types of errors. 588 00:59:49.440 --> 00:59:56.369 Okay, okay so that was. 589 00:59:56.369 --> 00:59:59.670 Some confidence in hypothesis testing. 590 00:59:59.670 --> 01:00:09.840 Is that oh, okay. I don't know vaccines have been in the news lately. And how do they test. 591 01:00:09.840 --> 01:00:14.159 I've been using us for lots of examples for let me. 592 01:00:15.179 --> 01:00:21.420 Talk about a drug or something like I know someone who is part of a blind test of 1 of the vaccines. So. 593 01:00:21.420 --> 01:00:35.880 She got, she was given the vaccine, she was given something that she didn't know if it was a vaccine or a placebo. Neither did the person administer so look at all the people and the people that say. 594 01:00:35.880 --> 01:00:40.619 About the placebo maybe who knows? 595 01:00:40.619 --> 01:00:48.329 10% of them may be got sick and the people who got the real vaccine, 5% of them got sick. 596 01:00:48.329 --> 01:00:54.960 But that might have happened just I'm not making these numbers up, but the point is, this might have happened just by chance. Okay. 597 01:00:54.960 --> 01:00:58.440 So, the hypothesis test is that. 598 01:00:58.440 --> 01:01:01.679 If the vaccine was doing nothing. 599 01:01:01.679 --> 01:01:10.170 What's the probability that this sample 5% of the people would get sick when the main number getting sick was 10% let's say. 600 01:01:10.170 --> 01:01:13.199 What's so this is hypothesis testing. 601 01:01:14.699 --> 01:01:19.800 Let me give you a real world again also. 602 01:01:19.800 --> 01:01:31.019 After she got the dose, she felt sick immediately so she knew she'd gotten the vaccine or not that not the placebo. Okay. Is it actually made people sick in the short term? 603 01:01:31.019 --> 01:01:35.309 Okay, but that's the real world contrast to this course. 604 01:01:35.309 --> 01:01:42.570 So so my example here is the students. 605 01:01:44.190 --> 01:01:47.369 So. 606 01:01:47.369 --> 01:01:55.170 Well, let's say that the students we assume there, 2 meters high. Obviously, this is I made up the numbers. It's not really. 607 01:01:55.170 --> 01:01:58.530 They're not really that tall on the average. 608 01:01:58.530 --> 01:02:03.539 But let's suppose a real population is. 609 01:02:05.400 --> 01:02:08.730 2 meters and then we measure. 610 01:02:08.730 --> 01:02:15.630 Well, when we measure 100 students, let's say, and the mean height of our samples 1.9 meters. 611 01:02:15.630 --> 01:02:23.159 Well, so we were guessing that the population was 2 meters high, but. 612 01:02:23.159 --> 01:02:27.750 If we measured a 100 students and got 1.9, instead of 2. 613 01:02:27.750 --> 01:02:33.690 Is our original idea of 2 meters valid? So maybe it's wrongs you want to. 614 01:02:33.690 --> 01:02:38.250 Test this hypothesis that the real population height was 2 meters. 615 01:02:38.250 --> 01:02:44.159 Now, the rate, the way this gets worded to make it a doable problem is that we say. 616 01:02:44.159 --> 01:02:49.679 If the true population height was 2 meters. What's the probability? 617 01:02:49.679 --> 01:02:55.650 That our sample of 100 had an observed height of 1, height of 1.9. so. 618 01:02:55.650 --> 01:03:01.079 So, the no hypothesis is that the true population height was really 2 meters. 619 01:03:01.079 --> 01:03:04.199 And we want the probability. 620 01:03:04.199 --> 01:03:09.239 That if the hypothesis was true that we observed 1.9. 621 01:03:11.730 --> 01:03:16.500 And the alternate hypothesis is that the 2. 622 01:03:16.500 --> 01:03:22.800 Mean height was not to Amir, so you have to think about this. You see. 623 01:03:22.800 --> 01:03:27.989 These definitions were chosen to make something we could actually compute with. 624 01:03:29.099 --> 01:03:32.760 Or, let me take you tossing a coin, let's say, and. 625 01:03:34.920 --> 01:03:40.889 If you toss a coin a 100 times and see 60 heads, 40 tails. 626 01:03:40.889 --> 01:03:47.369 Well, then if it was a fair coin, what's the probability that we would see something this far off. 627 01:03:47.369 --> 01:03:51.360 So now hypothesis it's a fair coin and. 628 01:03:51.360 --> 01:03:57.210 If it was fair, what's the probability that we saw? 60 heads? 629 01:03:57.210 --> 01:04:00.599 And the alternative hypothesis was, it's not a fair coin. 630 01:04:00.599 --> 01:04:06.210 Okay, now. 631 01:04:06.210 --> 01:04:10.769 And you can read my wording here on the but so. 632 01:04:10.769 --> 01:04:15.269 We can then get it's same math as a confidence interval, but. 633 01:04:15.269 --> 01:04:20.400 We get so we get these errors and then, like, if the. 634 01:04:21.809 --> 01:04:27.750 We get, you know, we might be wrong with our guests. Like, let me use the, the coin toss thing. 635 01:04:27.750 --> 01:04:32.039 Fair coin toss 100 times 60 heads. 636 01:04:32.039 --> 01:04:36.719 Let's just see what really is fair, but we see 60 heads. That's really unusual. 637 01:04:36.719 --> 01:04:40.199 Maybe we. 638 01:04:40.199 --> 01:04:45.449 Assume we say 60 heads of 100 that coin is biased. The call. 639 01:04:45.449 --> 01:04:52.650 But maybe the coin really is fair. So type 1 error would be when we falsely accused the coin. 640 01:04:52.650 --> 01:04:56.280 We fault, we reject all hypothesis. 641 01:04:56.280 --> 01:05:00.960 When, in fact, it's true then we get a type 2 error. 642 01:05:00.960 --> 01:05:12.119 Where maybe the coin had 52 heads 48 tails and we say, oh, that's close enough. It's really a fair coin, but perhaps the coin was not there. 643 01:05:12.119 --> 01:05:15.690 So type 2 error would be the coin was fake. 644 01:05:15.690 --> 01:05:20.969 And we falsely guessed that it was genuine was fair. That will be a type to error. 645 01:05:22.559 --> 01:05:27.449 And and you can compute days, so. 646 01:05:29.190 --> 01:05:37.260 And and these trade off, you know, you can design, you cut off okay you get whoever's making the rules for the. 647 01:05:37.260 --> 01:05:40.440 Experiment gets to make the rules for the cut off. 648 01:05:40.440 --> 01:05:46.860 And drug tests, you know, if you're too strict. 649 01:05:46.860 --> 01:05:50.400 You'll reject a new drug, which is actually doing some good. 650 01:05:50.400 --> 01:05:54.960 If you too lenient, you'll accept a new drug, which is actually useless. 651 01:05:56.190 --> 01:06:02.639 And the 1 versus 2 tailed distributions is that let's take the drug thing, for example. 652 01:06:02.639 --> 01:06:11.400 Maybe a new drug makes things worse. Okay. What possibilities are you looking for? Are you looking to spend to possibly does this new drug make things better or. 653 01:06:11.400 --> 01:06:16.079 Or are you looking for the probability that makes things different better or worse? 654 01:06:16.079 --> 01:06:23.730 Again, so you just, you have to set the rules, and depending what the rules are, then the math will be different 1 tail versus a 2 tail problem. 655 01:06:23.730 --> 01:06:27.239 I mentioned the criminal trial thing that. 656 01:06:27.239 --> 01:06:38.639 Things are fuzzy so the jury has to know what sort of error do they make? Do they falsely convict an innocent person or do they falsely free? A guilty person? 657 01:06:38.639 --> 01:06:42.750 Okay, can you, depending where you put the dividing line. 658 01:06:45.239 --> 01:06:49.199 Any case so this is this hypothesis testing. 659 01:06:49.199 --> 01:06:58.409 And it's, it's worded, so that you ask that if the no hypothesis that there's no difference is, in fact true. 660 01:06:58.409 --> 01:07:06.809 What is the probability we saw something as far off normal as we actually saw, and far off in 1 direction or either direction? 661 01:07:08.219 --> 01:07:13.320 And if we rejected, it doesn't say what the true mean is, let's say. 662 01:07:13.320 --> 01:07:17.760 Or it just says that the hypothesis is not very likely. 663 01:07:19.139 --> 01:07:28.710 And often, you'll see an experiments, they'll talk about a 5% conference they'll say they want a 5% chance of being wrong or something, whatever, but that's. 664 01:07:28.710 --> 01:07:31.739 Whoever designs that gets to specify. 665 01:07:31.739 --> 01:07:44.309 You know what the cut off is so I'm talking about random sampling here. These things are called Z tests. So at least we know the population variance, but. 666 01:07:45.389 --> 01:07:48.780 Yeah, it makes life easier if we assume that. 667 01:07:48.780 --> 01:07:58.440 Use a sample if we don't know what we do, a T test and so I'll talk about later. So the random sampling I mentioned the example before of the. 668 01:07:58.440 --> 01:08:07.469 Polling the United Volt American voters in 1936, and they tempted to randomly sample the voters by. 669 01:08:07.469 --> 01:08:11.880 Sampling people who had a telephone, which. 670 01:08:11.880 --> 01:08:14.909 A lot of people do not have in 1936. 671 01:08:14.909 --> 01:08:19.380 So, they got that random sample was bad and got a lot of publicity. 672 01:08:19.380 --> 01:08:23.250 Here's another 1 you don't have to go back to 19 and 36. 673 01:08:23.250 --> 01:08:30.420 This is the United States was doing a visa lottery and. 674 01:08:30.420 --> 01:08:37.020 2012 of all the applicants they were randomly selecting if you're lucky people to get the visa. 675 01:08:38.069 --> 01:08:43.199 Unfortunately, they got it. 676 01:08:45.930 --> 01:08:49.920 I can't even get about. Yeah. Wow. 677 01:08:49.920 --> 01:08:53.550 They don't even have a valid certificate. 678 01:08:55.710 --> 01:09:04.890 And if we go back down here, they've updated it since I initially. 679 01:09:04.890 --> 01:09:10.770 Did this but what happens is this is the lottery and they got the lottery wrong. 680 01:09:10.770 --> 01:09:13.920 The 1st time so yeah. 681 01:09:13.920 --> 01:09:18.569 Problems right. I keep telling you random sampling as hard. 682 01:09:19.859 --> 01:09:32.520 Some of enrichment stuff you can look at here and what I'll show you Monday or some more videos. There's no time now to start 1, but they're getting into more experimental design things. So. 683 01:09:32.520 --> 01:09:36.840 To tell us 2 populations are the same or different. 684 01:09:36.840 --> 01:09:41.640 When we don't know the variants and over. 685 01:09:41.640 --> 01:09:47.340 Looks at a set of experiments and tries to see if 2 populations are different or not. 686 01:09:47.340 --> 01:09:51.149 Um, and. 687 01:09:51.149 --> 01:09:57.329 I don't know you're right. Handed or left handed people get hired. 688 01:09:57.329 --> 01:10:01.859 And so we have a sample to look at the variance of people. 689 01:10:01.859 --> 01:10:08.369 Within right handed within left handed, then we aggregate the 2 groups and look at the variance that's called an annual. 690 01:10:08.369 --> 01:10:13.770 And will look at a few of things like that, and continue on and maybe work out some problems. 691 01:10:14.819 --> 01:10:18.090 Okay, so that is a chance. 692 01:10:18.090 --> 01:10:21.090 And a couple of minutes early, this is a chance. 693 01:10:21.090 --> 01:10:26.880 To ask any questions, if you don't have questions. 694 01:10:26.880 --> 01:10:32.250 Enjoy the weather I was seeing a little snow earlier this morning. 695 01:10:32.250 --> 01:10:37.229 Other than, but it's going to warm up so have a weekend and see you Monday.