WEBVTT 1 00:01:44.879 --> 00:01:56.370 Okay, good afternoon class. So I think I'm recording this and also broadcasting it with luck. 2 00:01:56.370 --> 00:02:04.590 Take a quick look. Yeah. Okay so probability class. 3 00:02:04.590 --> 00:02:12.719 25, we're continuing on in chapter 8 of Leon Garcia talking about statistics. 4 00:02:12.719 --> 00:02:17.969 Um, a couple of things I thought, I would show you today. 5 00:02:17.969 --> 00:02:22.800 1st, um, I want to show you some paradoxes. 6 00:02:22.800 --> 00:02:26.490 I'll show you another 1. um. 7 00:02:26.490 --> 00:02:36.210 I, they're not in the book, I think looking at these paradoxes is important, because it shows you things happen in statistics that are surprising. 8 00:02:36.210 --> 00:02:41.879 That a non technical person might think are not possible even but yet. 9 00:02:41.879 --> 00:02:50.699 they happen and we get interesting arguments over that when the math partitions such and such a thing has happened and then 10 00:02:50.699 --> 00:02:53.819 Non mathematician says, well, that's not possible. 11 00:02:53.819 --> 00:03:04.800 So that's number 1 number 2. I think I'll show you a few videos on hypothesis testing and then we'll do some examples from the book. 12 00:03:04.800 --> 00:03:11.189 So this is actually not working. Let me see. 13 00:03:15.599 --> 00:03:25.050 Okay, so, um, my blog, I did some suggest counterintuitive things in statistics. I call it. Um. 14 00:03:27.719 --> 00:03:36.120 Let me show you, um, for our 1st, 1 here. So these are. 15 00:03:36.120 --> 00:03:44.009 Column paradoxes, or or whatever um. 16 00:03:45.150 --> 00:03:45.479 So, 17 00:03:45.474 --> 00:03:46.375 here's 1, 18 00:03:46.405 --> 00:03:46.914 um, 19 00:04:11.064 --> 00:04:14.215 didn't comment a whole country can rise faster than incoming any. 20 00:04:14.520 --> 00:04:18.300 And any part of the country, um. 21 00:04:19.350 --> 00:04:24.870 Let me have a fictional Here's a fictional country. Let's say. 22 00:04:27.028 --> 00:04:30.149 And maybe we've got a, um. 23 00:04:33.209 --> 00:04:42.538 We got a poor half for part of the country and credit rich part of the country. And let's suppose each part has a 1000 people. Um. 24 00:04:47.158 --> 00:04:53.639 Very small country, and let's say here the average and com. 25 00:04:55.468 --> 00:05:00.928 Income let's say for cap is what ever say 10 K. 26 00:05:00.928 --> 00:05:04.108 And let's say here are the average income. 27 00:05:05.309 --> 00:05:11.189 20 K. okay. So for the country as a whole. 28 00:05:11.189 --> 00:05:15.509 Um, so if you look at the total income. 29 00:05:18.988 --> 00:05:23.548 Um, the 4 parts, 1000 people at 10. K. so. 30 00:05:26.788 --> 00:05:30.598 + 1000 people at. 31 00:05:32.668 --> 00:05:38.939 And then then the per cap, so that will be 30,000,000. 32 00:05:40.259 --> 00:05:47.759 And then the per cap would be 30Million divided by, um. 33 00:05:47.759 --> 00:05:50.819 2000. 34 00:05:50.819 --> 00:06:01.019 15 K per person. Okay. Makes common sense. I have 30,000. okay now, let's suppose, um. 35 00:06:01.019 --> 00:06:04.199 We have a group here of, um. 36 00:06:06.298 --> 00:06:12.988 So here is, let's say, so, Here's 100 average people. 37 00:06:16.168 --> 00:06:23.459 They move, they move here and they get average jobs. 38 00:06:27.869 --> 00:06:33.209 Um, so we'll call this, um. 39 00:06:33.209 --> 00:06:37.199 Section a of the country and section B of the country. Okay. 40 00:06:37.199 --> 00:06:49.168 So, um, so again, I, I can't split the screen, so it helps to have. 41 00:06:49.168 --> 00:06:54.869 So, um, section is. 42 00:06:57.298 --> 00:07:02.098 Average income stays the same. 43 00:07:05.009 --> 00:07:09.569 Is originally 1000 people at. 44 00:07:09.569 --> 00:07:13.858 10 K now there's 900 people. 45 00:07:13.858 --> 00:07:17.249 At 10 K. okay. 46 00:07:17.249 --> 00:07:22.108 Um, section be Ditto. 47 00:07:23.699 --> 00:07:29.668 Originally a 1000 people at. 48 00:07:29.668 --> 00:07:33.449 Now, I said this out the 100 people that moved. 49 00:07:33.449 --> 00:07:40.439 That average jobs is 1100 people, but they got average 20,000 a year jobs at 20. K. 50 00:07:40.439 --> 00:07:49.288 So, the per cap income, and the 2 halves of the big country stayed the same. Now, let's look at the country as a whole. 51 00:07:51.928 --> 00:07:58.379 Look at the whole country. Okay. Originally. 52 00:07:58.379 --> 00:08:02.999 The per capita, the per cap well, as I said, 15 K. 53 00:08:02.999 --> 00:08:08.668 Now, it's 900 people times. 10,000. 54 00:08:08.668 --> 00:08:12.449 + 1100 at 20000. 55 00:08:12.449 --> 00:08:15.749 Divided by sale 2000 people. 56 00:08:15.749 --> 00:08:19.798 910 K. 57 00:08:19.798 --> 00:08:25.079 Is, um, 9,000,000. 58 00:08:27.209 --> 00:08:32.308 That's 1120 K. is 22Million. Um. 59 00:08:36.298 --> 00:08:41.278 By 2008 calls, um. 60 00:08:42.869 --> 00:08:46.019 31Million divided by 2000. 61 00:08:46.019 --> 00:08:55.438 Equals, um, let's see. 62 00:09:04.379 --> 00:09:08.099 2 is, um, on. 63 00:09:14.489 --> 00:09:20.639 I'm just going to happen here. 64 00:09:23.453 --> 00:09:44.033 Okay, 65 00:09:45.083 --> 00:09:45.894 okay. 66 00:09:51.269 --> 00:09:55.048 Okay, so the per capita income went up. 67 00:09:55.048 --> 00:10:00.808 Um, so it's up from. 68 00:10:00.808 --> 00:10:06.928 15, okay, so. 69 00:10:08.308 --> 00:10:15.899 Hey, see's computer, it's too smart for its own. Good. As soon as I put it down. 70 00:10:15.899 --> 00:10:20.938 It looks away from it, you know, it thinks I'm looking away from it and it locks itself. 71 00:10:20.938 --> 00:10:31.528 So. 72 00:10:33.749 --> 00:10:38.399 Okay um, so. 73 00:10:40.078 --> 00:10:45.328 So, the country as a hold of per capita income went up from 15,015 and a half 1000. 74 00:10:45.328 --> 00:10:49.739 Even though the 2 halves of the country stayed the same. 75 00:10:50.849 --> 00:11:00.479 And it's not something, something, which is totally impossible, because people are gonna move from a poor region to a rich region. Okay like, make it arbitrage. So that's the 1st. 76 00:11:00.479 --> 00:11:08.788 Um, counterintuitive thing I have it here was like, different numbers. So any question. 77 00:11:08.788 --> 00:11:20.639 And, you know, this is an engineering course, not a political course, keep politics out of it, but you can see how the engineering stuff might be affecting the political discourse. Doesn't matter what side you have. 78 00:11:20.639 --> 00:11:24.208 Um, math is useful. That's it. Okay. 79 00:11:24.208 --> 00:11:28.139 What can I say? I like math. Math is useful. 80 00:11:28.139 --> 00:11:36.208 Okay, so that's the 1st thing. Um, get a lot of them. Here's another 1 since we're in a college and. 81 00:11:37.769 --> 00:11:43.318 This is a college acceptance rate. Surprise. 82 00:11:58.109 --> 00:12:05.489 Okay, so basically, um, let's say there's 2 groups applying to college. Um. 83 00:12:09.119 --> 00:12:14.428 College, let's say group a, is there from Albany. 84 00:12:15.658 --> 00:12:23.009 And will be from Boston. Okay. And, uh, and there are several majors. 85 00:12:26.639 --> 00:12:29.938 An example here I have, let's say. 86 00:12:29.938 --> 00:12:34.438 Engineering and humanities. Okay. 87 00:12:34.438 --> 00:12:48.448 Um, so now we, we look at the, um, probability of getting accepted. So, um. 88 00:12:54.808 --> 00:13:02.038 What can happen here? Um. 89 00:13:07.198 --> 00:13:16.828 Of a person being accepted, um. 90 00:13:16.828 --> 00:13:21.899 And let's say we look at, um. 91 00:13:21.899 --> 00:13:28.558 Say into engineering, let's say a Bostonian. 92 00:13:32.129 --> 00:13:35.428 As a larger chance of being accepted. 93 00:13:38.999 --> 00:13:48.208 An Albanian, and I guess some numbers here, um. 94 00:13:48.208 --> 00:13:54.298 So, Bostonian might be, um. 95 00:13:54.298 --> 00:13:58.828 For 5th, versus say, 1115. 96 00:14:00.208 --> 00:14:06.328 And in each case, this is accepted versus applied. 97 00:14:07.739 --> 00:14:12.119 Okay, so forecast would be 1215. so the Bostonian. 98 00:14:12.119 --> 00:14:20.369 A random Bostonian who applies to engineering has a better chance of getting in and around Albanian who applies to engineering. 99 00:14:20.369 --> 00:14:24.869 Let's look at humanities buying into humanities. 100 00:14:26.308 --> 00:14:30.688 Oh, a Bostonian applying to the humanities. 101 00:14:30.688 --> 00:14:36.058 Might be, um, 715th. 102 00:14:36.058 --> 00:14:45.359 Versus some say Albanian might be 25th, and these would be the numbers number accepted versus the number applied. Okay. 103 00:14:45.359 --> 00:14:51.839 and two fifty six fifteen so um okay um 104 00:14:53.188 --> 00:14:58.589 So, let's see. Did I get the. 105 00:14:58.589 --> 00:15:03.719 Right. Okay so it looks like, um. 106 00:15:03.719 --> 00:15:09.328 Bostonians have a better chance of getting and to call it than Albanian. 107 00:15:09.328 --> 00:15:13.948 Um, but now let's amalgamate them. 108 00:15:17.969 --> 00:15:22.048 M. M. L. 109 00:15:22.048 --> 00:15:27.389 My guy this. 110 00:15:29.099 --> 00:15:38.249 Okay, so I gotta go to another page, but I'm going to add and I'm going to add. So, um, so. 111 00:15:41.129 --> 00:15:54.058 So, um, total, um, the total say applications. 112 00:15:56.548 --> 00:16:00.389 Um, so, um, basically. 113 00:16:02.009 --> 00:16:05.969 So, it was 11 or accepted from 15 applications. 114 00:16:05.969 --> 00:16:19.318 Into engineering and 2 were accepted at 5 applications and humanities. So, 13 were accepted out of 20 applications. Total were not just adding the practice. I. numerator and denominator. Okay. 115 00:16:19.318 --> 00:16:23.698 Um, Austin applications. 116 00:16:23.698 --> 00:16:30.328 Um, 4 to 5, or accepted into engineering. 117 00:16:30.328 --> 00:16:34.078 715 were accepted in humanities. 118 00:16:34.078 --> 00:16:39.989 Which is 11 out of 20, um, or accepted total. 119 00:16:39.989 --> 00:16:43.528 So, let me go back to the previous page. Um. 120 00:16:44.609 --> 00:16:48.389 Yeah, okay so in total. 121 00:16:48.389 --> 00:16:52.979 Opinions have a better chance of getting in, um. 122 00:17:02.938 --> 00:17:06.989 Has a has a better chance. 123 00:17:08.909 --> 00:17:13.679 But and and engineering. 124 00:17:14.848 --> 00:17:20.368 Hovnanian worst Chad. 125 00:17:21.959 --> 00:17:25.618 Cats and humanities. 126 00:17:25.618 --> 00:17:28.709 How banyon has the worst chat. 127 00:17:28.709 --> 00:17:39.419 Total Albanian has a better chance they're looking at the numbers aren't tucked into the blog on the right so you can check to see if I made a mistake but this, um. 128 00:17:39.419 --> 00:17:42.719 Um. 129 00:17:44.729 --> 00:17:52.288 So you say what's, um, you know, what's going on here basically um. 130 00:17:53.669 --> 00:17:58.229 Anyone think how you might explain it to an English major. 131 00:17:58.229 --> 00:18:06.689 Um, okay, I'll give you an explanation which I fly with an English major. Um. 132 00:18:07.888 --> 00:18:18.538 Is it engineering engineering is harder to get into it? And these are all fictitious numbers. Okay. Engineering is harder to get into the humanities. 133 00:18:18.538 --> 00:18:23.338 And more obedience applying to. 134 00:18:23.338 --> 00:18:30.058 Engineering then Bostonians are applying so moral Albanian to applying to the harder major. 135 00:18:30.058 --> 00:18:34.229 Then Bostonian so, um. 136 00:18:35.608 --> 00:18:44.368 So that starts affecting things, but you see, it sounds impossible. It this is totally crazy, but it actually happens and there's numbers for. 137 00:18:44.368 --> 00:18:52.318 So, counterintuitive things in statistics and now this if you want a, um. 138 00:18:53.909 --> 00:18:57.778 Yeah, it could have happened say, with drug trials. 139 00:18:57.778 --> 00:19:03.778 You might have 2 separate drug trials that each drug trials separately. 140 00:19:03.778 --> 00:19:07.078 Says the drug appears to be healthy. 141 00:19:07.078 --> 00:19:14.669 But you amalgamate the data from the to test the Amalgamated data. I might say the drug is not healthy. 142 00:19:14.669 --> 00:19:17.729 Even though each separate test says yes. 143 00:19:17.729 --> 00:19:22.739 Combine all of all the experiments and combo answer is no. 144 00:19:22.739 --> 00:19:27.989 I mean, you know, this sounds impossible, but it can actually happen. 145 00:19:27.989 --> 00:19:31.949 And does actually happen so, um. 146 00:19:34.048 --> 00:19:41.999 So these are the, um, real world things in statistics or statistics as we call it sometimes. 147 00:19:41.999 --> 00:19:47.548 And there's lots of these examples. So, um. 148 00:19:49.108 --> 00:19:53.999 I think I got it right here. Um, in total. 149 00:19:53.999 --> 00:19:59.249 Sorry, Albanian do better, but in each separate discipline. 150 00:19:59.249 --> 00:20:02.939 Yeah, to. 151 00:20:04.588 --> 00:20:12.209 Uh, in each separate discipline, the Albanian do worse did I say that here? 152 00:20:12.209 --> 00:20:16.618 Yeah all right. Okay. 153 00:20:16.618 --> 00:20:19.828 Weird to chase. 154 00:20:19.828 --> 00:20:29.638 Uh, I mentioned the 2 things here I'm going to do some more. 155 00:20:29.638 --> 00:20:32.999 Okay. 156 00:20:35.999 --> 00:20:44.548 Like, to actually go video or 2 on, um. 157 00:20:47.969 --> 00:20:52.499 Where did I put some of this? And, um, I'm on. 158 00:21:02.519 --> 00:21:06.929 I'd like to show a couple of videos on hypothesis testing and so on. 159 00:21:06.929 --> 00:21:12.239 And just to refresh it mentioned it on Monday. 160 00:21:12.239 --> 00:21:18.118 Important topic, so, you know, you want to. 161 00:21:19.288 --> 00:21:22.949 On the side, if something is working. 162 00:21:22.949 --> 00:21:28.288 Give the example, the mathematician and Dawson is working for Dennis brewery. 163 00:21:28.288 --> 00:21:35.219 You know, get a new technique raised the amount of alcohol in the peer perhaps. 164 00:21:35.219 --> 00:21:38.459 Or again, drug, um. 165 00:21:38.459 --> 00:21:50.519 Is this new drug working on the average and if the problem that affects different people different ways now to throw some numbers at you for the pharmaceutical and as you mentioned it before. 166 00:21:50.519 --> 00:21:57.568 The worldwide annual gross for the pharmaceutical industry is tend to the 12. 167 00:21:57.568 --> 00:22:02.489 Dollars a terra buck. Okay. A 1Trillion dollars. 168 00:22:02.489 --> 00:22:05.788 Worldwide half the United States and. 169 00:22:05.788 --> 00:22:12.239 Let me write some of these numbers down here. I may have written them before. I can't remember at the moment. 170 00:22:12.239 --> 00:22:16.979 Um, so. 171 00:22:16.979 --> 00:22:26.038 Hypothesis testing. Okay. 172 00:22:28.229 --> 00:22:32.999 That was a new drug work. 173 00:22:34.648 --> 00:22:38.429 Okay, um, you know. 174 00:22:40.138 --> 00:22:44.278 I work more often than not or something. 175 00:22:49.108 --> 00:22:53.368 You know, and just to show some numbers that use so the. 176 00:22:55.229 --> 00:23:01.979 Worldwide pharma pharma is. 177 00:23:01.979 --> 00:23:08.459 Under the 12th per year, the U. S. is, um. 178 00:23:08.459 --> 00:23:13.919 It's basically 5 times 10 of the 11st per year. 179 00:23:13.919 --> 00:23:17.308 And developing 1, new drug. 180 00:23:21.719 --> 00:23:26.818 Unsuccessful drug. 181 00:23:29.638 --> 00:23:32.909 Okay, the number there's a big argument. So maybe it's. 182 00:23:32.909 --> 00:23:38.608 Um, say it's 1 to 5 to 10 of the 9 dollars. Okay. 183 00:23:38.608 --> 00:23:42.298 A 6Billion who knows? 184 00:23:42.298 --> 00:23:51.058 This is including the failures. Okay. 185 00:23:53.699 --> 00:23:57.509 I gave you 10% of the drug to try work so maybe. 186 00:23:57.509 --> 00:24:05.429 You know, each test might be half a 1Billion, but you have to do try 10 drugs to get a working. Once you have 5Billion. Okay so. 187 00:24:08.338 --> 00:24:11.788 So, the so you'd like to have tests, which, um. 188 00:24:13.019 --> 00:24:21.449 So the thing works on the average, um, and to government rules to throw some history at, you. 189 00:24:21.449 --> 00:24:26.519 In the 19th century, there were a lot of things so called patent medicines. 190 00:24:26.519 --> 00:24:30.388 The claim that they would do everything for you. Um. 191 00:24:30.388 --> 00:24:35.098 Most of which didn't work and some of which were actively deadly. 192 00:24:36.239 --> 00:24:41.459 You know, time to change the drugs, the 19th century mothers might give their kids. 193 00:24:41.459 --> 00:24:44.999 Some sort of opium compound to keep them quiet. 194 00:24:44.999 --> 00:24:48.538 I'll just put the kids in front of television, so. 195 00:24:48.538 --> 00:24:56.308 Hi, Michael it was illegal. I might argue the opium was less harmful television, but that's just my opinion. 196 00:24:56.308 --> 00:25:01.288 In any case, so there were all these things within the government started. 197 00:25:01.288 --> 00:25:05.699 Putting in standards and drug agency and, and so on FDA. 198 00:25:05.699 --> 00:25:13.078 To try and regulate this sort of stuff and to try and keep out the awful things that might kill you and to. 199 00:25:13.078 --> 00:25:19.378 It's like for the good stuff. Um, there's another reason why you'd like to have these tests. 200 00:25:19.378 --> 00:25:24.719 It's a case some, you might have heard of it goes back a few decades. Now, there was a new drug. 201 00:25:24.719 --> 00:25:28.019 Call them to minimize and. 202 00:25:28.019 --> 00:25:33.749 It was prescribed for things like. 203 00:25:33.749 --> 00:25:36.778 Morning sickness I'm on pregnant mothers. 204 00:25:36.778 --> 00:25:47.729 And so it, the drug companies sought permission to market it in many different countries us, Canada, Europe, and so on. 205 00:25:47.729 --> 00:25:56.909 And the United States bureaucrats says, no, we don't trust this drug. No, it's illegal to market for the mind in the United States. 206 00:25:56.909 --> 00:26:00.179 Basically, every other country said, yes. 207 00:26:00.179 --> 00:26:08.638 Go ahead, it looks like a very useful drug and it was and it's still sold for certain things certain limited cases. 208 00:26:08.638 --> 00:26:14.608 But it had 1 the problem that when it was taken by a pregnant mother. 209 00:26:14.608 --> 00:26:18.598 When the baby was born, the baby might have no arms or legs. 210 00:26:19.769 --> 00:26:28.138 Part of the problem was a good drug, but the poor kid would have, you know, the arm would be this log or something. 211 00:26:28.138 --> 00:26:34.739 Um, but not in the United States, cause United States never approved it. 212 00:26:34.739 --> 00:26:39.419 This is a canonical example of why we got bureaucrats in the United States. 213 00:26:39.419 --> 00:26:42.659 We're really talking about approving new drugs. 214 00:26:42.659 --> 00:26:48.778 Okay, so that's outside this course. The part relevant to this course is hypothesis testing. 215 00:26:48.778 --> 00:26:54.959 So that, um, we have a drug that pretends to. 216 00:26:54.959 --> 00:26:57.959 Here are the common cold let's say. 217 00:26:57.959 --> 00:27:01.169 And you give it to for the people. 218 00:27:01.169 --> 00:27:11.368 And some people get better on the average in a week. Other people get better on the average of 7 days. It's an old, but get 7 days. 6 days 8 days. 219 00:27:11.368 --> 00:27:20.999 Okay, so so they cannot answer the question directly. Does this drug. 220 00:27:20.999 --> 00:27:27.989 Make you get better quicker so what they, they look at the test people and they blind the experiment, which means. 221 00:27:27.989 --> 00:27:32.338 That the patients don't know, or they're getting the real drug or the placebo. 222 00:27:32.338 --> 00:27:36.449 Usually, although some drugs, you cannot blind it. Um. 223 00:27:38.368 --> 00:27:44.848 Real example, like some types of vaccines, some that have been in the news. 224 00:27:44.848 --> 00:27:47.969 If you get the vaccine, your your arm gets soar. 225 00:27:47.969 --> 00:27:51.239 If you get the foreseeable, your arm does not get. 226 00:27:51.239 --> 00:27:56.608 But, yeah, you know, if you got the vaccine or the placebo, if you're in a test okay. 227 00:27:56.608 --> 00:28:03.808 In any case, so the way they they so they had this weird wording that they constructed last up. 228 00:28:03.808 --> 00:28:10.679 Good again, is they look at well, see what they observed and they say that if. 229 00:28:11.759 --> 00:28:19.798 The drug was useless. How likely is it that we would have observed what we saw so exactly as it's good repeating. 230 00:28:19.798 --> 00:28:25.979 Is that, you know, is this coin a buys coin? We talked about a 100 times, got 60 ads. 231 00:28:25.979 --> 00:28:31.138 If the coin was not fine, that is if it was fair, what's the probability. 232 00:28:31.138 --> 00:28:37.078 We would have, um, seen a result, which is as far off of fair. 233 00:28:37.078 --> 00:28:44.098 That's his hypothesis testing and some new terms are the null hypothesis. 234 00:28:44.098 --> 00:28:53.848 Is the hypothesis that there is no effect that the coin is fair that the drug is not doing anything for you the alternative hypothesis that. 235 00:28:53.848 --> 00:28:56.939 Is it I, Paul says that. 236 00:28:56.939 --> 00:29:02.009 The coin is not fair that the drug is doing something. 237 00:29:02.009 --> 00:29:06.358 And we can say, what's the probability if. 238 00:29:06.358 --> 00:29:09.479 The normal hypothesis is true. 239 00:29:09.479 --> 00:29:16.138 Which is that the coin is fair? The drug is not helping. What's the probability that we would have seen this? So. 240 00:29:17.429 --> 00:29:26.068 And to construct the alternative hypothesis, you have to decide what it should be that the coin is not fair. If the coin is not fair in 1 direction and so on. 241 00:29:26.068 --> 00:29:33.628 For the drug, we could have alternative process to the drug helps you or just that the drug changes you might hurt you. So don't mind. 242 00:29:33.628 --> 00:29:39.868 So so the math you're using in this course is, um. 243 00:29:39.868 --> 00:29:44.189 Is relevant now, the hypothesis testing. 244 00:29:44.189 --> 00:29:49.499 You can't say for sure that the coin is fair, or that the drug. 245 00:29:49.499 --> 00:29:56.729 Is useless no hypothesis what you can say is you can assign a probability to it. 246 00:29:56.729 --> 00:30:01.648 That if the coin, so we saw a 60 ads out of 100. 247 00:30:01.648 --> 00:30:12.868 If well, we can say is that if the coin was fair, then, you know, 98% of the time, we would not see this big a difference off of 50, 50. 248 00:30:12.868 --> 00:30:17.759 So, we have this you can and traditional thing in lot of engineering. 249 00:30:17.759 --> 00:30:21.778 Passes that we say, probably 5% so that. 250 00:30:21.778 --> 00:30:29.848 You know, is the probability less than 5% that we would have seen the coin be this far off 50, 50 if, in fact, it's a fair point. 251 00:30:29.848 --> 00:30:35.729 Or, for, um, for the drug. 252 00:30:35.729 --> 00:30:44.848 Then, you know what, you know, we want, at least the 95 chance out of 100 that the drug is not useless, but. 253 00:30:44.848 --> 00:30:49.138 You get into these issues there um. 254 00:30:49.138 --> 00:30:58.169 And you might get a confidence interval, which is that if the coin is fair, we toss 100 times and 95 times out of a 100. 255 00:30:58.169 --> 00:31:02.969 The number of heads we see will be from I don't make a number up from. 256 00:31:02.969 --> 00:31:08.848 Within 450, 50, let's say okay. Um. 257 00:31:11.249 --> 00:31:17.788 This also applies. Okay, so I gave you examples, say fantastic coins. I gave you an example from drugs. 258 00:31:17.788 --> 00:31:22.618 Um, I gave you an example from. 259 00:31:22.618 --> 00:31:30.808 Say the law of criminal evidence matching fingerprints is to some extent a probabilistic thing. 260 00:31:30.808 --> 00:31:36.028 That there's a, you've got a print, you lift off a, um. 261 00:31:36.028 --> 00:31:41.009 A bomb part after a bomb exploded to pick a real example in, um. 262 00:31:41.009 --> 00:31:49.739 Spain, sometime ago, there was a horrible explosion, killed 200 people or something on a subway. They lifted a print. 263 00:31:49.739 --> 00:31:54.088 And then they had a suspect in Seattle, the United States. 264 00:31:54.088 --> 00:31:58.469 And they said, well, this print matches the suspect's finger. 265 00:31:58.469 --> 00:32:05.878 And arrested him, so there's a probabilistic thing going on there matching because. 266 00:32:05.878 --> 00:32:15.179 The print that you see, the latent print, you see is not going to be perfect. So it's going to be, does it probably match on 12 different or 15 different points or something? 267 00:32:16.378 --> 00:32:23.878 This particular example, the guy was totally unrelated and totally innocent, but his fingerprinted match. So okay. 268 00:32:23.878 --> 00:32:29.009 So just show you a couple of videos talking about this hypothesis testing. 269 00:32:29.009 --> 00:32:32.788 And so on, and because the speaker. 270 00:32:32.788 --> 00:32:37.108 I think may speak it better than me. So, let's see if this works. 271 00:32:44.338 --> 00:32:47.848 Um, I wasn't going to professor. 272 00:32:47.848 --> 00:32:51.298 What's the hypothesis testing? 273 00:32:54.659 --> 00:33:03.538 Let's see. Um, right. Um. 274 00:33:03.538 --> 00:33:06.808 Okay, 6,970. 275 00:33:06.808 --> 00:33:14.368 71, okay, the number is he changed the numbers a little, um. 276 00:33:29.578 --> 00:33:34.108 Okay, when I get the audio working to then. 277 00:33:34.108 --> 00:33:47.338 Restart. 278 00:33:52.259 --> 00:34:02.489 My latest little toys, my own speaker. 279 00:34:03.473 --> 00:34:31.344 Okay. 280 00:34:36.804 --> 00:35:06.744 Okay. 281 00:35:16.014 --> 00:35:49.164 Hello. 282 00:35:55.469 --> 00:36:01.168 Hmm. 283 00:38:25.105 --> 00:38:43.554 Okay. 284 00:38:47.039 --> 00:38:53.519 Hello. 285 00:38:53.519 --> 00:39:05.610 Okay. 286 00:39:24.954 --> 00:39:49.434 Okay. 287 00:39:56.820 --> 00:40:00.809 Okay. 288 00:40:03.059 --> 00:40:09.360 So, there are lots of real world applications where I see a whole bunch of runs of some. 289 00:40:09.360 --> 00:40:12.809 Degenerating process and I want to. 290 00:40:12.809 --> 00:40:20.489 Decide okay, for example, maybe these are flips to the coin and I want to know is this coin fair or not based on the flip side? I see. 291 00:40:20.489 --> 00:40:31.980 Or, maybe I have 2 sequences, and I want to know, maybe these are like, lifetimes of battery before and after I change my manufacturing process. So I want to know. 292 00:40:31.980 --> 00:40:36.659 Is the battery a longer life than this battery? Right? Statistically. 293 00:40:36.659 --> 00:40:44.489 Or does this signal have a different distribution in that signal? So, all these things are instances of generally what's called hypothesis testing. 294 00:40:48.329 --> 00:40:59.485 And you could take all, of course, on hypothesis, testing that course by, for example, we call to the like, detection destination usually a graduate level. It's all super important in machine learning and pattern recognition, computer vision. 295 00:40:59.664 --> 00:41:08.034 And so what am I doing these 2 lessons is use some of the tools that basic probability that we've accumulated so far to answer some simple hypothesis testing kinds of questions. 296 00:41:08.250 --> 00:41:13.199 And the 1st, 1 is called significance test. So it's kind of like a a special case. 297 00:41:18.179 --> 00:41:22.619 And the setup here is that I have what I call the no hypothesis. 298 00:41:22.619 --> 00:41:26.789 Which I'm going to call 0. 299 00:41:26.789 --> 00:41:34.530 And what I have to do is, I can decide, am I going to accept this hypothesis is true or I'm going to reject it. 300 00:41:34.530 --> 00:41:42.090 Okay, so for example, maybe we might know hypothesis is that, um, I have a fair coin. 301 00:41:44.429 --> 00:41:54.269 And what I'm going to do is I'm going to look at some sort of a, um, what I call a statistic involving all of my experiments as supposed to look at the some. 302 00:41:54.269 --> 00:42:05.070 Of a bunch of flips to this quarter and then I say, okay, well, as far as I did 100 quarter flips, I know that my main should be about 50. right? 303 00:42:05.070 --> 00:42:17.905 And so, if my number is 23 or 78, I probably know that my coin is not fair, but it's if it's 52 or 48, maybe I think it's okay. So, the idea is, how can we make a principled argument for whether we should accept or reject the no hypothesis. Okay. 304 00:42:17.934 --> 00:42:22.554 So to be a little more precise. The idea is that I'm gonna look at this. 305 00:42:23.699 --> 00:42:28.199 Number, and I'm going to design a region and I'm going to say, okay. 306 00:42:28.199 --> 00:42:35.639 Um, s, N, is in the region then I'm going to reject the hypothesis. 307 00:42:36.869 --> 00:42:40.170 And if an is not in the region. 308 00:42:40.170 --> 00:42:43.679 I'm going to accept hypothesis. 309 00:42:44.184 --> 00:42:57.655 Right. And this is kind of like a decision rule it says, okay if I'm in this category reject, if I'm in this category accept. Okay. And so, in order to design these kinds of decision rules, we need to think about the ways that our decision can be wrong. Right? 310 00:42:57.715 --> 00:42:59.934 We talked about this a little bit early on. 311 00:43:01.710 --> 00:43:05.429 So we have what's called type 1 error this means. 312 00:43:05.429 --> 00:43:09.449 That we rejected the hypothesis. 313 00:43:09.449 --> 00:43:17.010 When it was true and we have the type 2 error. 314 00:43:20.369 --> 00:43:24.630 Which is that we rejected or we accepted. 315 00:43:24.630 --> 00:43:32.070 The hypothesis when it was false, right? Those are the 2 ways that we can be wrong. 316 00:43:35.730 --> 00:43:43.230 Okay, and so basically, what we have are kind of like, um, this is and false alarms. Okay. 317 00:43:43.494 --> 00:43:57.235 So, I'm assuming that I know h0 that is I know how the coin should behave when it's a fair claim but there are lots of ways that claim to behave when it's unfair. Right? All I really know is what you do in the case that 80 is true. Right? 318 00:43:57.235 --> 00:44:06.324 So, really, the only thing that I can then work with is what's called the type 1 error. Right? I can't really say much about the type 2 error, because I don't know all the ways in 80 can be wrong. Right? So. 319 00:44:07.679 --> 00:44:16.019 Significant setting is about basically, I put a bound on the type 1 error and I find out what my region should be. So, um. 320 00:44:16.019 --> 00:44:20.130 That would be precise. It's like, saying alpha. 321 00:44:20.130 --> 00:44:23.639 Is going to be basically our type 1 error. 322 00:44:24.840 --> 00:44:33.239 Which is, uh, well, I guess this alpha is the probability of a type 1 error, right? And I can evaluate that by saying, okay, if I. 323 00:44:33.239 --> 00:44:38.159 Have some region where I reject the hypothesis right? This is like. 324 00:44:38.159 --> 00:44:42.090 The rejection region then. 325 00:44:42.090 --> 00:44:49.559 I integrate my PDF under the now. I thought this is like a class conditional kind of probability. 326 00:44:50.610 --> 00:44:56.610 And this is the probability of making a mistake, right? That's saying that H, Sarah was true, but I rejected it. 327 00:44:56.610 --> 00:45:02.099 And so then, uh, what I can do is, I can say, okay, um, you know, a more. 328 00:45:02.099 --> 00:45:08.010 A smaller alpha would lead to a more conservative range of. 329 00:45:08.010 --> 00:45:15.329 Um, accepting the hypothesis, right? That's like, saying, I'm Super scared of, um. 330 00:45:15.355 --> 00:45:29.425 Rejecting the hypothesis, so I basically just like always say it's true, right because I want to avoid making any mistakes, whereas maybe if I'm more willing to make mistakes than my decision region changes. Okay. So this alpha is basically the same alpha as we had. 331 00:45:29.425 --> 00:45:31.855 We were talking about confidence intervals a few lessons ago. 332 00:45:32.190 --> 00:45:43.469 And in a similar way, we can use 2 functions and so on to try to actually get numbers certainly be concrete about this example. Right? So, let's suppose that the null hypothesis is. 333 00:45:43.469 --> 00:45:47.130 That the coin is fair, right? That is. 334 00:45:47.130 --> 00:45:51.750 I have a really run the variable with people's 1 half. Right? 335 00:45:51.750 --> 00:45:58.829 Now, I'm going to look at 100, which is the sum of 100. 336 00:45:58.829 --> 00:46:04.469 With 1 being ahead and 0 being not ahead. Right? So what I want to do is. 337 00:46:04.469 --> 00:46:09.630 Find a significant test significance. 338 00:46:10.800 --> 00:46:15.059 Test at a level. 339 00:46:15.059 --> 00:46:18.119 Of say 5%. 340 00:46:18.119 --> 00:46:22.769 Okay, what this means is that I want to find, um. 341 00:46:22.769 --> 00:46:26.099 Alpha equals 0.05. 342 00:46:26.099 --> 00:46:30.690 Which is the probability of rejecting 8? 0. 343 00:46:32.280 --> 00:46:36.929 When was true and. 344 00:46:36.929 --> 00:46:40.139 Really this probability is related to a region. 345 00:46:42.030 --> 00:46:46.650 Where I'm basically going to say. 346 00:46:46.650 --> 00:46:50.610 You know, plus or minus some constant C. 347 00:46:51.534 --> 00:47:03.985 Outside of the expected value, which is 50, is when I'm going to if I'm in this region I accepted that the amount is true outside. I reject it. Right so I can kind of like, now work on these probabilities. 348 00:47:04.074 --> 00:47:10.824 And since 100 is a lot of cleanflicks, I know that the central limit theory should give me a pretty good accurate probability in the tales of distribution. Right? 349 00:47:11.429 --> 00:47:18.690 That's like saying, I want to have point 05 B, the probability that the sum. 350 00:47:18.690 --> 00:47:22.380 Is, uh, not equal to. 351 00:47:22.380 --> 00:47:27.719 Not in this range when 8 0 is true. 352 00:47:27.719 --> 00:47:36.809 Right. That's like saying, I get some crazy set of claim flips that gives me 70 heads even though the point was fair. Right? What's the problem with that happening? 353 00:47:36.809 --> 00:47:40.829 I'm just going to kind of rewrite this and this should be familiar from the. 354 00:47:40.829 --> 00:47:44.760 Um, central limit theorem, kind of. 355 00:47:44.760 --> 00:47:52.440 Lectures right. Or your way of saying this is that if I divide everything through by 100 that mean. 356 00:47:52.440 --> 00:47:57.869 -1 half is great. This. 357 00:47:57.869 --> 00:48:05.460 And so now I can kind of use my queue tables to say, okay, I know what this is. This is like, saying, okay, I want. 358 00:48:05.460 --> 00:48:11.670 To have, um, some of these 2 tails. 359 00:48:11.670 --> 00:48:17.519 Be point. 05, right? So, it's like, saying, I want to queue of Z. 360 00:48:17.519 --> 00:48:24.960 Equal 0.05 then I can look in my queue table and I can find out that the corresponding Z value here. 361 00:48:24.960 --> 00:48:36.809 Is 1.96 we had this actually from some of our previous, uh, tables. I don't have it on the screen right now, but we, wherever we had this table that said, what is the inverse of. 362 00:48:36.809 --> 00:48:40.380 You know, 0.25, it turned out to be this, right? 363 00:48:40.380 --> 00:48:47.130 And then I can use this to say, okay, this, this number here corresponds to Z. Uh. 364 00:48:47.130 --> 00:48:54.269 Sigma square to them what don't I know. I don't know. See right. It's really what I do. Other ones I see is equal to 100. 365 00:48:54.269 --> 00:49:01.739 Times the Z times, the Sigma Sigma for a renewal random variable should be a half. 366 00:49:01.739 --> 00:49:05.670 And I divide by it and square to them, which is. 367 00:49:05.670 --> 00:49:09.570 100 flips. Right so I'm going to have basically 10. 368 00:49:09.570 --> 00:49:17.280 Times what's roughly 2 divided by 2 so basically, I can say that see, here is going to be 10. and that tells me that my. 369 00:49:17.280 --> 00:49:23.610 Significance test a level of point. 05 is in the range 460. okay. 370 00:49:23.610 --> 00:49:30.119 And so this is all going good and in next month, I'm going to say, well, what happens when I can also characterize. 371 00:49:30.119 --> 00:49:44.039 the alternate hypothesis this is like saying you know maybe i know either quite as fair or it's a coin that has uh probability people's three quarters how could i distinguish which point i have from seeing a whole bunch of clients so today next time 372 00:50:01.465 --> 00:50:09.775 So last time we talked about, uh, the significance testing, which is basically saying, can I reject some? No hypothesis then I'm gonna talk about what's called simple hypothesis. 373 00:50:11.579 --> 00:50:17.340 And honestly, we already talked about this a little bit in different contexts. Right? Some of this stuff is gonna look familiar. 374 00:50:17.340 --> 00:50:22.079 But, basically, now, the setup is, I have 80, which I call. 375 00:50:22.079 --> 00:50:30.570 But knowing offices, and I have a new 1, this is H1, which I'm going to call the. 376 00:50:31.860 --> 00:50:40.650 Alternate boxes and what I have in each case is a PDF of how the recommended variable to look. 377 00:50:40.650 --> 00:50:46.710 Under each of these hypotheses, so kind of before we're talking about these, uh. 378 00:50:46.710 --> 00:50:55.769 Under the name class conditional probabilities, right? I remember we had this example with salmon and tuna. This is actually a standard example, just give them a different way. And so again, I have. 379 00:50:55.769 --> 00:51:05.670 A decision rule that says, if this, you know, some of access, for example, is above a certain value. I choose each 1 otherwise I choose. So we're gonna have a decision rule. 380 00:51:09.630 --> 00:51:17.130 And that decision rule can lead to different errors. Right? So not going to give these errors and name right? So type 1 error. 381 00:51:17.130 --> 00:51:20.699 Is basically saying, I decide. 382 00:51:20.699 --> 00:51:24.719 Each 1, but is true. 383 00:51:24.719 --> 00:51:35.670 Right this is kind of like saying, okay, usually the alternative, the alternate hypothesis usually something like the patient has the disease. Right? So usually, uh, h0 means. 384 00:51:35.670 --> 00:51:43.920 That something is absent and each 1 means something is present. Right? So, if I decide, but the thing is present, when it's actually absent, that's what I would call basically a false alarm. 385 00:51:47.519 --> 00:51:52.409 And I type 2 error is when I decide. 386 00:51:52.409 --> 00:51:56.909 That nothing was happening when actually something was happening. 387 00:51:56.909 --> 00:52:00.480 That's like I miss okay. 388 00:52:00.480 --> 00:52:08.760 And so both of these have kind of real world implications for missing something or detecting something. Right? So we're going to talk about. 389 00:52:08.760 --> 00:52:16.170 How we assign and costs we use admit so you may remember that we talked about the maximum likelihood a test. 390 00:52:16.170 --> 00:52:19.199 Or making these decisions. 391 00:52:21.059 --> 00:52:29.190 Michael maxim made a decision a role can refer back to your previous lesson for what this was basically said. Okay. If. 392 00:52:29.190 --> 00:52:33.329 This number is this ratio. 393 00:52:34.889 --> 00:52:39.570 Was greater than the morning then I should choose. 394 00:52:40.889 --> 00:52:45.599 Which 1, and if it's less than 1, I could choose. 395 00:52:46.920 --> 00:52:50.909 80 and the middle is just a coin flow. Okay. It doesn't really matter. 396 00:52:50.909 --> 00:52:54.539 And so this thing here is called the likelihood ratio. 397 00:52:58.019 --> 00:53:02.489 Right. And so we talked a little about maximum likelihood decisions now. 398 00:53:02.489 --> 00:53:15.210 Uh, then we talked a little bit about making, uh, what's called the maximum up was jury position and actually, this is a, uh, some form of what's called just generally pacing hypothesis. 399 00:53:18.179 --> 00:53:22.139 Do you remember anything from those previous lessons? I kept on saying that. 400 00:53:22.139 --> 00:53:29.039 Um, you know, and what you have information is always better than doing maximum length of it. Right? So. 401 00:53:29.039 --> 00:53:37.079 This like, say, okay, we have the likelihoods our, the class conditional. 402 00:53:37.079 --> 00:53:43.769 Distributions we also have what are called prior priorities. 403 00:53:43.769 --> 00:53:47.820 Which are the underlying probabilities. 404 00:53:47.820 --> 00:53:53.579 That I'm in each of the classes to begin with. Right? So very early. How we talked about what's the probability. 405 00:53:53.579 --> 00:54:01.110 Over the whole population that the patient has the disease in the 1st place, right? So, I could have a very rare use, for example, and that will skew. 406 00:54:01.110 --> 00:54:05.219 And then we're also going to have costs and this is kind of the new. 407 00:54:05.219 --> 00:54:10.710 Twist right. The 1st, here is basically saying, I, I basically going to. 408 00:54:10.710 --> 00:54:17.789 Reward or penalize you for making great decisions. Right? So, this is like, basically saying, this is the cost. 409 00:54:17.789 --> 00:54:22.139 Um, h0 being true and I decide. 410 00:54:22.139 --> 00:54:25.320 So this is the cost of. 411 00:54:25.320 --> 00:54:29.010 A 0 being true. And I decide. 412 00:54:30.239 --> 00:54:34.320 Which 1, this is like the opposite of page. 1 is true. 413 00:54:34.320 --> 00:54:38.039 Decide 8 0. 414 00:54:38.039 --> 00:54:42.269 And this is the process with each 1 is true and the side. 415 00:54:42.269 --> 00:54:46.349 So usually. 416 00:54:46.349 --> 00:54:54.989 These, you know, 1st, 1, last 1 are usually set to be 0, because it's not like, you, you incur any penalty for making the right choice. 417 00:54:54.989 --> 00:54:58.590 This is basically the type 1 or or false alarm. 418 00:55:20.639 --> 00:55:23.880 Here, maybe I would decide that, you know, the. 419 00:55:23.880 --> 00:55:27.150 Cost here maybe is like. 420 00:55:28.164 --> 00:55:41.934 Okay, just to use some number right whereas the cost here might be 100. that is to say, it's okay if I occasionally flag somebody as need to have their bag check, because the machine went off, but I absolutely don't want any explosives to get into the airport. Right. 421 00:55:41.934 --> 00:55:47.844 So I don't want to have any misses and I'm willing to tolerate some false alarms. Right? And that's, you know, the kind of. 422 00:55:48.480 --> 00:55:58.260 Considerations that people have to make your own systems all the time. And so my overall cost is going to basically be the thing that I want to minimize. Right. That's like, saying, I have. 423 00:55:58.260 --> 00:56:03.630 00 times the probability of making the of choice. 424 00:56:05.340 --> 00:56:11.400 Given the 80 is true. Time's the probability 80 it's true in the 1st place. 425 00:56:12.809 --> 00:56:18.329 And then I have the other costs as well. It's like, saying, I choose each 1. 426 00:56:18.329 --> 00:56:21.750 0, 0. 427 00:56:21.750 --> 00:56:27.090 I choose this basically, there are 4 possibilities. 428 00:56:34.980 --> 00:56:38.309 And what I want to do is I want to minimize. 429 00:56:38.309 --> 00:56:41.610 The expected value of the cost. 430 00:56:41.610 --> 00:56:47.039 Right that seems like to make a lot of sense. And that's what called Daisy and, um. 431 00:56:47.039 --> 00:56:55.050 This isn't rule and so I'm not going to derive it, but you can show that the decision rule actually comes down to the same likelihood ratio. 432 00:56:56.190 --> 00:57:00.269 Those are the same thing that we had before I'm looking at this ratio. 433 00:57:03.389 --> 00:57:17.670 Which, again, I call the likelihood ratio and the idea here is I have a decision that says I'm going to choose each 1. if I above some threshold at 80. if I'm below that threshold. 434 00:57:17.670 --> 00:57:21.869 And this cow is equal to a combination. 435 00:57:21.869 --> 00:57:26.909 Um, the prior values and the costs. 436 00:57:32.880 --> 00:57:45.960 And so I'm not going to drive this because this is kind of a little bit more advanced than the basic probability for us. But the idea here is that okay, you know, supposing that. I had like, normal costs like. 437 00:57:45.960 --> 00:57:50.820 These things were equal to 0 and I had equal probabilities of. 438 00:57:50.820 --> 00:57:55.619 You know, I had equal costs for making 1 of the areas where we prioritize. 439 00:57:55.619 --> 00:58:04.380 And this is just going to fall off, right? This is just going to be like a ratio of these. We already found this in earlier experiment. And if I also say that. 440 00:58:04.380 --> 00:58:08.280 You know, if the priors are the same. 441 00:58:08.280 --> 00:58:16.650 Right then this tile becomes 1 and that means I'm basically doing that. Right so that's why sometimes we call maximum likelihood. 442 00:58:16.650 --> 00:58:27.239 Uninformative priors. Right. It's like saying, we don't really know, but I'm going to have just like, saying, well, suppose that, you know, each 1 is really. 443 00:58:27.239 --> 00:58:35.789 And the probability of each 1 is really high, right? That means that this threshold is going to be really low and I'm going to choose H1 a lot. Right? 444 00:58:35.789 --> 00:58:38.940 Even if this ratio here is small. 445 00:58:38.940 --> 00:58:45.300 I'm going to basically be cranking out my threshold to say, well, you know, I think each 1 happens all the time. So the prior is good that you should choose that right? 446 00:58:45.300 --> 00:58:58.344 And so, this is kind of an interesting interplay between, you know, the data what the data is telling me by prior possibilities, and the costs for making good or bad decisions. Right? So if you're just kind of thing, you could take a whole course on this. 447 00:58:58.344 --> 00:59:02.755 And then there are kind of variations where maybe, I don't know each 1, but I know that. 448 00:59:03.030 --> 00:59:14.909 You know, it's like, saying, either the coin is fair, or the coin has a higher probability than the head but I don't know exactly what that probability is. That's what's called composite hypothesis testing. And so you can you can learn more about that. 449 00:59:14.909 --> 00:59:20.039 And I think that I'm going to do 1 work example in the next lesson to kind of make this all a little bit more country. 450 00:59:22.230 --> 00:59:29.610 You do an example to see it and this will show the professor Rad keys. 451 00:59:37.704 --> 00:59:50.184 So, I want to do a work example of simple hypothesis testing for last time. This is kind of where I ended up in the last lesson and I want to just mentioned that when I have these kind of standard costs when basically 0 for these equal ones. 452 00:59:50.184 --> 01:00:00.864 And 1, for these other ones, then when I have this as a rule, this is called the map or maximum office jury decision role, which I think we've talked about in the previous lesson. Although not immediately in the previous lesson. But we use that terminology before. 453 01:00:01.320 --> 01:00:05.369 So, here's the set up and this is basically the kind of problem that you'd see as. 454 01:00:05.369 --> 01:00:11.190 1 of your 1st, communication systems kinds of problems. So a binary communication system. 455 01:00:12.630 --> 01:00:21.900 What this means is that the system is transmitting either plus ones or minus ones. Okay. And I'm transmitting the same bit end times. 456 01:00:21.900 --> 01:00:27.659 In a row that. 457 01:00:27.659 --> 01:00:30.869 Good every time goes through the channel. 458 01:00:32.280 --> 01:00:35.610 And what the channel does is, it adds noise. 459 01:00:35.610 --> 01:00:42.780 And this noise is gonna be calcium with say, mean 0 and Sigma equal to 1. 460 01:00:42.780 --> 01:00:57.719 Okay, so that's noisy up these bits to go through the channel and so it used to be a +1 could turn into 1.25 or 7.89 with very high. Very low probability. Right? But what comes out is some. 461 01:00:57.719 --> 01:01:06.599 Uh, number that is a real number, right? Not just at the street 1 or 0 value. And so the, the setup is, I, it transmit the same bit end times. 462 01:01:06.599 --> 01:01:09.630 I observe these N. 463 01:01:09.630 --> 01:01:12.840 Values and now I want a decision a rule that says. 464 01:01:12.840 --> 01:01:17.309 Am I in the +1 case or am I in the -1 case? Right? So. 465 01:01:17.309 --> 01:01:22.619 What are my hypothesis? My hypothesis 80 is that I transmitted. 466 01:01:25.260 --> 01:01:29.460 And, uh, -1 bits. Okay. 467 01:01:30.869 --> 01:01:34.110 And my alternate process is transmitted. 468 01:01:36.300 --> 01:01:43.019 And +1 bits. Okay, so what I have to do, I need to characterize the. 469 01:01:43.019 --> 01:01:46.139 Hypotheses in each case, right? So that. 470 01:01:46.139 --> 01:01:52.199 Pdf of these end numbers under the non hypothesis. 471 01:01:52.199 --> 01:01:59.099 It's basically, well, I have to evaluate that assuming that I'm in the -1 case. So I have. 472 01:01:59.099 --> 01:02:02.429 An independent, um. 473 01:02:02.429 --> 01:02:07.679 This, and then I'm going to have. 474 01:02:11.369 --> 01:02:16.829 The mean, the minus ones, it's like X, +1 squared. 475 01:02:16.829 --> 01:02:31.019 So this is skipping couple of steps. So, this is just basically saying, well, I'm assuming all the bits are coming across and noise independently. I guess I should say noise independent every time. And so this is just a product of a bunch of those PDFs. 476 01:02:31.019 --> 01:02:37.679 And in the same way, if I'm in the H1 case, that means that my. 477 01:02:37.679 --> 01:02:41.639 Mean is +1. 478 01:02:45.960 --> 01:02:49.050 So this is my PDF. Okay. 479 01:02:49.050 --> 01:02:55.469 Now, I have to be the likelihood ratio, which is basically the ratio between this on the top and just on the bottom. 480 01:02:55.469 --> 01:03:00.360 So the 1 or 2 Pi, stuff is going to cancel out and what I'm going to get is basically. 481 01:03:00.360 --> 01:03:06.360 Um, being a little bit. 482 01:03:06.360 --> 01:03:10.349 Brief here where X that represents all the data that I saw. 483 01:03:10.349 --> 01:03:14.489 I'm going to have each of the -1 half some. 484 01:03:15.510 --> 01:03:19.889 Of the numerator part is yes. 485 01:03:19.889 --> 01:03:23.670 And the done on their part is this. 486 01:03:25.050 --> 01:03:34.619 And I can simplify things a little bit. So it's actually going to get squared. Um. 487 01:03:34.619 --> 01:03:38.219 Minus X squared so those are going to cancel out. 488 01:03:38.219 --> 01:03:42.480 And I'm gonna get a minus another minus X side here. So basically I'm going to have. 489 01:03:42.480 --> 01:03:49.440 Um, to and another - and the ones are going to cancel to. So actually, this is pretty. 490 01:03:49.440 --> 01:03:52.980 Straightforward, so I'm going to have is the to the. 491 01:03:52.980 --> 01:03:58.050 Uh, some 2. 492 01:03:58.050 --> 01:04:02.579 Okay, we've talked about this before that actually. 493 01:04:02.579 --> 01:04:07.710 Since the log function is monotonically increasing. 494 01:04:07.855 --> 01:04:14.425 With X that maximizing the likelihood is the same as maximizing the log like, with it. Right. 495 01:04:14.454 --> 01:04:24.684 So, in some sense, what I can do is instead of having a decision on the likelihood ratio by itself, sometimes it's helpful to look at what they call the log likelihood ratio. So I think the larger both sides. 496 01:04:24.869 --> 01:04:29.550 This is like, saying, I have the log of the likelihood ratio is equal to. 497 01:04:29.550 --> 01:04:32.820 Um, just some of my numbers. 498 01:04:34.230 --> 01:04:38.159 And then I'm going have a decision rule that says, well, I should choose. 499 01:04:38.159 --> 01:04:44.760 H1, yes, I'm above some threshold and h0. If I'm below that threshold. 500 01:04:44.760 --> 01:04:50.429 And the threshold from before is basically going to be the log of the. 501 01:04:51.659 --> 01:04:59.880 Ratio of the priors assuming that I don't have any costs right? If I had unequal costs, I could throw them in here too. Right? 502 01:04:59.880 --> 01:05:05.099 And now I could say, okay, well, then what if I have, um. 503 01:05:05.099 --> 01:05:09.989 The same prior if I have P, h0 is equal, which 1 is equal to a half. 504 01:05:09.989 --> 01:05:23.369 Then I have log of 1, which is 0 and so my threshold basically saying, if I add up these numbers, and they're positive, choose the +1 case. If I add up numbers in that negative, choose the -1 case. Right? And that actually makes a lot of sense. And so in some sense, what I have is. 505 01:05:23.369 --> 01:05:26.639 A decision to roll that says, um, you know. 506 01:05:26.639 --> 01:05:35.250 Here's my Here's my sum. Anything on this side. I choose each 1 anything on this side. I choose. 507 01:05:35.250 --> 01:05:38.429 Oh, well, I guess I mixed up which 1 was this. 508 01:05:38.429 --> 01:05:46.829 Yeah, they decide that she was 80, right? But that threshold could shift depending on this number. Right? 509 01:05:46.829 --> 01:05:53.969 So suppose that this turns out to be a positive number, right? That's saying that h0 is ummm. 510 01:05:53.969 --> 01:05:57.900 The prior product 0 is more than the private, real estate age. 1, right? 511 01:05:57.900 --> 01:06:04.949 Then I have log of some positive number and that means that this number now is going to be positive and that's gonna shift by, you know. 512 01:06:04.949 --> 01:06:10.199 Situation over here, so I'm going to be generally more inclined to accept. 513 01:06:11.400 --> 01:06:21.719 80, then, because the prior probably tells me that's likely to be the case. Right? And so this tile slides back and forth, depending on the ratio of my practice. 514 01:06:21.719 --> 01:06:27.059 So, hopefully, that gives you some sense of kind of like a real world system's kind of application of this stuff. 515 01:06:30.449 --> 01:06:35.909 Okay. 516 01:06:50.070 --> 01:06:54.360 Okay. 517 01:07:07.105 --> 01:07:12.385 So, it's showing you Chris, the Rad, keeps you on some of the see compare if you prefer him and watch these videos. 518 01:07:12.659 --> 01:07:21.360 You can watch mine. Oh, okay. Probably reasonable point to stop. Now. Actually. So. 519 01:07:22.949 --> 01:07:26.489 Put a note here for people watching remotely, um. 520 01:07:38.215 --> 01:07:38.514 Oh, 521 01:07:39.235 --> 01:07:53.275 okay. 522 01:07:57.630 --> 01:08:05.340 Oh, okay. I'll put a note here. Watch. 523 01:08:05.340 --> 01:08:10.739 Anybody else. 524 01:08:10.739 --> 01:08:20.430 Okay, we can point to stop now, but I showed you, I showed you some counterintuitive paradoxes statistics. 525 01:08:20.430 --> 01:08:24.750 To motivate you to the mathematics does surprising things. 526 01:08:25.949 --> 01:08:30.659 And then we saw some more review, uh, hypothesis testing. 527 01:08:30.659 --> 01:08:37.289 And we saw some key videos, we'll continue on Monday with more examples, more stuff. 528 01:08:37.289 --> 01:08:45.300 Chapter 8 statistics, so have a good weekend, enjoy the weather a couple of days of warm weather. 529 01:08:45.300 --> 01:08:50.880 And then in 3 days, the high temperature falls by 30 degrees. 530 01:08:50.880 --> 01:08:55.350 I know I don't believe everything I read, but. 531 01:08:55.350 --> 01:08:58.560 Yeah, right. 532 01:08:58.560 --> 01:09:03.090 Okay. 533 01:09:03.090 --> 01:09:06.750 All right. Okay so today, the high is. 534 01:09:06.750 --> 01:09:12.930 83, according to this on Sunday, the high will be 49. 535 01:09:12.930 --> 01:09:16.170 34 degrees called, so. 536 01:09:16.170 --> 01:09:23.069 So, how should I keep you awake during class? Bring a water pistol and go Sprint sprints. 537 01:09:24.449 --> 01:09:28.409 Yeah, um. 538 01:09:31.109 --> 01:09:41.760 So, I'm trying to give you different viewpoints, you know, you may prefer someone else's style to mind you watched theirs, or you watch mine, or you want both or you watch either. So. 539 01:09:41.760 --> 01:09:45.239 Yeah, I have a good weekend.