WEBVTT 1 00:00:20.969 --> 00:00:27.089 Good afternoon profitability people. So this is class some, whatever. 2 00:00:27.089 --> 00:00:30.239 21, I guess Thursday. 3 00:00:30.239 --> 00:00:35.189 April 15th, 2021 and by universal question is. 4 00:00:35.189 --> 00:00:41.039 Can you hear me and see screen. 5 00:00:41.039 --> 00:00:46.200 The reason I ask that is again, is I have no way of. 6 00:00:46.200 --> 00:00:49.679 Um, thank you, Nicholas. I have no way of. 7 00:00:49.679 --> 00:01:01.590 Um, I would need a separate RCS account, which I guess I should get some time and sign in a 2nd time to see what you can see. Even if I sign in as myself, thought it was 2nd device. And I'm sitting here actually surrounded by. 8 00:01:01.590 --> 00:01:05.760 Various pieces of hardware that doesn't show me what you would say. 9 00:01:05.760 --> 00:01:15.689 Okay, so we're continuing on just giving you highlights from the textbook. We're talking probability and. 10 00:01:15.689 --> 00:01:28.825 What we're talking about now are vectors so that we saw probability for 1 variable, then for 2 variables, like X and Y, on a plane. And now we're seeing probability for a vector of variables. 11 00:01:29.424 --> 00:01:40.105 Well, could be samples for a sound wave. And, of course, this gets into the getting deeper than this. Course of course, it could be pixels on a screen or pixels. 12 00:01:40.379 --> 00:01:49.709 You know, over time for a video and the reason you want to do probability on this is that they are. 13 00:01:49.709 --> 00:01:55.200 You would use things like this to devise on compression techniques, which are. 14 00:01:55.200 --> 00:02:02.670 Efficient which compressed the signal to use many fewer bits for example. So, there would be an application of the probability on a vector. 15 00:02:02.670 --> 00:02:15.539 Now, it's going to be happening more and to give you a feeling for the rest of the course, it only has a couple more weeks. The last few classes I'll be talking about statistics. 16 00:02:15.539 --> 00:02:23.280 Which is the opposite of probability in a sense and the causal relation with statistics. 17 00:02:23.280 --> 00:02:29.189 We have a population and we want to. 18 00:02:29.189 --> 00:02:33.629 Determine parameters like, we have all. 19 00:02:33.629 --> 00:02:38.099 Not suppose you've got students that our API and. 20 00:02:39.085 --> 00:02:50.814 We take these numbers off the top of my head now. Okay. 7,000 students at our API and so, this is a finite population and equals 7,000. this population. 21 00:02:51.324 --> 00:02:58.914 You can compute means it needs an expected value. You can computer standard deviation for this population of 7,000 students, but. 22 00:03:01.979 --> 00:03:08.969 We don't have all 7,000 students, so let's say we take would take a random sample of maybe. 23 00:03:09.895 --> 00:03:23.155 70 students say 1%, and for those 70 students, we could look at their mean high. I forgot to say hype let's say, mean, height and side deviation. Then the question is. 24 00:03:24.449 --> 00:03:38.664 What could we, you know, from and that's called a sample and so we do statistics. So statistics means starting to get so we take the sample of somebody's students. We compute their main height. Deviation. 25 00:03:38.664 --> 00:03:45.175 The question is, what does this tell us about the whole population of 7,000 students given that we looked at the sample. 26 00:03:46.080 --> 00:04:00.509 And that's statistics, or we might have to 2 groups of students, a group of say students from, and a students to receive all money. And if we take a sample from each group, then. 27 00:04:02.159 --> 00:04:08.069 Could, you know, are the 2 population? Do they have different heights? Let's say, and. 28 00:04:08.965 --> 00:04:16.194 That sort of thing to tie this down to the real world without getting partisan politics is in the news. 29 00:04:16.194 --> 00:04:30.654 And there's just a story 2 days ago pollsters before the last election admitting that they actually their polls were uniformly wrong. And so we have a question there that you see all these things, you know, the vote will go this way plus or minus. 3%. 30 00:04:30.654 --> 00:04:32.514 What does that mean? So. 31 00:04:33.269 --> 00:04:45.238 Again, if you sample 0.1% of the United States population, and they say they're going to vote for Joe, Joe Smith or something, then. 32 00:04:45.238 --> 00:04:52.588 What does this tell you about the whole population ignoring all the real world things buddies at random. So statistics. 33 00:04:52.588 --> 00:05:05.908 I've been mentioning some examples a couple of times throughout the semester very more than 100 years ago. With their mathematician. I heard gallstones and they had a question about the alcohol content of the batch. 34 00:05:05.908 --> 00:05:13.348 They were growing, so and so you get little samples and measure them but samples cost money. 35 00:05:13.348 --> 00:05:24.119 So, you know, he developed some mathematics for that sort of thing. And so that would be statistics and there might be you've got 2 different batches of. 36 00:05:24.119 --> 00:05:28.619 Beer do they have the same concentration. 37 00:05:28.619 --> 00:05:32.488 So, that that sort of thing is statistics, um. 38 00:05:33.504 --> 00:05:46.043 Or another example, you're a hard rock miner, let's say, a Kareo to pick an example and they're looking for gold or copper or nickel, or whatever is currently valuable silver. 39 00:05:46.343 --> 00:05:52.014 And so they would drill holes and they would drilling a hole. You look at what you'd build up and that's a sample. 40 00:05:52.559 --> 00:06:01.858 And then from that, you want to get some idea in general that if you dug, if you don't mind there, how profitable will the mind? The. 41 00:06:02.483 --> 00:06:11.843 And they're questions of how many holes you do, because each hole that you'd build costs a lot of money and takes time, but gives you a better idea. 42 00:06:11.874 --> 00:06:20.934 Just like with gallstones at Guinness, it costs costs a little money and takes a little time to measure the alcohol content of a. 43 00:06:21.629 --> 00:06:32.189 Of a batch, so but you do more samples you get you get a better idea. So this is statistics you're doing your trade off there and just I mentioned the anterior hard rock mining thing that. 44 00:06:32.189 --> 00:06:45.358 It's interesting I've run out of card to install, I can do is do some real examples. Maybe. So, Ontario province of Ontario, Canada, you have scratch off a scratch off numbers game. You'd buy a card. 45 00:06:45.358 --> 00:06:58.134 You could well, you got to look at the cards at the convenience store, pick ones. You want to buy, like, the numbers. You couldn't see scratch off other numbers and I say, get a certain pattern and then you need to win some money. 46 00:06:58.793 --> 00:07:05.934 Well, it turned out that there was a statistician, a professional statistician that did statistics designs for mining companies. 47 00:07:06.418 --> 00:07:13.559 Um, used his expertise to look at the new scratch off cards and, um. 48 00:07:14.543 --> 00:07:28.343 Figured out ways to, you could look at a card and most of the time predict if it would win or not. And that's real world statistics. Before I get back to probability. Let me see if I can find that example I see. 49 00:07:29.249 --> 00:07:33.718 tario, um. 50 00:07:35.278 --> 00:07:41.939 Okay, good side I don't know act or something. 51 00:07:41.939 --> 00:07:49.288 See. 52 00:07:49.288 --> 00:07:53.608 Say, I can find it. 53 00:07:57.478 --> 00:08:03.449 Well, here's 1, for example, real world probability statistics. 54 00:08:03.449 --> 00:08:06.838 Not the story I was mentioning, um. 55 00:08:09.569 --> 00:08:18.509 Yeah, so they're doing statistics to look at winners in the Ontario lottery and to determine that. 56 00:08:20.098 --> 00:08:30.509 Too many of the convenience store owners were winning the applied statistics and so you don't know what they're doing, but there's something funny going on. 57 00:08:30.509 --> 00:08:34.109 Saw real world statistics, um. 58 00:08:34.109 --> 00:08:38.849 See, here is the 1 I wanted. 59 00:08:43.014 --> 00:08:57.744 Okay, you see, you can make money with stuff, you alerted RPI. Sometimes you didn't Scott didn't go to but the sort of stuff where you could really make your money. Okay. So I Googled it. You can find this I'll put the link online. It's a 1000 years ago. 60 00:08:58.134 --> 00:09:00.833 So so you see a geological console to. 61 00:09:01.349 --> 00:09:06.928 You statistics for gold miners used the same logic for his, um. 62 00:09:07.979 --> 00:09:11.369 You know, used the same logic for his. 63 00:09:11.369 --> 00:09:16.349 You know, to predict the tickets, um, okay pick here. So. 64 00:09:16.349 --> 00:09:22.379 Now, this is wired magazine and does okay, so. 65 00:09:24.114 --> 00:09:38.783 And again, you could read it yourself, the, I'll give you the keys is that the tickets, the lottery Commission, when they designed the tickets, they wanted to be able to control the number of winners they didn't want. They did not want the number winners to be a random variables. 66 00:09:38.783 --> 00:09:43.913 So there was a pattern. And the thing is that. 67 00:09:44.698 --> 00:09:50.818 Trinity Sava was able to figure it out. Okay. Oh, you might. The obvious question is as soon as to. 68 00:09:50.818 --> 00:09:59.214 Stop I figured it out, why didn't he just play it? And he answers the question is that he sorry? Well paid at his day job. 69 00:09:59.484 --> 00:10:13.374 And so he goes to convenience store to look at some tickets and spend a minute or so doing metal computations on each ticket. Then decides to buy 10% of them or something scratches them off and makes 3 dollars. And he's spent. 70 00:10:14.908 --> 00:10:21.899 And basically, he made more dollars an hour of his day job. So that's why he did it. Oh, then it took him to try to convince the lottery commission or something. 71 00:10:21.899 --> 00:10:28.589 Okay, so that's, um, statistics could actually make you money. 72 00:10:28.589 --> 00:10:40.708 In any case, so we're back and probability. You'll get to statistics in a week and we're talking about the textbook here. I'm walking walking through the textbook here. 73 00:10:40.708 --> 00:10:46.828 So, we have a vector Van random variables. 74 00:10:47.333 --> 00:10:59.813 And I'm putting some highlights and so on and maybe, at some point, the thing is my ipad's not connecting. At the moment it connected this morning. It's not connecting now and it's testing before class 1 can learn to hate hardware. But okay. 75 00:10:59.903 --> 00:11:03.803 And by the way, I have my chat window open, so feel free. 76 00:11:04.283 --> 00:11:14.543 To type something, you can also allow you to unmute your microphones also like to say something. Okay. So they're in random variables and they're, they're joint B goes you know, what does this mean? 77 00:11:14.543 --> 00:11:21.774 Each 1, separately is Galaxy and again, the reason we use Galaxy, and even though the math it's harder for them, is that. 78 00:11:23.009 --> 00:11:30.479 And almost every other random variable if you get if you let and get bigger. 79 00:11:30.479 --> 00:11:44.849 It starts looking like a Gaussian. There are exceptions, but they're weird. Most random variables really quickly in fact, start looking calcium. I mean, if you just take a uniform 01, random variable and you add. 80 00:11:44.849 --> 00:11:48.359 You know, 10 or 15 of them together. 81 00:11:48.359 --> 00:11:54.178 That's equals 10 or 15. the sum of those 10 or 15 uniforms looks pretty calcium. 82 00:11:54.178 --> 00:12:01.078 Doesn't just look at it you can throw math at it and analyze it analyze. It'll be pretty close. Okay. 83 00:12:01.078 --> 00:12:12.178 That's why we use calcium. Now jointly calcium means that they're not independent. They're related to each other. Now. We don't just allow any sort of relation. You. 84 00:12:12.594 --> 00:12:18.803 Could imagine 1 was a square root of another 1 was 1 over the other or the CO sign it. 85 00:12:19.464 --> 00:12:30.833 We're restricting them to have certain relations between them linear relations between them and for whoever creates. The field gets to define things in the field. 86 00:12:30.833 --> 00:12:36.474 And so that if I'm joint Galaxy, and if there's this particular relation with correlations, and so on. 87 00:12:37.134 --> 00:12:48.203 Okay, and so this 642 a, is the formula and okay, so X is and the equals with the delta over it means definition. 88 00:12:48.203 --> 00:12:53.244 So this is the definition for a density again little app it's a density function. 89 00:12:53.244 --> 00:13:07.193 And again, just to remind you the syntax here with the lower case, capital s, capital X is the name of the random variable and lower case. X is a particular value for that random variable with that name. 90 00:13:08.339 --> 00:13:16.739 Okay, it's defined as and then some, and we just, they just expand out the vectors again components. 91 00:13:16.739 --> 00:13:24.839 And it has a mean vector M, each component has, that means so take this and K is a correlation. 92 00:13:24.839 --> 00:13:32.759 Matrix that we'll see more later and and it's the general Isaiah correlation. 93 00:13:32.759 --> 00:13:46.313 It's a generalization of the standard deviation. If you had 1, random variable, you would have a standard deviation in the formula. Actually, in case there is a variance and so scar would a case attorney. 94 00:13:46.854 --> 00:13:55.014 So, if it was 1, it would be the standard deviation of the parents, which is how spread out. The thing is when we have more than 1 we got an. 95 00:13:55.528 --> 00:14:01.798 It captures how spread out each 1 is, and also the linear. 96 00:14:01.798 --> 00:14:09.869 How how they're tied together correlate each other linearly is again, it says a nonlinear relation. 97 00:14:09.869 --> 00:14:13.739 This isn't captured, it would only capture the linear part of it. 98 00:14:13.739 --> 00:14:16.798 Um, but. 99 00:14:16.798 --> 00:14:24.688 You know, he's working with, so basically this will be the definition and it's, it's useful that this, this is already getting hard enough that. 100 00:14:24.688 --> 00:14:29.188 I'll fire up Mathematica at some point. Um, and. 101 00:14:29.188 --> 00:14:35.428 It's getting to be the point you can't really do this sort of stuff by hand, you know, so I'm. 102 00:14:35.428 --> 00:14:41.668 And if you pick your favorite tool computer tool to work with this. 103 00:14:41.668 --> 00:14:48.328 That's the real world, you know, stuff in the real world is sometimes hard enough. You can't really do it by hand. Okay. 104 00:14:48.328 --> 00:15:00.778 Intended to do and here is K, the diagonal is the variance of each the K is the generalization of the variance. If there was 1 variable. 105 00:15:00.778 --> 00:15:05.489 1, it'd be the parents, so the diagonal here is the variance. 106 00:15:05.489 --> 00:15:17.158 Of each separately and the off diagonals are the CO variants of each pair of random variables. 107 00:15:17.158 --> 00:15:26.639 Now, again, if there's some more complicated relation, what a company won't be, let's suppose X3 was always X to plus 1. 108 00:15:27.833 --> 00:15:41.514 So the 3 are quite strongly related, but any 2 of them are not it's the relation ties together the 3 of them. Well, that's not going to be captured by this. So, this isn't a perfect method. 109 00:15:41.849 --> 00:15:47.369 Okay, but it's good enough to get a lot of work done. 110 00:15:47.369 --> 00:15:51.658 And it's simple enough that we can work with it. 111 00:15:51.658 --> 00:15:57.749 And that's the story of life working with engineering solutions and. 112 00:15:57.749 --> 00:16:03.178 It's powerful enough we can do something with it and it's cheap enough that we can actually. 113 00:16:03.178 --> 00:16:07.139 Park with it. Okay. 114 00:16:07.139 --> 00:16:12.509 Any case so, Kay, it's called the variance matrix. 115 00:16:13.769 --> 00:16:16.918 And we get the joint Tom things now. 116 00:16:18.958 --> 00:16:25.318 And they give some examples so the notation is also a little. 117 00:16:25.318 --> 00:16:29.458 Not completely logical and. 118 00:16:29.458 --> 00:16:33.479 All I can defend it as it's the real world. 119 00:16:33.479 --> 00:16:39.749 You got to learn to work with stuff. That's a little logical. So I got the guarantee matrix K. 120 00:16:39.749 --> 00:16:46.349 If you got only 2 dimensional means 2 variables, you get X1 X2 and that's it. And equals. Do. 121 00:16:46.349 --> 00:17:00.958 Again, so diagonal is always the variance variance of the square to standard deviation. The off diagonals are signal 1, single 2 of the 2 sandy's role just to remind you is called the correlation coefficient of 2. 122 00:17:00.958 --> 00:17:13.739 Scaler random variables and row is dimension list. Like the standard deviation has the dimension of the underlying random variable. If you're talking about student Heights and. 123 00:17:13.739 --> 00:17:21.388 You're working metric the satin deviation of units of meters if you're working English units of feet or whatever. 124 00:17:21.388 --> 00:17:33.358 If you're talking about temperature suspected to snow tonight, then instead of units of degrees, pick your favorite unit for temperature Calvin maybe. Okay. 125 00:17:33.358 --> 00:17:45.989 So, sigmas role is dimension list and it goes to minus 1 to 11 means that the 1 variable is to a linear. 126 00:17:45.989 --> 00:17:51.239 Totally a linear relation on the other variables. 1 variable is X. the other 1 would be X plus B. 127 00:17:51.239 --> 00:18:00.538 And that would be role 1 if it was negative, a, being less than role would be minus 1. if it's a sort of a relation, then always summer. 128 00:18:00.538 --> 00:18:07.558 Less than absolutely, that's the 1. okay so that's that. And then the coal variances roll times the 2 segments. Oh, okay. 129 00:18:09.419 --> 00:18:17.489 And you can do the determined of that and again, it expresses. 130 00:18:17.489 --> 00:18:24.148 Something about holiday. Okay. The exponent is there for. 131 00:18:24.148 --> 00:18:31.679 This is in the thing, and we can get something like this. Okay. 132 00:18:33.689 --> 00:18:38.669 Now example, again is getting a little too much. 133 00:18:38.669 --> 00:18:43.979 To work with, but assuming we've got jointly Gaussian and then they got these segments and. 134 00:18:43.979 --> 00:18:56.009 Variances and covariances the marginal now, the thing with here, there's 3 and is 3. so the thing with I'm trying to give you a height of a meeting. So the thing with the. 135 00:18:57.598 --> 00:19:09.719 The joint the Gaussian is that the marginal Gaussian for any subset of them it's also a Gaussian or joint calcium and how to find it. 136 00:19:09.719 --> 00:19:15.239 So that's a problem here and example, 621 and you can just integrate out the formula. 137 00:19:15.239 --> 00:19:21.209 And or you can use some knowledge about. 138 00:19:21.209 --> 00:19:26.969 Marginal and so on using some earlier stuff. Okay. 139 00:19:28.919 --> 00:19:32.489 Now, here there. 140 00:19:33.778 --> 00:19:38.128 602 is making the point that. 141 00:19:39.358 --> 00:19:50.398 Well, if a Gaussian, if they're not correlated, then they're also independence and independence is a stronger concept. The macro correlation because independence includes any sort of weird. 142 00:19:50.398 --> 00:20:02.489 Even non linear relation, but in fact, if they're jointly Gaussian, then they follow that joint PDF and that locks it down. It removes a lot of freedom in this case. 143 00:20:02.489 --> 00:20:06.719 If they're not correlated, they're also independent. 144 00:20:06.719 --> 00:20:10.078 That's the idea they're working out here. 145 00:20:11.933 --> 00:20:25.463 Do to do where it would go here conditional conditional conditional good 603. okay, so conditionals again, we're hitting again and again and again and probability just to remind you of canonical example. 146 00:20:25.463 --> 00:20:31.134 Noisy communication channel. You transmit X. you receive Y. 147 00:20:32.969 --> 00:20:38.338 And because it was X plus, and were in some noise in that case, then. 148 00:20:38.338 --> 00:20:47.249 Then you want the probability that you receive different signals conditioned on given that certain signals were transmitted. 149 00:20:47.249 --> 00:20:51.749 And, yeah, you know yeah. Use that to analyze. 150 00:20:51.749 --> 00:20:58.138 You know, the noisy channel, so, and we saw it when simpler cases. So here, what's happening. 151 00:20:58.644 --> 00:21:11.183 Is and again, I'm giving you the executive summary makes me drill down in a later class but so the notation here, if you can see me circling it with my cursor. 152 00:21:11.604 --> 00:21:22.344 So TX event, critical bar, 12 X1, 2 and minus 1. so, that's what is probably density of X. so then if you have no, it have particular values for the 1st and minus 1 and. 153 00:21:26.364 --> 00:21:40.253 And that's add to the definition here on the right it's the density for the whole vector van, minus the density, the marginal divided by the marginal density ready and minus 1 that you're conditioning it on. And. 154 00:21:40.828 --> 00:21:44.398 Formulas for all of that and. 155 00:21:44.398 --> 00:21:47.638 Okay, and. 156 00:21:48.719 --> 00:21:53.159 Okay, and then you can get the conditional mean and so on. 157 00:21:54.114 --> 00:21:58.493 And, of course, again, looking ahead, you can do base and you go backwards. 158 00:21:58.493 --> 00:22:09.713 So, again, just to remind you, and I like to say important things more than once just to try and pound them into you because useful and so, you know, you child condition. 159 00:22:09.953 --> 00:22:23.963 So the conditional density of the output, why given the input X and you want to run it backwards using the invasion concept and find a conditional probability is on the input, given that you saw certain predicted her out. But that's what you would. 160 00:22:24.388 --> 00:22:29.219 Yes, if you saw okay. 161 00:22:29.219 --> 00:22:38.338 Oh, see, I mean, I want to make I'm not doing the starred things. Let's explore 1. we will skip. 162 00:22:38.338 --> 00:22:45.239 And that's part of 641642 we will skip. 163 00:22:45.239 --> 00:22:48.929 65 estimation. Okay. 164 00:22:48.929 --> 00:22:59.038 Now, give you some examples of really noisy ran over there. I mean, there are channels where it's less than 1. 165 00:23:00.023 --> 00:23:02.243 And well, 166 00:23:02.243 --> 00:23:02.903 the 1st, 167 00:23:03.624 --> 00:23:04.013 the 1st, 168 00:23:04.013 --> 00:23:04.284 trans, 169 00:23:04.284 --> 00:23:16.493 Atlantic telegraph cable is 1850 or so they're running it from England some card wall or Ireland but they ran it to heart's content bay in Newfoundland. 170 00:23:16.523 --> 00:23:26.124 And I visited where the cable came ashore by the way, and just a little museum there and it was laid by the largest ship in the world. 171 00:23:27.358 --> 00:23:33.239 Which was the Great Eastern and the Great Eastern was 7 times bigger than the 2nd, largest ship in the world. 172 00:23:33.773 --> 00:23:38.064 And it was the only ship in the world big enough to carry the cable. 173 00:23:38.483 --> 00:23:52.763 And before that, they made an attempt, they used to ships, they put half the cable on each ship, and the ship sailed out to the middle of the Atlantic, got together and ice their 2 cables together and then they hit it off in opposite directions. 174 00:23:53.034 --> 00:23:55.074 Any case the radius was big enough so they. 175 00:23:56.818 --> 00:24:10.463 Run the cable and but the cable has considerable capacitance and inductance, it's 1500 miles long, or whatever under the ocean and they really hadn't worked out things like insulator and stuff like that. They had not work out the math. 176 00:24:10.463 --> 00:24:22.374 And the theory also, amplifiers had not been embedded. So they use a battery at the transmitting end, sends a voltage and they're receiving signal age was running less than 1 less than 3rd. 177 00:24:22.374 --> 00:24:36.443 So they were using the voltage at the receiving end to run in the electromagnetic that tilted a mirror a very lightweight, mere Hong by a very thin thread and they would reflect the light beam off of it. 178 00:24:36.443 --> 00:24:44.153 And they'd look where the mirror would twist light. Beam would hit a different point. And that's how they would tell what voltage was on cable or what signal. 179 00:24:44.153 --> 00:24:52.614 And it said transmission medium signals and always very less than 1 bit for 2nd, less than 1. it would take a while for. 180 00:24:55.403 --> 00:25:03.983 Mirror to stabilize, and the thing is to improve the signal to noise ratio. They raised the voltage of the translating in again. 181 00:25:03.983 --> 00:25:12.233 It is used factor a stack of batteries and series increase the voltage and increase it so much that they punched right through the installation. 182 00:25:12.719 --> 00:25:24.358 On the cable somewhere under the Atlantic and destroyed the cable punch the whole of the installation again, they did not have high quality until late what they use as a sort of natural rubber for the installation. You got to encourage. 183 00:25:24.358 --> 00:25:37.019 And so, this was not that long after the cable started, they destroyed it and so skeptical investors thought the thing had been a fraud from the beginning to be financed by an American investor called Cyrus failed and. 184 00:25:37.019 --> 00:25:42.298 And they claimed that field had been running a fraud and. 185 00:25:42.298 --> 00:25:50.459 No, he was not and so he financed himself of his own money. He was rich the 2nd cable it took a few years. 186 00:25:50.459 --> 00:26:02.038 And the 2nd cable, they were more careful, and it worked fine and then he paid off all the investors on the 1st cable also, even though I didn't legally have to. So, in any case that's probably real world engineering. 187 00:26:02.038 --> 00:26:07.138 So, okay, any case estimation. 188 00:26:08.699 --> 00:26:13.259 So, the thing is that. 189 00:26:13.259 --> 00:26:20.459 Again, just a touch of this last time. It's good to rerun. I like to do important things again. 190 00:26:20.459 --> 00:26:29.669 Um, okay, so what's happening here is my communication channel thing you receive Y, and you want to estimate X. 191 00:26:30.324 --> 00:26:45.263 And, okay, so we saw 2 different ways to do it called a, the basic difference is that do we have a probability distribution on X on the transmitted signal? 192 00:26:45.568 --> 00:26:53.548 You hope that you do, but maybe you don't or maybe you want to devise a technique that's robust against. 193 00:26:53.548 --> 00:27:04.138 Different types of sources so the rule is that the more, you know, the more you can compute. So if you know the probability distribution, maybe you're trying to. 194 00:27:04.138 --> 00:27:13.648 Compress images of text, if, you know the fraction of the pixels that are black, you can do more than if you don't know the fraction. 195 00:27:13.648 --> 00:27:23.009 However, there is a flip side here that if you think, you know, the fraction of pixels that are black, but you get it wrong. 196 00:27:23.009 --> 00:27:36.834 You're going to devise a technique, which is really suboptimal, which is really inferior. So, maybe you don't want to make assumptions about the fraction of transmitted pixels that are black, but say, maybe you don't want to estimate and a probability for access. 197 00:27:37.074 --> 00:27:39.894 So these are the differences here. Okay. Now. 198 00:27:40.229 --> 00:27:50.189 Again, communication channel, transferred, extra, save. Why? And why is X plus noise here here and is noise not number and so. 199 00:27:52.493 --> 00:28:05.993 And we want to go backwards downstairs up after applying base and so on. We were, we see a signal why? And we want to get some idea. What was so we saw why now, what was transmitted what was the X that was transmitted. 200 00:28:06.328 --> 00:28:15.028 And so why is fixed? Okay we saw maybe a, I received voltage of point 6. 201 00:28:15.294 --> 00:28:16.374 Let's say now, 202 00:28:17.634 --> 00:28:27.084 and let's take the simple case X could be 01 or something so we got the probability of that access different values given that we saw Y, 203 00:28:27.173 --> 00:28:29.453 being point to X extreme or 1 to the probability 0, 204 00:28:29.723 --> 00:28:31.493 given wise point 6, 205 00:28:31.493 --> 00:28:32.963 or the probabilities axis. 206 00:28:32.963 --> 00:28:44.153 1, given wise point 6, and we're going to pick the greater profitability and here's the syntax here. We're trying to maximize have profitability, different types we want to. Okay. 207 00:28:44.969 --> 00:28:55.169 The syntax for this is, we want to determine the value for X, which gives you the maximum probability for that value back given that why it's called the map estimate. 208 00:28:55.169 --> 00:29:01.949 And and this is a condition so the next thing down this is just simply based. Okay. 209 00:29:01.949 --> 00:29:08.249 And so this requires that we had the priority, the probability of different types of X down here. 210 00:29:08.249 --> 00:29:11.909 Okay. 211 00:29:12.953 --> 00:29:26.364 Good and that's now maybe like I said, we don't know the prior probabilities for X or maybe we don't want to use them because we're worried that if we use something wrong, we'll develop a really bad estimator. So we don't. 212 00:29:27.749 --> 00:29:32.909 No, the prior probably is have X so then we can do something. 213 00:29:32.909 --> 00:29:42.929 Which I lead mid coming intellectual point of view it's a hack. If you forced me to send it on some logical grounds I cannot however. 214 00:29:44.009 --> 00:29:56.519 It appears to work, so people use it. And so what we say is that for each value of X, what's the probability of getting that value? Why? And we pick the value of X, which is. 215 00:29:56.519 --> 00:30:02.489 Such as for that X, we get the greater value of why and that sort of. 216 00:30:02.489 --> 00:30:16.913 Assuming the priors were all equal for the axes. That's not I'm being sloppy here, but that's not the maximum. So we want to find the value of actually maximize the value of why given that X. so the map and the maximum likelihood are opposites. 217 00:30:16.913 --> 00:30:21.144 You see the formula, it's sort of weird. Okay. 218 00:30:21.449 --> 00:30:25.679 And probabilities you can work with densities. 219 00:30:25.679 --> 00:30:31.439 And then we saw here comparing them and joint and work it out. 220 00:30:31.439 --> 00:30:36.749 I'll work it out for and I mentioned last time also that. 221 00:30:37.193 --> 00:30:46.463 The maximum estimators depend on the noise if it's a really noisy system knowing, why does not actually tell you that much about X? Because it always swamps why? 222 00:30:47.094 --> 00:30:52.223 It's very little noise and knowing why it tells you a lot about X and that's the. 223 00:30:52.648 --> 00:30:56.308 What's what are coming in here? Okay. 224 00:30:56.308 --> 00:31:07.979 So, that's what they're talking about here and now we can also compute the error. The probability that we're wrong. 225 00:31:09.328 --> 00:31:15.509 Okay, and that's what's happening here so we have the best estimator, but it's not perfect. 226 00:31:15.509 --> 00:31:25.528 Okay, but we can count the probability that is wrong. And that's talking about errors here. That's what's happening here. 227 00:31:25.528 --> 00:31:32.638 And you need to do, I'm waving my hands. It's a hand waving day sort of. 228 00:31:33.778 --> 00:31:41.909 And you can also, this is another way to compute stuff and so we've got the error, we can try to minimize the value. 229 00:31:41.909 --> 00:31:50.219 Our imputed value of exit minimizes the error the probability, the probable error, and we can do that also. So. 230 00:31:51.328 --> 00:32:00.749 What they're doing here is they're trying to say we're going to assume that our transmitted signal is some linear, a function of the received signal. 231 00:32:00.749 --> 00:32:08.939 Access a WI, Fi and we don't know and be, but access on linear function of Y and now we can compute what the. 232 00:32:09.324 --> 00:32:19.403 Error is for that value vaccine now, we can optimize, find the values of expected value for the error. And now we can, it's just our mass error root means squared. 233 00:32:19.584 --> 00:32:24.443 And now we can solve a B or unknown, we can solve for the values of a, and B, that. 234 00:32:24.719 --> 00:32:29.578 Give us, the minimum expected error on X is another way to go and do it. So. 235 00:32:29.578 --> 00:32:33.298 Blah, blah, blah, blah, go down and compute. 236 00:32:33.298 --> 00:32:42.868 Okay, and this would be called minimum mean square error. I see. Okay. And. 237 00:32:44.189 --> 00:32:48.868 If we look at what's happening here, we got the expectations that means they just. 238 00:32:48.868 --> 00:32:53.249 Add and subtract in the obvious way. 239 00:32:53.844 --> 00:33:06.054 And things and things are scaled by the respective standard deviations. Why gets divided by Sigma? Y, next thing we modify by segment actually effects. Okay. That's sort of obvious. 240 00:33:06.324 --> 00:33:12.084 The other thing is the row here, the correlation coefficient comes into it and. 241 00:33:13.679 --> 00:33:17.429 And what this is saying is, this is saying what I just told you that. 242 00:33:17.429 --> 00:33:31.259 If they're if they're very strongly correlated, which means a noise is small, then X is going to be about equal to Y, actually correct. For the means of deviation. However. 243 00:33:31.259 --> 00:33:34.858 If they're weekly correlated, which means the noise is large. 244 00:33:34.858 --> 00:33:40.229 Then the exit gives you the smallest error is actually a lot less than why. 245 00:33:40.229 --> 00:33:45.689 And it's always negative and negative with that and. 246 00:33:45.689 --> 00:33:51.659 Yeah, so that's what intuitively. What's that? What that is saying is that. 247 00:33:51.659 --> 00:33:59.669 Just because you see a big value for why does not mean access biggest annoy if there's a humongous noise. So that's what that's saying here. 248 00:33:59.669 --> 00:34:09.809 That's the most that the X, the most probable X is still quite small. Even if why? Even 1 wise big if noise is big. 249 00:34:09.809 --> 00:34:16.048 If the correlation is small, so that takes a little thinking about. 250 00:34:17.309 --> 00:34:26.634 But it's actually an important thing to realize and probability. And again, you can work it into the news with people talking about false, positive, false negatives with cobit tests. 251 00:34:26.634 --> 00:34:32.364 And so on that, there's a lot of noise in the system, even a positive test or negative test. 252 00:34:32.639 --> 00:34:36.929 Might not be strongly predictive about are you sicker healthy? 253 00:34:36.929 --> 00:34:45.628 So, okay, so skipping through a little that there that's what they're talking about there. 254 00:34:45.628 --> 00:34:56.818 And in this case, the various ones types of estimators might in some cases, turn out to be the same. 255 00:34:57.898 --> 00:35:07.708 And then the best Estimator might depend on the particular random variables probability distribution. So, here, we're using this 1 type exponential. 256 00:35:07.708 --> 00:35:17.849 Or whatever, so the best estimate is, for an exponential distribution would be different perhaps than for a normal distribution. So that's what's happening here. 257 00:35:17.849 --> 00:35:26.039 Hey, now we're here, we're doing linear, random barrier linear and uniformly in the interval minus 1 1. 258 00:35:26.039 --> 00:35:38.068 And so sort of fun thing here and so we got access uniform minus 11 is why is X squared. 259 00:35:39.298 --> 00:35:47.608 And, okay, you know, that formula, why is X squared you got actually and find why okay, but the thing is, why is non linear in exits the. 260 00:35:47.608 --> 00:35:54.869 So, what we're talking about here, suppose this exercise 608 it's in the exercise and. 261 00:35:54.869 --> 00:36:00.119 Playing with the tools to work with this. So, here we're. 262 00:36:00.119 --> 00:36:13.409 We're saying we're required to have a linear Estimator for Y, in terms of acts. We're not allowed to have the exact X squared. We're required to have a layer estimator. Y, equals a X. plus B. 263 00:36:13.409 --> 00:36:19.139 Which is sort of weird. I admit this is a submetric gravel, but, you know, we play along with this. 264 00:36:19.139 --> 00:36:31.768 And, okay, I mean, is 0, of course from mice and now the correlation for the correlation. 265 00:36:31.768 --> 00:36:35.518 We need expected value of X times. Y. 266 00:36:35.518 --> 00:36:43.498 And and so and. 267 00:36:43.498 --> 00:36:48.869 And integral and. 268 00:36:50.128 --> 00:36:53.668 And expected value of ex cube is 0. 269 00:36:54.474 --> 00:37:06.594 And I don't know why you're getting from minus a half to 1, but let's not worry about the immediate execute to 0 anyway. So the call variance it's expected value of X Y, by X. 270 00:37:07.284 --> 00:37:09.893 Y variance is 0. 271 00:37:10.168 --> 00:37:13.858 Which indexer, why are not linearly correlated. 272 00:37:13.858 --> 00:37:19.409 So, I go back a few equations and take them with the word that the best Estimator for. 273 00:37:19.409 --> 00:37:28.648 Why, it's just a constant the expected value of why? Because they're not correlate to and the mean for why the mean for X squared. 274 00:37:28.648 --> 00:37:36.389 Well, we can integrate it out and you can integrate it out. 275 00:37:36.389 --> 00:37:39.989 Get whatever, but then, so the best estimator. 276 00:37:41.398 --> 00:37:46.318 Um, and then error, and it says of our into Y. 277 00:37:48.329 --> 00:37:55.409 And so that's the so, that's the error, the media square for the best linear estimator. 278 00:37:57.119 --> 00:38:07.798 It's quite high because the best is just a constant because X are not correlated here. Okay. So whatever axes you're going to discuss at why is whatever. 279 00:38:07.798 --> 00:38:16.320 The main value for why? Okay, but the best estimator, if you're not restricted linear is why it was X. and the error is 0 here. 280 00:38:16.320 --> 00:38:22.469 So, they're constructed this example to show that when you restrict yourself to linear estimators you may. 281 00:38:22.469 --> 00:38:27.239 He's so optimal, but, you know, is constructed that example. 282 00:38:27.239 --> 00:38:31.409 Broader lesson here with 608. 283 00:38:32.550 --> 00:38:35.940 In this field, they do a lot of stuff. 284 00:38:35.940 --> 00:38:46.289 Which, frankly does not have really strong formal justification let's say, when they say, let's do linear. 285 00:38:46.289 --> 00:38:57.925 It's good enough and it's cheap enough and that's the defense for you. There is no defense that said linear is more likely to happen the nonlinear or anything like that. 286 00:38:58.914 --> 00:39:12.175 And a lot of the techniques will see with statistics data. You can't actually justify them. They appear to work, okay or you can't prove they don't work, which is a different state, but. 287 00:39:13.650 --> 00:39:21.900 So, what hey, they said, they appear to work and you can work with them. Okay. 288 00:39:21.900 --> 00:39:33.179 And frankly, the justification for a lot of route means square things you can, they're cheap enough to compute if you don't have a computer and when they weren't got it. Okay. In any case. So, jointly Gaussian. 289 00:39:34.710 --> 00:39:41.159 Again, they work it out now. 290 00:39:41.159 --> 00:39:45.630 So vector of observations again, you got the vector of and. 291 00:39:45.630 --> 00:39:51.449 Right. Variables and so we have the observations. Why? So Ben here. 292 00:39:51.449 --> 00:39:59.670 And, okay, so what's happening here is the original random variable X is a scaler. 293 00:39:59.670 --> 00:40:02.849 You're observing it end times. 294 00:40:02.849 --> 00:40:10.139 But the observations are noisy and what's in a real world example? 295 00:40:10.139 --> 00:40:17.309 You're managing some physical, constant, like, speed of light. Okay. And every time you measure it, you get a different value. 296 00:40:17.309 --> 00:40:31.170 Well, the speed of light is a constant or assuming we're not getting into philosophical, weird theories that it changes over time or something. We're measuring it all today. But every time you measure it, it's there front or. 297 00:40:31.170 --> 00:40:42.630 Id, we might hypothesize something constant in your catapult, but every time it shoots a marshmallow. 298 00:40:42.630 --> 00:40:53.519 It's different and you want to know and for some, some property of the catapult. Okay, so you got the 1 random variable X. you're observing end times. 299 00:40:53.519 --> 00:40:57.900 Each observation and has a random error. 300 00:40:57.900 --> 00:41:01.050 And now you want to estimate X. 301 00:41:01.050 --> 00:41:11.250 From all those wise. Okay. Now, the obvious thing is, you might say, oh, and it's also, um. 302 00:41:11.250 --> 00:41:17.099 Using some function otherwise now, the obvious thing you might say is. 303 00:41:17.099 --> 00:41:25.440 You know, make it stick the mean of all the wise. Okay. And when I say axis the expected mean of all the wise, and. 304 00:41:25.440 --> 00:41:32.340 From any distribution. That's good. Okay. When we want to get a touch more formal here so. 305 00:41:32.340 --> 00:41:39.179 I mean, effects is normal and then the noise is normal then picking the mean of the wisest. 306 00:41:39.179 --> 00:41:50.130 Probably optimal actually, but you could imagine exponential or something and there's some weird noises out there that are not normal. So. 307 00:41:50.130 --> 00:41:57.989 Um, yeah okay. And so maybe mean is not to. 308 00:41:58.855 --> 00:42:03.414 Accurate and okay, so we have here. 309 00:42:04.315 --> 00:42:15.565 So the way that notation ranges g is on g is the function you're going to apply to your vector why of observations to get your estimated X and what this thing is saying, what's the best g. 310 00:42:15.869 --> 00:42:25.559 Maybe mean, but maybe not mean, maybe median or something. Actually a few random variables are weird enough media and works better than mean. 311 00:42:25.559 --> 00:42:29.670 Weird enough. Meaning seriously not normal. 312 00:42:29.670 --> 00:42:38.880 Okay, so they're crunching it on getting a high level stuff today. So. 313 00:42:40.650 --> 00:42:51.389 And is working through about this and figuring out what's happening. And because I just gave you the giving you the point of what they're doing and. 314 00:42:51.389 --> 00:42:59.340 And then now they get mean square and then they start getting particularly assume X and Y, are joint counts. 315 00:42:59.340 --> 00:43:02.639 And if they do that, then they can go through. 316 00:43:02.639 --> 00:43:07.800 And get things. Okay. And. 317 00:43:12.119 --> 00:43:25.619 Here's an interesting thing that the diversity receiver. Okay, what it means is that there are 2 and 10 as you see, and you see this sort of thing with WI, fi setups. And so and you'll see. 318 00:43:25.619 --> 00:43:28.949 Multiple antennas for transmitting are receiving. 319 00:43:28.949 --> 00:43:34.500 And the reason is that 1 antenna might be in a dead zone and. 320 00:43:34.500 --> 00:43:43.380 2 and 10 is 1 in tangible, not being a dead zone or and so the idea is that you attempt to receive the signal. 321 00:43:43.380 --> 00:43:58.224 From 2 and tennis, and you look at the to receive signals and you pick the better 1 where we have to figure out what better means now. Okay, so there's the 1 transmitted signal X but the 2 received said, why 1, and why 2 with different noises. 322 00:43:58.530 --> 00:44:05.940 And again, real world, you know, the noise might be seriously non Gaussian, but we're not going there. 323 00:44:05.940 --> 00:44:10.289 Okay, but any case. 324 00:44:11.369 --> 00:44:18.750 So, real world that might not be linear. Okay so now. 325 00:44:21.480 --> 00:44:27.719 What we're doing is and we're also in here, making some assumption, we can estimate the noise. 326 00:44:28.195 --> 00:44:42.445 Okay, there's ways you could do that. I guess if the signal being transmitted has error Corrector priority bad sentence. You can look at how many parity bits were wrong and not give you an idea of the noise 1 thing off the top of my head. 327 00:44:42.954 --> 00:44:44.454 Okay. So any case. 328 00:44:46.530 --> 00:44:52.019 You're working with here the 2 noise that to receive signals y1 Y2. 329 00:44:52.019 --> 00:44:56.849 And we're looking at how the to reception was correlate with each other. 330 00:44:56.849 --> 00:44:59.880 Okay, so. 331 00:44:59.880 --> 00:45:09.030 On here, we're assuming that the noises 0 mean unit variance that are really simplifying the problem for you to make it actually workable. 332 00:45:09.030 --> 00:45:14.820 But at this point, it's seriously more simple than the real world but okay. 333 00:45:14.820 --> 00:45:24.090 And we're also sending the noises, independent, the 2 ends and which again is getting a touch idealistic. 334 00:45:24.090 --> 00:45:28.800 Noise independent of X, which is okay now. 335 00:45:31.019 --> 00:45:34.980 And now what we're doing is we're looking at. 336 00:45:34.980 --> 00:45:42.869 1, single into signals here, so correlation. So you see okay, so we have. 337 00:45:43.375 --> 00:45:57.594 Expected values of the Y squared and y1 Y2 and I are here. And why is why 1 is X plus N1 and so on. So we do that and we can simplify things out and we are assuming. 338 00:46:01.349 --> 00:46:07.710 Now, what's happening down here how we go from, say, expected value X squared. 339 00:46:07.710 --> 00:46:22.644 2 is that X is known to be calcium normal 01, which means 0 variance 1 and therefore the expected X squared will be 2. I might work it out at some point and for this. 340 00:46:22.644 --> 00:46:23.304 So, this is. 341 00:46:25.980 --> 00:46:31.409 How do they get those things there? Okay. Um. 342 00:46:31.409 --> 00:46:35.039 And executive value of X times Y. 343 00:46:35.039 --> 00:46:41.849 Is this and again, skipping over several steps the. 344 00:46:41.849 --> 00:46:46.019 Off to the Estimator for. 345 00:46:46.019 --> 00:46:54.090 If we just have 1, real estate tax 2 thirds. Well, I know why it's not. Why is it? And this is the thing about the noise. 346 00:46:54.090 --> 00:47:00.869 If we don't know anything, the optimum get. 0. okay. So we know why 1. 347 00:47:00.869 --> 00:47:08.940 But there's noise in there, so the ultimate it's not as far as why 1 it's only 2 thirds. y1. 348 00:47:08.940 --> 00:47:15.599 Because of the effect of the noise. Okay. And we can get the, I mean, squared error. 349 00:47:15.599 --> 00:47:28.170 Now, if we have 2 and candidates, and we're assuming everything is linear bubble and attendant the 2 noises are independent than this. The optimists are turns out will be. 350 00:47:28.170 --> 00:47:41.849 Point 4, y1 plus point 4 Y2. So well, I want to know why 2 are the same and comes up to be point 8. why that 2 thirds why the effect of noise is smaller because we've got 2 observations and the main square. 351 00:47:41.849 --> 00:47:48.269 It went down from 2 thirds to point 4 so having 2 and tennis. 352 00:47:49.380 --> 00:47:53.280 Helped us a lot. Um. 353 00:47:53.280 --> 00:47:58.500 Everything is here and the real world was dead zones and on to and Chinese would help us even more. 354 00:47:58.500 --> 00:48:06.989 Okay, okay. Um. 355 00:48:06.989 --> 00:48:10.469 Here's another 1. okay. What's happening here? 356 00:48:10.469 --> 00:48:17.340 As we have an audio signal could be speech or something. So the samples are at different times and. 357 00:48:17.340 --> 00:48:21.210 Empty, we want to capture stuff up. 358 00:48:21.210 --> 00:48:33.750 To oh, I would say, talk traditional telephony worked with saying those up to 3. kilohertz. I believe you can correct me if I'm wrong. 300 to 3000 hurts. 359 00:48:33.750 --> 00:48:46.349 So, to capture a 3 kilohertz thing, you got a sample at least 6,000 times a 2nd, that's at. Really? It should be doing a lot more, but okay. 360 00:48:46.349 --> 00:48:51.239 So, you got signals every 6, thousands of a 2nd, those X. so, by. 361 00:48:52.889 --> 00:48:57.059 And what you want to do is. 362 00:48:57.059 --> 00:49:11.755 You want to predict what the next voltage will be because if you can predict it and attach probabilities on, you can do even something the customer's coding, which most of, you know, and therefore you'll take less bandwidth transmitted but we're happening. 363 00:49:11.755 --> 00:49:18.295 Here is 2nd order. So, we want to predict the next voltage, give them the last 2 voltages. 364 00:49:18.570 --> 00:49:23.429 And the idea is that if there's a trend there, we'll capture it if if. 365 00:49:23.429 --> 00:49:34.590 It's the voltage went up over the last 2. we might extrapolate that's going to continue going up or the lower is going to continue going down. Okay so that's the 2nd order prediction. 366 00:49:34.590 --> 00:49:44.309 And so the Estimator here is and minus 2 plus B minus 1 and B are unknown. 367 00:49:44.309 --> 00:49:47.400 And we want the best predictor. 368 00:49:47.400 --> 00:49:52.650 Assuming it's the and okay and. 369 00:49:52.650 --> 00:50:02.849 We're making also to simplifying assumptions. 0, mean, constant variance and savings this time. Invariant. 370 00:50:02.849 --> 00:50:06.809 And there's a whole variance between the exercise that doesn't change. 371 00:50:06.809 --> 00:50:19.619 So, we simplified the problem, so it's totally not really reality, but let's not go there. And but it's now simple enough. We can have a hope of analyzing it and. 372 00:50:20.820 --> 00:50:27.929 Is the diagram to communications people like to use if you haven't seen the diagrams before? 373 00:50:27.929 --> 00:50:39.929 We've got, they've got these 33 actions up here we want to predict X and X and minus 1 X and minus 2. we take them. 374 00:50:39.929 --> 00:50:44.400 Little float data flow, chart and modified by a and B, and add. 375 00:50:44.724 --> 00:50:57.715 And then, plus a little minus there, so we're subtracting and you take X and what we got down here. And the little hat is our predicted value. We subtract it out and we get the error. 376 00:50:58.014 --> 00:51:12.804 It's called a 2 tap predictor. Because we're using the last 2 values, and it's called a tap, because at some point in the a**, there might have been a physical delay line. We're actually tapping value value is off or something. Okay. 377 00:51:12.804 --> 00:51:14.545 So this is just to refresh. 378 00:51:14.849 --> 00:51:18.449 2nd order prediction of speech and. 379 00:51:18.449 --> 00:51:21.630 You'd do it because if you predictor is good. 380 00:51:21.630 --> 00:51:29.670 You can then compress the signal and you might do more than the 2nd order. 2nd order is popular for some simple image. Compression. Let's say. 381 00:51:29.670 --> 00:51:33.449 Okay on. 382 00:51:33.449 --> 00:51:38.010 And they can work out the ever, and so on. 383 00:51:38.010 --> 00:51:51.840 Um, and they're saying that they can lower the variance to about a quarter of what it was before by using for this real example. 384 00:51:53.550 --> 00:52:02.519 And these predictors, and they say they used extensively and now they're getting fancier. 385 00:52:02.519 --> 00:52:10.980 Because you're using neuronet and machine learning and so on think of what Tesla is doing. 386 00:52:10.980 --> 00:52:17.579 For their autonomous vehicles stuff serious prediction. 387 00:52:17.579 --> 00:52:23.909 Okay, okay. I'm not doing start things. 388 00:52:23.909 --> 00:52:29.039 But just generating random stuff, be carefully. 389 00:52:29.039 --> 00:52:36.929 It's harder to do than you think, than you think, you know, you don't want to try to use a separate team to do it. 390 00:52:36.929 --> 00:52:46.139 Mentioning octo here and and that we're chapter 6. okay. 391 00:52:46.139 --> 00:52:51.210 I'll call him again on Monday in greater detail, but again, that's it. 392 00:52:52.769 --> 00:53:00.480 And let me get some of the points here. I chilly. 393 00:53:02.130 --> 00:53:11.099 So we have a vector of random variables and they have a joint kill to distribution function. John Kim looked at. That's a capital f. 394 00:53:11.099 --> 00:53:21.690 And the CDF works with either mass or density, a discrete or continuous, which is great. You got the joint probability, mass function properly. 395 00:53:21.690 --> 00:53:33.300 Factor of X separately, or the density function if it's continuous, which gives the probability of a very small square cube. Okay. Sized DX. Okay. 396 00:53:33.300 --> 00:53:43.019 And then limit is the X goes to 0. okay. And you can do marginal density functions and CWS. And so I'll just integrate out. 397 00:53:43.019 --> 00:53:46.920 Or some out the variables, you're not interested in. 398 00:53:46.920 --> 00:53:55.559 And then there's a definition that they said they're independent if the joint probability. 399 00:53:55.559 --> 00:53:58.739 It was the point thing if it's. 400 00:53:58.739 --> 00:54:09.809 Discreet or the integral over the density functional continuous if the joint probability can be separated out into the product of the probability of each of each. 401 00:54:09.809 --> 00:54:19.500 Variables separately and if it's true for all the false values of the variables, that's independent. Okay. It's a definition. 402 00:54:20.519 --> 00:54:25.590 And then we want to talk about statistical behavior that means means variances. 403 00:54:25.590 --> 00:54:26.574 And so on, 404 00:54:28.974 --> 00:54:31.945 and I mentioned the conditional expectation that I defined again, 405 00:54:31.945 --> 00:54:32.514 today, 406 00:54:33.025 --> 00:54:38.994 you can summarize information with the mean the vector means and the variance matrix again, 407 00:54:38.994 --> 00:54:40.315 which says pair wise, 408 00:54:40.315 --> 00:54:42.445 linear relations between the variables, 409 00:54:42.835 --> 00:54:43.164 not. 410 00:54:43.440 --> 00:54:48.900 Triple wise, only pair white and only the linear relations between them. 411 00:54:48.900 --> 00:54:55.559 You can transform random variables to get other random variables. 412 00:54:55.559 --> 00:55:03.929 Um, I gave you some little hints and my blog, you know, you're doing measurement and feet and you want to convert it to meters. Well, that's. 413 00:55:03.929 --> 00:55:12.929 Really simple or you have Cartesian notation X and Y, and you want to work with Polar, you're transforming random variables. 414 00:55:12.929 --> 00:55:21.300 Folder rotate, or let's say, in this, you got the original density function, you want to find the new density function. 415 00:55:22.284 --> 00:55:34.284 Okay, and you want to find some best estimators for some of these things and minimize error 1 thing you may want to minimize the mean value of the squared error. 416 00:55:34.284 --> 00:55:48.505 It means squarest or minimizing this weird error means that really large area errors, call, carry or heavier weight in the formula. You're trying to minimize lessen. The chances are really big errors. 417 00:55:48.900 --> 00:55:56.550 And frankly, historically means square error was used because it was easy to compute the books. Don't tell you that. 418 00:55:56.550 --> 00:56:07.079 Okay, and you get things again working with pairwise relations and stuff like that. 419 00:56:07.079 --> 00:56:21.000 And we talk about jointly calcium and jointly galaxy is not just any 2 calcium, random variables there. 2 gals. You've got individuals that have a particular relation between them and goes to the definition of jointly calcium. 420 00:56:21.000 --> 00:56:25.530 And has a definition for what a jointly Gaussian pair of factor around the variables are. 421 00:56:25.530 --> 00:56:31.920 It's more than to teach 1 separate is calcium. Okay. I'm talking about here. 422 00:56:31.920 --> 00:56:35.670 Important terms, and I've hit most of them. 423 00:56:35.670 --> 00:56:50.670 I have problems, I guess, Monday to try your street reading. 424 00:56:50.670 --> 00:56:55.199 And. 425 00:56:55.199 --> 00:57:00.989 Yeah, um. 426 00:57:00.989 --> 00:57:07.980 A little high level stuff from here and random variables you may want to find the, some of them. 427 00:57:07.980 --> 00:57:17.849 And then the expected value of the sum and the variance of the, some well, this is along the way to finding the mean and so on. 428 00:57:17.849 --> 00:57:28.260 Accepted values add, regardless of the relation between the random variables. This is particular case for expectation. That's 72 is quite nice. 429 00:57:28.260 --> 00:57:33.179 Variances not add in this. They are. I'm independent. 430 00:57:33.179 --> 00:57:36.510 Okay, um. 431 00:57:38.010 --> 00:57:43.019 The formula for the variance of the sum of X and Y. 432 00:57:44.400 --> 00:57:55.980 Well, it just uses the definition of variance, blah, blah, blah you can work it out. I can work it out. Maybe it's in SAP and it's the variance as some if some of the variances plus twice typical variance. 433 00:57:55.980 --> 00:58:02.670 So, if they're not correlated the CO guarantee 0, so the variances just add. 434 00:58:02.670 --> 00:58:07.170 If they are correlated, the variants will be bigger because. 435 00:58:07.170 --> 00:58:13.199 All they're tracking each other if their negative the card, the variance will be smaller because they cancel each other out in this. 436 00:58:13.199 --> 00:58:16.920 Okay, talking about there. 437 00:58:16.920 --> 00:58:21.300 I demons independently identically distributed. 438 00:58:21.300 --> 00:58:28.829 And there's the same mean and variance and so on. And they're independent of each other. The variance. 439 00:58:28.829 --> 00:58:35.820 Just adds what's happening there. Um. 440 00:58:35.820 --> 00:58:39.449 I not working with it transforms. So. 441 00:58:40.559 --> 00:58:46.920 Well, this is working on what happens here is if you're adding to random variables. 442 00:58:46.920 --> 00:58:51.239 The, the density of the sum is the convolution. 443 00:58:51.239 --> 00:59:00.690 Of the density functions for X and Y, and then it was showing 2 chapters back or whatever. So this can be useful. 444 00:59:00.690 --> 00:59:03.900 To tell you that and. 445 00:59:04.920 --> 00:59:09.030 And it's a, this is actually a case. 446 00:59:09.030 --> 00:59:23.664 Whereas the transform is characteristic functionalities, useful, but since I had to shorten the course, some of those things are also going to solidly this semester with everything. It's a place or carotid function would actually be useful. But. 447 00:59:24.210 --> 00:59:30.300 For you to read it. Okay. 448 00:59:33.750 --> 00:59:40.260 Yeah, okay now this is starting to get into irrelevant stuff. We're approaching statistics here. Okay. 449 00:59:40.260 --> 00:59:45.000 Is a big idea here. I'll tell you what the big idea is. 450 00:59:45.000 --> 00:59:53.789 You got some random variable and X. 451 00:59:53.789 --> 00:59:57.929 And it turns out you do not know the mean. 452 00:59:57.929 --> 01:00:10.019 Okay, and the ultimate goal is we want to learn something about the mean so we take these observations of X, but X thing around them very well. They're all going to be different. Okay. 453 01:00:10.019 --> 01:00:13.110 And so that's called a sample. 454 01:00:13.110 --> 01:00:19.440 And what we do with the sample is we find the mean of the sample. 455 01:00:19.440 --> 01:00:22.559 And that's and. 456 01:00:22.559 --> 01:00:33.269 That is, as I mentioned for half an hour ago, this tends to be a good estimator. For the unknown mean the whole population. 457 01:00:34.349 --> 01:00:39.750 You know, if the random variables distribution is. 458 01:00:39.750 --> 01:00:44.010 Not too crazy. Gaussian would be good. 459 01:00:44.010 --> 01:00:56.219 Crazy would be. I mentioned something called a Kelsey distribution a long time, and go how she distributions. Don't actually have means you can still take a sample. You could find the mean the sample, but. 460 01:00:57.480 --> 01:01:01.409 Um, the thing is, as this sample gets bigger, the meaning of the sample doesn't. 461 01:01:01.409 --> 01:01:05.400 Doesn't settle down, but in any case so Gaussian, let's say. 462 01:01:05.400 --> 01:01:15.780 Okay, so, let me take this students thing. So the population is like 7,000 students. We do not know the mean. 463 01:01:15.780 --> 01:01:27.119 Height of the students, so we pick 1% we picked 70 students, and we look at the height the, we measure the height of those 70 students. 464 01:01:27.119 --> 01:01:33.030 Okay, and that's called the sample mean the meaning of the sample. 465 01:01:33.030 --> 01:01:38.550 What what we want to do, and, of course, every sample of 70 suits going to give us a different mean. Okay. 466 01:01:38.550 --> 01:01:43.289 Now, the question is. 467 01:01:43.289 --> 01:01:50.280 How good is this? Okay, then totally looks pretty good. 468 01:01:50.280 --> 01:01:56.940 That the mean the whole population is 7,000 students is we take the mean of our sample 70. that's. 469 01:01:56.940 --> 01:02:06.869 Just intuitively it's going to just tell us something about the whole population and you're going to be. Right but how good is that so. 470 01:02:08.429 --> 01:02:16.500 If we take lots and lots of samples of 70 students, each 1 is a different mean. How much does a sample mean bounce around? 471 01:02:16.500 --> 01:02:28.559 Okay, and so we're going to get into things here called a law of large numbers and so on. And what that says as your sample gets bigger. 472 01:02:28.559 --> 01:02:37.829 It gets better. Okay. I just gave you the large numbers in English, but, you know, you can actually attach numbers to that. 473 01:02:37.829 --> 01:02:42.210 Okay, so the thing is here. 474 01:02:42.210 --> 01:02:52.079 So, we take little and is 70 in my case of 70 stood and so we take the sample mean of those 70 students. That's big. M, sub. 475 01:02:52.434 --> 01:03:04.945 Okay, now big am so, Ben, it's a random variable itself. Okay so we got random variables of random variables. Okay. Or transformations around Nebraska. 476 01:03:04.945 --> 01:03:08.844 We've got the 70 random variables are 70 students that we. 477 01:03:09.420 --> 01:03:14.070 Grabbed and mentioned the height of and. 478 01:03:15.599 --> 01:03:19.079 And we have a new random variable, which is their mean. 479 01:03:20.820 --> 01:03:27.869 And so been, it's it's all it's a random variable it's caught and itself has a mean and a variance. 480 01:03:27.869 --> 01:03:34.710 And what we want to do is compute them and. 481 01:03:34.710 --> 01:03:44.610 And the variance of M, will tell us how much will bounce around as we have more and more separate samples. 482 01:03:46.440 --> 01:03:53.909 And then essentially, the variance of them is small than M is more accurate Estimator of the original population's height. 483 01:03:53.909 --> 01:04:00.179 Okay, that's the point about here. 484 01:04:02.219 --> 01:04:13.409 Relative frequency, we're just saying, did something happen or not happened. So the random variables is there or 1. 485 01:04:13.409 --> 01:04:16.590 And we do it end time. 486 01:04:16.590 --> 01:04:22.590 Okay, and gives a relative frequency. Okay so. 487 01:04:22.590 --> 01:04:26.280 So, what's happening here? 488 01:04:27.480 --> 01:04:33.449 New is the mean of the whole population. It is a true. 489 01:04:33.449 --> 01:04:42.929 Mean height of this students and again M, sub and is the calculated mean of our sample of 70 students. 490 01:04:42.929 --> 01:04:47.550 The expected value of them is, in fact, new. 491 01:04:47.550 --> 01:04:54.360 Assuming, or we're going to say is what we want it to happen. 492 01:04:54.360 --> 01:05:08.460 Okay, well, we'll work it out again, because is 1 over and and using linearity for means. In fact, always the expected value of our sample means in fact. 493 01:05:10.139 --> 01:05:13.829 Okay, and. 494 01:05:13.829 --> 01:05:26.159 And so our sample mean is equal on the average for the true mean, which means it's an unbiased estimator. That definition of unbiased is that the mean as a sample. 495 01:05:26.159 --> 01:05:34.050 Is at the mean of the sample mean, you've got to meet him. I mean, the main of the sample mean. 496 01:05:34.050 --> 01:05:37.559 Is equal to the population. 497 01:05:37.559 --> 01:05:41.820 3 means of that sentence and if that's that's definitely by estimator. 498 01:05:42.864 --> 01:05:56.394 The point here, there's an unstated point is an Estimator might be biased. Its mean might not be equal to the true population mean. However, it might be better for some other reason. 499 01:05:56.394 --> 01:06:01.224 It might be biased, but I to have a smaller variance. And so this is why. 500 01:06:01.559 --> 01:06:08.489 Talk about bias or unbiased, so that's what's happening here. 501 01:06:10.380 --> 01:06:14.190 Now we want to know okay. Sorry. 502 01:06:14.190 --> 01:06:25.320 The mean of our sample being is equal to the true population mean. Good. Now, how much does it jump around? So, what we want is the variance of our sample mean. 503 01:06:25.320 --> 01:06:28.559 So, we want the variance of M. 504 01:06:28.559 --> 01:06:33.119 Big, and that's equal to the expected expectation of how far. 505 01:06:33.119 --> 01:06:38.400 Deviate from its main so 2nd value then began minus squared. 506 01:06:38.400 --> 01:06:47.730 And you expected value them Square, you can take this and. 507 01:06:48.780 --> 01:06:53.460 Do a simple math, and it comes down to 718. 508 01:06:53.460 --> 01:07:02.610 I'm not hiding from anything from you. It's actually is simple that the variance of the sample bean is the original population variants divided by and. 509 01:07:04.530 --> 01:07:07.739 So, what this means is that. 510 01:07:09.570 --> 01:07:14.489 If you sample more students, my samples. 511 01:07:14.489 --> 01:07:19.139 Variants will be smaller, so my samples mean. 512 01:07:19.139 --> 01:07:32.070 Well, not jump around so much, take my case, the populations of 7,000 students. Now, I was talking about having a population of a sample of 70. 513 01:07:32.070 --> 01:07:36.150 Well, maybe, I'll suppose I was sample 7, instead of 70. 514 01:07:36.150 --> 01:07:40.650 It's me and it's still going to be the true mean, but it's variance is going to be a lot larger. 515 01:07:40.650 --> 01:07:44.369 So, if I take. 516 01:07:44.369 --> 01:07:48.809 A sample of only 7, it's a much less precise. 517 01:07:49.860 --> 01:07:56.250 Estimate of the average side of the students. I mean, the means going to be correct, but it's going to jump around. It's gonna be less certain. 518 01:07:56.250 --> 01:08:01.650 On the other hand, if I take 700 students, instead of 70 students. 519 01:08:01.650 --> 01:08:09.960 The mean of that of the, the variance of the mean of a sample 700 students is is smaller. 520 01:08:09.960 --> 01:08:16.020 So, if I take 700 students, instead of 70 students. 521 01:08:16.020 --> 01:08:21.569 This is a more accurate estimate of the true population. Me. 522 01:08:23.550 --> 01:08:36.329 And, but the bigger samples costs more money and if I'm interested in the standard deviation, of course, that's the square root of the variance. So, the deviation decreases was 1 over in squared. 523 01:08:38.069 --> 01:08:48.659 And we can formalize the probability that we're on. Now, what we're getting down to here is the things that again, you look at polling, and they'll tell you that your results are accurate. 524 01:08:48.659 --> 01:08:53.640 Within 3%, 95 times out of 100 or something. 525 01:08:53.640 --> 01:08:57.510 And so there's, they're getting into an equation something like here. 526 01:08:57.510 --> 01:09:01.289 Um, and. 527 01:09:01.289 --> 01:09:07.979 Using things like Chevy even do better than cherish if, you know the probability distribution. Of course. 528 01:09:07.979 --> 01:09:14.880 Um, but this is getting down to where you see it in the real war with pollsters. Okay. And again that. 529 01:09:14.880 --> 01:09:24.060 If the sample for the pollster is larger than the error bar, and this convention called, like, the error bar or something gets smaller but the largest sample costs more money. 530 01:09:25.350 --> 01:09:33.960 And by the way, when I talk about this, I'm not talking about the difficulty if in fact drawing a random sample, which is the hardest part of the whole problem. 531 01:09:35.760 --> 01:09:39.689 Okay, um. 532 01:09:39.689 --> 01:09:45.359 And 79, they're using a voltage thing again that. 533 01:09:46.560 --> 01:09:54.000 Your again, there's a, there's a voltage which isn't known and a noise, which is known. 534 01:09:54.000 --> 01:09:57.390 And every time you measure the voltage, you get a different. 535 01:09:57.390 --> 01:10:01.560 Number this error prone, because it's got that noise in it. 536 01:10:01.560 --> 01:10:12.149 We're assuming everything is independent and whatever so the more samples you take the better the estimate is of the 2 voltage. 537 01:10:12.149 --> 01:10:15.750 And you want to get the 2 voltage within. 538 01:10:15.750 --> 01:10:22.199 With the probability of at least 99%. So, how many samples do you have to measure. 539 01:10:22.199 --> 01:10:27.600 And this idea using these ideas here. 540 01:10:27.600 --> 01:10:34.260 And laws of large numbers say that, as your sample gets larger, then. 541 01:10:34.260 --> 01:10:42.000 The sample being converges on the real thing. Oh, okay. I'm not going to worry too much about the difference between strong and weak. 542 01:10:42.000 --> 01:10:45.270 Okay. 543 01:10:45.270 --> 01:10:56.640 Central limit there, I'm going to wave my hands. Wait wait, wait wave and tell you what it is and not necessarily prove it. So what this is saying. 544 01:10:56.640 --> 01:11:06.840 This is the whole thing behind calcium or normal whites called normal, is that if you add random variables together, the sum starts looking like a calcium. That's the central limit. 545 01:11:06.840 --> 01:11:20.069 0, that's if the original random variables are either independent, they got finite mean and variance. Then you, some, some start to really quickly looking like a Gaussian. 546 01:11:20.069 --> 01:11:24.180 So that's the central limit. 547 01:11:24.180 --> 01:11:33.270 Okay, and the central theme just puts the numbers on what I said, how quickly it starts looking like that. 548 01:11:33.270 --> 01:11:36.989 Okay. 549 01:11:38.520 --> 01:11:49.380 And example, with restaurants where I'm guessing that the orders are probably not probably a reasonable point to stop. Now. 550 01:11:49.380 --> 01:11:55.739 So, I'll hit this in more detail so we were finished what today we're doing. We're finishing off. 551 01:11:55.739 --> 01:12:00.060 We were finishing off the. 552 01:12:01.944 --> 01:12:15.204 Vector random about chapter 6, and we were then starting to get into chapter 7, which is starting to approach statistics I mentioned at the start of the class and then I came round back to it at the end of the class. 553 01:12:16.494 --> 01:12:27.444 The new idea in 7 was, we're sampling a distribution, we're sampling around about every time up to now we knew the parameters of the random variable. 554 01:12:27.444 --> 01:12:31.975 We knew it had a certain certain distribution renewed, had a certain mean and variance. 555 01:12:33.204 --> 01:12:46.824 Chapter 7, we do not know that any more and we're trying to determine it. And so this is getting into getting into statistics. We still know in the distribution while we assume it's normal. Let's say, because normal. 556 01:12:46.824 --> 01:12:59.125 So common, but we don't know the mean, and maybe sometime later, we don't know the standard deviation. So we take a sample. We observe the random variable and times and from our end observations. 557 01:12:59.760 --> 01:13:10.739 We get an estimate, or for the true mean and we also get an error bar on our estimator. We're trying to engineers are trying to get this. Precisely. So, like, for. 558 01:13:10.739 --> 01:13:14.609 Taking case of the students, um. 559 01:13:14.609 --> 01:13:27.925 We said, take our sample of 70 students, the mean of those 70 Heights is our Estimator for the true mean height, but we can also put an error bar on our estimator. So, that's a true. 560 01:13:27.925 --> 01:13:33.444 Height is within such and such a region. 95 times out of 100. we'll talk about. 561 01:13:35.819 --> 01:13:49.050 Talk about that more Monday. Okay. So have a good weekend if you can I notice it's raining outside because I said, I'm just at my house. I'm 10 miles from our at the moment and. 562 01:13:49.050 --> 01:13:57.810 It's raining hard actually so my solar generator's not going to be producing much energy today, but okay, so. 563 01:13:57.810 --> 01:14:04.680 Okay, so this Thursday condition or 2, if not. 564 01:14:05.850 --> 01:14:07.829 Sounds good. Bye.