WEBVTT 1 00:00:01.110 --> 00:00:05.969 Here. 2 00:00:19.439 --> 00:00:27.839 We're going to see what's going on the screen to. 3 00:00:29.129 --> 00:00:33.750 Okay. 4 00:00:35.850 --> 00:00:41.549 I don't know what it means. My screen too, but okay. 5 00:00:43.920 --> 00:00:48.659 Okay. 6 00:00:49.740 --> 00:00:52.859 And. 7 00:00:52.859 --> 00:00:55.950 It's recording. 8 00:00:55.950 --> 00:01:01.259 So, you get the job of telling me if it's working, because I cannot actually tell from here. 9 00:01:01.259 --> 00:01:05.099 Um, so. 10 00:01:06.209 --> 00:01:16.469 And the funny thing is, if I use a 2nd computer signed in to myself to monitor what it's doing, even though I attempt to mute it. 11 00:01:16.469 --> 00:01:20.310 The 2nd microphone is still on and I can't stop it. 12 00:01:20.310 --> 00:01:23.760 Thereby causing feedback, so. 13 00:01:23.760 --> 00:01:29.310 I guess I need a separate user account. Okay. So. 14 00:01:29.310 --> 00:01:32.670 Um, last 17, March, 17. 15 00:01:32.670 --> 00:01:45.420 And thrust, so the standard template library has incorporated some of the parallel stuff from thrust in it. And what cost is still. 16 00:01:45.420 --> 00:01:48.629 I think fun to look at. 17 00:01:48.629 --> 00:01:51.870 And. 18 00:01:51.870 --> 00:02:00.959 So, the thing with trust was the central library, not have it thrust from NVIDIA has back ends and, um. 19 00:02:00.959 --> 00:02:10.710 I haven't really talked about that yet, but you can set compiler flags to make the back end to be the GPU. Let's say, which I don't think you would necessarily have. 20 00:02:10.710 --> 00:02:17.069 In the okay, so what I have here pointing to lots of examples. 21 00:02:17.069 --> 00:02:22.680 And and I got some notes on some of them here. 22 00:02:22.680 --> 00:02:30.840 And I put up another homework to play with some of so this so I've got talking about some of the things here. So I'll show you some of the examples. 23 00:02:30.840 --> 00:02:36.240 And then I'll continue on with the slides notes. I've typed up. 24 00:02:36.805 --> 00:02:47.754 And also, I've got notes, I've typed on some of the thrust examples they're worth looking at, because it's surprising what you can do. 25 00:02:47.754 --> 00:02:52.824 And this is officially in parallel clicks, log in log P time, give or take where they're at. 26 00:02:53.430 --> 00:03:00.509 Processors P threads available and you think with single program, multiple data stream. 27 00:03:00.509 --> 00:03:03.810 You'd be limited because. 28 00:03:03.810 --> 00:03:13.680 Every thread is doing the same thing, but in fact, they've got some parents, some surprising paradigms that let you do things in parallel. 29 00:03:13.680 --> 00:03:17.219 And we see some examples here. Um. 30 00:03:17.219 --> 00:03:25.530 Well, just, for example, run, length, encoding and decoding doing that in parallel past sounds, you know, think about how you would do that. 31 00:03:25.530 --> 00:03:31.110 So, I have notes that I typed in on some of these things here. 32 00:03:31.110 --> 00:03:39.330 So, um, and some notes I've got on efficiency that I've determined and okay. Um. 33 00:03:40.889 --> 00:03:46.830 Let me just, for example, perhaps I'll go over to, um. 34 00:03:46.830 --> 00:03:51.900 Let's see. 35 00:03:53.639 --> 00:03:58.319 Did not work. 36 00:03:58.319 --> 00:04:02.819 This work good. 37 00:04:02.819 --> 00:04:06.870 Okay. 38 00:04:06.870 --> 00:04:10.560 No one's doing anything 252 Jake of memory. 39 00:04:10.560 --> 00:04:16.829 Being used okay, so just show you where someone. 40 00:04:16.829 --> 00:04:22.500 Got a link here our class. 41 00:04:22.500 --> 00:04:25.858 Users or your accounts, and only, you can. 42 00:04:25.858 --> 00:04:34.048 Okay, yeah. 43 00:04:34.048 --> 00:04:39.658 Yeah, so the examples I pulled all. Okay. Um. 44 00:04:39.658 --> 00:04:45.988 Um, the examples are on the GitHub, so the video, um. 45 00:04:45.988 --> 00:04:56.999 Over time and video moves stuff around and it gets a little confusing and in fact, there's old versions of thrust and various places you have to watch when you're checking documentation. 46 00:04:56.999 --> 00:05:03.329 Even from Nvidia that they may be pointing to obsolete locations, but currently. 47 00:05:03.329 --> 00:05:06.869 If you install the current version of CUDA. 48 00:05:06.869 --> 00:05:12.928 It comes with the thrust what trust is again. It's just header files. It's not each files. 49 00:05:12.928 --> 00:05:19.139 Because all the code that Mike could execute, it is in the header files, you know, it says member functions. 50 00:05:19.139 --> 00:05:28.738 And so there's nothing to be linked at run time. The code isn't, the source code is included in your program and gets compiled. 51 00:05:28.738 --> 00:05:34.019 Okay, so that's necessary to run thrust. 52 00:05:34.019 --> 00:05:47.423 However, the executable, I mean, the examples they're on GitHub, and I've had a link in last class, I think, are the costs before, and you can just, you know, clone it. It's a GitHub repository. 53 00:05:47.963 --> 00:05:52.584 Um, what I just did is I cloned it and I grabbed this stuff and threw it in here. 54 00:05:52.798 --> 00:05:56.699 So, um. 55 00:05:56.699 --> 00:06:01.978 So, actually, I'm going to look at 1 or 2 of these and then I'll go back to the. 56 00:06:01.978 --> 00:06:09.298 The slide set that I was halfway through on Monday, because it was a, it was, there's a lot of input. It's surprisingly. 57 00:06:09.298 --> 00:06:13.499 Dan's, there's a lot of stuff that takes thinking. 58 00:06:13.499 --> 00:06:17.608 But something like minmax, perhaps. 59 00:06:17.608 --> 00:06:23.428 Now, to make the things I insert my own, make file um. 60 00:06:23.428 --> 00:06:29.189 I'll just copy, I'll just leave this basically read only and just, um. 61 00:06:29.189 --> 00:06:32.788 Copy I'm sorry. 62 00:06:32.788 --> 00:06:38.579 Okay, so exam, I'll just copy it through an example. Okay. 63 00:06:39.689 --> 00:06:44.459 Right so the only thing I did in the make file is that. 64 00:06:44.459 --> 00:06:49.738 I created a rule here. Um, I. 65 00:06:49.738 --> 00:07:01.108 1st thing is that I told, make the dots as a suffix of interest suffix. There needs to be dot CC and that's what dot O. and so on. 66 00:07:01.108 --> 00:07:05.249 So, I tell it that, um, this is a topics. 67 00:07:05.249 --> 00:07:14.519 That may be relevant to make and then, um, so this is like a, what they call it a pseudo rule or something. The next thing. 68 00:07:14.519 --> 00:07:19.918 I say is make file syntax. What this says here. 69 00:07:19.918 --> 00:07:25.709 Is that if it's searching for an executable look if I say, make food. 70 00:07:25.709 --> 00:07:37.199 Then make file will assume who is an executable that you have to compile some source file to get food and it will look for food on f. C. C. C. 71 00:07:37.199 --> 00:07:42.028 And who see, you know, it has an order that goes to looking for source files. 72 00:07:42.028 --> 00:07:45.538 And the 1st source file, it finds it will compile. 73 00:07:45.538 --> 00:07:49.709 And if it's a dot f. C CC, it has a built in rule. 74 00:07:49.709 --> 00:07:53.819 For you to compile the dots, you file I gave it the rule here. 75 00:07:53.819 --> 00:07:57.028 And it says Ron envy C. C. 76 00:07:57.028 --> 00:08:00.509 This dollar less than. 77 00:08:00.509 --> 00:08:04.199 Is a macro to expands to the name of the source file. 78 00:08:04.199 --> 00:08:08.189 That I'm compiling, it'd be food, let's say, and then. 79 00:08:08.189 --> 00:08:12.238 It just goes in literally and dollar ad signed. 80 00:08:12.238 --> 00:08:16.348 Is the macro to expand to the name of the executable that I'm generating. 81 00:08:16.348 --> 00:08:22.499 So this will tell it how to compile a doc file to make an executable. 82 00:08:22.499 --> 00:08:29.879 And you can generalize this if I wanted, I could put lots of extra macros and options in here, but I don't have to. 83 00:08:29.879 --> 00:08:33.928 So, okay. 84 00:08:33.928 --> 00:08:39.599 And if you're curious, I added a comment to the make file. 85 00:08:41.879 --> 00:08:51.538 Up here because of how I installed the CUDA, the NVIDIA stuff onto parallel. 86 00:08:51.538 --> 00:08:56.759 Basically, this is where the interesting stuff is, um. 87 00:08:56.759 --> 00:09:00.239 Dot often whatever. 88 00:09:00.239 --> 00:09:09.328 And way, way down there is where they include files are if you wish to look at them. So, in the in your cross program, you say, include. 89 00:09:09.328 --> 00:09:20.369 Food or something, and it will search a sequence of include directories to find food. And this is 1 of them. It will search just because of how I installed. 90 00:09:20.369 --> 00:09:24.958 And be, so you do not need. So I listed that in case. You wanted to look at the. 91 00:09:24.958 --> 00:09:29.038 Include files yourself. Okay. 92 00:09:30.599 --> 00:09:36.119 So, we could have something like minmax. The name sounds interesting. And let's look at that. 93 00:09:38.788 --> 00:09:44.759 And again, some of this is a chance to see some interesting ideas in C + . 94 00:09:44.759 --> 00:09:53.759 So, it's a toolbox, you know, if you, you can use these tool, it says includes up here. And, um, so you can go look at the source code. If you want. 95 00:09:53.759 --> 00:10:04.678 Currently the way if I just do the basic compile, I think it just runs it on the it doesn't actually run stuff on the GPU yet but. 96 00:10:04.678 --> 00:10:13.798 Okay, so we have a class here called min, Max pair. 97 00:10:13.798 --> 00:10:19.528 And it's a templated class and so be in mid Max pair and whatever and flow whatever. 98 00:10:19.528 --> 00:10:22.649 Um. 99 00:10:24.298 --> 00:10:32.999 And, um, and now what we're going to do is, um. 100 00:10:34.769 --> 00:10:42.089 We overload the press this operator again. It's a weird syntax, but it's incredibly powerful. 101 00:10:42.089 --> 00:10:49.828 And what this overloading here add to, this is just the construction. 102 00:10:49.828 --> 00:10:55.469 Function, so this is a function that we can call to construct a new element of class. 103 00:10:55.469 --> 00:11:00.298 And Max pair, and what you do is you give it an initial value. 104 00:11:00.298 --> 00:11:03.778 And it just loads the. 105 00:11:03.778 --> 00:11:09.688 So the class min, Max pairs 2 members, and it just loads it in. 106 00:11:09.688 --> 00:11:16.139 So this is a rather clumsy way to do it. 107 00:11:16.139 --> 00:11:24.359 Initializing things up in the definition line is actually cleaner and faster, but. 108 00:11:24.359 --> 00:11:28.558 Okay, I go back a little. 109 00:11:30.119 --> 00:11:34.589 What is happening here is. 110 00:11:36.808 --> 00:11:44.609 It constructs a new element here and then returns it and with any decent compiler. 111 00:11:44.609 --> 00:11:50.969 There does not create a temporary. It actually constructs this in the target of the assignment. 112 00:11:50.969 --> 00:11:55.019 Okay, um. 113 00:11:57.629 --> 00:12:02.519 Now, what we have here. 114 00:12:04.019 --> 00:12:13.139 Um, is your overloading parenthesis operator again but this time with 2 arguments. 115 00:12:13.139 --> 00:12:24.658 The construction operator had 1 argument, that's how the compiler can tell which it is. So if you call a minmax pair variable Francis, us with 2 arguments. 116 00:12:24.658 --> 00:12:33.239 It will call this function here and with this function here will do we'll update them in in the max. So. 117 00:12:33.239 --> 00:12:37.349 A variable of class min, Max pair is the current min and Max. 118 00:12:37.349 --> 00:12:45.538 You call it with another minmax pair variable it will update the minimum maximum the current variable. So. 119 00:12:45.538 --> 00:12:52.859 Um, so this is a powerful idea here you're working with just, you know, aggregate. 120 00:12:52.859 --> 00:13:02.099 Okay, and then the main program. 121 00:13:04.769 --> 00:13:13.678 Little things here you don't worry about size tea might say to size is like an individual a size. He's like an unsigned manager. Um. 122 00:13:15.839 --> 00:13:21.509 I'm skipping past how around year round of number generator for now um. 123 00:13:24.089 --> 00:13:30.418 They're separating the cons of the abstract concept of a random number generator from a particular distribution. 124 00:13:30.418 --> 00:13:45.149 So well, maybe this is worth talking, but so we have a distribution read the kinds of uniform distribution and then we take the distribution and we apply, or we run a random number generator on that distribution. 125 00:13:45.149 --> 00:13:54.389 I'm going to ignore that part there and this is just having a simple 4 loop. I'm not trying to do it in parallel. Um. 126 00:13:59.099 --> 00:14:02.399 And, um. 127 00:14:03.509 --> 00:14:13.379 Yeah, that what's happening here I'm trying to ignore also. It's just. 128 00:14:13.379 --> 00:14:18.958 Take the apprentices operator and give it a name actually um. 129 00:14:19.979 --> 00:14:23.188 I'm trying to I'm going to skip over that. Um. 130 00:14:27.688 --> 00:14:35.818 So, what's happening down here? Is that again? The min, Max parents a template class. 131 00:14:35.818 --> 00:14:43.168 And just create a variable and initialize with the 1st data the minute. The Max. Okay. Um. 132 00:14:43.168 --> 00:14:48.808 Now, here is the interesting thing. All the content happens here. 133 00:14:48.808 --> 00:14:52.619 And it works because of what we've set up. So. 134 00:14:54.359 --> 00:14:58.078 A, um. 135 00:14:58.078 --> 00:15:04.619 So, a transform reduce, it takes a vector. 136 00:15:04.619 --> 00:15:07.678 It applies a function to each element of the vector. 137 00:15:07.678 --> 00:15:14.099 And then it takes us output vector and reduces it. Adds it all up? Let's say. 138 00:15:14.099 --> 00:15:17.219 Okay, now. 139 00:15:19.078 --> 00:15:25.379 And the vector again is just if I, if I'm not making you see sick by scrolling back. 140 00:15:25.379 --> 00:15:32.009 The vector is, um. 141 00:15:32.009 --> 00:15:35.278 It's a device factor here, so, you know. 142 00:15:38.009 --> 00:15:45.089 Um, so just as an aside, this would be incredibly slow because we're. 143 00:15:45.089 --> 00:15:49.229 Writing to the device element by element by elements so that. 144 00:15:49.229 --> 00:15:55.078 Don't worry about that. That's it. If it's compiled to make the device a thing. Okay. So now. 145 00:15:55.078 --> 00:16:00.239 So, data is a vector of ends and it began data end. 146 00:16:00.239 --> 00:16:03.778 So, transform reduce, um. 147 00:16:03.778 --> 00:16:07.979 Takes each element of the vector that's an end. 148 00:16:07.979 --> 00:16:17.698 And turns it into a min, Max variable. So it takes it in and turns the into the into a min, Max variable. 149 00:16:17.698 --> 00:16:32.548 And the way that work it just the API for transform reduces this argument here is a function of 1 argument that gets applied to elements of the input factor. So, a urinary off is. 150 00:16:32.548 --> 00:16:41.458 It's mid Max urinary off up here and if. 151 00:16:41.458 --> 00:16:44.938 Don't bother going all the way back through here. 152 00:16:44.938 --> 00:16:51.089 It will net out if I go back far enough to be the constructor thing. 153 00:16:53.519 --> 00:16:59.908 Way right up here. So so if I ignore some of the details. 154 00:16:59.908 --> 00:17:02.999 Go back to the interesting thing. 155 00:17:02.999 --> 00:17:09.449 So, urinary off constructs a variable of class. Min, Max. 156 00:17:09.449 --> 00:17:12.989 From or whatever, it's a template it. 157 00:17:12.989 --> 00:17:19.588 Okay, and so this works because the API for transform reduced takes this thing. 158 00:17:19.588 --> 00:17:22.769 And treats it as a function of 1 argument. Okay. 159 00:17:22.769 --> 00:17:30.778 And then what transform reduce does. Okay so doing the reduction. This is just the initial. 160 00:17:30.778 --> 00:17:34.409 The thing that we're doing now, we're not doing an addition here. 161 00:17:34.409 --> 00:17:39.808 What we're doing element element what we're doing to the reduction is binary, all. 162 00:17:39.808 --> 00:17:46.259 It's in this case, it shows the power of reduction binary off. It's not an addition. 163 00:17:46.259 --> 00:17:52.739 It's an update of the min, Max, so binary up up here is, um. 164 00:17:52.739 --> 00:17:57.479 If I can scroll back, if I ever scroll back too fast, you're getting seasick. Let me know. 165 00:17:57.479 --> 00:18:02.699 And but binary off is up here. 166 00:18:02.699 --> 00:18:05.759 And what it is doing. 167 00:18:05.759 --> 00:18:11.818 Is actually, I didn't look at it, but it is updating. 168 00:18:11.818 --> 00:18:16.348 Men and Max, so, binary optics, 2 arguments. 169 00:18:16.348 --> 00:18:19.769 And which are min, Max variables. 170 00:18:19.769 --> 00:18:23.038 And it computes the amount of demands that's here. 171 00:18:23.038 --> 00:18:26.759 And the maximum access and returns the result. 172 00:18:26.759 --> 00:18:31.229 So, instead of doing addition, it's doing a minmax update. 173 00:18:31.229 --> 00:18:35.159 Of the men's and maximum access. 174 00:18:35.159 --> 00:18:38.548 Then I go back down to the interesting thing. 175 00:18:38.548 --> 00:18:44.489 Which is here and so at the end of all of this result. 176 00:18:44.489 --> 00:18:49.979 Will be a min, Max variable, which has the men of the whole vector in the max of the whole vector. 177 00:18:51.239 --> 00:18:56.098 So, it shows the power of reduce. 178 00:18:56.098 --> 00:19:04.199 Reduce the operator it applies to the elements of the vector. All the operator has to do is actually used to be associative. 179 00:19:04.199 --> 00:19:11.578 Probably committed to evolve, so I'm not even certain, but it certainly has to be associative. 180 00:19:11.578 --> 00:19:19.078 But you see, it's any any operator we can apply to elements of class minmax. So. 181 00:19:19.078 --> 00:19:30.659 So, you see the concept of reduce as powerful, you can do more powerful things and there's another example where we use this to find and closing boxes around points and 2 dimensions. 182 00:19:30.659 --> 00:19:37.378 It's like a min, Max and also and why? So we'll have a more general version of binary off, which takes to. 183 00:19:37.378 --> 00:19:42.118 Closing boxes and fine tune, closing box of the imposing boxes. 184 00:19:42.118 --> 00:19:50.788 And so we define the class, which does this, and then be compute the bounding box around everything. 185 00:19:50.788 --> 00:19:56.009 And it's just and all the work happens in a transform reduce. 186 00:19:56.009 --> 00:20:00.118 See, it's a powerful concept is a programming paradigm. 187 00:20:01.199 --> 00:20:05.608 And then at the end here, we just write everything out at the end. So. 188 00:20:07.979 --> 00:20:11.999 That's the idea. Um, I can make it. 189 00:20:19.439 --> 00:20:26.219 In BCC, surprisingly slow and then I run it. 190 00:20:27.298 --> 00:20:35.398 The reason I say dot slash is that ensures it's running in the current directory, because I never know what my execution path is. 191 00:20:35.398 --> 00:20:40.618 And the men of the men's is 10 to maximum access 97 or something. 192 00:20:42.118 --> 00:20:52.078 And let me show you the bounding box thing. I'll take this idea a site step further. We have, um, do we have founding box here? 193 00:20:54.088 --> 00:20:58.558 Founding box that CEO, so we'll take the idea. 194 00:20:58.558 --> 00:21:10.199 Okay more. Okay. And again, this works cause C + plus the overload operators. So we have a class point to D, new members X and Y. 195 00:21:10.199 --> 00:21:16.979 So, this is showing a little more modern way to initialize again. Um. 196 00:21:16.979 --> 00:21:20.098 Well, this is doing here, this is a. 197 00:21:20.098 --> 00:21:23.489 Construction this is a functional construct a new element. 198 00:21:23.489 --> 00:21:28.798 Of class point to D when there are no arguments and the syntax here. 199 00:21:28.798 --> 00:21:33.118 Is after the name of the colon and then. 200 00:21:33.118 --> 00:21:38.459 X is a member of the of the class and this just as initialize x0. 201 00:21:38.459 --> 00:21:44.519 Initialize Y, to 0. this is another construction function here. 202 00:21:44.519 --> 00:21:52.108 And, um, and this will be if you can start a new value and give the construction function 2 arguments. 203 00:21:52.108 --> 00:21:56.278 Which are underscore and justice initializes thing. So. 204 00:21:56.278 --> 00:22:03.568 Now, the semantics of this colon, whatever it is, the initialization are done in parallel, which can be useful. 205 00:22:03.568 --> 00:22:09.028 And they're all done before the body of the function is executed and here, the body is empty. 206 00:22:09.028 --> 00:22:16.439 No, okay so that's a point to the and then we have a, um. 207 00:22:16.439 --> 00:22:21.868 Class for the bounding box and, um. 208 00:22:21.868 --> 00:22:26.308 Well, 1st, you've got a construction function for it. Um. 209 00:22:26.308 --> 00:22:30.148 If said, no arguments, um. 210 00:22:32.368 --> 00:22:37.378 Well, let me go back down, go back down and know a mentor 2 here. 211 00:22:38.939 --> 00:22:45.148 The construction things. Okay so these are the 2 member of variables for the class. 212 00:22:45.148 --> 00:22:51.689 22 D, points lower left and upper right and all all the way up here are construction functions and construct. 213 00:22:51.689 --> 00:22:57.179 A bounding box, if you give it a pair of points that takes the 2 points and, um. 214 00:22:59.818 --> 00:23:03.749 If we can, if we construct a box, we give it 1 point and lower upper right? 215 00:23:03.749 --> 00:23:07.919 And so on, okay, give it 2 points. 216 00:23:07.919 --> 00:23:12.328 Then whatever it does reasonable things you can. 217 00:23:12.328 --> 00:23:17.878 Okay, now, um. 218 00:23:17.878 --> 00:23:27.929 So, what happens here is that this is overloading the Prince's operator for bounding box and if you give it to bounding boxes. 219 00:23:27.929 --> 00:23:31.318 It finds the bounding box that contains the 2 of them. 220 00:23:31.318 --> 00:23:36.298 As men's on the lower left maxes on the upper right and returns to the box. So. 221 00:23:36.298 --> 00:23:43.528 And then the main program generate some points I'm trying to ignore how there are. A number of generator works. 222 00:23:43.528 --> 00:23:47.249 And, um, create initial bounding box. 223 00:23:47.249 --> 00:23:52.288 And, oops. 224 00:23:52.288 --> 00:23:58.469 Okay. 225 00:23:58.469 --> 00:24:02.098 So, just as a threat to reduce on the bounding boxes. 226 00:24:02.098 --> 00:24:06.598 It takes the vector of points. They've already been made into 2 D points. 227 00:24:06.598 --> 00:24:10.828 And then just applies binary off on them and binary off was the thing that. 228 00:24:10.828 --> 00:24:16.769 Combines bounding boxes, so again, so just reduce operator is quite powerful. 229 00:24:16.769 --> 00:24:20.638 And I could run a laptop. 230 00:24:35.489 --> 00:24:38.729 Whatever the points were, and so on. 231 00:24:38.729 --> 00:24:44.608 Okay, some simple things here. 232 00:24:44.608 --> 00:24:48.598 To get the idea, I think it's time to move back to the. 233 00:24:50.548 --> 00:24:55.558 Slides here. Okay. Um. 234 00:24:55.558 --> 00:25:01.588 So, I already okay, so the new thing here, just let me refresh a little. 235 00:25:08.519 --> 00:25:15.719 Well, we looked at last time is that you want to work with structures of a raise. 236 00:25:15.719 --> 00:25:23.338 Because you can do coalesce reads and rights with them and it's a lot that's which is good. 237 00:25:23.338 --> 00:25:28.318 So, what happens if you have something like a 3 dimensional point. 238 00:25:28.318 --> 00:25:34.979 And so what I just showed you, that example, was actually not a structure of a raise. It was in a rate structures. 239 00:25:34.979 --> 00:25:39.778 Um, so the way to do it efficiently. 240 00:25:39.778 --> 00:25:49.528 Yeah, let me go back several slides here. So the 3 D point is that so we'd have the gray or the X's the. 241 00:25:49.528 --> 00:25:55.949 You know, like purple or the wise in the dark purple or the disease, and the 3 separate arrays. 242 00:25:55.949 --> 00:26:01.888 But you want to conceptually access a point of an X Y, and Z um. 243 00:26:03.358 --> 00:26:08.189 And the irrelevant class they're using, it's a 2 fold. 244 00:26:08.189 --> 00:26:13.348 And a 2 full contains a small number of objects. Don't have to be the same class. So, here, they're going to end. 245 00:26:13.348 --> 00:26:17.699 So, what we do is we're going to use something called a. 246 00:26:17.699 --> 00:26:22.588 To Poland and a zip operator, and we're going to end up with a pointer. 247 00:26:22.588 --> 00:26:27.749 That when you de reference, it gives you a tool of an X Y, and Z. 248 00:26:27.749 --> 00:26:33.088 So, what it does is it pulls together the X Y, and Z from the 3 separate arrays. 249 00:26:33.088 --> 00:26:40.169 And presents it to you as 1 tool that you can work on, you can pull the component and it's right or you can read and you can write. 250 00:26:40.493 --> 00:26:55.104 And and it's called a or and if you up, and if you add 1 to the zip iterate, or you'll now have a pointed to a twofold with the next corresponding X Y, and Z. so you can use your good abstract programming techniques. But underneath it. 251 00:26:55.378 --> 00:27:01.169 It will compile to fast code so it's a win for everyone. Once you get over. 252 00:27:01.169 --> 00:27:06.118 The unnecessarily complex syntax, um. 253 00:27:07.528 --> 00:27:13.979 And so we have, so the class is a tool full of a float float float or whatever and. 254 00:27:13.979 --> 00:27:24.689 To get the, the 1 problem with tools, you kind of iterate over the element number, because it could be different classes. So you have to get 0 zeros. 1st and 2nd and so on. 255 00:27:24.689 --> 00:27:29.278 So, the messy, complicated syntax. 256 00:27:29.278 --> 00:27:34.558 So, X, Y, and Z. yeah. So to begin pointer to those arrows element. 257 00:27:34.558 --> 00:27:39.749 And so if you make a twofold takes 3 argument, takes. 258 00:27:39.749 --> 00:27:46.919 You know, 1 to 9 arguments and makes it to people. So now this makes a total of 3 begin pointers. 259 00:27:46.919 --> 00:27:58.769 And then makes it better, turns it into a zip iterate or which, again, the point of zip writer, you can de reference it. You'll get a tutorial and you can add 1 to what you're going to. 260 00:27:58.769 --> 00:28:04.499 Pointer to the next 1 so we make it better writers for the beginning and the ends. 261 00:28:04.499 --> 00:28:09.598 And now transform now, it's got a vector of a beginning and an end. 262 00:28:09.598 --> 00:28:13.169 And it applies rotate 2 full to each element of this. 263 00:28:13.169 --> 00:28:21.628 Virtual zipped vector that doesn't actually exist in memory because it's lazy evaluated and just compiled. 264 00:28:21.628 --> 00:28:25.138 Just created it and so on, as you need it. 265 00:28:25.138 --> 00:28:29.638 And there and reasons why this is. 266 00:28:29.638 --> 00:28:35.578 Interesting is it compiles to very efficient code. 267 00:28:35.578 --> 00:28:43.858 And and that's it, that's why to me, at least why it's interesting. So. 268 00:28:43.858 --> 00:28:50.159 And now transform, so transform, sees vector, beginning and end. It's a virtual vector but. 269 00:28:50.159 --> 00:28:53.578 Transform doesn't care. It's this thing with functional programming. 270 00:28:53.578 --> 00:28:58.229 Okay, so this was a review, um. 271 00:28:58.229 --> 00:29:02.398 And just a quick review again, since you have your pointers. 272 00:29:02.398 --> 00:29:07.648 The point is, can be lazy evaluated, so the referencing can call it a function. 273 00:29:07.648 --> 00:29:12.239 And it might return a constant range when incremental range. So there's a constant generator. 274 00:29:12.239 --> 00:29:15.989 You add 1 to it you referenced it you get some constant. 275 00:29:15.989 --> 00:29:21.628 And you add 1 to it, it's still points to that kind of counting it. Or you do reference it at Council. 276 00:29:21.628 --> 00:29:26.189 Okay, um, and. 277 00:29:26.189 --> 00:29:30.179 Yes, I think a again, um. 278 00:29:31.798 --> 00:29:37.318 Nothing interesting here I just went through all of that with the previous examples. Um. 279 00:29:39.328 --> 00:29:51.028 So min, Max again, I just showed you them in Max code slightly. This is a slightly different version of the same min, Max code. Um, it's using to pull the example I showed you before did not use. 280 00:29:51.028 --> 00:29:54.449 So, um. 281 00:29:54.449 --> 00:30:03.118 Example you before it, a specialized class min, Max, this does not have to create the specialized classes is the 2 fold. 282 00:30:03.118 --> 00:30:09.628 Um, and. 283 00:30:09.628 --> 00:30:14.548 Interesting thing there this emphasizes the components of a tool can need to be different classes. 284 00:30:14.548 --> 00:30:18.808 Okay, that's weird. 285 00:30:19.828 --> 00:30:32.848 Skip over that. Okay. Um, so any case so the just remind you the best things we like to fuse functions together. Their functions been applied to vectors. 286 00:30:32.848 --> 00:30:38.969 If you have 2 functions, you want to have the composed function and apply it. 287 00:30:38.969 --> 00:30:46.798 And optimize the composition of the 2 functions director of raised. And if I mentioned implicit sequences, Nothing's actually stored. 288 00:30:46.798 --> 00:30:52.618 Okay, so here is a nice new example, which is going to take me a while. 289 00:30:52.618 --> 00:30:55.828 To go through it as it relates to my research, coincidentally. 290 00:30:55.828 --> 00:31:00.568 2 dimensional geometry. 291 00:31:00.568 --> 00:31:07.048 We have what I call a uniform grid 3 by 3 uniform grids. So the universe has these colored points. 292 00:31:07.048 --> 00:31:10.798 Um, and. 293 00:31:10.798 --> 00:31:19.949 The points, and what we want to do is sort the points into the pocket. So called these are called pockets. Let's say buckets through sales is 9 of them. 294 00:31:19.949 --> 00:31:28.048 And what we want to do is at the end of everything, we want to know the points inside each pocket. 295 00:31:28.048 --> 00:31:33.028 Now, what makes us interesting. 296 00:31:33.028 --> 00:31:40.469 Is that the buckets? Some buckets are empty like pocket 7 and 8 and other pockets have a lot of stuff and I'm like. 297 00:31:40.469 --> 00:31:48.419 Bucket 3 so if I stop right there and you think, how would you do this? If programming contest. 298 00:31:48.419 --> 00:31:52.858 Well, the 1st thing you'd say, well, the grid is just a 2 dimensional. 299 00:31:52.858 --> 00:31:56.638 All right. That's no problem. Um. 300 00:31:58.199 --> 00:32:02.999 But what how do you store the points in each bucket? 301 00:32:02.999 --> 00:32:09.509 Well, you could just say, you know, have a fixed size array, but what do you fix it at? Because you don't know what the biggest. 302 00:32:09.509 --> 00:32:14.999 Buckets going to have so you say, okay, hey, is cool. We'll have a vector. 303 00:32:14.999 --> 00:32:18.388 In each for each pocket. 304 00:32:18.388 --> 00:32:29.608 It's the obvious thing to do now, 1 thing about me is whenever I say something is the obvious thing to do, I follow it with, but it's a bad idea. 305 00:32:33.473 --> 00:32:48.413 Well, having a vector for each bug, it's not awful, but you have this enormous, unnecessary overhead because of vector, you know, you construct a vector it's putting stuff up on the heap and it's got some old overhead. I know what it's 16 bytes. I don't know what it is for each vector. 306 00:32:48.659 --> 00:32:57.118 And and you're doing a lot of constructions here, just to construct the vector for each bucket. 307 00:32:57.118 --> 00:33:05.098 Um, and another thing, if this is wants to be a parallel program, you got 1 global heat. 308 00:33:05.098 --> 00:33:09.388 Modifications of the shop here that he'd have to be serialized. 309 00:33:09.388 --> 00:33:21.989 And because if you're trying to grab a chunk off the heap, you can't have 2 different threads simultaneously, grabbing a chunk off the heat. It has to get serialized. So, bangles a lot of your parallel performance. 310 00:33:23.128 --> 00:33:27.298 Next problem is, as I said. 311 00:33:27.298 --> 00:33:32.729 We don't know how many points are going to be in each bucket. So you so you're going to construct. 312 00:33:32.729 --> 00:33:43.499 A small vector for each bucket and then if you have to you'll grow it. The nice thing about factories is you can reallocate them, you can grow with them and make them bigger. That's really nice feature. You don't need to reallocate the size. 313 00:33:43.499 --> 00:33:52.769 However, what does that mean for the heat? You have to grab a new chunk off the heap and copy stuff over then free the old small chunk. 314 00:33:52.769 --> 00:34:04.378 So every time you grow the vector, you know, you've got heap operations that have to be serialized and you fragment the heat got lots of little pieces on the heat. And the thing was, the heat. 315 00:34:04.378 --> 00:34:08.248 And it's got an internal data structure to keep track of all the fragments. So. 316 00:34:08.248 --> 00:34:12.119 Working with chunks on the heap is super linear. 317 00:34:12.119 --> 00:34:17.159 The more chunks you have, the more time it takes to create and free each chunk. 318 00:34:17.724 --> 00:34:31.373 So you don't want to have millions of little things on the heat but it's something that I'm doing here is my grid might be 5,000 by 5,000. perhaps, you know, could have millions of buckets. Which would mean, if you had vectors, millions of vectors that keep getting grown. 319 00:34:33.088 --> 00:34:37.108 And the see, yeah, yeah you and you want to do it in parallel. So that's. 320 00:34:37.108 --> 00:34:42.119 Why I would say, having a vector for each bucket would work. 321 00:34:42.119 --> 00:34:46.918 But it doesn't scale up to big examples and doesn't. So okay. 322 00:34:46.918 --> 00:34:50.128 Is that idea? Doesn't work? 323 00:34:51.659 --> 00:34:55.588 You think of other things you might do? Um. 324 00:34:55.588 --> 00:34:58.619 Other ideas suggestions. 325 00:34:58.619 --> 00:35:03.809 So, there's no, really good way, but this will show you to actually 2 competing methods. I think. 326 00:35:03.809 --> 00:35:07.168 Depending on how many points you think are in each. 327 00:35:07.168 --> 00:35:20.789 Pocket, um, oh, 1 other thing, if you assume the points are random uniform, independent, blah, blah, blah, blah stuff. I'm going to teach in 2 hours that probability. 328 00:35:20.789 --> 00:35:27.389 Then the statistics, the number of points in a bucket is across all random variable. 329 00:35:27.389 --> 00:35:30.929 So, if all of that stuff, then. 330 00:35:30.929 --> 00:35:41.759 You know, how many points there are, you know, how many buckets are, you know, the mean number of points pocket points divided by pockets then the distribution for the number of points in any bucket plus on distributed. So you can. 331 00:35:41.759 --> 00:35:47.909 Do your fun statistics on that any case so this is what you want to do, how are you going to do it? 332 00:35:47.909 --> 00:35:57.119 Um, so here's a really cool idea I think, which deserves to be known more um. 333 00:35:58.679 --> 00:36:06.449 Is that we have the point we can get a month so we make 2 passes through the point. 334 00:36:06.449 --> 00:36:11.489 The 1st pass through the points all we do is we count the size of each bucket so. 335 00:36:11.489 --> 00:36:17.579 We'll create so, let's suppose this is a grid g cells by g. cell g goes free here. 336 00:36:17.579 --> 00:36:22.079 So, I just create a G. g array of. 337 00:36:23.188 --> 00:36:28.829 And I run through the points, and I just count the number of points in each pocket. 338 00:36:30.088 --> 00:36:37.619 And then using this distribution count, I then create a data structure to hold the points themselves. 339 00:36:37.619 --> 00:36:42.389 So, I'll have to read the points 1 time, count the number of points for bucket updating accounts. 340 00:36:42.389 --> 00:36:48.898 And then a 2nd, time construct the data structure, and then read the points the 2nd time and populate it. 341 00:36:48.898 --> 00:36:59.219 Now, the updating the counts, you're gonna have to worry about that. It's like, it looks like it's going to be serialized, which is not nice. 342 00:36:59.219 --> 00:37:02.878 Any case so this is 1 way to implement it. 343 00:37:02.878 --> 00:37:09.778 And then later, we'll see a 2nd way, just create the read the points and just. 344 00:37:09.778 --> 00:37:16.438 Create new your accounts so, instead of doing a reduction, some on each separate, um. 345 00:37:16.438 --> 00:37:20.398 Okay, but we haven't seen how to do that. Uh. 346 00:37:22.228 --> 00:37:29.938 The size of each bucket well, where to generate the random input I'm trying to avoid that. 347 00:37:29.938 --> 00:37:38.998 Maybe you want a pseudo random thing that takes the index high and randomizes I ticket or end point. Perhaps that will work in parallel. 348 00:37:38.998 --> 00:37:42.539 How to compute the size of each bucket. 349 00:37:42.539 --> 00:37:47.938 Well, what those mean later. 350 00:37:47.938 --> 00:37:53.909 And again, I'm skipping over the details of how this random. 351 00:37:53.909 --> 00:37:59.309 Point generator works, but in any case. 352 00:37:59.309 --> 00:38:02.818 Generate outputs vector. 353 00:38:02.818 --> 00:38:06.418 Calls a function for each element function text. 354 00:38:06.418 --> 00:38:09.509 No arguments and creates elements to the vector. 355 00:38:09.509 --> 00:38:16.949 Um, 1, minor thing, um. 356 00:38:16.949 --> 00:38:20.579 Yeah, this is a built in random, random number. 357 00:38:20.579 --> 00:38:27.630 Occasionally the built in random number of generators that got awful bad for manufacturers. You have to check that. So. 358 00:38:27.630 --> 00:38:37.920 It's a very nice thing called myosin twister, but this other, this simpler things called linear congruent. 359 00:38:37.920 --> 00:38:47.460 Which are actually not good and I could detail what not good means. Okay. Generator. We create a host factor pocket to the device. Um. 360 00:38:49.860 --> 00:38:55.710 Or another way, um, generated directly on this again. Let me skip over that. Um. 361 00:38:57.480 --> 00:39:02.250 In any case, Here's something interesting. So transform. 362 00:39:02.250 --> 00:39:06.869 We give it accounting iterate or virtual vector. You see so. 363 00:39:06.869 --> 00:39:15.239 It's, um, so all of this does accounting iterate or of I just returns. I, so you might think that's a little clumsy you want to say I. 364 00:39:15.239 --> 00:39:20.010 Yeah, but this is in the paradigm of a vector that we can operate on. 365 00:39:21.300 --> 00:39:25.260 I can create the random ignore the random points. 366 00:39:27.000 --> 00:39:39.510 Ignore that okay now sorry about that. Well. 367 00:39:39.510 --> 00:39:44.670 Okay, you're going to mute the phone. Okay. 368 00:39:44.670 --> 00:39:49.139 To do a, um. 369 00:39:49.139 --> 00:39:56.460 Class point to bucket index, um, to get the bucket index it's just the module operator. 370 00:39:56.460 --> 00:40:01.980 Uh, what it's doing here is it's a 2 D bucket index X and Y, it's converting into a 1 D index. 371 00:40:01.980 --> 00:40:05.579 You do 2 D job where did you do that? Sort of thing? All the time. 372 00:40:05.579 --> 00:40:12.780 Okay, classify each point. That's a module operator. I'm going to skip it so it takes a point. 373 00:40:12.780 --> 00:40:19.440 And W, nature of the size of the grid and then convert, I'm going to skip that. Okay. Um. 374 00:40:19.440 --> 00:40:26.099 In any case, um. 375 00:40:28.050 --> 00:40:40.260 Okay, let me go back here we have a vector of points. Okay. But this creates is a vector of bucket indices and the index is the number of the. So, the bucket at each point is in 0, 2. 8. okay. 376 00:40:40.260 --> 00:40:48.329 So, can imagine how you do that now. So now we've got what we have here we have now 2 vectors. 377 00:40:48.329 --> 00:40:56.909 That conceptually or parallel to each other, we get a vector of points coordinates and a vector of the points indices in the bucket. 378 00:40:56.909 --> 00:41:00.000 Now, what sort by key does. 379 00:41:00.000 --> 00:41:03.630 Is it takes this pair of vectors. 380 00:41:03.630 --> 00:41:15.150 And sorts both vectors together, depending on the 1st factor. The 1st factor is the key. The 2nd vector is the elements, the interesting data that's coordinates. 381 00:41:15.150 --> 00:41:18.210 And sort by key. 382 00:41:18.210 --> 00:41:21.360 Sorts them and brings all the points. 383 00:41:21.360 --> 00:41:24.690 In the same bucket together and. 384 00:41:24.690 --> 00:41:29.429 Points about this. It's, um. 385 00:41:29.429 --> 00:41:39.840 It doesn't care if some buckets are bigger than other buckets. It just sorts this paired vectors of bucket index and key and points brings them all. So this. 386 00:41:39.840 --> 00:41:43.769 It doesn't care. Some buckets are big in some buckets or small. 387 00:41:43.769 --> 00:41:52.380 And it will probably go in parallel. If you're bucket indices are a plain old data type. It'll do a Radix sort. 388 00:41:52.380 --> 00:41:57.119 A is faster than a quick sort of. 389 00:41:57.119 --> 00:42:00.630 So, I know they love teaching quick sword in. 390 00:42:00.630 --> 00:42:06.599 This 1, I guess, maybe cause quick sorta tonight, for example, of recursion, et cetera. 391 00:42:06.599 --> 00:42:13.949 Um, except that if your key or sorting is simple, like an inter upload, Radix sort blows it over the water. 392 00:42:13.949 --> 00:42:21.119 So the thing is that record doesn't generalize the real complex keys. 393 00:42:21.119 --> 00:42:31.469 For simple keys panel, data type 2 Radix sort and that's what thrust we'll do. Okay if it can. Okay, so we start we bring all the points with the same key together. 394 00:42:31.469 --> 00:42:35.489 Um, okay. 395 00:42:38.280 --> 00:42:46.650 Now, here's an example of how the map produced paradigm can do surprising things. 396 00:42:51.150 --> 00:42:57.630 We want to find the number of points in each bucket. 397 00:42:58.980 --> 00:43:04.710 And so I told you, we have, you know, you had the reduced function. 398 00:43:04.710 --> 00:43:09.960 The reduce function of the sums up a whole. 399 00:43:09.960 --> 00:43:16.170 A whole vector, let's say, reduce by key. 400 00:43:17.699 --> 00:43:27.150 Is a more powerful version of reduce it has the vector that it's going to reduce, but it has a parallel. It has an associated. 401 00:43:27.150 --> 00:43:33.750 Vector of keys and what it does is it reduces. 402 00:43:33.750 --> 00:43:40.469 The data vector only overruns of the same key value. 403 00:43:41.610 --> 00:43:50.579 So, let me use the thing right here to show you, just as an example, look at keys and back here guys have just all 1. 404 00:43:50.579 --> 00:43:57.869 Just to make it easy here are the keys so reduced by key will do on this pair of keys and values. 405 00:43:57.869 --> 00:44:05.159 It will say we got to run of zeroes. We're going to reduce the values for this, this round of 2 elements. What? I will give you a 2. 406 00:44:05.159 --> 00:44:11.670 Here we've got a 2 here, so the run here that's just 1 element reduce over that. 407 00:44:11.670 --> 00:44:21.960 Here we've got 3 threes we'll reduce over this run of corresponding data elements. We've got a 4 reduce over that 1, got a 6 reduce over that 1. 408 00:44:22.980 --> 00:44:29.730 So reduced by key, or we'll do is a whole set of runs that are all different lengths. It doesn't matter. 409 00:44:29.730 --> 00:44:33.000 And just because the keys. 410 00:44:33.000 --> 00:44:37.320 Or a key changes its value that's a cut. 411 00:44:37.320 --> 00:44:40.320 In the values factor and it, it doesn't reduce over. 412 00:44:42.360 --> 00:44:46.980 So reduced by keys it's powerful. We'll see examples using it. 413 00:44:46.980 --> 00:44:55.320 Now, another thing is that reduced by keys. 414 00:44:55.320 --> 00:45:03.630 Is passed in parallel, so reduce the basic reducing. I think I showed you how you do it log in time. 415 00:45:03.630 --> 00:45:08.489 If there's any elements in the vector and you got a sufficient number of processors. 416 00:45:08.489 --> 00:45:17.880 Well, reduced by keys also goals in parallel you take the reduced tree and you basically put fences in it where the key changes. 417 00:45:17.880 --> 00:45:23.519 And you don't, um, you don't accumulate a sum across offense. 418 00:45:23.519 --> 00:45:29.159 So, reduced by keys goes fast in parallel. 419 00:45:30.239 --> 00:45:35.550 Okay, um. 420 00:45:36.809 --> 00:45:40.769 And what reduced by keys will output um. 421 00:45:42.090 --> 00:45:46.530 Effectively is, I'm waving my hand slightly. Um. 422 00:45:46.530 --> 00:45:50.250 2 smaller vectors, um. 423 00:45:50.250 --> 00:45:55.469 Well, eventually it will happen is a vector of the unique output keys. 424 00:45:55.469 --> 00:45:58.739 And another factor of the length. 425 00:45:58.739 --> 00:46:01.889 Of the reduction for each value. 426 00:46:01.889 --> 00:46:06.000 So key 0, here, we summed the 2 ones we got it 2. 427 00:46:07.139 --> 00:46:11.670 Key to we summed the 11 and we got 1. 428 00:46:11.670 --> 00:46:16.590 P3, we saw the 3 threes and got a 3. 429 00:46:16.590 --> 00:46:21.269 He's 46 and 7 each had 1 element. 430 00:46:21.269 --> 00:46:26.610 So reduced by keys, we'll take a vector of keys, which I've runs in them. 431 00:46:26.610 --> 00:46:33.630 And for each run of the same key will sum up that corresponding sub vector in the values thing. 432 00:46:34.650 --> 00:46:38.880 Okay, and it's a built in function and. 433 00:46:41.400 --> 00:46:48.389 It looks a little weird, but our design philosophy is we want stuff to be map reduced. 434 00:46:48.389 --> 00:46:53.099 And so this is mapping a function over some vectors. 435 00:46:53.099 --> 00:46:59.849 Because if we make things, look like map reduce, we know how to do them in parallel, assuming these. 436 00:46:59.849 --> 00:47:05.219 Functions are associative and we can compose them and so on. Okay. 437 00:47:06.809 --> 00:47:18.210 So, how is this useful in this particular thing? And we want to do what we had is this bucket. Okay. There's buckets, there's grid points then we wanted to put points into pockets. Um. 438 00:47:18.210 --> 00:47:24.239 Let me go back a page, so we had points. 439 00:47:24.239 --> 00:47:30.719 Um, we computed the bucket index of each point. That's a map function. 440 00:47:30.719 --> 00:47:34.349 And then we started the pair of vectors. 441 00:47:34.349 --> 00:47:39.300 And we produced a sorted pair, sorted bucket and disease, and the associated points. 442 00:47:39.300 --> 00:47:42.449 So, we've got runs in the bucket indices. 443 00:47:42.449 --> 00:47:49.320 I'm going to be 0 than a 1 to 2. we 3 three's and so on. So. 444 00:47:49.320 --> 00:47:58.409 So, after the sort by key, we've got the data, the variables of vectors that we can throw into. The next thing I just talked about on the following slide. 445 00:47:58.409 --> 00:48:07.019 So, what we can do is we got a paired vector vector of points that have been sorted by bucket and a vector in the pocket indices. 446 00:48:07.019 --> 00:48:10.380 We then apply reduced by key. 447 00:48:10.380 --> 00:48:15.150 And the output from reduced by key will be the size of each pocket. 448 00:48:16.380 --> 00:48:26.219 Um, so after this, we will have, we will know the size of each pocket, which is 1 of the things we wanted. 449 00:48:27.239 --> 00:48:33.420 Um, 1 thing that's anticipating a little that sort of. 450 00:48:33.420 --> 00:48:39.030 In your program, if you've got large vectors, that's far because the slowest part of the whole program. 451 00:48:39.030 --> 00:48:44.610 As I said, Radix sort is fast. Well, yeah, it's passed, but pass to the quick sort. 452 00:48:44.610 --> 00:48:47.699 But it's still slower than the simple maps. 453 00:48:47.699 --> 00:48:53.070 That produces is the Radix sort involves making several passes through the data. 454 00:48:53.070 --> 00:48:57.900 And reading and writing stuff, and that's just flow. 455 00:48:57.900 --> 00:49:02.880 Okay, um, compute the size of each bucket. Um. 456 00:49:08.820 --> 00:49:12.900 Well, that up there computed the size of each pocket. Okay. Um. 457 00:49:12.900 --> 00:49:21.179 Yeah, I'll put value to the size of each pocket, the output keys, and they reduced by key. The argument list. Um. 458 00:49:21.179 --> 00:49:26.429 Well, we got the, um. 459 00:49:26.429 --> 00:49:29.460 Indices, um. 460 00:49:29.460 --> 00:49:32.639 These are the index, the bucket number for each point. 461 00:49:32.639 --> 00:49:36.929 And again, make constant iterate that's a vector of all ones. And again. 462 00:49:36.929 --> 00:49:43.860 It looks weird, but the point is, it makes it into the paradigm of vectors and functions and reductions. 463 00:49:43.860 --> 00:49:47.760 And then the output from reduced by key to the next 2 arguments. So. 464 00:49:47.760 --> 00:49:52.530 Why it doesn't return a twofold of vectors I don't know. Um. 465 00:49:53.909 --> 00:49:58.769 Um, that's the way I would right that's the API I would do, but they didn't ask me. 466 00:49:58.769 --> 00:50:06.030 And you better assume that these 2 output factors are big enough. 467 00:50:07.500 --> 00:50:13.079 If they're not big enough, if you're lucky your program crashes, if you're not lucky, your program does not crash. 468 00:50:13.079 --> 00:50:18.659 You just get garbage. Okay. Um. 469 00:50:21.059 --> 00:50:30.690 So that was 1 way to get the size of each pocket. Here's a competing way to get the size of each pocket. So the 1st way involved this. 470 00:50:30.690 --> 00:50:35.130 Board couple with sorts is they take time, um. 471 00:50:35.130 --> 00:50:44.159 Because they involve here's a competing way to get the size of each bucket. 472 00:50:45.179 --> 00:50:49.590 If I don't make you sick by scrolling back 1 page. 473 00:50:49.590 --> 00:50:54.119 We have here a vector of keys. 474 00:50:54.119 --> 00:50:59.909 Okay, so I is the bucket that points to buy is in. Okay. 475 00:50:59.909 --> 00:51:08.099 So, um, if we want to find the number of points in bucket 3 oh, keys assorted. 476 00:51:08.099 --> 00:51:11.880 Now, if I want to find the number of points in bucket 3. 477 00:51:11.880 --> 00:51:18.929 I could find where the run of 3 starts and where the run of threes ends and subtract. Okay. 478 00:51:18.929 --> 00:51:21.989 So, the rental 3 starts. 479 00:51:21.989 --> 00:51:26.699 01, actually, coincidentally, at point 3 of the keys. 480 00:51:26.699 --> 00:51:31.320 And the row threes ends here at point 6. 481 00:51:33.360 --> 00:51:41.280 So, if I can find where the run of 3 starts, and where the run of threes ends and subtract, I've got the number of points in bucket 3. 482 00:51:43.949 --> 00:51:47.909 Now, I can find that with a binary search. 483 00:51:49.500 --> 00:51:54.420 And that's what the next thing will talk about a binary search. Um. 484 00:51:57.599 --> 00:52:01.230 And I'm going to waive my hands and. 485 00:52:01.230 --> 00:52:08.309 Skip over details, if you trust me that fundamentally, the program is sound, it actually works. 486 00:52:08.309 --> 00:52:18.059 Uh, basically, we binary search for the start of each run and the end of each and we subtract. 487 00:52:18.059 --> 00:52:23.610 And I'm skipping some details here and lower bounce or they do searches. 488 00:52:23.610 --> 00:52:28.170 The trick was a binary searches, there's gaps up here. 489 00:52:28.170 --> 00:52:34.739 There is no, um, key 5 never occurs here. For example, neither just. 490 00:52:34.739 --> 00:52:38.250 And that makes the search a little more sophisticated. 491 00:52:38.250 --> 00:52:42.599 Any case it does this and these go in parallel. 492 00:52:42.599 --> 00:52:47.130 And, um, okay. 493 00:52:47.130 --> 00:52:50.639 In any case we can do that. 494 00:52:50.639 --> 00:52:56.219 Binary searching for beginning and ending and we subtract and we get the size of each bucket. 495 00:52:58.559 --> 00:53:01.769 Or what I would say, a trivial example. 496 00:53:05.010 --> 00:53:08.579 And performance, um. 497 00:53:11.130 --> 00:53:17.130 I told you the sword was slow. 498 00:53:20.309 --> 00:53:26.400 And they're doing reduction different, comparing them and so on. Okay. Um. 499 00:53:28.050 --> 00:53:31.110 Interesting yeah. 500 00:53:31.110 --> 00:53:38.070 I don't find that hosted device once takes much time. It's going to also be a synchronous. 501 00:53:38.070 --> 00:53:45.090 If you're generating points and ran on the. 502 00:53:45.090 --> 00:53:50.369 On the device in parallel you want to point to actually to be independent of each other. 503 00:53:50.369 --> 00:53:58.019 I would disagree here if he is a cryptographic hash, it's going to be excellent quality. 504 00:53:58.019 --> 00:54:03.239 You call him and generate the random points of I, let's say. 505 00:54:03.239 --> 00:54:07.980 Yeah, and that's going to be beautiful quality. 506 00:54:10.320 --> 00:54:18.420 I did say that sort is slow and I did say it's asked for these things are called pods, plain old data types. 507 00:54:19.619 --> 00:54:26.670 Well, the way the great except works, um. 508 00:54:28.349 --> 00:54:34.079 So imagine when I used to say, have paper exams in a class, so I could say 100 students in a class. 509 00:54:34.079 --> 00:54:47.010 And now I got all their exams that came in and random order, and I want to hand them out back to the class. I want to sort them alphabetically. So I got the stack of 100 exams and I'm going to have to say several stacks here. 510 00:54:47.010 --> 00:54:52.469 This will be, you know, students say to see what else I might have say. 511 00:54:52.469 --> 00:55:07.344 5, stacks here, and I might take my 100 students and go here here. Here here here here whatever. Now I've got 5 stacks. I'll give or take 20 students seats and then I take the stack of 20 students, and I may do it into piles or whatever. 512 00:55:07.344 --> 00:55:08.635 Then I assembled the files. 513 00:55:08.880 --> 00:55:17.280 And at the end of it, I've got 1 sorted stack of 100 exams. That's sort of what it does. It's a little more sophisticated, but not much. 514 00:55:19.289 --> 00:55:24.239 Okay, um, analysis not, um. 515 00:55:24.239 --> 00:55:29.340 They said, in this case, the binary search is good. It's not always good. 516 00:55:30.360 --> 00:55:37.559 Okay minutes about lots of other examples. Oh, okay. Um. 517 00:55:41.550 --> 00:55:50.969 Yeah, um, this is obsolete. The slides that okay um, I gave you the links tomorrow up to date. 518 00:55:50.969 --> 00:55:58.679 Okay, so that was, um, this nice tutorial. 519 00:55:58.679 --> 00:56:03.719 If I go back to my things, I was typing in. 520 00:56:03.719 --> 00:56:11.340 Come on here and. 521 00:56:13.440 --> 00:56:22.440 Okay, so I finish that. Um, this is a GitHub interesting, um, tool if you want um. 522 00:56:23.909 --> 00:56:28.949 This is a web based compiler explore, you can compile stuff on the web and run it on the web here. 523 00:56:28.949 --> 00:56:36.960 Okay. Um, let me show you some other crazy things you can do with the. 524 00:56:38.219 --> 00:56:41.460 Parallel with this. 525 00:56:41.460 --> 00:56:49.530 Tiled range, um. 526 00:57:03.329 --> 00:57:12.210 Okay, so we have this vector at the start 10, 2030, 40 and this, this makes copies of it. 527 00:57:15.480 --> 00:57:20.400 So, the question is, how can we do that as a map reduce. 528 00:57:21.750 --> 00:57:32.340 And, um, so the function tiled range, it takes the vector and it takes a repeat count. 529 00:57:32.340 --> 00:57:37.320 And it tiles the vector K times, like, 3 times or something. 530 00:57:37.320 --> 00:57:42.750 Um, and. 531 00:57:42.750 --> 00:57:48.360 I imagine going to go straight to my optimization of it. 532 00:57:48.360 --> 00:57:53.099 Look good um. 533 00:57:54.329 --> 00:58:00.389 Okay. 534 00:58:08.400 --> 00:58:14.489 Okay, um. 535 00:58:14.489 --> 00:58:18.360 I'm sorry, I just think I'm a good program or I'm sorry um. 536 00:58:18.360 --> 00:58:24.869 Call me hour ago. Yeah, it will have to be our good about the programming. Okay. 537 00:58:24.869 --> 00:58:28.619 So this is cool here. 538 00:58:32.070 --> 00:58:35.400 Make counting iterate or, um. 539 00:58:35.400 --> 00:58:38.429 I've got to write this down, um. 540 00:58:41.670 --> 00:58:47.130 Okay, let's see if this works. Um. 541 00:58:55.019 --> 00:58:58.559 It does not, um. 542 00:59:04.110 --> 00:59:09.539 Trying to think how I can. 543 00:59:11.909 --> 00:59:16.320 That's parallel of course. 544 00:59:16.320 --> 00:59:21.719 Trouble is, is, um, jumping, um. 545 00:59:21.719 --> 00:59:26.880 Around from computer to computer and all the software. 546 00:59:30.210 --> 00:59:35.519 Okay. 547 00:59:38.969 --> 00:59:42.300 Um. 548 00:59:43.710 --> 00:59:47.789 Um. 549 00:59:47.789 --> 00:59:52.139 Hello. 550 00:59:52.139 --> 00:59:55.559 Let me go back to them. Okay. 551 00:59:55.559 --> 00:59:58.650 What's going on in here? 552 01:00:05.550 --> 01:00:11.670 Or the attempt is to make resolutions work. 553 01:00:13.889 --> 01:00:17.550 Wow. 554 01:00:18.809 --> 01:00:28.440 It actually works mostly. Okay so what we're trying to do is, um. 555 01:00:34.440 --> 01:00:38.429 This is tiled range. 556 01:00:40.980 --> 01:00:51.030 Sorry for the bad handwriting, but there's a lag between when I move the pencil and when it writes on to the screen. 557 01:00:51.030 --> 01:00:54.360 So, I'm actually not getting good feedback. 558 01:00:54.360 --> 01:00:57.960 Okay, so we want this this is the goal. 559 01:00:57.960 --> 01:01:01.710 We got some input vector. I looked at it. 560 01:01:01.710 --> 01:01:09.179 And that may be say, 3145 or something, and we got a repeat count or maybe. 561 01:01:10.530 --> 01:01:15.389 3, and then what we want is, the output is 3 1, 4 5. 562 01:01:15.389 --> 01:01:18.900 314 or 5, 3, 1, 4 or 5. 563 01:01:18.900 --> 01:01:25.380 And what to do it in parallel. Okay. So, again, showing some nice techniques. Um. 564 01:01:26.760 --> 01:01:31.019 The key is the observation is that, um. 565 01:01:32.130 --> 01:01:35.159 Call us the output vector um. 566 01:01:37.619 --> 01:01:41.309 I'm not a, let's say. 567 01:01:41.309 --> 01:01:45.960 And the key is that, um. 568 01:01:50.969 --> 01:01:54.510 A sub. 569 01:01:54.510 --> 01:02:03.059 Um, I, Ahmad are. 570 01:02:03.059 --> 01:02:07.320 Okay. 571 01:02:08.730 --> 01:02:15.570 So, it works. Okay. So how do so we want to generate. 572 01:02:19.019 --> 01:02:24.210 Want to generate a vector. 573 01:02:24.210 --> 01:02:27.480 I so Ahmad are. 574 01:02:29.159 --> 01:02:35.849 Okay, um, so how do we do that? So we've accounting it operator. 575 01:02:38.760 --> 01:02:43.949 Um, 012345678. 576 01:02:43.949 --> 01:02:47.159 And what we do is we do a map. 577 01:02:47.159 --> 01:02:51.840 We map with the function. 578 01:02:51.840 --> 01:02:57.179 Are. 579 01:02:57.179 --> 01:03:02.039 Okay, now using placeholder notation. 580 01:03:02.039 --> 01:03:05.369 That would be underscore 1. 581 01:03:09.449 --> 01:03:12.840 That's a 1. 582 01:03:12.840 --> 01:03:16.530 And that's a vertical bar. Okay. 583 01:03:16.530 --> 01:03:30.630 That's an underscore. Okay, so please orientation. This is a definition of a function for character long definition. Underscore 1 vertical bar. Are you apply? It takes 1 argument and it. 584 01:03:30.630 --> 01:03:33.690 Returns the argument module are okay. 585 01:03:33.690 --> 01:03:37.409 So, what we do is. 586 01:03:38.940 --> 01:03:42.239 We take accounting generator here. 587 01:03:42.239 --> 01:03:46.170 And I said our, in this case was great. 588 01:03:46.170 --> 01:03:50.969 I take my accounting iterate or and then I apply this function to it. 589 01:03:50.969 --> 01:03:54.929 And what do I get? I mean, put the function in blue just for 1. 590 01:03:54.929 --> 01:03:58.769 012012012. 591 01:03:58.769 --> 01:04:04.440 Okay, the next thing I do. 592 01:04:04.440 --> 01:04:08.550 Is I do a, um. 593 01:04:08.550 --> 01:04:13.739 And do a gather operator. 594 01:04:13.739 --> 01:04:17.340 And what a gather operator does. 595 01:04:18.719 --> 01:04:23.489 It it takes a, a, again, so let's suppose. 596 01:04:23.489 --> 01:04:29.190 Gather it takes 2 input input 1 is a vector. 597 01:04:29.190 --> 01:04:35.550 B, input to are a set of indices. 598 01:04:35.550 --> 01:04:39.510 Call them. 599 01:04:39.510 --> 01:04:43.889 And then the output you or something. 600 01:04:43.889 --> 01:04:48.989 And W, so by equals V sub. 601 01:04:48.989 --> 01:04:53.909 It randomly gathers and brings that back together. 602 01:04:53.909 --> 01:05:05.969 And gathers are fast in parallel because we're reading in different. It's a little random, but the thing again on parallel machines, there may be a latency, like, on the GPU. But, um. 603 01:05:05.969 --> 01:05:10.920 It doesn't matter the IO is fast. Okay those are the ideas. 604 01:05:10.920 --> 01:05:15.570 Let me go back to the slides that let me go back to the program. 605 01:05:15.570 --> 01:05:28.079 Shh. 606 01:05:30.449 --> 01:05:36.239 Okay, um. 607 01:05:36.239 --> 01:05:49.320 All that work is in the 1 for line thing right down here. Okay. 608 01:05:49.320 --> 01:05:52.650 Everything I talked about happens right here. 609 01:05:55.409 --> 01:06:07.349 Okay, sorry my mind is going module of course, in this and C + plus percent not vertical bar. 610 01:06:07.349 --> 01:06:11.219 I'm thinking, okay, so. 611 01:06:11.219 --> 01:06:17.369 What's happening here make counting iterate or 012345 and so on. 612 01:06:18.449 --> 01:06:23.699 Uh, and then transforming it with this, um. 613 01:06:25.409 --> 01:06:28.500 Here underscore 1% and. 614 01:06:28.500 --> 01:06:32.940 Transforms it by divided by doing each element of the counting iterate or mod and. 615 01:06:32.940 --> 01:06:43.320 So so make counting iterate, or is everyone can now make transform iterate or. 616 01:06:43.320 --> 01:06:47.190 Can't highlight stuff here, so well. 617 01:06:49.050 --> 01:06:55.289 And there it worked cool. Okay. So transform generator. 618 01:06:55.289 --> 01:06:58.619 What it does is it takes a factor. 619 01:06:58.619 --> 01:07:06.000 And it takes a function, and it returns a virtual vector or the function applied to the original background element. 620 01:07:06.000 --> 01:07:09.090 And when it returns iterate, or it's a pointer. 621 01:07:09.090 --> 01:07:12.929 And you de, referenced that you get elements of the transformed vector. 622 01:07:12.929 --> 01:07:19.170 Okay, but we did this fusion. So now this when I've highlighted. 623 01:07:19.170 --> 01:07:23.369 Will is a virtual vector of. 624 01:07:23.369 --> 01:07:27.900 You know, 0 to 1-1 0 to 1-1 going as far as you want. 625 01:07:27.900 --> 01:07:31.349 And the elements are generated as you need them. 626 01:07:31.349 --> 01:07:39.269 And so we have this on transform iterative 0 transmitter and time C ends. The length of the vector sees the repeat count. 627 01:07:39.269 --> 01:07:47.849 And what we have here is a virtual vector of indices if I go back over to here. 628 01:07:47.849 --> 01:07:52.500 The thing I had and blew up here and that's what that does. 629 01:07:54.150 --> 01:07:57.929 Okay. 630 01:07:57.929 --> 01:08:04.110 And that's a matrix. So now, what we have is a vector of these iterate are, is. 631 01:08:04.110 --> 01:08:09.780 Of these indices data is the input data. 632 01:08:09.780 --> 01:08:13.440 Gather goals and. 633 01:08:13.440 --> 01:08:19.289 For each, and these are the indices the 1st, 2 arguments are the vector of indices. 634 01:08:19.289 --> 01:08:24.359 The data and gather goals, gathered the data from those random locations. 635 01:08:24.359 --> 01:08:32.189 And writes it into the, and the effect of all of that is, um. 636 01:08:32.189 --> 01:08:38.279 To do what I do up here example. 637 01:08:38.279 --> 01:08:41.819 File range okay. To repeat the range. 638 01:08:41.819 --> 01:08:47.670 So, it shows are really compact things that you can do with, um. 639 01:08:49.470 --> 01:09:00.989 With this map reduced paradigm, it's why I like it it's worth forcing your problem into a map produced paradigm because then you can call on all these powerful tools. 640 01:09:03.029 --> 01:09:06.659 Um, so the whole body of the program is, is. 641 01:09:06.659 --> 01:09:10.079 1 is 4 lines of code, right there. 642 01:09:10.079 --> 01:09:19.260 Um, also goes in parallel nicely, so gathers obviously in parallel. 643 01:09:19.260 --> 01:09:27.420 And make transform, that's lazy evaluated so you need to do it. And again, the compiler takes everything and. 644 01:09:27.420 --> 01:09:33.119 You know, pops it all in place optimizes it and it goes fast. 645 01:09:36.060 --> 01:09:39.840 And, um. 646 01:09:39.840 --> 01:09:48.119 And just to show you why I honestly think I'm better than some of the other examples. 647 01:09:48.119 --> 01:09:51.899 Let me show you the original program before I modified it. 648 01:09:53.760 --> 01:09:58.859 Sorry, go to. 649 01:09:58.859 --> 01:10:06.720 Well, we have this mess here of the doctor idea where you declare a new class, and you overload the apprentices operator. 650 01:10:06.720 --> 01:10:10.500 It's a month thing, and then you create a new offer, so. 651 01:10:12.270 --> 01:10:23.670 And counting iterate, or is, this is why auto was invented as a class name. 652 01:10:23.670 --> 01:10:29.880 These these are the explicit names for what a numerator for a different classes. 653 01:10:29.880 --> 01:10:33.689 And C, +, plus, you know. 654 01:10:35.189 --> 01:10:38.939 Awful and, um. 655 01:10:39.989 --> 01:10:49.829 There's all sorts of mess all that stuff up there that I did not have any of it. Um. 656 01:10:52.050 --> 01:10:55.439 Copying and Tyler and John. 657 01:10:55.439 --> 01:11:03.569 The actual thing that does the work was where I don't even know where it is in here. 658 01:11:03.569 --> 01:11:09.060 Somewhere in there. Yeah, so I think my program is more compact so. 659 01:11:09.060 --> 01:11:14.250 What else do we have other ways to do it? 660 01:11:17.850 --> 01:11:24.779 Um. 661 01:11:24.779 --> 01:11:29.130 Nothing different here actually, um. 662 01:11:30.239 --> 01:11:34.529 Okay, um. 663 01:11:37.409 --> 01:11:42.569 Nothing different here, so oh. 664 01:11:47.340 --> 01:11:59.760 What's new here? A universal vector is not a hosted device factor to unified memory factor and you don't have to copy stuff from hosted device. Um. 665 01:11:59.760 --> 01:12:03.029 But you still got these other classes. 666 01:12:03.029 --> 01:12:12.930 So, you can do declare universal back during it will float between the host and revise, like, a datum and virtual memory, which it is. 667 01:12:30.354 --> 01:12:40.465 The more the map reduce paradigm hub and it works on parallel computing. It's a surprisingly powerful paradigm. It can handle. 668 01:12:40.739 --> 01:12:53.880 Sets of varying sizes, like the, the bucket sorting of points, and a 2 D grids different number of points in each bucket. It can handle that nicely things like reduced by key. 669 01:12:53.880 --> 01:12:57.659 And things like these virtual vectors and so on. So, um. 670 01:13:03.149 --> 01:13:06.420 Okay, so I was talking about that. 671 01:13:06.420 --> 01:13:12.000 So, in fact, here is this tiled range I just typed in what I told, you. 672 01:13:12.000 --> 01:13:16.289 Child range 3 I don't know what happened to it, but. 673 01:13:18.180 --> 01:13:26.010 Um, these other ways you can do this. There's a, there's something called a permutation iterate. Are. 674 01:13:26.010 --> 01:13:30.659 And what a permutation at a rate, or does is you give it a vector. 675 01:13:30.659 --> 01:13:33.840 And you give it a vector of indices. 676 01:13:33.840 --> 01:13:40.319 And you reference permutation iterated goes and grabs that right? Or random element from the vector. 677 01:13:40.319 --> 01:13:44.220 So, in a sense, it's a complicated way of saying, a sub. 678 01:13:44.220 --> 01:13:51.090 But it's a vector, it goes into the map produce paradigm. So. 679 01:13:51.090 --> 01:14:03.600 Okay, um, slightly different variant keyboard just went dead. 680 01:14:04.859 --> 01:14:13.829 Hello. 681 01:14:13.829 --> 01:14:18.090 That may be a good time to stop the class and the keyboard just went dead. So. 682 01:14:18.090 --> 01:14:24.210 In case, you see some of the ideas and will continue on with other ideas. 683 01:14:24.210 --> 01:14:36.300 So, I have a homework on, you're taking some of these examples and converting them to use placeholder notation and Lambda notation and so on just to have fun with that. 684 01:14:36.300 --> 01:14:42.539 Um, the NVIDIA GTC, the GPU technology conference is. 685 01:14:42.539 --> 01:14:47.279 Next week, I guess this week next week, so the next homework will be to. 686 01:14:47.279 --> 01:14:51.300 Find something interesting on that, and maybe give a talk about it in class. 687 01:14:51.300 --> 01:14:55.470 So, and next topics after this, um. 688 01:14:55.470 --> 01:15:00.210 So, we finish off thrust and so on. 689 01:15:01.289 --> 01:15:09.810 Well, later in the class, we get into quantum computing, I'll give you a summary of my quantum computing class and somewhere in between, we'll get on some Amazon. 690 01:15:11.100 --> 01:15:19.770 He's and so on, that's a reasonable point to stop have a good weekend. It's warming up. No more skiing. Apparently I'm annoyed. 691 01:15:19.770 --> 01:15:23.670 But, um and. 692 01:15:23.670 --> 01:15:27.119 Have fun. 693 01:15:27.119 --> 01:15:31.170 Hello. 694 01:15:32.729 --> 01:15:39.750 Work. 695 01:15:39.750 --> 01:15:43.710 Hang on. 696 01:15:43.710 --> 01:15:47.670 I think parallel just crashed or something. 697 01:15:55.979 --> 01:16:00.479 And see, who's here. 698 01:16:00.479 --> 01:16:06.329 Goodbye. 699 01:16:07.800 --> 01:16:08.543 Wow.