WEBVTT 1 00:08:16.139 --> 00:08:22.108 Okay, good afternoon in parallel people. So I guess this is. 2 00:08:22.108 --> 00:08:25.918 Sort of class 19, April fool's day. April. 1st. 3 00:08:25.918 --> 00:08:32.489 2021, so I see if I can share my screen. 4 00:08:32.489 --> 00:08:42.119 And Eva. 5 00:08:43.798 --> 00:08:48.928 So, I'll 1st question compulsory question is. 6 00:08:50.849 --> 00:08:56.548 Is just a 2nd here can you hear me? 7 00:08:59.908 --> 00:09:04.558 Anyone hey, Connor and Isaac. 8 00:09:06.869 --> 00:09:10.168 So. 9 00:09:12.448 --> 00:09:16.139 To see here this 1. 10 00:09:18.058 --> 00:09:22.948 Up that 1. 11 00:09:24.149 --> 00:09:28.078 Paramount. 12 00:09:28.078 --> 00:09:31.708 This 1. okay. 13 00:09:32.783 --> 00:09:44.274 So check ins, so it's not completely good the way I'm doing this with a chat window up at the same time as the rest of the screen. 14 00:09:44.303 --> 00:09:56.484 So, if you post something on me, and I don't respond and unmute yourself and ask the question, I'll try to keep something working here. So what we're talking about today. 15 00:09:57.563 --> 00:10:10.403 Is thrust and just oh, let me before I get to thrust I put up a homework question, doing some programming things with thrust to modify them to change. 16 00:10:10.433 --> 00:10:14.423 I mean, the trust examples, use the overloading operator. 17 00:10:14.609 --> 00:10:18.749 Parentheses, which I think is that obsolete technique I can't. 18 00:10:18.749 --> 00:10:31.583 It has some advantages in that the variables of that class of local States you can, you can do a closure, but if you're not using that, then using lamb does and notations a lot better. And I'll show you some examples. 19 00:10:31.913 --> 00:10:34.524 I'd like you to do it to some other programs. 20 00:10:34.769 --> 00:10:38.849 That's number 1, number 2. 21 00:10:41.033 --> 00:10:52.073 Good. And so next Thursday will be wellness day. So There'll be no class snow homework do. So it takes a day off. 22 00:10:52.134 --> 00:10:56.033 And many classes are doing that. 23 00:10:56.458 --> 00:10:59.668 That Tuesday and Thursday, next week. 24 00:10:59.668 --> 00:11:03.448 If you're wondering why. 25 00:11:03.448 --> 00:11:10.828 I did not have a spring break and then suddenly, at the last minute, has this. 26 00:11:11.634 --> 00:11:25.884 You should know that this has nothing to do with the faculty these decisions are made way above the faculties pay grade and last December of the faculty Senate said that we need a spring break and then they reiterated to January, 27 00:11:25.884 --> 00:11:27.082 we need a spring break. 28 00:11:27.144 --> 00:11:37.163 There was no spring break assigned and way at the end of the semester on 1 week's notice, bang this happens. Luckily, I scheduled my classes loosely enough that it's not a problem. 29 00:11:37.163 --> 00:11:50.394 If I had a strict sequence of labs and homeworks and exams and lectures, that would be a real problem. But that just I just want you to the students. The students should be clear. The faculty are not responsible for this chaos. 30 00:11:50.609 --> 00:11:54.418 We objected to it. Okay. 31 00:11:54.418 --> 00:12:00.359 Getting back okay to thrust and so it's functional notation. 32 00:12:00.359 --> 00:12:14.038 And I'll, I'll demonstrate it with examples. So what I have here as I just have my notes, and it helped me to understand it. So there's some intellectual work here. These are very. 33 00:12:14.038 --> 00:12:24.629 Concise programming paradigms, they're powerful their brief and but to take some thinking and they let you do things that. 34 00:12:24.629 --> 00:12:28.619 Are surprising so now. 35 00:12:28.619 --> 00:12:37.379 Where, to get where to get thrust and saw and if I go to this window here, it's logged in to parallel and just. 36 00:12:37.379 --> 00:12:40.798 Make it a little bigger so you can actually see it. 37 00:12:40.798 --> 00:12:45.269 Okay, um. 38 00:12:45.269 --> 00:12:49.859 Stay in parallel and. 39 00:12:49.859 --> 00:12:54.509 Class. Okay so now. 40 00:12:54.509 --> 00:13:02.099 If we go up here, this is a get repository. If anyone would like me to summarize, get. 41 00:13:02.099 --> 00:13:05.188 It's basically it's a distributed. 42 00:13:05.663 --> 00:13:20.214 Programme revision thing that keeps track of changes and you can have repositories. You can clone repository, send to other computers. You can pull changes and push changes. And I use it as a backup system for my own for my own files. 43 00:13:20.458 --> 00:13:29.278 Actually, and 1 thing and video, for example, uses it and then 1 way to find. 44 00:13:29.278 --> 00:13:37.589 Remote command, so this here is on top in video thrust get and you can clone this. 45 00:13:37.589 --> 00:13:40.769 And I'll show you what I did actually. 46 00:13:41.183 --> 00:13:53.783 This was the command here I did it from a higher level directory, and it just took this remote GitHub repository and now I have a local copy of it and I can pull if there's any changes from NVIDIA, I can pull the changes here. 47 00:13:53.783 --> 00:13:59.094 I can make local changes if I had permission to, I could push them back and so on. And a lot of my. 48 00:14:00.744 --> 00:14:15.083 Local repositories, I've actually got copies of them in several different computers just in case. Something happens any case. So, this is the latest thing from Nvidia and the doc has some oxygen stuff and so, and I. 49 00:14:15.359 --> 00:14:20.308 Okay, examples are the interesting thing here. Oh, trust here. 50 00:14:21.869 --> 00:14:28.198 So this are the include files and again thrust is it's only header files and. 51 00:14:28.198 --> 00:14:35.068 So this is it here and because there's code well, the code is included as. 52 00:14:35.068 --> 00:14:41.158 It's their class functions, so, and also it's I have it on. 53 00:14:43.349 --> 00:14:54.899 Local CUDA and could include, for example, you look down here has trust so you just run programs and. 54 00:14:55.374 --> 00:15:05.844 I mean, if your account is set up the way mine is, you don't need to do anything special with include things and so on it will include automatically. So we go to examples here. Here are a lot of examples. 55 00:15:06.203 --> 00:15:10.793 And 1 thing I've added is to make file that I showed you last time. 56 00:15:14.063 --> 00:15:24.264 And all it does, it looks for, if I say, make food, it will look for food and to compile it to make the output file food. 57 00:15:24.624 --> 00:15:38.903 It will also there are now make file has dozens and dozens of default rules, for example, to compile files. And so suppose I said something like, let's touch and I said make and clue. 58 00:15:39.474 --> 00:15:52.764 You see it all and well, I say, make food. It'll look, that's the name of and execute. It will look for source files of all different exclusion types of food. I'd see you food food, food. I'd see. 59 00:15:52.764 --> 00:15:59.813 Whatever it finds a food and it would do that. Make ends says show me what you would do, but don't really do it. 60 00:16:00.389 --> 00:16:05.068 Okay, I also have. 61 00:16:05.068 --> 00:16:09.089 See here actually. 62 00:16:10.798 --> 00:16:18.568 Okay, these are some of my modifications. 63 00:16:18.568 --> 00:16:22.259 To some of these things, a few modifications. Okay. 64 00:16:23.609 --> 00:16:31.109 So, let me just save that directory. Oh, okay. Now, if I go back to website. 65 00:16:31.109 --> 00:16:38.009 Not this 1 this 1. okay. Just show you some examples here. 66 00:16:41.908 --> 00:16:50.188 I'll get you started with something fairly vague on gap and due to do tile to range. 67 00:16:52.619 --> 00:16:56.578 And. 68 00:16:56.578 --> 00:17:09.719 Documentation shows what it does here. You see, you gave it a factor and you get rid of repetition count and it repeats each element of the vector. 69 00:17:09.719 --> 00:17:13.919 That many times, so if repetition count is 3. 70 00:17:13.919 --> 00:17:18.028 three's or it repeats the whole range 3 times. Okay. 71 00:17:18.028 --> 00:17:23.398 Sorry, repeated range, that's the opportunity to and we saw a little of it last time. 72 00:17:24.778 --> 00:17:33.598 What, if I go back up here, if we look at the output thing here and ask, how would we do it? Well, there's 4 elements in the source factor here. 73 00:17:33.598 --> 00:17:37.979 And so that if we have output element. 74 00:17:37.979 --> 00:17:45.719 So the output out, so we look at the input elements, elements, 1230123 and solid. So if you've output element K. 75 00:17:45.719 --> 00:17:49.288 That will be a copy of input element. K mod. 76 00:17:49.288 --> 00:18:02.903 For so it actually you might think this is a complicated way to do a simple object. You just say output was input. So I'm on 4 but this is again. It's showing you the general style. 77 00:18:02.903 --> 00:18:06.834 That's so you could do it as 1 little for loop obviously as a program but. 78 00:18:07.048 --> 00:18:11.818 As the overall promoting Prince, these to do this. 79 00:18:11.818 --> 00:18:19.709 What is happening here is that dysfunction is operating on the index of the input indexes. I'll put in Texas. 80 00:18:19.709 --> 00:18:25.648 But tile size, and a lot of board upgrade is not particularly necessary. 81 00:18:25.648 --> 00:18:35.189 And none of that's particular. 82 00:18:37.259 --> 00:18:41.759 And. 83 00:18:41.759 --> 00:18:45.719 Where the interesting thing is. 84 00:18:46.949 --> 00:18:50.189 Let's see. Whoops. 85 00:18:50.189 --> 00:18:56.969 Okay, so what we've done is we've got the class tiled range and then. 86 00:18:58.739 --> 00:19:06.594 What we can go down here and the count, we have an input which is say, we have some sort of counting it or 8 or this is a little over. 87 00:19:09.054 --> 00:19:14.663 Now the permutation or 8 or I haven't showed you before a permutation editor. 88 00:19:14.939 --> 00:19:24.749 Takes it's a random access, so takes the vector of indices. 89 00:19:24.749 --> 00:19:28.648 And it reads those indices from the input. 90 00:19:28.648 --> 00:19:35.788 Um, from the input factor, and does it start strictly a permutation? Because a permutation is a. 91 00:19:35.788 --> 00:19:40.288 Like, a 1 to 1 mapping just doesn't strictly have to be a 1 to 1 mapping. 92 00:19:40.288 --> 00:19:43.858 But in any case, I'm ex. 93 00:19:43.858 --> 00:19:47.848 Basically. 94 00:19:47.848 --> 00:19:56.699 What's going to happen down here? Now? Here's a beautiful example of we've composed 3 functions. 95 00:19:56.699 --> 00:20:04.558 The counting editor just returns 012345 and so on. 96 00:20:04.558 --> 00:20:09.479 A transform editor does a transformation. 97 00:20:11.009 --> 00:20:24.778 And then a, and what it's doing is this here is working on industry so the counting, it's, you're turning 012345 we're treating these as indices. 98 00:20:24.778 --> 00:20:28.769 Okay, the transport and uses these are indices end to the. 99 00:20:28.769 --> 00:20:33.148 Input vector you might say the transform editor. 100 00:20:33.148 --> 00:20:36.568 Computes what the output index would be. 101 00:20:36.568 --> 00:20:42.298 For each input index and the transform editor. 102 00:20:46.943 --> 00:20:58.794 Yeah, it's defined a little further up, but it's using that mod thing. So the output from transform counting are the indices that we. 103 00:20:59.308 --> 00:21:04.318 The sequence of indices that we need to copy to the output basically now. 104 00:21:04.318 --> 00:21:10.318 Permutation does this random cough, so takes the vector. 105 00:21:10.318 --> 00:21:16.739 Of data, and then takes a vector of indices and it in parallel. 106 00:21:16.739 --> 00:21:24.148 Um, well, it's also, okay I am, I skipped 2. 107 00:21:24.148 --> 00:21:29.429 A little to transform iterate or takes some indices and it takes some data. 108 00:21:29.429 --> 00:21:38.398 And, in fact, I'm waving my hands on, but the effect of all of this is that we've copied the desired input vector elements. 109 00:21:38.398 --> 00:21:44.909 To the output with the permutation editor and then it just. 110 00:21:46.048 --> 00:21:49.919 What just returns here is a number of elements effectively. 111 00:21:49.919 --> 00:22:00.538 The, and the ended, or K points to 1 class last element is the output. How many elements are there in the output is a number of elements and the input. 112 00:22:00.538 --> 00:22:07.078 Times the repetition factor, which in tiles that we added to the beginning pointer in this gives us a pointer of the output. 113 00:22:08.999 --> 00:22:15.358 So, what this is doing is and doing the replication, and it's doing. 114 00:22:15.358 --> 00:22:18.659 And it's writing the data out. 115 00:22:18.659 --> 00:22:22.618 As I'm using a copy. 116 00:22:22.618 --> 00:22:32.638 Thrust copy thing and what it's doing is copy. Actually, I showed it with a race. It's actually copying with. 117 00:22:32.638 --> 00:22:36.659 And so we're copying from the data, which is where the. 118 00:22:36.659 --> 00:22:40.439 Put data was put and. 119 00:22:43.019 --> 00:22:46.709 And it's actually the output is. 120 00:22:48.479 --> 00:22:56.249 It's copying to the it make the output is pointing to the Australian that writes the data that writes the input data. 121 00:22:57.568 --> 00:23:05.848 And here again, wasting my hands, although we take the input so we're calling tiled range and it will duplicate. 122 00:23:05.848 --> 00:23:10.348 They range twice and copy it to the output. So. 123 00:23:10.348 --> 00:23:14.909 3, thrice and so on I can make. 124 00:23:16.858 --> 00:23:23.368 Oh, I already just approve. I can actually compile it. I'll remove the compiled thing and make it again. 125 00:23:26.308 --> 00:23:31.858 You'll notice that NBC is extremely slow. 126 00:23:31.858 --> 00:23:39.088 Because it's taking the program, it's split it into CUDA stuff and C plus plus stuff and compiling them separately. Then merging the result. 127 00:23:39.088 --> 00:23:46.558 Okay, there's the result we right we repeat the range, the input range we repeated twice and thrice and so on. 128 00:23:46.558 --> 00:23:51.659 I go back to my blog blog. 129 00:23:51.659 --> 00:24:03.179 And, I mean, it's a little bigger. Okay. So now, let me show you how I can make it a little shorter with child range. 2. 130 00:24:04.888 --> 00:24:09.719 So do I have it here? Yes. 131 00:24:11.278 --> 00:24:22.348 So, I use play, sold a notation. Okay. That's just printing the thing out. I put it in the function here. 132 00:24:23.848 --> 00:24:27.628 Oops, the, and again. 133 00:24:27.628 --> 00:24:31.138 And this is it right here? Um. 134 00:24:31.138 --> 00:24:40.499 It's a little shorter you might say. So, what did I do here? I'm using a new thing called gather and this thing gather photo has the same function as. 135 00:24:40.673 --> 00:24:46.614 As the permutation gather will gather a random set of elements from a vector. 136 00:24:46.614 --> 00:25:00.743 So it takes a data vector it takes an index factor of random indices, and it goes and gathers them from the source data vector and puts them sequentially and to the target data vector. 137 00:25:01.134 --> 00:25:04.253 So, here's how we do this to replicate the thing. 138 00:25:07.709 --> 00:25:11.999 So, basically of accounting it or 8 or. 139 00:25:14.939 --> 00:25:19.019 Basically, so here this place on a notation. 140 00:25:19.019 --> 00:25:30.239 Is that means takes 1 hour it's a function takes 1 argument and does it mod in? And here is a local variable. 141 00:25:30.239 --> 00:25:33.449 And and that equals 4. okay, so. 142 00:25:35.759 --> 00:25:40.709 So, the transformed, it takes a vector. 143 00:25:40.709 --> 00:25:48.868 And it returns a virtual vector, which is each element of the original vector transformed somehow. 144 00:25:48.868 --> 00:26:02.219 So, the original vector is accounting 012345 and it transform and it's a virtual thing. It's not stored. It transforms the elements as it needs them. And the transformation is to take each element in. 145 00:26:03.449 --> 00:26:16.229 So, this here is a, it's an inter to a virtual vector. Let's say it was 3 we'll do 012012012 and so on and this and this here. 146 00:26:16.229 --> 00:26:20.699 Will be a pointer to the end of that so I will do this. 147 00:26:20.699 --> 00:26:27.088 Until we have vector in time see long. So. 148 00:26:27.088 --> 00:26:32.038 Data dot began is our source data up here. 1020, 30, 40. 149 00:26:32.038 --> 00:26:35.548 And ve, dot began is our output. 150 00:26:35.548 --> 00:26:40.648 It's up here. Oh, okay. So this thing says here. 151 00:26:40.648 --> 00:26:44.638 Is it rights into output? Vector? V. 152 00:26:45.898 --> 00:26:50.638 And it the ice element of the. 153 00:26:50.638 --> 00:26:54.088 Is the I mod and element of data. 154 00:26:54.088 --> 00:26:57.959 License Apple much simpler than how NVIDIA does it come? 155 00:26:57.959 --> 00:27:04.528 I write better code than in video. The video people do that. Trust people do. 156 00:27:05.818 --> 00:27:12.749 And writing code for a while. So okay. And that is that here. 157 00:27:12.749 --> 00:27:17.128 And I can, I have it here compiled so. 158 00:27:19.888 --> 00:27:24.028 And it does pile data, for example, um. 159 00:27:25.288 --> 00:27:30.868 I can prove that it really works that I haven't totally can totally canned example. 160 00:27:30.868 --> 00:27:34.019 Let's see. 161 00:27:34.019 --> 00:27:37.888 Silence. 162 00:27:37.888 --> 00:27:42.118 Silence. 163 00:27:42.118 --> 00:27:45.568 I had a few more things to it. 164 00:27:46.888 --> 00:27:50.459 Silence. 165 00:27:53.878 --> 00:27:57.209 Silence. 166 00:28:14.759 --> 00:28:20.068 Oh, okay. We just modified it a little. 167 00:28:20.068 --> 00:28:24.749 Guess what I needed to do. 168 00:28:24.749 --> 00:28:29.338 You try to do an example. Oh, oh. 169 00:28:29.338 --> 00:28:33.148 Of course, just a sec. 170 00:28:33.148 --> 00:28:38.969 All of these. 171 00:28:38.969 --> 00:28:43.348 Silence. 172 00:28:43.348 --> 00:28:48.598 Time set a date. 173 00:28:48.598 --> 00:28:52.288 Oh, okay. And I can do it. 174 00:28:52.288 --> 00:28:56.249 Kill it. 175 00:28:56.249 --> 00:29:01.348 Oh, okay. 176 00:29:07.019 --> 00:29:18.689 That's better what I was deleting was stuff that compile for a specific version of CODA, which which I had to do at 1 point, but not now. 177 00:29:18.689 --> 00:29:22.499 Yeah, you say actually works. 178 00:29:22.499 --> 00:29:25.558 Nice and things occasionally work. 179 00:29:25.558 --> 00:29:31.648 A child range 2 count range 3. 180 00:29:33.568 --> 00:29:36.929 Let me take a look at the where I put that. 181 00:29:36.929 --> 00:29:40.858 Let me see. 182 00:29:40.858 --> 00:29:45.298 Silence. 183 00:29:50.009 --> 00:29:53.338 What I'm doing here, I'm using the Z shell. 184 00:29:53.338 --> 00:30:03.598 I guess said shell, if you're from Great Britain and star star is a wildcard that expands an arbitrary number of directories deep. So, this is a fast way to do. 185 00:30:03.598 --> 00:30:07.169 I am good. I put that. 186 00:30:10.858 --> 00:30:16.739 That is annoying. 187 00:30:16.739 --> 00:30:22.588 Really. 188 00:30:22.588 --> 00:30:26.338 Hello. 189 00:30:29.788 --> 00:30:33.058 Silence. 190 00:30:36.749 --> 00:30:42.239 Okay, I just want I'm getting stubborn. Where did I put it? 191 00:30:42.239 --> 00:30:52.618 It's weird. Oh, okay. Forget about that. Are there other ways to improve it? 192 00:30:53.909 --> 00:30:58.108 What I what I would be, I'll find it for next time and so on. 193 00:30:58.108 --> 00:31:01.558 Okay, repeated range. Let me show you. 194 00:31:01.558 --> 00:31:04.739 Other example of this. 195 00:31:05.759 --> 00:31:10.048 Silence. 196 00:31:10.048 --> 00:31:13.048 Okay. 197 00:31:17.009 --> 00:31:21.058 Okay. 198 00:31:21.058 --> 00:31:27.148 Okay, so that repeats each element several times. 199 00:31:27.148 --> 00:31:32.638 And if you're if I'm talking to solely for, you could also use a mind yellow. 200 00:31:32.638 --> 00:31:39.088 Sort of placeholder thing, but again, so here. 201 00:31:39.088 --> 00:31:45.568 An output index is actually an input index. I divided by the repeat count. 202 00:31:45.568 --> 00:31:54.058 The floor down to the next lower end. Okay. So if I look up here output index, um, this is. 203 00:31:54.058 --> 00:32:02.459 For this would be 4 divided by 3, which is 1, would be input index 1 counting from 0. 204 00:32:02.459 --> 00:32:07.199 Okay, so and this is done. 205 00:32:07.199 --> 00:32:11.578 Again, the same sorts of ideas is accounting at or 8 or. 206 00:32:11.578 --> 00:32:21.209 The repeat factor is what does the division so the counting 01234 to transform it or take set. 207 00:32:21.209 --> 00:32:28.019 And transforms the things by dividing each element by and effectively the permutation editor. 208 00:32:28.019 --> 00:32:32.489 Takes 1st and. 209 00:32:32.489 --> 00:32:37.078 Which is a pointer to the input data and returns. 210 00:32:37.433 --> 00:32:49.163 To a virtual data set, or the AI element of the output is the AI divided by an element of 1st and it's a point. So it's returning an iterate or turning the pointer. 211 00:32:49.374 --> 00:32:53.153 And if you do reference it, you'll get the desired elements that you want. 212 00:32:54.628 --> 00:33:00.239 And you did, did you do we're also doing a copy just for fun to show that. 213 00:33:00.239 --> 00:33:07.709 Writing data by doing it this is sort of weird, but copying too an output, which is oh stream. 214 00:33:07.709 --> 00:33:13.769 Okay, and then we call repeated range down here. 215 00:33:13.769 --> 00:33:17.939 So, on okay, twice and thrice. 216 00:33:17.939 --> 00:33:23.159 Thrice is old English for 3 times. This is twice as 2 times. 217 00:33:23.159 --> 00:33:27.659 And we can make it. 218 00:33:27.659 --> 00:33:35.909 So, you notice the compiling, you don't need to explicitly specify any. 219 00:33:36.953 --> 00:33:51.114 Any include files and so they're in the right place on the system. Now I'm operating directly in this in the par class directly. You don't have right access to this directory. So I hope you don't. So, if you want to do is copy the whole thing over to your own account. 220 00:33:51.419 --> 00:34:01.828 And then repeat it range and it's doing what I said it. Okay. Do I have a local version of it? 221 00:34:04.199 --> 00:34:08.099 This 1. 222 00:34:08.099 --> 00:34:15.838 We, we do, so if I look at repeated range, my version of that. 223 00:34:15.838 --> 00:34:23.159 So, what I did, I'm using placeholder notation. 224 00:34:23.159 --> 00:34:24.534 Make counting it, 225 00:34:24.534 --> 00:34:34.434 or the nice thing was these make functions is we don't have to be explicit about what the return data type base it to turning some complicated class, 226 00:34:34.434 --> 00:34:37.793 which is not the name of the class is not fit for human consumption, 227 00:34:38.003 --> 00:34:39.204 but you don't have to consume it. 228 00:34:39.204 --> 00:34:49.164 Okay it just because as an input to the next function, so counting it or is there 1234 here's the placeholder notation to divide by. See where it sees a variable that's known in the scope. 229 00:34:49.164 --> 00:34:55.554 So, make returns virtual 00111222 saying, and takes a sequence of embassies. 230 00:35:01.318 --> 00:35:07.289 And applies it to data dot begin and it returns returning this. 231 00:35:07.289 --> 00:35:19.559 New output. All right. Okay. And now notice what's happening here? There's no temporary storage. Okay. We got the input. The only. 232 00:35:19.559 --> 00:35:26.219 That thing that's stored is our input data. There's no other array stored in the whole system. 233 00:35:26.219 --> 00:35:36.418 Everything is just, you know, it's this composition 3 functions, deep composed together and yet and what we have here is an, that when we do reference it. 234 00:35:36.418 --> 00:35:43.829 We get the appropriate elements of data so this is the power of this of this functional notation. 235 00:35:45.293 --> 00:35:58.643 Or quasi function, location at functional location. Pierce would argue this isn't really functional notation if you want to get into that debate. Well, okay, but otherwise, I'm just pointing out I'm aware of the debate. 236 00:35:58.643 --> 00:36:01.014 I just disagreed with it and then here. 237 00:36:01.289 --> 00:36:05.969 So, we never start the output at all. What we do is we. 238 00:36:05.969 --> 00:36:13.798 Outputs it puts not the output array. It's an, a fancy and we print the thing directly. 239 00:36:13.798 --> 00:36:17.068 We never store it, so that is efficiency. 240 00:36:17.068 --> 00:36:20.369 Kay, and we talk and talk about it here. 241 00:36:20.369 --> 00:36:25.528 And I talk about what's happening here. Oh, so that. 242 00:36:27.239 --> 00:36:39.028 Okay, so those things are sort of regular you might think they're obvious, but let me show you some things that are not regular at all. Do do. 243 00:36:46.619 --> 00:36:50.728 Expand no questions. 244 00:36:50.728 --> 00:36:54.898 Okay, okay. Okay. 245 00:36:55.949 --> 00:37:00.929 Expand those run length. 246 00:37:00.929 --> 00:37:07.079 It does run lights, end coding or sorry I run my I. D. coding. 247 00:37:07.079 --> 00:37:12.268 I'm backwards. Okay. What we have here expand takes 2 arguments. 248 00:37:12.268 --> 00:37:21.059 And the 1st argument is a factor of run lines. The 2nd argument is the vector data elements for the run. 249 00:37:21.059 --> 00:37:27.838 Let's look at the 1st, analog here so this says we're going to have a run of 1 a 3 B's and c's. Okay. 250 00:37:29.009 --> 00:37:43.074 So the question is, so nothing is regular here. So the question is, how can we do it fast by the all the other stuff? I, should it all works fast and parallel by the way. I mean, that's the point of it. 251 00:37:43.074 --> 00:37:52.673 So, all this stuff takes, like, log in time, those scans and repeats and they're very fast and parallel. This will be fast and parallel. So, even though things are. 252 00:37:52.949 --> 00:38:05.548 Any event I'm going to get you some highlights of this and again, the programming style is God awful, but these are all these weird data types. Okay. Let me show you the. 253 00:38:06.989 --> 00:38:11.519 To let me show you my where did I put it? 254 00:38:12.750 --> 00:38:25.230 Okay, this will be the high level thing. We have a vector of data and factor of counts. Okay. And notice account might be 0 set data element. 255 00:38:25.230 --> 00:38:29.849 Ok, so the 1st question is and this 1, we're actually going to say. 256 00:38:29.849 --> 00:38:39.570 Perhaps right, the output factor, we could just have return of smart reference. You get out but elements, but this will right to factor just. 257 00:38:39.570 --> 00:38:44.340 To be different 1st thing is, is how long is the output? 258 00:38:44.340 --> 00:38:49.289 Factor going to be. 259 00:38:49.289 --> 00:38:57.269 Excuse me, it's a sum of all the run lane so we do we do a some reduction on C. 260 00:38:57.269 --> 00:39:00.630 And we get the length of the output. Good. 261 00:39:01.920 --> 00:39:05.159 So, now we can construct that vector if we need. 262 00:39:05.159 --> 00:39:10.469 The 2nd thing is that we, we want adult factor. 263 00:39:10.469 --> 00:39:16.920 Which, and again what adult factor is a vector of bases and it shows where. 264 00:39:16.920 --> 00:39:20.460 Adult factor is when you have a rag, what I call it a ragged are right. 265 00:39:20.460 --> 00:39:26.550 It's, it's a vector of variable inspectors and you. 266 00:39:26.550 --> 00:39:33.059 Add to use that sort of thing a lot in my own parallel geometry programming. But here, the thing here is your output. 267 00:39:33.059 --> 00:39:40.889 And the dope vector will point to the started this arrows to the start of the Kansas started. The therapies has started the forties. 268 00:39:40.889 --> 00:39:45.269 And there started the twenty's would be in here and it's none of them. 269 00:39:45.269 --> 00:39:53.610 So, the 1st element adult don't factor. Sub. 0 is 00. so dope factors of 1 is 2 and so on. 270 00:39:54.715 --> 00:40:08.605 Okay, so the adult factor we get by doing an exclusive scan on this exclusive scan. The 1st output element is always 0, because it'll factor always starts off at 0. then the 2nd output element is to the 3rd 1 is 3. 271 00:40:08.605 --> 00:40:14.724 the 4th 1 is 3 the next 1 is 6. the next 1 is a slow scan. It's just a vector of partial sums here. 272 00:40:15.000 --> 00:40:18.840 Where it starts at 0. 273 00:40:18.840 --> 00:40:22.739 Yeah, okay. So that's what we have down here. 274 00:40:25.679 --> 00:40:36.690 Okay, the next thing is a scatter if we saw gather, gather, goes and grabs random elements into a factor. 275 00:40:36.690 --> 00:40:43.230 Scatter does the opposite scatter scattered the elements of a vector into random places. 276 00:40:43.230 --> 00:40:46.829 And so. 277 00:40:46.829 --> 00:40:58.980 Gather and gather may gather the same element into more than 1 place. So scatter may scatter the same old. So scattered input is sequential. And its output is random. 278 00:40:58.980 --> 00:41:11.190 Whereas for gather, the input is random, the output is sequential you might say so they're complementary and by the both of these things go fast and parallel we wouldn't be talking about. Okay so. 279 00:41:11.190 --> 00:41:18.059 So so scatter scattered a sequential vector into random places. 280 00:41:18.059 --> 00:41:23.610 However, it doesn't scatter them all. 281 00:41:23.610 --> 00:41:29.670 It has what's called a stencil, which is a vector of booleans. 282 00:41:30.175 --> 00:41:41.755 And it scattered, only elements where the value is Paul is non 0. so it's a conditional scattered. 283 00:41:41.755 --> 00:41:45.204 So it scattered some elements of the input vector and drops others. 284 00:41:47.039 --> 00:41:51.960 Okay, and what we're going to do is here, so. 285 00:41:53.010 --> 00:42:05.309 This gets weird, but it's that this is worth your while to see how this things works. That's why I wrote this down here. In fact, I wrote it down so I could understand it because if you look at the code. 286 00:42:05.309 --> 00:42:09.119 If you may, you want to understand this took some thinking, but it's worth the time. 287 00:42:09.119 --> 00:42:13.349 Okay, so how do we do this coding? 288 00:42:13.349 --> 00:42:20.130 Um, so I've created the adult factor. 289 00:42:21.989 --> 00:42:31.289 So this is where each round will start the run of these arrows element will start here. The run of the 1st element will start here. The 1st element is attends. 290 00:42:31.289 --> 00:42:41.400 The 2nd element we'll start, or the twenties will start here, but there won't be a 0 actually the run of the 30 s will start our position. 3. 291 00:42:41.400 --> 00:42:44.909 Around the 40 s will start a position 6. 292 00:42:44.909 --> 00:42:49.739 Okay, now this gets weird what we do. 293 00:42:49.739 --> 00:42:54.480 Is that we just take accounting. 294 00:42:54.480 --> 00:43:00.000 And we scatter it into an output vector. 295 00:43:02.039 --> 00:43:06.869 Where each counting number like. 296 00:43:06.869 --> 00:43:14.429 Say, 3 goes to the position and the output vector where the 3rd run will start. 297 00:43:15.570 --> 00:43:20.070 So this is the 0 gets copied to where the 01 will start. Well. 298 00:43:20.070 --> 00:43:23.849 Doesn't much matter the 1 gets copied to the position. 299 00:43:23.849 --> 00:43:26.969 Where the 1st run will start. 300 00:43:26.969 --> 00:43:36.000 The 2, well, won't get copied because that the 2nd run is lane 0. the 3 will get copied to where the 3rd run will start. 301 00:43:36.000 --> 00:43:40.050 And the 4 will get copied to where the 4th run it will start. 302 00:43:40.050 --> 00:43:50.280 How we do that is we take this counting vector, which, of course, it's a virtual 1 and where we scatter each element. 303 00:43:50.280 --> 00:43:56.610 Is with the dope factors, and we scattered them and who it was originally all zeros. 304 00:43:56.610 --> 00:44:03.150 So these don't factor elements get scattered into. 305 00:44:03.150 --> 00:44:09.480 It scattered into here, for example, the 4 gets scattered into the 6th position. 306 00:44:11.610 --> 00:44:21.239 Okay, so this is what we get and the 2 didn't get scattered anywhere because there's a stencil factor, which will be. 307 00:44:21.239 --> 00:44:25.349 1, only if the runs have a positive length. 308 00:44:28.320 --> 00:44:37.590 Okay, now what's the next thing we do. 309 00:44:37.590 --> 00:44:47.579 We're going to do an inclusive scan here. 310 00:44:47.579 --> 00:44:54.389 So, yeah, the, the closest can by default does a lot of partial sums. 311 00:44:54.389 --> 00:45:05.909 Here we're going to do a lot of partial maxes, but we're effectively going to do what we want to do is run down this factor and make each element the maximum of all the preceding elements. 312 00:45:08.010 --> 00:45:12.059 So, it's an inclusive scan. We only care about the final. 313 00:45:12.059 --> 00:45:17.250 The final output well, that is that. Okay, so. 314 00:45:17.250 --> 00:45:28.079 So, the output, if here, some access, this is so what happens is that 0 gets replaced by a 3 the 1 gets replaced by 3. 315 00:45:28.079 --> 00:45:31.199 So, the, the, the inclusive scan. 316 00:45:31.199 --> 00:45:34.949 On this, we'll end up with this. 317 00:45:37.050 --> 00:45:40.469 So, let's think about what happened this here. 318 00:45:40.469 --> 00:45:44.340 Is a vector or or the start of each run. 319 00:45:44.340 --> 00:45:48.210 Has the number the ID number of the run in it? 320 00:45:48.210 --> 00:45:57.510 So, the 1st, element of each run has the number of the run and all the all the other elements of the runner is 0. 321 00:45:59.699 --> 00:46:05.880 When we run down this, doing a maximum, then we get a vector where each element. 322 00:46:05.880 --> 00:46:10.139 Now, has the number of the run that that element is in. 323 00:46:11.250 --> 00:46:14.969 And again, now, this is interesting because it's maximum thing. 324 00:46:14.969 --> 00:46:18.599 Surprising as it is, can be done and log time. 325 00:46:18.599 --> 00:46:28.170 I mean, to be obvious, I could just run down the array doing linear time and do Maxima. But the thing is, we can do it and log time. 326 00:46:28.170 --> 00:46:39.715 I'll let you think about how you do that I mean, you saw how we could do a summer. I never I saw it I guess I showed you how you can do the scan in log time. You build up a trace harder. 327 00:46:40.135 --> 00:46:43.405 Well, instead of doing some addition, we're doing maximum. 328 00:46:44.070 --> 00:46:47.489 You can do your scan over anything that's an associate of. 329 00:46:47.489 --> 00:46:53.610 Function don't even know if it has to be commuter to probably should be actually, but oh, okay. 330 00:46:53.610 --> 00:47:03.480 So, we do this now, and now, what we've got is a vector. Each each element has the number of the run that I will have in it. 331 00:47:05.039 --> 00:47:09.809 And now you just take this, add, consider this as a vector of indices. 332 00:47:09.809 --> 00:47:17.489 And you do a gather and you build up the output factor. That's how you do run lengths decoding run, lights, end coding. 333 00:47:18.659 --> 00:47:22.769 And it's just a sequence of several. 334 00:47:24.480 --> 00:47:31.590 You know, several of these functional functions. Basically the only thing I didn't show you is how you find this, how you find the. 335 00:47:33.059 --> 00:47:38.579 How you find the stencil thing this is a really powerful. 336 00:47:38.579 --> 00:47:50.639 It's really compact. This is the thing about the functional programming potentially it's really compact and it can be compiled to run past and parallel. This is long time and. 337 00:47:51.809 --> 00:47:57.989 It's just that, like, most compact. 338 00:47:57.989 --> 00:48:02.010 Programming schemes like, Matlab would be another example. 339 00:48:02.010 --> 00:48:12.030 The joke is, it's it's a right only language or you have to seriously think about how it works, but it's, it's such a powerful metaphor it's worth thinking about. So. 340 00:48:13.739 --> 00:48:17.909 And so this is my executive summary of what's going on. 341 00:48:22.559 --> 00:48:27.090 Oh, there it is again and do do do do do. 342 00:48:29.340 --> 00:48:36.989 This is the space now this again could be coded to be much briefer than they have for their crap programming style. Here. 343 00:48:39.389 --> 00:48:48.869 And the only thing I didn't show you was the thing. 344 00:48:48.869 --> 00:48:56.909 And the stencil that's a vector of where the element is 1. if the I run. 345 00:48:56.909 --> 00:49:01.829 Has length as a positive length and. 346 00:49:03.059 --> 00:49:07.469 Well, that she just takes the vector of counts here and. 347 00:49:07.469 --> 00:49:11.699 You say, you know, count I greater than 0. that's your pencil back to that one's easy. 348 00:49:11.699 --> 00:49:14.940 Okay, so did it. 349 00:49:14.940 --> 00:49:20.099 And. 350 00:49:20.099 --> 00:49:24.239 That's it, you know, I'm waving my hand, but that's the basic idea here. 351 00:49:24.239 --> 00:49:28.860 Need something slightly different. Okay. Questions. 352 00:49:31.050 --> 00:49:36.510 Eva. 353 00:49:36.510 --> 00:49:39.869 Okay, another complicated 1. 354 00:49:42.239 --> 00:49:45.869 Is set operations and. 355 00:49:48.449 --> 00:49:52.320 Never run expand for you. I never did. 356 00:49:52.320 --> 00:49:57.300 I want to show you, it actually works. 357 00:50:01.289 --> 00:50:04.440 Now, this is slow. 358 00:50:07.650 --> 00:50:13.949 And it actually works. Oh, okay. I won't bother modifying. And if you believe me that it works. 359 00:50:13.949 --> 00:50:19.469 What do you. 360 00:50:19.469 --> 00:50:22.860 Oh, okay. So. 361 00:50:22.860 --> 00:50:26.340 Area a machine is mostly idol here, so. 362 00:50:29.429 --> 00:50:37.170 Dual 14 core system on 2 things that's 56 some hyper threads here. 363 00:50:38.190 --> 00:50:41.699 Okay. 364 00:50:41.699 --> 00:50:45.360 And again. 365 00:50:45.360 --> 00:50:50.070 Nice query. Yeah, that's the 1. 366 00:50:51.210 --> 00:50:55.860 Hi, Sarah yeah. Did you. 367 00:50:57.510 --> 00:51:04.469 48 gigabytes a memory. Okay. Okay. Set operations here. 368 00:51:08.219 --> 00:51:19.920 So, what we have is a set 2 sets of elements, and they're just a sequence of the members of the set. So, it's a vector in the back to just list the members of the set. 369 00:51:19.920 --> 00:51:25.170 And we want to do a union up 2 factors, or an intersection, or a difference, or. 370 00:51:25.170 --> 00:51:28.980 Symmetric difference. Okay. Now. 371 00:51:28.980 --> 00:51:35.760 There are 2 different versions here, depending on whether space or time is more important. 372 00:51:35.760 --> 00:51:41.670 The thing is that the output is also variable size. Okay. 373 00:51:41.670 --> 00:51:46.019 So, it's talking about stuff here. 374 00:51:47.789 --> 00:51:51.869 Before I go to my notes, I can read their notes here. 375 00:51:51.869 --> 00:51:57.000 Um, you see, you're doing Union, let's say the output. 376 00:51:57.000 --> 00:52:03.000 At the biggest could be the, some of the sizes of the 2 input factors, but if there's a lot of duplicates the output. 377 00:52:03.000 --> 00:52:14.550 Size could be much less, so we might perhaps just allocate the Max an output factor. That's the biggest possible size and then shrink it. If it turns out to be too big. 378 00:52:14.550 --> 00:52:27.960 Another way if possible is, we might process the input twice as an idea. I like to do. Actually, we read the data and we somehow compute how many output elements there are without storing them. 379 00:52:27.960 --> 00:52:31.530 If there's a way to do that, and then we. 380 00:52:32.574 --> 00:52:42.684 Allocate exactly the right size of data, and we read in the data 2nd time and populate the protector. So we're reading and processing the data twice. 381 00:52:42.715 --> 00:52:53.545 We're writing it only once now computation I said is often free on a gpo and reading is a lot faster than writing because it's reading sequentially down the vectors. And so. 382 00:52:53.969 --> 00:53:01.769 The Reed twice and right once it's not just a bad idea and it might actually be the fastest way. So, that's what they're talking about here. 383 00:53:02.849 --> 00:53:09.119 So, 1 way to do it. So if I go to my notes here. 384 00:53:10.289 --> 00:53:13.349 Yeah, so the 2 ways, um. 385 00:53:13.349 --> 00:53:18.989 Just construct the maximum sized output factor and then erase it down. 386 00:53:18.989 --> 00:53:24.179 Um, and or the 2nd thing is. 387 00:53:24.179 --> 00:53:34.530 Read the data twice, we assume the data sorted. We suddenly factors sorted so you want to walk down the 2 vectors perhaps and. 388 00:53:34.530 --> 00:53:37.619 You know, just count duplicates. Okay. 389 00:53:37.619 --> 00:53:49.110 So, and so what we're doing is that we're using some built and. 390 00:53:50.340 --> 00:53:56.460 Merge is a stressed thing that merges to assorted array so it does the Union. 391 00:53:56.460 --> 00:53:59.519 Or you don't have to program it. Okay. 392 00:54:00.599 --> 00:54:06.360 So you have to give it a big enough factor and thrust vectors have the usual. 393 00:54:06.360 --> 00:54:09.510 Class function start size, et cetera. 394 00:54:09.510 --> 00:54:13.409 Okay, so the emergency easy way here. 395 00:54:15.929 --> 00:54:19.079 So, set Union. 396 00:54:19.405 --> 00:54:33.385 Is well, there is a set Union built in so we're talking about how it can work here and so on the set Union will do this and it will return and iterated to the end of the vector and the end. 397 00:54:33.385 --> 00:54:40.284 You don't know where it is, because it could be duplicates and then erase to erase things. Okay. 398 00:54:42.179 --> 00:54:47.219 And there's some built in set intersections and so on do the obvious thing. 399 00:54:48.269 --> 00:54:51.840 Yeah, so these are some of these are built in, but that helps. 400 00:54:51.840 --> 00:54:56.429 But say what you're thinking about here is how we might do it. 401 00:54:58.440 --> 00:55:02.489 Okay, the next thing is this. 402 00:55:02.489 --> 00:55:11.460 Oh, and those mergers and so on going parallel, of course, you might think how they could possibly do that. Okay. Sparse vector is. 403 00:55:11.460 --> 00:55:17.940 We have a vector that's most zeros and some of my research has worked with a matrix as most of these arrows. 404 00:55:17.940 --> 00:55:25.139 Example of a sparse matrix. Suppose I'm doing some of the flash and PDA and I have a heat flow. 405 00:55:25.139 --> 00:55:28.855 Pda let's get other names also. 406 00:55:28.855 --> 00:55:40.224 Each element is the average average for neighbors, you've got temperature in a square plate and so it's an end by infector of temperatures, let's say, and in the steady state, each. 407 00:55:40.500 --> 00:55:49.800 Point is the average of its 4 neighbors, so you could construct as a set of linear equations. If it's an end by end array there's N, squared unknowns. 408 00:55:50.875 --> 00:56:04.135 And so, this could be a, it's a linear system with N squared unknowns C or a vectors N squared by in square to send us the 4th elements. But each role, which is N squared log is only. 409 00:56:06.269 --> 00:56:11.099 Which is the 4th is N squared unknowns. 410 00:56:11.099 --> 00:56:14.369 So it's an N squared by N squared coefficient. 411 00:56:14.369 --> 00:56:20.070 Matrix, but each row, which is N, squared long, has only 5 and. 412 00:56:20.070 --> 00:56:28.920 Non zeros, which is N squared long has 5 and non. 0 So it's that's extremely sparse. Okay. So. 413 00:56:28.920 --> 00:56:36.059 We may want to say, add 2 as far as factors and this is actually working with sort of the union and whatever. 414 00:56:37.349 --> 00:56:40.800 So, there's 2 ways to do this. 415 00:56:42.329 --> 00:56:47.849 So, the sparse factors again, it's a pair it's a factor of indices vector data. 416 00:56:47.849 --> 00:56:53.159 So we want to add 2 sparse factors. We could essentially merge the index are raised. 417 00:56:53.159 --> 00:56:56.159 I am waving on my hands and and so on, but. 418 00:56:56.159 --> 00:57:03.119 And, and then a reduced by key, or we could. 419 00:57:03.119 --> 00:57:08.460 Take the 21 way we could just takes it to input factors and catenate them combine them. 420 00:57:08.460 --> 00:57:19.380 And then we sort by Index, so if the 2 spar sectors have the same index of the same non 0 element, those will be adjacent in this catenate sorted array. 421 00:57:19.380 --> 00:57:23.670 And now we can find those duplicates and. 422 00:57:23.670 --> 00:57:27.059 Doing in our product and to reduce by key. 423 00:57:27.059 --> 00:57:32.730 I'm waiting hands and so on. Oh, okay. 424 00:57:35.039 --> 00:57:44.010 And to show you what would happen here, and this is part sector. 425 00:57:45.030 --> 00:57:50.309 Don't let me 1st, make the thing. 426 00:57:50.309 --> 00:58:00.329 Silence. 427 00:58:03.360 --> 00:58:06.840 Silence. 428 00:58:06.840 --> 00:58:11.219 Okay, what happened. 429 00:58:11.219 --> 00:58:14.940 Set operations. 430 00:58:17.699 --> 00:58:22.980 Okay sector. 431 00:58:22.980 --> 00:58:26.760 Again, it's a. 432 00:58:28.619 --> 00:58:37.679 There's various type of merge by key of merging things here and which you can work with and. 433 00:58:42.150 --> 00:58:49.050 This is interesting here number of unique indices. 434 00:58:49.050 --> 00:58:52.349 What's happening here is as follows. 435 00:58:55.920 --> 00:58:59.639 If I have an array, let me just say, write something. I don't know. 436 00:59:02.880 --> 00:59:07.320 Didn't even be different. 437 00:59:09.000 --> 00:59:13.260 I didn't want to okay. Something like that. 438 00:59:13.260 --> 00:59:19.019 A way to find, and we want to find the number of unique indicies and something like that. 439 00:59:19.019 --> 00:59:23.820 What we do is that we look for places. 440 00:59:23.820 --> 00:59:29.610 Where there where an element of this array is different from its. 441 00:59:29.610 --> 00:59:35.429 Successor so 0, is the same as 00 is different from 1. that's why. 442 00:59:36.264 --> 00:59:48.534 Here at 1111 is different from 22 is different from 44 is different from 8 so we look at changes. The way we do that is we take the vector and we compare it to the vectors shifted by 1. 443 00:59:50.699 --> 00:59:54.630 And if the vectors and elm is different from. 444 00:59:54.630 --> 00:59:58.980 The shift by 1 element, then we just found a boundary. 445 00:59:58.980 --> 01:00:02.909 So, we do that and we add up. 446 01:00:02.909 --> 01:00:06.780 We add up all of those places where it says a difference and that gives us. 447 01:00:06.780 --> 01:00:11.250 And possibly had a 1 to the result and that gives us the number of. 448 01:00:11.250 --> 01:00:14.309 He might say this is a number of runs actually. 449 01:00:14.309 --> 01:00:18.179 What this is a faster way of doing it, then actually doing a run length and coating. 450 01:00:19.375 --> 01:00:33.264 And this is done actually by a dot product so we do a DoD product, or this factor against itself shifted by 1. but the DoD product you've seen before is a times and ad here, it's a compare. 451 01:00:33.295 --> 01:00:43.795 And it's a difference. And ad, DoD products can work with any pair of 2 operators instead of times. And plus this is not equal. And plus. 452 01:00:44.010 --> 01:00:48.059 And so that's how we count the number of. 453 01:00:48.059 --> 01:00:51.599 A number of fronts here. Okay. 454 01:00:52.619 --> 01:01:01.079 Another interesting paradigm thing. I'll show you some other examples here. I showed you run lines, decoding, run, length and coding. 455 01:01:01.079 --> 01:01:07.769 Let's say, and that would be. 456 01:01:11.639 --> 01:01:17.519 Let's see. 457 01:01:21.059 --> 01:01:25.409 Well, certainly run lanes encoding. 458 01:01:28.829 --> 01:01:33.030 So. 459 01:01:34.679 --> 01:01:43.590 So, Here's this, and we want to reduce it to around a 31 C2 d's lots of these and 2 apps. 460 01:01:44.909 --> 01:01:49.920 So, given what I told you about, what you can do here. 461 01:01:49.920 --> 01:01:53.460 So 1st, you got to find where the boundaries are. 462 01:01:53.460 --> 01:01:57.389 And the length of each, so you find the boundaries of we take this factor. 463 01:01:57.389 --> 01:02:04.289 We shifted by 1, which means you just add 1 to your and you compare this. 464 01:02:04.289 --> 01:02:07.619 2 itself shifted by 1. 465 01:02:07.619 --> 01:02:11.699 And where there is a difference, you just found the start of a run. 466 01:02:11.699 --> 01:02:19.380 Okay, so you do that and just write the new vector out. Now you've got a vector pointing to the start of each run. 467 01:02:20.730 --> 01:02:25.349 And now you can do things, some fail ends with a max, et cetera, et cetera. 468 01:02:25.349 --> 01:02:31.349 And that's basically how you do the encoding so. 469 01:02:31.349 --> 01:02:35.880 Okay, I'm just waiting my hands. I'll little, but. 470 01:02:37.650 --> 01:02:45.480 Other things here, other ideas. 471 01:02:45.480 --> 01:02:53.880 Ways he was a crazy suppose I have 16,000 arrays of a 1000. 472 01:02:53.880 --> 01:02:57.719 Of a 1000 numbers. 473 01:02:59.130 --> 01:03:03.210 See, where my examples here is. 474 01:03:12.840 --> 01:03:21.690 So, what I'm doing here is my pseudo random thing, just the way creating a pseudo random thing. 475 01:03:23.309 --> 01:03:29.639 We have here what is and. 476 01:03:29.639 --> 01:03:35.070 Ends a nice size number. It is 100. 477 01:03:35.070 --> 01:03:42.150 So, I create a vector on the host of size 100,000,000. 478 01:03:42.150 --> 01:03:45.300 I load it with random. 479 01:03:45.300 --> 01:03:51.780 Values, pseudo, pseudo randomization of the index. I just doing that sequentially on the host. 480 01:03:51.780 --> 01:03:55.440 Um. 481 01:03:55.440 --> 01:04:02.010 And what I'm doing is I'm just transforming iterate numbers. 482 01:04:04.170 --> 01:04:11.309 Adam, to and I'm what I'm doing here is I am starting it on the host and doing sorting things. 483 01:04:11.309 --> 01:04:14.969 What I'm doing here is I'm comparing. 484 01:04:14.969 --> 01:04:19.739 Sorting the thing on the host with. 485 01:04:19.739 --> 01:04:25.769 Copying it to the device and sorting it on the device and copying it back. 486 01:04:25.769 --> 01:04:30.630 Using various weird things here. 487 01:04:32.070 --> 01:04:36.539 Silence. 488 01:04:40.590 --> 01:04:47.010 What am I doing here? 489 01:04:47.010 --> 01:04:59.369 Just a 2nd. 490 01:04:59.369 --> 01:05:03.119 Silence. 491 01:05:03.119 --> 01:05:17.820 Silence. 492 01:05:19.230 --> 01:05:24.210 Yeah, I forget it. 493 01:05:27.690 --> 01:05:32.099 Silence. 494 01:05:32.099 --> 01:05:37.530 Silence. 495 01:05:41.460 --> 01:05:45.269 Not give up. 496 01:05:45.269 --> 01:05:54.960 Get this working later, just different ways to do this. So, in any case, it's a trick where, if we sort. 497 01:05:54.960 --> 01:06:06.989 16, the, what this example, if I get it working will show you, is that there's an overhead to starting up to calling each. 498 01:06:06.989 --> 01:06:19.019 Josh Fox is an overhead included a starting up each colonel, which is if you can do some sort of fusion thing. So you've got fewer number of kernels. It can be faster. 499 01:06:20.369 --> 01:06:24.599 And so Here's a crazy way. I want to sort the. 500 01:06:24.599 --> 01:06:30.539 Again, there's 16,000 sets of a 1000 numbers eats at 16Million numbers. 501 01:06:30.539 --> 01:06:39.929 The obvious way is we sort each 1000 element factor once, and we're calling 16,000 sorts in this example, when it was actually working for 10 seconds. 502 01:06:39.929 --> 01:06:43.289 Um, just for the fun of it. 503 01:06:45.389 --> 01:06:50.880 Silence. 504 01:06:55.559 --> 01:06:59.789 I could recover sorry form, but. 505 01:07:01.139 --> 01:07:13.559 No worry about that. Okay. Or Here's another crazy thing. 506 01:07:13.559 --> 01:07:17.579 Is that I create a key vector? 507 01:07:17.579 --> 01:07:27.300 Listing the set that each point is in. So the 1st, 1000 points has said here on the next 1000 points is that 1 so I've now got 2 factors vector as a data. 508 01:07:27.300 --> 01:07:31.590 That's the 16,000 sets and another vector. 509 01:07:31.914 --> 01:07:41.844 1000 zeros a 1000 ones and so on and we sort these things basically together so we sort all the numbers together with the keys and then we started by key. 510 01:07:42.324 --> 01:07:47.695 So the numbers with any setter still sorted but we've now separated out the 16,000 sets again. 511 01:07:48.894 --> 01:07:59.244 It sounds crazy instead of 1 sort. We've, we've done 2 sorts and we sorted twice as much data, but the thing is, we've only got 1 parallel colonel instead of 16,000 parallel kernels. 512 01:07:59.244 --> 01:08:04.644 And when I ran the thing, it went down front, it went down by a factor of a 250. 513 01:08:07.230 --> 01:08:17.939 An intermediate thing as I could inside thrust, I could actually do a nested sort and thrust would actually do this. 514 01:08:17.939 --> 01:08:24.989 Fairly efficient, and it, it's times between the 2 times, it's 30 times faster than the naive way but. 515 01:08:24.989 --> 01:08:30.869 So, 8 times slower than the efficient acid so it's invested sort sort of thing. 516 01:08:30.869 --> 01:08:35.939 Oh, okay. 517 01:08:35.939 --> 01:08:39.060 Other examples here. 518 01:08:43.350 --> 01:08:46.859 Let's see got proud. 519 01:08:46.859 --> 01:08:50.939 Dot products. 520 01:09:02.220 --> 01:09:06.869 So, what we have here is we've got. 521 01:09:06.869 --> 01:09:10.649 3 D, factors, and we want you to do a product of them. 522 01:09:11.760 --> 01:09:15.720 But the thing is, I want to do it with. 523 01:09:15.720 --> 01:09:21.630 A structure of a raise, instead of an array of structures basically. 524 01:09:23.640 --> 01:09:29.369 Something sort of like that, and I showed this was on 1 of the slides that. 525 01:09:30.600 --> 01:09:37.140 Or we have a 2 Paul, a B, and we get the elements of the 2 ball. 526 01:09:38.520 --> 01:09:44.670 Well, we have 232 poles actually, and this gets the elements here. 527 01:09:44.670 --> 01:09:48.539 So, get 0, gets the. 528 01:09:48.539 --> 01:09:54.510 Zeros all 1st and 2nd, and again, these get, it's a template. 529 01:09:54.510 --> 01:10:02.430 Class and its specializing, the template was 01 or 2 and what's inside. The angled brackets are Constance. 530 01:10:02.430 --> 01:10:08.970 You cannot iterate over this thing here. So so I could not hear write a loop. 531 01:10:08.970 --> 01:10:16.170 Unless the loop was expanded at compile time, I could not have get from Sara too, which is actually somewhat annoying. 532 01:10:16.170 --> 01:10:23.850 You or you could write a loop would be a compile time you would have to do, but okay. Any case. So, this here. 533 01:10:25.050 --> 01:10:30.300 Does they dot product of 2? 534 01:10:30.300 --> 01:10:39.930 3 D points. Okay. And it's done overloading operator here, which means you can do it faster with. 535 01:10:39.930 --> 01:10:45.449 Okay, so. 536 01:10:45.449 --> 01:10:54.329 We have down here just wider, so you can see what's going on. Okay. 537 01:10:56.159 --> 01:11:00.449 And it's saying we want to use the structure of a raise approach. 538 01:11:00.449 --> 01:11:08.430 The qualifications here, this problem is actually getting a little smaller with current versions of the, but it's still an issue. 539 01:11:09.569 --> 01:11:14.789 Okay, so random vectors is creating random vectors. So, um. 540 01:11:16.619 --> 01:11:22.260 So basically a 0, 1 and 2. 541 01:11:22.260 --> 01:11:29.460 It's some base again, 3.01 and 2 and 3 D points. 542 01:11:29.460 --> 01:11:35.039 But it's a, we got 6 factors factors of X Y, Z, X Y, Z. 543 01:11:35.039 --> 01:11:43.319 Okay, so what's happening down here? 544 01:11:43.319 --> 01:11:49.649 Is that, um. 545 01:11:49.649 --> 01:11:53.430 This is creating a tip at a writer. 546 01:11:53.430 --> 01:11:59.579 Pointing to the points in a but the X Y, and Z is have been brought together. 547 01:11:59.579 --> 01:12:02.880 So, a 1st. 548 01:12:03.055 --> 01:12:17.064 Is pointing to the 1st, 3 D point a, and it did it by using the separate or taking to a 0 to begin a 1 dot began a 2 dot began, make them into a total and making that into a zip. 549 01:12:19.260 --> 01:12:33.239 And that's so theaters is horrible type up here. I would say auto and a dot last and beat up. 1st. Okay. So a, a, 1st, it's again, it's pointing to a virtual. 550 01:12:34.439 --> 01:12:37.770 No, virtual vector of 3 D points. 551 01:12:40.770 --> 01:12:51.689 And now transform is going to do a doc product. So Trent, so we want to do is we want to do doc products of corresponding points in a, and B. 552 01:12:51.689 --> 01:12:59.159 The 0, say, dotted this arrow speed the 1st day dotted the 1st be installed and we want to output a factor of floats. 553 01:12:59.159 --> 01:13:06.329 Okay, and it's being done with the transform so this version of the transform. 554 01:13:06.329 --> 01:13:11.340 Transforms. 555 01:13:11.340 --> 01:13:20.340 Transforms takes pairs of vectors. So this is the 1st, this is the 2nd factor. 556 01:13:20.340 --> 01:13:25.890 The course pairs a car spotting elements are combined with Dot product. 557 01:13:25.890 --> 01:13:31.680 And it's written into result. Okay. 558 01:13:34.619 --> 01:13:39.899 Yeah, and that right? Stuff and dot product again was the thing up. 559 01:13:39.899 --> 01:13:46.560 Up here, which I would write using a Lambda or. 560 01:13:46.560 --> 01:13:57.090 I sold a notation K post device because the method might be wants to be compiled on both the hosting the device. Okay. 561 01:13:57.090 --> 01:14:02.760 1st method here. 562 01:14:04.380 --> 01:14:07.710 They've noticed that it can be a little Congress. 563 01:14:09.779 --> 01:14:15.029 Um, now. 564 01:14:16.829 --> 01:14:25.949 What we're doing here. Okay so, what we did up here is that we created the. 565 01:14:25.949 --> 01:14:32.670 And then call transform, what we can do is we can put this function call here. 566 01:14:32.670 --> 01:14:38.939 And zap it right down into here, let's say, you know, you can create 1, really long, complicated statement. 567 01:14:38.939 --> 01:14:46.649 Instead of explicitly creating generators and transform up. So this method is reasonable, but you can pack it all. 568 01:14:47.850 --> 01:14:56.010 And 21 thing like here so this is 1 line. Okay. Are here. 569 01:14:56.010 --> 01:15:03.060 Advantage of this is we don't have explicit variables of type whatever the output it makes that better is for example. 570 01:15:03.060 --> 01:15:17.279 Okay, and it'll work so this is showing zip and they can put them into transform and sod 2 dot product. 2 vectors a 3 D points stored as structure of a race. 571 01:15:17.279 --> 01:15:21.300 Ok, talking about that here. 572 01:15:22.979 --> 01:15:26.189 Arbitrary transformation. 573 01:15:31.050 --> 01:15:34.350 Oh, I probably should compile it. 574 01:15:35.819 --> 01:15:41.729 It actually works. 575 01:15:43.260 --> 01:15:48.600 I'm assuming the mass is correct. 576 01:15:50.010 --> 01:15:53.010 Ok, arbitrary transformation. 577 01:15:54.750 --> 01:16:00.569 Um. 578 01:16:00.569 --> 01:16:10.229 So, here, we might want to do a function on perhaps 3 inputs. But how do we do that? Because transform. 579 01:16:10.229 --> 01:16:16.409 Can transform actually 1 of the elements of 1 factor, or the elements of only. 580 01:16:16.409 --> 01:16:24.000 2 vectors pair it up, but it was suppose you wanted to a function on 3 vectors. 581 01:16:24.000 --> 01:16:30.270 So, what we do is they're going to sip everything together, so. 582 01:16:31.409 --> 01:16:34.439 Say, Andre plus B. okay. 583 01:16:38.100 --> 01:16:41.909 Okay. 584 01:16:43.229 --> 01:16:46.350 So, how do you do something like this? Um. 585 01:16:46.350 --> 01:16:49.529 So, we got 3 input vectors, kind of an output factor. 586 01:16:49.529 --> 01:16:56.579 So, we take to each of the 4 vectors we zip it up into 1. zip. 587 01:16:56.579 --> 01:17:00.390 Whose components are pointing to all 4 relevant factors. 588 01:17:04.255 --> 01:17:17.935 And what's going to happen with arbitrary factor? 1 of their names? Some silly names this will take. So this thing here will take an input of is it better? Who's components are the 4 relevant factors. 589 01:17:18.510 --> 01:17:21.689 And it does the, a plus B times C. 590 01:17:21.689 --> 01:17:27.029 And this is using the fact that zip, the components of a zip can be written to. 591 01:17:27.029 --> 01:17:33.239 And language terms, their L values as well as our values. 592 01:17:33.239 --> 01:17:42.569 So so the documentation and trust is a touch light on this very important thing. But sometimes. 593 01:17:42.569 --> 01:17:48.779 But not always these fancy are El values that means they can be written to. 594 01:17:48.779 --> 01:17:52.829 D referenced and written to it. Okay. So. 595 01:17:54.029 --> 01:18:02.189 So, if we can step up to our 4 relevant factors, and we call arbitrary factor 1 on it. 596 01:18:02.189 --> 01:18:05.760 That will do this operation and right entity. So. 597 01:18:07.380 --> 01:18:11.729 And we could create arbitrary factor 2 that adds the 3 of them. And so on. 598 01:18:11.729 --> 01:18:15.300 Okay, I'll do do do do do do. 599 01:18:15.300 --> 01:18:23.789 There are more efficient ways to do that any case. So, this is a, for each thing, which does the obvious thing. 600 01:18:25.079 --> 01:18:28.289 It is having fun showing you different possibilities. 601 01:18:28.289 --> 01:18:34.079 Okay, we make a total of a B. C and D. we make it into a separate or 8 or. 602 01:18:34.079 --> 01:18:37.500 Or began, and then we call arbitrary factor 1. 603 01:18:37.500 --> 01:18:40.800 On the for each so. 604 01:18:42.060 --> 01:18:46.260 Okay, arbitrary factor too. The same thing. 605 01:18:48.689 --> 01:18:56.310 Silence. 606 01:18:57.630 --> 01:19:05.010 Yeah. Okay. Good. 607 01:19:05.010 --> 01:19:12.029 Here's a crazy idea. Suppose I want to find a bounding box. 608 01:19:13.079 --> 01:19:19.050 And I'll stop after this 1, a founding box around and to the points in parallel. 609 01:19:20.550 --> 01:19:28.470 So, we'll do a reduce, but the reduce instead of summation the reduce is abounding box. 610 01:19:28.470 --> 01:19:37.949 So, we found we find bounding boxes around individual points. I just the point founding box around 2 bounding boxes. 611 01:19:37.949 --> 01:19:46.739 Is the obvious idea so, instead of adding and working our way up the tree, you might say we're finding nested founding boxes. 612 01:19:46.739 --> 01:19:52.229 And it takes a lot of time so this functional notation it's paid. 613 01:19:52.229 --> 01:19:59.399 Is very powerful, so the reduce can do any associative operation and probably should be committed to. 614 01:19:59.399 --> 01:20:03.449 And nested bounding boxes associative. So. 615 01:20:03.449 --> 01:20:07.319 See, how that works. 616 01:20:09.270 --> 01:20:15.000 Structure it's a point. So, here, it's in a. 617 01:20:15.000 --> 01:20:19.229 A structure of array. Well, basically. 618 01:20:19.229 --> 01:20:30.090 It's a 2 D point. This is a constructor. The 2 D point has local member data members X and Y, this is your constructor. 619 01:20:30.090 --> 01:20:33.210 It takes, this is a default constructor. 620 01:20:33.210 --> 01:20:43.590 It takes no arguments and initializes X and Y, to 0. and this is a notation here that in parallel initializes data members of the class. 621 01:20:43.590 --> 01:20:53.100 This is the name and this is the value happens in parallel and it's all done before the body of the function is executed. The body's empty here. 622 01:20:53.100 --> 01:20:58.050 Here is, and we know it's a constructor is the name of the function is the class name. 623 01:20:58.524 --> 01:21:04.944 Here okay, 624 01:21:04.944 --> 01:21:08.845 this is a constructor where we give it alum, 625 01:21:08.845 --> 01:21:10.074 we give it the X and Y, 626 01:21:10.074 --> 01:21:17.875 data their arguments here up before there were no arguments and this is using this initialization idea. 627 01:21:18.180 --> 01:21:23.130 So, each, each data member gets Initialized with the argument. 628 01:21:23.130 --> 01:21:31.890 And there's no body to the function and this is in some senses preferable to putting explicit coffee statements in the body of the function. 629 01:21:31.890 --> 01:21:37.170 So so that's a class for a point to class for a bounding box. 630 01:21:37.170 --> 01:21:40.739 1st and empty bounding. 631 01:21:43.020 --> 01:21:46.319 The here are the. 632 01:21:48.510 --> 01:22:02.430 It's got 2 data elements are in here somewhere, they're lower left an upper right and the constructor. So you could construct a bounding box with no argument. You can construct a bounding box with 1 argument. 633 01:22:02.430 --> 01:22:05.760 So, lower left and up for right? Just get the same point. 634 01:22:05.760 --> 01:22:10.979 Uh, okay. 635 01:22:12.300 --> 01:22:16.649 And I can do it, this will do an assignment on bounding box says. 636 01:22:16.649 --> 01:22:24.000 And if you assign a, if I copy a point to abounding box, and it just does the obvious thing. 637 01:22:24.000 --> 01:22:28.470 Founding box in pairs of points. 638 01:22:30.119 --> 01:22:41.609 Okay, now here is construction. This is a construct an operator for a bounding box. Some parent where the arguments are 2 bounding boxes. 639 01:22:41.609 --> 01:22:52.979 Using the fact that these things get result to compile time and does them in the lower left, the max of the upper right and so on and returns to bounding box. 640 01:22:52.979 --> 01:23:00.359 Okay, I still haven't found in here where the data elements are stored there somewhere. 641 01:23:01.470 --> 01:23:06.810 Yeah okay. Okay. So the, um. 642 01:23:08.279 --> 01:23:12.899 And it's using 1 of these random number of things that I just sort of alluded to. 643 01:23:12.899 --> 01:23:21.960 Okay, this is sequential for loop on the host. Just allocate constructing points with random values. 644 01:23:23.159 --> 01:23:29.340 So that the BB reduction, just the binary operate Ops. 645 01:23:30.689 --> 01:23:36.060 Okay. 646 01:23:38.609 --> 01:23:43.079 And all the fun happens on this 1 line right here. 647 01:23:43.079 --> 01:23:46.079 We give it the points factor. 648 01:23:46.734 --> 01:24:00.925 And we just got awful names. The binary operator is a BB reduction and again, baby reduction is a class and this creates a default variable of that class. 649 01:24:00.925 --> 01:24:13.015 But the apprentice operators overloaded. Any initialization founding buys an empty bounding box based some, sir. Okay. And bang, and this in the 1 line, finds the bounding box around the end points. 650 01:24:13.260 --> 01:24:18.359 In long time course, let's compile it. 651 01:24:19.560 --> 01:24:26.100 And. 652 01:24:29.640 --> 01:24:37.649 I was assuming that's correct. Okay. So what did we see today? Just because, you know, an hour and 20 minutes of a long time ago. 653 01:24:37.649 --> 01:24:41.520 Um. 654 01:24:41.520 --> 01:24:47.939 I gave another homework next Thursday's all day. 655 01:24:47.939 --> 01:24:52.260 And lots more thrust examples. Okay. 656 01:24:52.260 --> 01:25:05.489 Got as far as bounding back, so what will happen Monday is continue with more thrust examples and I don't know, maybe we'll finish thrust by Monday and then next time we start Tom. 657 01:25:05.489 --> 01:25:08.939 Quantum computing. 658 01:25:10.050 --> 01:25:14.819 A questions if there's no questions. 659 01:25:14.819 --> 01:25:19.500 Then have a good weekend. 660 01:25:19.500 --> 01:25:23.520 Until it's still snowing outside or just raining. 661 01:25:23.520 --> 01:25:29.430 Okay, see, you. 662 01:25:29.430 --> 01:25:33.329 Silence.