WEBVTT 1 00:05:10.858 --> 00:05:43.918 Silence. 2 00:06:12.658 --> 00:06:22.918 Silence. 3 00:06:25.168 --> 00:06:29.189 Okay. 4 00:06:29.189 --> 00:06:36.598 Silence. 5 00:06:37.798 --> 00:06:41.249 Silence. 6 00:06:41.249 --> 00:06:49.798 She. 7 00:06:49.798 --> 00:06:54.119 Okay. 8 00:06:54.119 --> 00:06:58.918 Okay, so good afternoon class. 9 00:06:58.918 --> 00:07:08.338 Can you hear me? Great thanks, Isaac. 10 00:07:08.338 --> 00:07:14.129 So, I'm not looking at a mirroring version. 11 00:07:14.129 --> 00:07:20.459 Of my session today, so if you cannot see things. 12 00:07:20.459 --> 00:07:24.689 Mirrored and so on, then, let me know and I'll. 13 00:07:24.689 --> 00:07:27.838 See, if I can fix that so. 14 00:07:27.838 --> 00:07:33.298 Arrow computing class 18 and. 15 00:07:33.298 --> 00:07:39.509 March 29th 2022nd question. 16 00:07:39.509 --> 00:07:42.658 And is going on here. 17 00:07:44.249 --> 00:07:56.399 1st question is what the class, like an optional day off next week, or so, just I gave probabilities day off last Thursday because of the spring town meeting. 18 00:07:56.399 --> 00:08:02.038 Parallel class would like a day off. I'm okay. And just. 19 00:08:02.038 --> 00:08:09.899 Let me know, you could post and webx you could post now and I don't care which day. So I'll be just a chance to do nothing. 20 00:08:11.189 --> 00:08:25.769 Okay, so what's happening now, is we're talking about thrust, which is a mid level parallel programming tool. It's above the level of raw code or you're not looking at. 21 00:08:25.769 --> 00:08:33.749 Registers and global and device routines and so on but it's below the high level thing where you lead to compiler do everything. It's a. 22 00:08:33.749 --> 00:08:42.089 Functional programming paradigm and you apply functions and composed functions and. 23 00:08:43.619 --> 00:08:56.219 And it's widely used and it's alive, which is nice. So today we'll learn some more programming paradigms. It's also a good chance to see some paradigms for how you do parallel functional programming. 24 00:08:56.219 --> 00:09:05.458 And and so these paradigms of wider use outside thrust, of course, it will see lots of examples and see crazy things you can do. 25 00:09:05.458 --> 00:09:10.288 And not going to get to all this stuff today, but. 26 00:09:10.793 --> 00:09:22.703 So we'll finish off this lecture 8 from Stanford, and because the Stanford classes, 10 years old, but this part of it hasn't changed the later Stanford classes, all the to you to look at. 27 00:09:22.703 --> 00:09:28.673 If you want, they're talking more about various parallel algorithms for stuff and parcel differential equations and so on. 28 00:09:28.948 --> 00:09:35.428 And we'll do a bit from another thrust tutorial, which has some interesting slides. 29 00:09:35.428 --> 00:09:47.788 And the next thing here, new topic, Linux, heterogeneous memory management, it's an operating system tool, which makes devices. 30 00:09:47.788 --> 00:09:53.938 Gives devices access to the memory space. This idea has been around for many decades, but doing it. 31 00:09:53.938 --> 00:09:59.668 And in the press 2 for it from memory and a little tricky, it's an enrichment thing, I'll just give you the ideas. 32 00:09:59.668 --> 00:10:04.229 Okay, so Stanford class. 33 00:10:04.229 --> 00:10:09.058 8 lecture 22. see. 34 00:10:12.568 --> 00:10:16.979 Yeah. 35 00:10:19.528 --> 00:10:22.708 Just. 36 00:10:25.619 --> 00:10:30.328 Okay. 37 00:10:31.734 --> 00:10:46.073 A little review here. Last time we saw 3 different ways you can do functions and C plus plus 1 of them was that you have a class or a struct, and you overload the operator per in. So, this is a review. 38 00:10:46.073 --> 00:10:49.313 These 3 slides are review, but the interesting idea. 39 00:10:49.708 --> 00:10:54.298 So you have a variable of class is greater than where you both loaded operator. 40 00:10:54.298 --> 00:10:58.889 Prince, the variable has a member, a threshold and. 41 00:10:58.889 --> 00:11:09.808 So, when you construct a variable of class is greater than and you give it a value for the threshold. Now, what you've done is, you've created a predicate is greater than with a threshold. 42 00:11:09.808 --> 00:11:24.479 Is embedded in the credit and every here's the cool thing. Now, you can create it's a class you can create lots of variables of that class for each variable. You give it a different value for threshold when you construct that variable but you could also change it. It's not. 43 00:11:24.479 --> 00:11:34.769 A constant, the social is a really cool idea, and it's called a closure so is greater than and you defined threshold and you wrap it in and you create. 44 00:11:34.769 --> 00:11:46.134 Essentially a new predicates, so we have it down here. So this is a definition up here at the top and and then we say is greater than that's the class name and credit kit. That's a new variable. 45 00:11:46.134 --> 00:11:49.793 And 10 is an initial value and it's calling the constructor. 46 00:11:50.009 --> 00:12:02.964 Up here, which we see the constructor at all it does is it takes the argument it stores it in threshold. So now you can call so predicates a variable, but you call it as a function with 1 argument and it's using an overload of Princess right here in the count. 47 00:12:02.964 --> 00:12:16.943 F, which does the obvious thing you give it bread and prayer into account the number of so this will count the number of elements, the factor that are bigger than 10 and if you constructed and if you change threshold inside prayed or constructed new variable, you'd have a new special. 48 00:12:17.219 --> 00:12:20.879 So, recap is we got these generic. 49 00:12:21.323 --> 00:12:36.203 Got this generic function generic means they can take different types for their arguments and it's statically dispatched. So it can be hosted can be device different variance of these functions that are different numbers of arguments and compiler determines which 1 to call. 50 00:12:36.563 --> 00:12:40.553 So, it takes a little compiler time, but execution time, there's no overhead. 51 00:12:40.859 --> 00:12:55.288 An example of different parents down here, reduce the basic version and reduce you give it to begin and end it or eighter or you can give it lots of things such as which binary opt you want to use to production. And if about and so these are different versions of reduce. 52 00:12:55.288 --> 00:12:59.099 And it's the compiler chooses which 1. 53 00:12:59.099 --> 00:13:09.208 Okay, the next thing is point into vectors, but you got fancy. I don't actually have data. So these are read only. 54 00:13:09.208 --> 00:13:19.469 For the most part, and but the thing is, they're giving you their reading out from vectors that don't actually exist that they're lazy evaluated their. 55 00:13:19.469 --> 00:13:29.249 The values are created as you as you read from them, and these are, you can add 1 deter it points to the next element of the virtual. Right? And so on. 56 00:13:29.249 --> 00:13:37.198 Constant here, it just always returns a concept and you construct your constant. 57 00:13:37.198 --> 00:13:47.278 It's a, it's an integrator over a factor events, and it doesn't matter if it's hosted device, because it doesn't actually exist. And its name is began initial value 10. and then you can just. 58 00:13:47.278 --> 00:13:51.749 subscript at the same thing as adding to the pointer and so on. 59 00:13:51.749 --> 00:14:00.989 So this looks like so, what's the point? The point of it is some of these other functions, when composing functions, you may need a constant and. 60 00:14:00.989 --> 00:14:04.438 Created as a constant in this programming paradigm. 61 00:14:07.014 --> 00:14:20.783 Counting it or discounts up 12345 or something every time you increment begin you get to the next element, and just shows here that starts off at 10 and like any, or you can add and subtract. We can certainly add. 62 00:14:20.813 --> 00:14:26.754 And sometimes you can subtract it just it doesn't add bites. It adds elements. So, this is 3 elements past weekend. 63 00:14:28.649 --> 00:14:34.048 And so this reduces 3 things. Okay. 64 00:14:34.048 --> 00:14:37.349 So, we saw constant, we saw this. 65 00:14:37.349 --> 00:14:50.938 Counting, and what it does is it takes an integrator pointing to some factor and a function with 1 argument and it returns to that transformed vector. 66 00:14:50.938 --> 00:14:55.979 But it doesn't actually transform the whole thing and store to transform as the elements as you read them. 67 00:14:57.269 --> 00:15:04.499 So, you take this factor X here and f, X and it's it constructs system virtual. 68 00:15:04.499 --> 00:15:12.568 This new virtual vector, but it doesn't take any extra memory. So you see, we're saving on memory and we're saving on. 69 00:15:12.568 --> 00:15:15.958 I here would be an example of it. Um. 70 00:15:15.958 --> 00:15:29.129 Now, the syntax here is clumsy. I can't defend it when I, in fact, I've done. I've actually used thrust a good fit for several years and I write my own wrapper routines around this to make this easier. 71 00:15:29.129 --> 00:15:43.019 So so any case what we have here, we've got to factor device factor is going to be hosted device. You got a device factor 3 elements, and you can initialize it here. 72 00:15:43.019 --> 00:15:47.938 That's going to be very slow accessing device, make yourself available, but. 73 00:15:47.938 --> 00:15:59.578 Doesn't matter you're doing it once here. So then we've got the beginning and ending for factor and we're going to create a new virtual vector, which is a factor with each element negated. 74 00:15:59.578 --> 00:16:05.009 Well, there's a unitary built in function called negate takes an argument and the Gates, it. 75 00:16:05.009 --> 00:16:18.178 And then we do a transform later, so we have the to begin and end effect. We make a which takes another and a urinary function and creates a new area to set up this new beginning to end here. 76 00:16:18.178 --> 00:16:31.379 And what their type is, we're not going to talk about. It's a horrible mess. So, if you construct this by saying auto ashes, so I auto was invented. So, now, begin and end are 2 here. 77 00:16:31.379 --> 00:16:36.479 That you read them out and they're going to read the negation of factor. 78 00:16:36.479 --> 00:16:41.068 And we can do it, and we can compose our functions and do. 79 00:16:41.068 --> 00:16:52.499 Begin Anette do a reduction let's say now by itself this example is quite simple, but there's more complicated ways. Powerful. So, plus counting and constant. 80 00:16:52.499 --> 00:16:55.739 Okay, a separator. 81 00:16:55.739 --> 00:17:05.519 Solves the problem conceptually, if you got 63 dimensional points, the conceptually nice way is each points a structure of 3 floats and you. 82 00:17:05.519 --> 00:17:18.328 And you'd want to say, I have an array of those structures flow threes. The problem with that is it doesn't call the axes and whys and sees and it actually. 83 00:17:18.804 --> 00:17:29.364 It doesn't play nicely with the cash that's accessing the global memory on the GPU. So, instead of having an array of structures with you, it's better to have a structure of a raise for efficiency purposes. 84 00:17:29.364 --> 00:17:35.243 So you have an array of exit array of Y, and Z and you've got a structure as as a race to vectors. 85 00:17:36.628 --> 00:17:44.878 But that's fast, but it's still conceptually better. It'd be nice to have the points as X. Y, Z triples. 86 00:17:44.878 --> 00:17:48.388 So, separate the zip converts. 87 00:17:48.388 --> 00:17:53.098 From the efficient lay out to the nice layout. 88 00:17:53.098 --> 00:17:56.969 And so it takes an array of. 89 00:17:58.138 --> 00:18:03.058 Well, this example, here you got an array of ABC, whatever you have. X Y, Z. 90 00:18:03.058 --> 00:18:16.528 Structure of those 2 arrays and produce as an editor produces an iterate or 2 virtual pair. So these corresponding errors. So now you can access this. 91 00:18:16.528 --> 00:18:21.269 You can access the pairs triples, whatever that are the nice. 92 00:18:21.269 --> 00:18:24.358 Way to think about the nice abstraction for the data. 93 00:18:24.358 --> 00:18:27.929 Seamless the separator is it's free to write. 94 00:18:27.929 --> 00:18:36.388 So, you can take, you can reference it better idea, get elements here, these 2 balls and now you can write into. 95 00:18:36.388 --> 00:18:44.159 Elements of the 2 balls, and this will be writing back and modifying the original factor. So this is I read right. 96 00:18:44.159 --> 00:18:48.689 Generator so powerful an example. 97 00:18:49.828 --> 00:18:59.368 Here we have a vector of managers 1020, 30 a factor of characters X Y, Z and the 2 device factors on care. 98 00:18:59.368 --> 00:19:07.019 And then being factors they have begin an editor, is that you access as a doc begin. 99 00:19:07.019 --> 00:19:11.338 A dot end with parentheses and so on. And now. 100 00:19:11.338 --> 00:19:16.679 So begin points as a start 1st, day, beat up, begin points to the 1st, the. 101 00:19:16.679 --> 00:19:21.749 I can make those into a 2 ball a, to polish this to. 102 00:19:21.749 --> 00:19:25.199 It's, it's to variables. 103 00:19:25.199 --> 00:19:30.538 And then the nice thing was a triple 2 variables can be different types. 104 00:19:30.538 --> 00:19:35.608 And but the size is fixed so this would be a pair. 105 00:19:35.608 --> 00:19:44.159 And you could also use, make pair would be the same thing. We'll make 2. so we have a twofold, which is to begin integrators another 2 bullets begin. 106 00:19:44.159 --> 00:19:53.788 And it ended our writers, and then we trance, we take the triple the 2 begin, and we make it into. Is it better? I personally think this is. 107 00:19:53.788 --> 00:19:58.169 I don't see why this has to be explicit, but I didn't design for us. 108 00:19:58.169 --> 00:20:05.308 But any case begin and and pointing to a vector of pupils. 109 00:20:05.308 --> 00:20:11.219 Vector pairs and you can give reference them begin. 0. 110 00:20:11.219 --> 00:20:17.939 Is a 2 pull off 10 X and so on begin 12 plus 20 and why that sort of thing. 111 00:20:19.138 --> 00:20:22.378 Now, you got this here virtual. 112 00:20:22.378 --> 00:20:26.489 Funding to this virtual vector of pears vector of. 113 00:20:26.489 --> 00:20:30.959 And you can now consult, you can not do things like. 114 00:20:30.959 --> 00:20:36.449 We can, we can do something like, define a maximum. 115 00:20:36.449 --> 00:20:40.558 Function which takes. 116 00:20:40.558 --> 00:20:49.169 We might want to go and find the largest thing. So we defined a new event and car. 117 00:20:49.169 --> 00:20:59.308 Maximum, I'm skipping over details, I believe a binary off, which effectively takes 2 of these 2 calls and it returns. 118 00:20:59.308 --> 00:21:08.759 And create a new tool. Paul, who's who's character basically, is the larger character and larger number and where it is. 119 00:21:08.759 --> 00:21:12.479 And and then a reduce. 120 00:21:12.479 --> 00:21:21.538 Here on the binary off, we'll get the biggest to pull out where we have this custom predicate to say, who's the larger. 121 00:21:22.858 --> 00:21:33.598 And so this is reduces used to find a maximum or instead of summing everything we reduce 2 elements by returning the greater of the 2 elements let's say, and we give. 122 00:21:33.598 --> 00:21:38.159 Custom reduction function, using zip. 123 00:21:38.159 --> 00:21:42.538 So, our best practices, we try to do fusion like. 124 00:21:42.538 --> 00:21:46.739 Combine transform and reduce and zipper something. 125 00:21:46.739 --> 00:21:50.398 Structure of a race and you see simplicity sequences. 126 00:21:52.078 --> 00:22:03.659 Some examples of fusion here, because I think about functional programming is it combined 2 functions and you have a new function, which is applied to g, applied to some argument let's say. 127 00:22:03.659 --> 00:22:14.519 So, you operate on functions, just as a risk you operate on amateurs question here this pairs are a special data type for us. 128 00:22:14.519 --> 00:22:24.088 Well, it doesn't have to have to a pair with it. I would have to a duplicate up 2345 up to 9. they're N. S. T. L. 129 00:22:25.138 --> 00:22:32.098 Are they customized a little for thrust? I don't actually know the answer to that. 130 00:22:32.098 --> 00:22:39.898 The trust designers did minor customizations for things like unitary operators, binary operators for sure. 131 00:22:39.898 --> 00:22:44.669 Not even necessary I think so to answer is, I don't know. 132 00:22:46.709 --> 00:22:53.939 Now, the thing or 1 thing is to talk about is, you cannot index down a triple the zeros 1st and 2nd element. 133 00:22:53.939 --> 00:23:01.199 You can't you kind of a variable and finds a vary the element of a tool, but you can. 134 00:23:01.199 --> 00:23:05.278 Semantically that wouldn't work because each element could be a different data type. 135 00:23:05.278 --> 00:23:10.378 Different types so, what you have to do is use a fixing. If it's a pair you say. 136 00:23:10.378 --> 00:23:17.009 A, Doc 1st 8 a 2nd if it's a tool, it's a dot. 137 00:23:17.009 --> 00:23:20.398 Elements Sarah dot element 1 or something. 138 00:23:20.398 --> 00:23:24.719 And the member thing is no, but at compile time. 139 00:23:24.719 --> 00:23:33.118 The only way you could enter your way down a tube at runtime is by using something like a switch statement. 140 00:23:33.118 --> 00:23:37.528 So, okay. 141 00:23:37.528 --> 00:23:50.699 Okay, so here's another thing example. Okay, so we have a class square where we've host device means it's being compiled twice once to running the host wants to run out of the device. 142 00:23:50.699 --> 00:23:58.828 By the way latest version of the compiler, some of this is done automatically the compiler tells where it's going to be run. 143 00:23:58.828 --> 00:24:06.838 But, okay, and we overload the operator per and we've overloaded it to. 144 00:24:06.838 --> 00:24:11.189 Do a square and so okay. 145 00:24:11.189 --> 00:24:14.669 So, this here is not going to find the length of a vector. 146 00:24:14.669 --> 00:24:18.568 By squaring element, summing them and taking the square root of the sum. 147 00:24:18.568 --> 00:24:32.519 Okay, so and this is a really solid thing. Now, by the way these scratch backwards, you can find the size and so on, just like with scale factors. So this creates a temporary device factor. 148 00:24:32.519 --> 00:24:40.709 Transforms access with the square H, element gets squared and writes the squared elements into the temp factor. 149 00:24:40.709 --> 00:24:50.219 And then we take the 10 factor, we reduce it with default reduction operator. That's edition, sum it up and take this reduce returns the result. And. 150 00:24:50.219 --> 00:24:54.148 Does and square roots so this, this is the. 151 00:24:54.148 --> 00:24:57.358 Inefficient version 1. 152 00:24:57.358 --> 00:25:00.598 To find the link to a vector of arbitrary length. 153 00:25:02.308 --> 00:25:07.259 It's bad because we got this temporary factor we've got to write into and then read from. 154 00:25:07.259 --> 00:25:11.878 And this 1 on this, say on the. 155 00:25:11.878 --> 00:25:16.858 The people that they ran now is 4 times faster. So the difference is. 156 00:25:16.858 --> 00:25:31.709 What we have down here, we do a transform reduce. It's a new function. Haven't showed you yet. Does he? Obviously, it composes a transforming reduction so we input the vector X here that we wish to find a length of. 157 00:25:31.709 --> 00:25:46.019 Begin an end we take the transform the next arguments, the transformation function, which is square. And again, the syntax of this is, this is the square is a variable. 158 00:25:46.019 --> 00:25:50.489 Okay, this is weird. Now, I got it written down on the blog. 159 00:25:50.489 --> 00:26:02.909 Square is a type name, a class and square per end. It constructs a variable and it constructs an unnamed variable of class Square. 160 00:26:02.909 --> 00:26:07.979 With no arguments I mean, the constructor takes no arguments it's the default constructor. 161 00:26:07.979 --> 00:26:12.419 And so this returns a default variable of class Square. 162 00:26:12.419 --> 00:26:16.919 And the argue, so they started argument to transform reduce is a variable. 163 00:26:16.919 --> 00:26:24.239 However, transform reduced, treats it as a function puts prints and calls and so now calls the overloaded. 164 00:26:24.239 --> 00:26:30.628 Print operator. Okay. So this takes that vector X transforms each element. 165 00:26:30.628 --> 00:26:37.439 And then does a reduction adding the elements to 0 and the reduction function is plus. 166 00:26:37.439 --> 00:26:44.999 Which is an overloaded pluses, generic and destination. So the transform reduce does that Square each element. 167 00:26:44.999 --> 00:26:49.169 Some some and returns the resultant square root. So this. 168 00:26:49.169 --> 00:26:53.759 Is doing well transform reduce is a is a fusion. 169 00:26:55.648 --> 00:27:01.078 Structure of a race I mentioned array unstructured doesn't. 170 00:27:01.078 --> 00:27:05.189 It doesn't play nicely with. 171 00:27:05.189 --> 00:27:14.999 Chuck arrays is better, so, and they mentioned here again, this 3 times faster, if they do 3 D vectors as a structure for right the thing is, it's. 172 00:27:14.999 --> 00:27:19.288 From the API, from the user point of view, this is not as intuitive. 173 00:27:19.288 --> 00:27:25.739 Which is why we had the sit better. This is how you would say with the. 174 00:27:25.739 --> 00:27:35.939 1 method to work with the structure of a race it re, Z rotate flow. 3 rotates a 3 D point. 175 00:27:35.939 --> 00:27:43.199 And the angle is actually. 176 00:27:43.199 --> 00:27:47.009 Their arguments are X Y, and Z and. 177 00:27:47.009 --> 00:27:50.519 Nothing complicated there. 178 00:27:50.519 --> 00:27:55.739 So, we could call transfer, just set an example where we would transform a, um. 179 00:27:57.989 --> 00:28:11.548 A vector this is a vector of flow 3 so this is the thing that's easy to think up here, but is not going to parallelize. So well, okay, we have a vector of flow threes and we rotate each load screen. 180 00:28:13.229 --> 00:28:16.888 The 3rd argument here is to begin to the open factor. 181 00:28:16.888 --> 00:28:20.098 It says the same way to the input vector. Okay. 182 00:28:20.098 --> 00:28:23.189 Doing it faster here. 183 00:28:23.453 --> 00:28:38.334 It's a little clubs here, but it's well, the thing is, I've actually had I've actually done stuff like this in my research. What I actually do is I write other functions that takes us and encapsulate as I'm not actually writing this sort of mess. But any case, we got 3 factors X Y, and Z. 184 00:28:38.578 --> 00:28:41.939 Uh, device sector down here you see a. 185 00:28:41.939 --> 00:28:48.479 Okay, what argument's the length and that each factor has beginning and ending it. 186 00:28:48.479 --> 00:28:57.808 So, we make a 2 Paul of the 3 beginning generators, and we convert it to a zip here and again, make to pull the return. 187 00:28:57.808 --> 00:29:11.993 Class of bank, a horrible mess, but we're putting it right in as an argument to separate it. So we don't care. And again, the return argument classes the return from make separate, or is another horrible mess. 188 00:29:12.294 --> 00:29:13.823 Luckily, we don't need to worry about it. 189 00:29:14.038 --> 00:29:17.969 So, we make we make for. 190 00:29:17.969 --> 00:29:26.608 Which, which is to the 1st, virtual X Y, Z point to the end virtual X. Y, Z point. 191 00:29:26.608 --> 00:29:33.628 And and they're making the thing again to the 1st, 1, this is going to do a transform in place. 192 00:29:33.628 --> 00:29:40.288 Which you can do that sort of things, some of the time, the spec to say when you can do that. So, this whole rotate 3 points in place. So. 193 00:29:40.288 --> 00:29:45.568 Using the zip, and then rotate to Paul will take to Paul. 194 00:29:45.568 --> 00:29:55.528 Now, you have to do a little thinking here, so the exhibitor, right? Or the, that makes it better. Here. It's is it better to a triple. 195 00:29:55.528 --> 00:30:00.838 Okay, to a vector of a virtual vector of the of the. 196 00:30:01.888 --> 00:30:10.074 Flow threes. So you do reference dissipate already you get 1 flow 3 you add 1 to it. You get to the next float 3 and so on. 197 00:30:10.673 --> 00:30:18.054 So, now, what transform is going to do is it's going to de reference the, and pass the de reference thing to. 198 00:30:19.019 --> 00:30:22.439 Function to the transformation function. 199 00:30:22.439 --> 00:30:27.538 So, the transformation functions up at the top, it's rotate and it's argument. 200 00:30:27.538 --> 00:30:32.489 Its input is 1 is 1, 2 here. 201 00:30:32.489 --> 00:30:40.648 Okay, and the output, the output is another 2 bowl. So this is also showing you see outputs here can be more complex. 202 00:30:40.648 --> 00:30:54.023 Classes okay. So is this total a trust? I don't actually know and again, I don't know it would depend if the name twofold if we've done a using name space and so on. 203 00:30:54.173 --> 00:30:59.304 So, which is why I do what I'm programming in any case input. Here's the T. V. 204 00:31:00.778 --> 00:31:05.818 And we get the X Y and C here and. 205 00:31:05.818 --> 00:31:09.509 So this is an example to get elements of a, to get. 206 00:31:09.509 --> 00:31:17.068 So, notice get 01, and these are compile time ideas. Here. The get is a template. 207 00:31:17.068 --> 00:31:29.249 Okay, get is a template class there and the argument is a concept if you could not say, get on, that wouldn't compile it's get with a. 208 00:31:29.249 --> 00:31:38.098 01, or 2 or 34 whatever. Now, here I would use a reference actually not an assignment, but okay, so then we compute here. 209 00:31:38.098 --> 00:31:41.578 And then at the end. 210 00:31:41.578 --> 00:31:45.449 We take the 3 outputs and we. 211 00:31:45.449 --> 00:31:48.898 And we make a from them, and we return it. 212 00:31:48.898 --> 00:31:56.548 And C, plus plus is this idea here that if you're returning a more complex class. 213 00:31:56.548 --> 00:32:01.919 And assign immediately using it, it grabs the space, the memory space. 214 00:32:01.919 --> 00:32:07.048 Where the return returning variable is going to get written into edit. 215 00:32:07.048 --> 00:32:11.038 Directly writes into it straight here. There is no useless copies. 216 00:32:11.038 --> 00:32:15.989 So, okay, so that's using structure over rays. 217 00:32:15.989 --> 00:32:20.699 I mentioned the implicit sequences constants in it and so on. 218 00:32:22.259 --> 00:32:26.519 Here's an example where implicit sequence is used. 219 00:32:26.519 --> 00:32:37.528 Okay, what we want here is we have a vector of floats and we want to find this the minimum float and also where it is. 220 00:32:37.528 --> 00:32:43.739 Okay, so we want to know the minimum of value and index. 221 00:32:43.739 --> 00:32:47.308 And Here's how we can do it with functional programming. 222 00:32:47.308 --> 00:32:51.088 You define a new class here smaller 2 ball. 223 00:32:51.088 --> 00:32:55.828 And what it does is, it takes 2, 2 balls. 224 00:32:55.828 --> 00:33:03.209 The tools afloat and floats the value that we want to get the minimum of it and the index. 225 00:33:03.209 --> 00:33:08.219 And it says it returns the. 226 00:33:08.219 --> 00:33:13.048 Those flow value is smaller. Okay. Returns the whole. 227 00:33:15.088 --> 00:33:19.828 And again, because of this writing return values into their target. 228 00:33:19.828 --> 00:33:23.909 There's a nice acronym. C. plus plus I just can't think of the name of it at the moment. 229 00:33:23.909 --> 00:33:26.969 This is actually more efficient than you would think. 230 00:33:26.969 --> 00:33:33.269 Okay, so now we have here on dysfunction. 231 00:33:33.269 --> 00:33:38.999 Find some amendments index, so so this is a. 232 00:33:40.528 --> 00:33:46.618 Not the fastest way to do this, but okay, so that is the factor of floats. 233 00:33:46.618 --> 00:33:51.689 Coming in by reference here you see the ampersand. 234 00:33:51.689 --> 00:33:58.709 And she could identify this as a crappy way to do this. But, um. 235 00:33:58.709 --> 00:34:08.248 It's for it's mod 1 Mark 1. okay. We have a vector of the indices. Using is another new function. The sequence does the obvious thing. It just writes it. 236 00:34:08.248 --> 00:34:11.248 Incriminating sequence into its. 237 00:34:11.248 --> 00:34:16.918 Architecture, we have an initial. 238 00:34:16.918 --> 00:34:24.329 Value for the vector, we just use the zeros element and index and. 239 00:34:24.329 --> 00:34:28.349 And this and this is the output. Okay so we have here. 240 00:34:28.349 --> 00:34:31.978 All the work is done here. Okay. We have a 2 pull up the. 241 00:34:31.978 --> 00:34:38.009 Vector floats and the indices so this is a triple of afloat and it's index. 242 00:34:39.059 --> 00:34:42.088 And the, and the reduction here. 243 00:34:42.088 --> 00:34:47.039 Reduces this to 1 element, but the reduction function is a minimization. 244 00:34:47.039 --> 00:34:52.498 To the result of this is going to be a 2 pull up the smallest center. 245 00:34:52.498 --> 00:34:56.429 Though of the smallest float and its index. 246 00:34:56.429 --> 00:35:03.898 Nice and then we just return the get 1 we'll return and it's index here. 247 00:35:03.898 --> 00:35:06.898 A nice way to do it. 248 00:35:06.898 --> 00:35:14.188 Now, actually, implicit sequences you don't really actually have industries you could use accounting thing. 249 00:35:15.838 --> 00:35:20.278 But and as we do here. 250 00:35:20.278 --> 00:35:27.989 Okay, the solid is the same. The differences here are are indices. 251 00:35:27.989 --> 00:35:31.889 We just do accounting. Okay here. 252 00:35:33.088 --> 00:35:36.568 So, we don't even we just have for our. 253 00:35:36.568 --> 00:35:41.159 Sequence 012 and we just use them up. Here. You see. 254 00:35:41.159 --> 00:35:44.668 Begin and are our. 255 00:35:44.668 --> 00:35:48.719 Are our smart pointing to this? 256 00:35:48.719 --> 00:35:54.418 Outing discounting sequence here, which is never stored unlike the previous slides. So. 257 00:35:54.418 --> 00:35:57.539 An example of using these fancy here. 258 00:35:59.039 --> 00:36:06.838 And again, this and trust this whole thing, it's going to be work efficient and time if they work efficient. 259 00:36:06.838 --> 00:36:11.608 It will do a linear a number of operations in total over all the. 260 00:36:11.608 --> 00:36:15.389 Course, and its actual time is going to be logged in. 261 00:36:15.389 --> 00:36:18.748 So, nice. 262 00:36:18.748 --> 00:36:24.389 There weren't any efficient things we saw, like, 2 days ago. The thing might be really. 263 00:36:24.389 --> 00:36:32.489 Using a lot more course, and it has to it might be doing and log in course or something or it just says uses end. Course total. 264 00:36:32.489 --> 00:36:41.009 If you're using more cores and you need to more, could, of course, are you preventing something else from using those cards? And maybe something else wants to run on the machine. 265 00:36:41.009 --> 00:36:45.418 Okay, so reach, so we're doing fusion and okay. 266 00:36:45.418 --> 00:36:51.059 Now, these are on a 10 year old machine, but so the numbers will be different. But the idea is still works. 267 00:36:51.059 --> 00:36:54.418 Okay. 268 00:36:54.418 --> 00:37:04.739 So that was the last Stanford slide and what we were seeing here was some and some initial paradigms using thrust but the concepts. 269 00:37:04.739 --> 00:37:13.583 Also generalize so now other documentation and videos locations have changed. 270 00:37:13.943 --> 00:37:27.864 These are what I think are some somewhat recent things here and various got these 2 packages. There's 1, there's Coda and there is high performance computing package. 271 00:37:27.893 --> 00:37:32.063 Each of them has code, or they've got different versions of cross. Some of them have trust without the. 272 00:37:34.108 --> 00:37:38.608 Without the examples they now have the get hub thing up here. 273 00:37:38.608 --> 00:37:46.498 This is they changed the fairly recently, but it has thrust itself and it has other things like. 274 00:37:46.498 --> 00:37:52.018 Examples which documentation as I think sort of law actually. 275 00:37:52.018 --> 00:37:56.639 In any case, and they're updating it fairly fast. 276 00:37:56.639 --> 00:37:59.938 I'll show you some examples. Okay. 277 00:38:02.849 --> 00:38:10.559 And so they moved around some of their public facing websites, have updated some of the pointers, but not all of them. 278 00:38:10.559 --> 00:38:18.900 Here okay, this is not in the get how this isn't talks at NVIDIA dot com. So this is, I guess your most detailed. 279 00:38:18.900 --> 00:38:31.260 Description of of trust. So again, it's like the and video documentation. It's a program. It's I used this is authoritative and complete. 280 00:38:31.260 --> 00:38:34.409 It is however dry. 281 00:38:34.409 --> 00:38:40.349 But okay static dispatch. Oh, okay. So you can browse through here all things. 282 00:38:40.349 --> 00:38:46.679 That's theatre a, there's okay additional resources and so on. 283 00:38:46.679 --> 00:38:52.889 Okay, the older thing. 284 00:38:54.090 --> 00:38:58.739 This is generated by Docs. I actually hate OXY Jen, but that's just me. 285 00:38:58.739 --> 00:39:03.000 You can go through here and get. 286 00:39:03.000 --> 00:39:07.050 Browse through and get stuff and so on. 287 00:39:07.050 --> 00:39:10.079 And generate a. 288 00:39:11.880 --> 00:39:26.550 Okay, you can look at that tutorials online. There's a large lots of stuff I showed you, but you don't mention the latest things and they don't use the latest programming styles, like overloading operator per and it's obsolete. I would do a Lambda. 289 00:39:26.550 --> 00:39:33.809 Or even in place notation. Okay. And I have stuff on parallel in down in here. So. 290 00:39:33.809 --> 00:39:41.489 Other other, I mentioned other alternatives, which are lower level, but pasture and harder to use. Trust is current. 291 00:39:41.489 --> 00:39:44.550 Um, and. 292 00:39:44.550 --> 00:39:52.829 Again, it's fast because it does compile time dispatching and because this. 293 00:39:52.829 --> 00:40:00.449 Map produce programming model. Let me just shrink this so I can keep watch on the chat window. 294 00:40:01.045 --> 00:40:15.025 Okay, it's fast now, here's the thing. I mean, this is this took me a while to figure out you got reduced and transform. They take a function, which does the reduction, or the transformation for reduction. It's a function with 1 argument a uterus. 295 00:40:15.054 --> 00:40:20.094 If a transform, it's a function, it's a binary function. Look like this and. 296 00:40:20.875 --> 00:40:33.534 Here's the thing is, what looks like a variable is a class and the definition of the class overloads brand and here, but this is not the overloaded print. In this case. 297 00:40:33.775 --> 00:40:47.184 This is constructing a variable of that class using the default constructor, which takes no arguments. You could give an argument here if you wanted actually like the threshold that in an example. 298 00:40:47.429 --> 00:40:53.880 Okay, and you can create variables of that class and then use food. 299 00:40:53.880 --> 00:40:57.389 And in the argument, you'd say food, not food friends. So. 300 00:40:57.389 --> 00:41:03.630 And the nice thing about this is it gets inserted in place and the optimizing compile or just optimizes it. 301 00:41:04.949 --> 00:41:08.849 And we can see crazy things sometimes. 302 00:41:08.849 --> 00:41:23.670 You'll see something that there will be an argument in the reduction thing and this is so this again, this actually factors a class and this is constructing a variable of that class. But what the constructor takes an argument a. 303 00:41:23.670 --> 00:41:31.949 Now, this is a way to get you, it's the only easy way for you to get information into this. 304 00:41:31.949 --> 00:41:38.909 Reduction function because you don't have control over the arguments when it's called. It's called with the element. 305 00:41:38.909 --> 00:41:45.420 Doing the reduction on, but this is but this store's local information in the variable that gets constructed here. 306 00:41:46.440 --> 00:41:53.429 So, it's a closure, so I've actually found 2 bugs in thrust and. 307 00:41:53.429 --> 00:42:06.840 So, 1 of my, I'm annoyed at because I knew about it, but just hadn't had kept it private. I found NBC. This is crazy. I found a bug that put the compiler into an infinite loop. 308 00:42:06.840 --> 00:42:11.820 The compiler, you know, that's it happen that often, but yeah, well, I break stuff. 309 00:42:11.820 --> 00:42:17.940 There are some videos here you can look at could have cast. 310 00:42:19.585 --> 00:42:26.605 And, yeah, they've got a lot of nice stuff here, so a lot of stuff on could I NVIDIA. 311 00:42:26.605 --> 00:42:38.545 So if you like, watching somebody else talk instead of me, good at tutorials, thrust library and trial, blah, blah, blah, and maybe better speakers and me. So, you can have fun rousing around here if you like. 312 00:42:38.940 --> 00:42:44.340 Okay. 313 00:42:44.340 --> 00:42:56.250 I don't want him to go through some more another part of a slide that I'm talking about trust. And then what I'll do is, I'll walk through some examples. 314 00:42:56.250 --> 00:43:01.500 To show you crazy things. So I'll be teaching, by example 1st. 315 00:43:01.500 --> 00:43:09.929 Trust by GGC is a gpo technology. Com again, it's 10 years old, 11 years old, but this part of thrust hasn't changed. So. 316 00:43:09.929 --> 00:43:14.369 Okay, the new stuff hasn't worked its way into a lot of the tutorials. So that's. 317 00:43:14.369 --> 00:43:23.610 Annoying okay this is this man index thing. Um. 318 00:43:23.610 --> 00:43:28.139 Best practices you're saying, okay. 319 00:43:28.139 --> 00:43:31.769 So here is using. 320 00:43:31.769 --> 00:43:45.840 The trust to do a bucket sort 2 different ways. I think so the bucket sort again as actually his bucket sorts and my research as almost my favorite data structure. 321 00:43:45.840 --> 00:43:57.780 Um, because it's, it's better than people think it is. I've been using this for decades. Okay. So what we have here is that we have points here that the colored points, and. 322 00:43:57.780 --> 00:44:02.250 We want to impose a 3 by 3 grid over the points. 323 00:44:02.250 --> 00:44:07.650 And then we want to, for each point, determine what grid sell it send. 324 00:44:07.650 --> 00:44:19.409 Now, this is nice, for example, for doing intersections of objects because of 2 are are finding close points because of 2 points are very close. Then they're either in the same. 325 00:44:19.409 --> 00:44:22.829 Grids pocket or they're in adjacent pockets. Okay. 326 00:44:24.119 --> 00:44:28.380 In any case, how do we do this? This bucket sort now? 327 00:44:30.474 --> 00:44:41.905 If you were thinking how to do this, and you've been to 1 I being a little unfair, but it's such an easy target. 328 00:44:43.344 --> 00:44:50.034 They would probably teach you to create a square array of, like, lists. 329 00:44:50.309 --> 00:45:00.900 Okay, so you have so here, I would have 9 linked lists and as I read, in each point, I determine what cell it's in, what pocket it's and. 330 00:45:00.900 --> 00:45:04.139 And we append it to the link list. 331 00:45:04.139 --> 00:45:17.070 Instead of link lists you could, if you've learned in vectors, you could create here a square array of 9 vectors. And as you get each point, you do a push back onto the vector. 332 00:45:17.070 --> 00:45:28.559 They're horrible methods. Both of them. Let me tell you why the link list. 1st, the pointers take as much space as the data. You've just doubled the amount of. 333 00:45:28.559 --> 00:45:37.559 Memory and the 2nd thing is that you're allocating dotted pairs as used to be called pairs for the. 334 00:45:37.559 --> 00:45:42.269 Element and pointer to next year. 335 00:45:42.269 --> 00:45:47.429 Stop renewing stuff on the heap and, as I said before. 336 00:45:47.429 --> 00:45:55.260 Lots of small heap operations are bad. It's Super linear time. The more you do the more time each construction takes. 337 00:45:55.260 --> 00:46:00.000 Fragments the memory you're doing, pointer, you got the legacy Porter chasing. 338 00:46:00.000 --> 00:46:03.539 And is horrible. 339 00:46:03.539 --> 00:46:11.099 That's like lists are bad. Also it's English and you cannot find the element in the list. It takes plenty of your time. You got to step your way down the list. 340 00:46:11.099 --> 00:46:21.750 Okay, that's why I don't like link why I don't like the array of vectors is that there's an overhead associated with each factor. 341 00:46:21.750 --> 00:46:31.530 Okay, that there's a header information and the 2nd thing is, you're pushing pushing back onto the factors and you do a push back onto a vector. 342 00:46:31.530 --> 00:46:36.210 Well, when they're constructed, they have a little scratch base at the end just. 343 00:46:36.210 --> 00:46:48.269 To anticipate push backs, they can use up to scratch base. You then have to construct a whole new vector. That's perhaps twice as big or 50% bigger a copy of the thing over. So, again, that's. 344 00:46:48.775 --> 00:47:03.625 That's horrible that with you on the heap, so you're doing it and I don't just have 3 by 3. I call these uniform grades. I might have a 2000 by 2000 uniform grid, which would be 4Million. I might have a 3 D, uniform grid might be a 1000 by a 1000 by a 1000. that's a 1Million cells. 345 00:47:03.625 --> 00:47:07.914 That would be a 2Billion cells. Not a 1Million. That would be a 1Billion factors. 346 00:47:10.494 --> 00:47:23.485 It's horrible way to treat your poor global memory space and also things construct doing modifications of global Mary constructions. They're not parallelizable. That has to be sequential. 347 00:47:23.485 --> 00:47:25.554 So, I don't like either of these. 348 00:47:25.800 --> 00:47:33.059 So, we're going to do here with the output we're going to have is some sort of a ragged array. We're going to have the elements. 349 00:47:34.170 --> 00:47:40.710 Sorted by cell number with a what's called a dope vector pointing to the start of each sales points. 350 00:47:40.710 --> 00:47:46.199 Okay, so there's 2 different ways you can do this. 351 00:47:46.199 --> 00:47:58.019 What we're going to do here is for each point, so this does use extra memory. So it's actually editing the complaint I had about link listen double. So this will also double the memory. 352 00:47:58.019 --> 00:48:06.690 So so that complaint ahead what link? This doesn't not actually valve. This will have the same problem. For each point. I'll create an order pair of. 353 00:48:06.690 --> 00:48:09.840 Point number at the points. 354 00:48:09.840 --> 00:48:20.250 Whatever its coordinates and the cell number it's in, and I'll start those pairs by cell number and that will coalesce that will bring together all the points in each cell. 355 00:48:21.869 --> 00:48:27.239 And then I can do actually some other things to get the starter beach the points in each cell. 356 00:48:27.239 --> 00:48:31.769 That's 1 way to do it. 357 00:48:31.769 --> 00:48:39.300 Okay, now, what they're doing here is, okay, the thing with this random points. 358 00:48:40.559 --> 00:48:47.909 And the random and uniforms, so different buckets, different cells are different numbers of points. And in fact. 359 00:48:47.909 --> 00:49:02.010 It's random and a number of points in each cell. It's a, it's a random variable policy distribution so, 1 of using my probability class I'm combining the 2 classes now so the size of each cell is across all random variable. 360 00:49:02.094 --> 00:49:10.974 And and in practice, it's actually worse than that somewhat. Because there may be some underlying structure on the points. There's a policy correlation between point. 361 00:49:11.155 --> 00:49:23.425 So you'll actually have the biggest sales biggest cells of the most number of points will be worse than the pass on distribution. Would suggest so, what you might do, 1st, is make a pass through all of the points. 362 00:49:23.934 --> 00:49:38.695 And count the number of points in each bucket. So we create a factor of whose length as a number of cells, and we just take our vector points, reach vector transformation for each factor. 363 00:49:38.695 --> 00:49:47.454 We find the salads and the buckets and incremental counter for that bucket. So, at the end of this, we'll have a vector of calendars for the number of points in each bucket. 364 00:49:47.730 --> 00:49:58.800 So those issues here, of course, these calendars have to be cereal, you know, this has to be serialized with a topic operation when you're implementing a calendar, or that'd be a thrust analog to that. Okay. 365 00:49:58.800 --> 00:50:03.090 And now, once we've got the number point in each pocket, we can now. 366 00:50:03.090 --> 00:50:12.719 Treat this as a run as, and we can do a decode run length and now produce a vector, which will be big enough to hold the points for each bucket. 367 00:50:12.719 --> 00:50:22.949 Okay, we're generating random, but I don't much care. It's fast what to tell probably all the classes. If you care about quality of random numbers. 368 00:50:22.949 --> 00:50:29.610 Don't try to cook your own random algorithm. You'll do it wrong. Don't try to use 1 from a very old. 369 00:50:29.610 --> 00:50:33.360 You know, library, because they might have done it wrong. So. 370 00:50:33.360 --> 00:50:40.230 Too long don't read to something called a method, which is actually very good. 371 00:50:40.230 --> 00:50:47.820 Okay, so computing the size of each bucket, I mentioned. 372 00:50:47.820 --> 00:50:52.500 1 method, we'll see to here creating random points. 373 00:50:52.500 --> 00:50:56.639 They're stipulating that the built in around function is good enough. 374 00:50:56.639 --> 00:51:01.590 And it's returning an editor and so. 375 00:51:02.820 --> 00:51:12.659 Makes random parents of points waves that built in random number generators can fail. This 1 used to be popular called linear congruent shell. 376 00:51:12.659 --> 00:51:18.539 And what you do is you take your random, it creates this suit of random. 377 00:51:18.539 --> 00:51:27.239 Sequence of floats you take your old fault multiplied by some concepts and do it. My 1. the problem with it is if you used. Oops. Sorry. 378 00:51:27.239 --> 00:51:31.110 Go back up here where go. 379 00:51:31.110 --> 00:51:35.280 Here we go was the senior consequential method. The problem. 380 00:51:35.280 --> 00:51:44.219 Was that if you took pairs of numbers like this and made them into points, and you plotted them, the points would fall on a small number of parallel lines. 381 00:51:44.219 --> 00:51:47.400 And for many applications, that's not random enough. 382 00:51:47.400 --> 00:51:54.210 But so I don't know what this does, but this is a warning with some built in random number of things, create a factor of. 383 00:51:54.210 --> 00:51:59.519 Numbers and generate just. 384 00:51:59.519 --> 00:52:02.820 Um, basically, it. 385 00:52:02.820 --> 00:52:07.530 Takes a function, which takes no arguments and just. 386 00:52:07.530 --> 00:52:11.880 Calls it end times in parallel and just generate some. 387 00:52:11.880 --> 00:52:16.230 Vector about puts them the 3rd arguments here. 388 00:52:16.230 --> 00:52:20.219 This has been done on the host and copied to the device. So. 389 00:52:20.219 --> 00:52:27.000 We don't care if it's sequential or Rand here. There's an issue that. 390 00:52:27.000 --> 00:52:31.349 Can this be called in parallel, which we're not talking about. 391 00:52:31.349 --> 00:52:35.190 Oh, they're talking about it here so. 392 00:52:36.445 --> 00:52:45.355 And I'm going to skip this a little, but we'll go through it very quickly. So they've got R. N. 393 00:52:45.355 --> 00:52:54.534 g as an engine for generating random numbers and it's a class and you can do things to it and call it and pull out random numbers. 394 00:52:55.019 --> 00:53:01.289 An interesting idea here is we discard a number of elements at the start of it so. 395 00:53:01.289 --> 00:53:05.429 And you can call it whatever. Oh, okay. 396 00:53:06.750 --> 00:53:12.929 That slides not so interesting another way to do it. 397 00:53:15.510 --> 00:53:25.710 Hash index is the I want the ice number hashing just as a pseudo randomization. This is okay here. That's not so interesting. 398 00:53:25.710 --> 00:53:31.590 Or they are mentioned back here, so this device thing, it's doing a hash a hash. 399 00:53:31.590 --> 00:53:39.900 It just a parallel will function so this is pseudo. Right? So if your hash is good enough, then this is a nice way to create sit around and numbers and parallel. 400 00:53:39.900 --> 00:53:47.730 And what this will do is, I'll create the random number, which is nice. You don't have to count up from Sarah, just create the ice random number, right away in constant time. 401 00:53:47.730 --> 00:53:53.550 What I like for hashing function this thing like, this is actually a cryptographic cash, like empty 5. 402 00:53:53.550 --> 00:53:57.119 Okay. 403 00:53:57.119 --> 00:54:02.280 Great random point. That's not so interesting. And skip that. 404 00:54:02.280 --> 00:54:06.719 Um, yeah, I were to generated. 405 00:54:06.719 --> 00:54:12.210 Skip that somewhat. Okay so we have random points. 1 accounts the size of each bucket. 406 00:54:12.210 --> 00:54:25.829 Okay, so they're going to show Here's 1 I showed you 1 where you had a global vector of counts and you just add it into it. Here's a different way. 407 00:54:25.829 --> 00:54:31.380 What we're going to do is we have the random points. 408 00:54:31.380 --> 00:54:43.769 We can fit the bucket index for each point. That's just a map function transform and we sort it by pocket index. So, now what we've done is we've brought all of the indices. 409 00:54:43.769 --> 00:54:47.280 So that for the same cell together. 410 00:54:47.280 --> 00:54:52.170 And then we do a count, which will be a segmented scan. 411 00:54:52.170 --> 00:54:57.389 So the application segmented scan, so this sounds like a crazy way to. 412 00:54:57.389 --> 00:55:03.480 Get the histogram it's only a histogram okay. To sort the indices. 413 00:55:03.480 --> 00:55:10.590 And, yeah, and the sorts going to be the slowest part of this, but. 414 00:55:10.590 --> 00:55:15.659 It's an asset, so and it parallelize is. 415 00:55:15.659 --> 00:55:21.119 Okay, I can't say that, but all of these things came nothing here. 416 00:55:21.119 --> 00:55:27.000 Oh, okay. So how are we going to do this? 417 00:55:28.139 --> 00:55:37.019 I get this forget that. It's not interesting. Not interesting. Not interesting. Not interesting. 418 00:55:37.019 --> 00:55:44.159 Integer hash fast as little quality. If the little quality hash, it could be a good ash, but doing it. Right. Is difficult. 419 00:55:44.159 --> 00:55:50.280 Bucket index of each point that's a mod function. So. 420 00:55:51.900 --> 00:55:55.829 Um. 421 00:55:55.829 --> 00:56:00.449 While we're assuming it's a W by H grid here. 422 00:56:00.449 --> 00:56:05.369 And it's effective and afloat, so. 423 00:56:05.369 --> 00:56:09.690 We're scaling the thing to be the. 424 00:56:09.690 --> 00:56:19.380 So the floats from 0 to 1 so we're scaling it because it's W, by sales ready to. And this gives us the X and Y. 425 00:56:19.380 --> 00:56:24.960 Cell number, and we're converting the X and Y, pair down to a scaler. 426 00:56:24.960 --> 00:56:29.369 So, now this is a 1 dimensional index for. 427 00:56:29.369 --> 00:56:33.599 The bucket from the 2 dimensional point. 428 00:56:33.599 --> 00:56:38.159 Okay. 429 00:56:38.159 --> 00:56:44.909 How we do that, we have our vector of points input and output factor of indices, which is just. 430 00:56:44.909 --> 00:56:52.199 On the device factor here and the point to pocket index thing computes the 1 D index from each point transform. 431 00:56:52.199 --> 00:56:55.590 It's in parallel. Okay. 432 00:56:55.590 --> 00:57:02.070 And this is a small sale side of a 1000 by a 1000 or something, or by a 1000 maybe. 433 00:57:02.070 --> 00:57:09.719 I start scales up that's why I work with 1Billion element itself and so on now, we sort it and sort by key. 434 00:57:09.719 --> 00:57:16.710 Does the obvious saying sort by key, assumes that you have a well. 435 00:57:18.119 --> 00:57:21.659 Um, well, let's say, okay. 436 00:57:21.659 --> 00:57:35.670 Back up, sort by key assumes that you've got a vector of things that you want to sort like points and a 2nd, factor of keys that you want to use as the index to sort by. 437 00:57:35.670 --> 00:57:44.429 So, it's a structure of a raise array of indices and an array of points and it sorts those pair, those virtual pairs. 438 00:57:44.429 --> 00:57:58.530 By the index, but as its moving the indices around, it's also moving the points around. Okay. So this is an in place thing and I think and at the end of this indices have been transformed. So now counting up. 439 00:57:58.530 --> 00:58:03.449 And points have been transformed to match the permitted indices. 440 00:58:03.449 --> 00:58:15.059 Okay, and if it's 3 points are the same index, let's say, then the index will be repeated 3 times. And then at the corresponding point. 441 00:58:15.059 --> 00:58:28.860 In points, there'll be those 3 points there in that bucket sort by key useful function. It will probably be a Radix sort, which takes linear time. Erratic sort is used when the key. 442 00:58:28.860 --> 00:58:33.360 Is something simple like a or float? 443 00:58:33.360 --> 00:58:37.710 If the key was something more complicated, like a character string. 444 00:58:37.710 --> 00:58:44.400 They then they would fall back to a quick sort, which is log in time, but to sort by key is. 445 00:58:44.400 --> 00:58:55.800 Linear time, the only thing is it does need a work array of size equal to the indices array. It basically its workspaces double the space of the data. 446 00:58:55.800 --> 00:58:59.760 But me okay. 447 00:58:59.760 --> 00:59:04.380 This is a reason this thing might actually fail. 448 00:59:04.380 --> 00:59:12.869 Unexpectedly, and you think you've got enough data memory, but you don't because sort biking needs this worker, right? 449 00:59:12.869 --> 00:59:17.670 Which is not well documented, at least in the past was not document it at all. 450 00:59:17.670 --> 00:59:27.150 Okay, so this gets interesting. Now, we're starting. This will be a new idea. May take a 2nd or 2 minute to think about. 451 00:59:27.150 --> 00:59:30.510 So, reduce by key. 452 00:59:30.510 --> 00:59:34.889 So this assumes we've already started it. 453 00:59:37.079 --> 00:59:41.130 And what reduced by key will do. 454 00:59:41.130 --> 00:59:45.119 This is. 455 00:59:45.119 --> 00:59:49.110 What we have down here. 456 00:59:50.849 --> 01:00:00.480 It takes the, it takes the vector of keys reduced by keys essentially doing a run length and coding. 457 01:00:00.480 --> 01:00:11.760 So well, reduced by keys exactly. Doing a run length and coding. So reduce so run length and coding. We have a vector of whatever. 458 01:00:11.760 --> 01:00:20.280 Elements keys and and you and they're in runs. Okay and you want to count the length of each run. 459 01:00:20.280 --> 01:00:32.639 And also what the value is so show, you invites out. So here, look at the comment here keys. This is our input data. 460 01:00:32.639 --> 01:00:36.119 And what we want to do is. 461 01:00:36.119 --> 01:00:40.199 So, look at the runs, we've got to run of 2 zeros. 462 01:00:40.199 --> 01:00:46.650 Right of 1 to run a 3 threes. 1. 463 01:00:46.650 --> 01:00:50.130 And there's no, 1 in here, there's no 5 in here and so on. 464 01:00:50.130 --> 01:00:54.480 So, the output from this will be 2 vectors down here. 465 01:00:54.480 --> 01:01:05.760 Output keys and I'll put so usually I'll put keys just once each. Okay. 02345 and this is how often each 1 occurred. So the 0 wasn't a run up twos arrows. 466 01:01:05.760 --> 01:01:10.949 The 2 was in a run of 1 to the 3 was in a run of 3 threes. 467 01:01:10.949 --> 01:01:14.610 We, the 4 there was 1, 4, 1, 6 and 1 7. 468 01:01:14.610 --> 01:01:20.639 And you notice always not want to no 5 here. So, this so they reduced by key does a run length and coding. 469 01:01:21.809 --> 01:01:24.989 Um, the inputs are the. 470 01:01:27.750 --> 01:01:32.489 And the inputs, the way it does this. 471 01:01:32.489 --> 01:01:37.110 Well, it does more than just being unfair to say it was a. 472 01:01:38.400 --> 01:01:45.000 Run nights and coding it does more than that. The example here it will do that. It's actually more powerful than that. 473 01:01:45.000 --> 01:01:51.840 Because what it's actually doing it is another input to it, which is this. 474 01:01:51.840 --> 01:01:55.500 Value factor, which is a fact constant. A factor of one's. 475 01:01:55.500 --> 01:01:59.280 And what reduced by key actually does. 476 01:01:59.280 --> 01:02:02.849 Is it does a reduction over each run? 477 01:02:02.849 --> 01:02:15.480 So doesn't actually count the number of zeros it sums up the values factor over the run of zeros here. The values are all 1. so if that's doing account, but we could have more complex things in here. 478 01:02:15.480 --> 01:02:20.130 We got the 3 threes. It's not counting a number of these. It's. 479 01:02:20.130 --> 01:02:27.989 It's reducing the values factor over the run of threes, the value sector, being all 1 reduction operator being in addition. 480 01:02:27.989 --> 01:02:37.289 It turns out it's doing accounting, but reduced by keys a more powerful concept that we're just using here to do run nights and coding. So we take the. 481 01:02:37.289 --> 01:02:42.179 Indices those are the keys would that have the runs by the way? The key. 482 01:02:42.179 --> 01:02:45.329 The reason is runs in the keys that we've done, the sort. 483 01:02:45.329 --> 01:02:53.635 So, we brought together all of the points and keys in each pocket. Okay. So that's why there are runs. Otherwise it probably would be. 0. 484 01:02:54.204 --> 01:03:01.135 okay, so keys, these are the bucket indices up here beginning and then there are the values that we want to reduce and these are just. 485 01:03:01.349 --> 01:03:05.730 All 1 example again of these fancy. 486 01:03:05.730 --> 01:03:09.960 And then and then the output. 487 01:03:09.960 --> 01:03:14.340 Are the. 488 01:03:16.469 --> 01:03:23.489 Are going to be basically the. 489 01:03:26.369 --> 01:03:32.820 Is going is going to be the output. I guess I'm getting myself confused now, but. 490 01:03:36.539 --> 01:03:45.329 No, there's 2 outputs here the indices. So these are the values in each run and this is how long each run was the indices and the sizes. 491 01:03:45.329 --> 01:03:48.329 Okay, so you had to construct these. 492 01:03:48.329 --> 01:03:53.550 I'm factors 1st, well, indices is actually being reduced in place. 493 01:03:53.550 --> 01:03:59.849 So, you'd have to check the documentation reduced likely to see that. You could actually overwrite bucket industries. 494 01:03:59.849 --> 01:04:03.659 You can only because reduced by key was because. 495 01:04:03.659 --> 01:04:12.150 Coded carefully so effectively reducing the constant generator over each bucket. 496 01:04:12.150 --> 01:04:19.380 Yes, that is correct. 497 01:04:19.380 --> 01:04:23.699 The comment yes. Okay. 498 01:04:24.989 --> 01:04:31.590 Example of concept. Okay. And again, this will take long time. This will. 499 01:04:34.795 --> 01:04:41.815 It's quite nice actually, how it can do reduce so there's still the reduced by key. It's a variance of this segment and reduce. 500 01:04:41.815 --> 01:04:50.545 We're not reducing the whole vector where what we have is a string of little sub factors where this all strung together. 501 01:04:50.730 --> 01:04:55.980 And they are delivered by the fact we have these keys and we reduce each sub factor. 502 01:04:55.980 --> 01:05:00.030 Which are all variable length of course. Okay. 503 01:05:02.969 --> 01:05:06.000 What we're doing here. 504 01:05:08.190 --> 01:05:14.909 And if I go back a page, okay, so now we want to find to start of each bucket. 505 01:05:14.909 --> 01:05:26.969 Like, the start of pocket for starter bucket 7 or something, but whereas a 4 and here, whereas the 7 notebook piece where to find them what they're doing here is a binary search. 506 01:05:26.969 --> 01:05:31.590 On each possible key value to find where it is and output keys. 507 01:05:31.590 --> 01:05:38.250 So, and so lower bound. 508 01:05:38.250 --> 01:05:43.409 Well, basically, I'm going to skip through a few details here, but lower bound and upper bound. 509 01:05:43.409 --> 01:05:49.889 Will do binary searches on the bucket indices and produce an output vector. 510 01:05:49.889 --> 01:05:57.360 Or the start of each key value is so if I go in here, it will find, whereas the. 511 01:05:57.360 --> 01:06:04.079 Whereas there's no 1 in here, but it will find where each of these things is and you don't know because we're missing some. 512 01:06:04.079 --> 01:06:07.469 The values, so. 513 01:06:07.469 --> 01:06:11.940 And we're doing the search on the bucket indices. 514 01:06:11.940 --> 01:06:18.900 Which, if I scroll back is, this is a parallel buying a lower bound in parallel upper bound. So. 515 01:06:20.250 --> 01:06:26.460 Because it's not just searching for 1 value searching for each 1 of a vector of values. 516 01:06:26.460 --> 01:06:29.969 Okay. 517 01:06:29.969 --> 01:06:35.340 Time here, I guess it'll be log in times a number of things we're searching for us. 518 01:06:35.340 --> 01:06:42.929 Well, it's fine so this is 1 step when I was doing around nights and coding, I'm skipping over a few details, which you get the basic idea here. 519 01:06:42.929 --> 01:06:51.929 So, what we've done is we produced a vector this different way to get the length of each front. We've got a vector of. 520 01:06:51.929 --> 01:07:01.289 The start of each list I just slightly different way to do it. I was getting myself confused. 521 01:07:01.289 --> 01:07:11.070 Or, actually, I was wrong here, we're binary searching in the call last key factor here to find the start of each new. 522 01:07:11.070 --> 01:07:19.829 They can the end and then we subtract the end from the start from the end. And we've got the length is another way to get all the runways. 523 01:07:19.829 --> 01:07:23.280 overground and upper bound and. 524 01:07:23.280 --> 01:07:31.409 Okay, and minus, so we're taking the upper bounds minus the lower bounds as a transformed thing. 525 01:07:31.409 --> 01:07:36.239 And producing another way to that, the bucket of sizes here. 526 01:07:38.340 --> 01:07:42.269 Oh, okay. So 2 ways to get 2 sides of each bucket. 527 01:07:43.739 --> 01:07:48.480 And methodology. 528 01:07:52.440 --> 01:07:58.050 To do a random number of generator was slow. 529 01:07:58.050 --> 01:08:04.920 Okay sorting I said a slow reduction was faster. 530 01:08:04.920 --> 01:08:13.050 Okay, and searching binary search was fast. So the count was fast. So what I said the sword is often the slowest part of it. 531 01:08:13.050 --> 01:08:17.279 Classification is log in time and very fast log in. 532 01:08:17.279 --> 01:08:22.500 Okay, this is like they sit around a number of our energy. 533 01:08:22.500 --> 01:08:28.380 Through, which is basically the inverse of the. 534 01:08:28.380 --> 01:08:38.310 Time and counting was very searching was very fast sorting again with slow reductions. Surprisingly classification. 535 01:08:38.310 --> 01:08:43.380 Get in fast not as fast a search. I'm not certain. Why? But. 536 01:08:44.579 --> 01:08:47.699 Is it's doing more work I guess, at each point. 537 01:08:49.409 --> 01:08:56.520 Yeah, just comparing generating random numbers on the source. 538 01:08:56.520 --> 01:09:02.729 On the host is nice, but you've got a copy to the device on the device is. 539 01:09:04.529 --> 01:09:09.300 Doing it right is tricky. That's what they mean avoiding correlation. So. 540 01:09:10.619 --> 01:09:17.069 Oh, okay. You want good quality right? And over January don't try to do it. Yourself. 541 01:09:17.069 --> 01:09:20.640 There is a fast parallel thing in video has. 542 01:09:20.640 --> 01:09:25.890 Okay, but I keeps the sort is slow even if it is a fast sort. 543 01:09:28.020 --> 01:09:33.659 Yeah, what what they mean here is very fast means. So, using the Radix sort. 544 01:09:33.659 --> 01:09:42.779 Yeah, so reduced by key, it's a surprisingly powerful concept and it's only useful for parallel programming. 545 01:09:42.779 --> 01:09:54.840 So so you bring, like, items together, this will be with a sort and then do a reduce of all the, like, items. So histogram on links and coding word counts. So, on. 546 01:09:58.680 --> 01:10:04.710 And they say the binary search they said was much better then. 547 01:10:04.710 --> 01:10:14.520 Over over reduced by key, but it's more complicated. So, yeah, you had to do a soft at the start. This case happened to be sorted. You need storage. 548 01:10:14.520 --> 01:10:19.260 Lots of other examples here. I'll show you some of them. So. 549 01:10:19.260 --> 01:10:29.310 Okay, best practice few things together like transform reduce structure of a raise. 550 01:10:31.050 --> 01:10:45.359 Memories actually less important. Remember, the card I've got on parallel is 48 gigabytes on the GPU and parallel has South has a, a quarter terabyte on the host. So if your dad is smaller than that, which. 551 01:10:45.359 --> 01:10:51.329 I think a 100 probability 1, it is then you've got the memory, but still does the bandwidth issues. 552 01:10:51.329 --> 01:10:56.010 And cashing issues. Okay. Um. 553 01:10:58.350 --> 01:11:02.340 Big application a sword is to bring, like, items together. So then you. 554 01:11:02.340 --> 01:11:07.409 Reduce the set of, like, items or something generalization a map reduce. 555 01:11:07.409 --> 01:11:11.489 Yeah, it was on Google code at 1 point. 556 01:11:11.489 --> 01:11:17.189 That web site is down I think now that's obsolete. Okay. And. 557 01:11:17.189 --> 01:11:21.840 Over 1 of the guys that did thrust okay. 558 01:11:25.680 --> 01:11:30.659 Want to show you some examples now. 559 01:11:30.659 --> 01:11:35.489 They take me awhile to think about. 560 01:11:38.970 --> 01:11:44.189 That we can look at some of them, and I've got my blurbs on some of this. 561 01:11:45.239 --> 01:11:49.170 Let me look at some of them here. Tiled range dots. 562 01:11:50.609 --> 01:11:56.970 So, I am looking at a local version of it. 563 01:11:56.970 --> 01:12:00.239 But it's they're on. 564 01:12:03.119 --> 01:12:07.739 Silence. 565 01:12:07.739 --> 01:12:11.310 Yeah, um. 566 01:12:15.060 --> 01:12:22.739 Yeah, so if we look at some of them here. 567 01:12:24.210 --> 01:12:29.640 It's just a little bigger. Okay. I'm tiled range. 568 01:12:34.979 --> 01:12:38.789 Well, the example shows what we want to do here. 569 01:12:39.869 --> 01:12:43.319 We have we have an input vector. 570 01:12:45.000 --> 01:12:49.500 Of elements doesn't have to be consecutive 1, 2, 3. 571 01:12:49.500 --> 01:12:52.649 And we want to repeat. 572 01:12:52.649 --> 01:12:57.479 This factor K times, like repeated 3 times. 573 01:12:57.479 --> 01:13:03.329 They went to 0 and 20123. okay. Calling it. Tiling a range. 574 01:13:03.329 --> 01:13:07.289 A, very simple thing. Um. 575 01:13:08.460 --> 01:13:14.550 Ways he would, and I've got different ways where I'm actually improving on theirs, but okay, so. 576 01:13:14.550 --> 01:13:18.390 Let me just show, it actually works. 577 01:13:19.529 --> 01:13:24.239 Silence. 578 01:13:26.250 --> 01:13:30.840 I actually don't want to do that. 579 01:13:35.609 --> 01:13:42.510 Yeah, this is the compiler. 580 01:13:46.140 --> 01:13:51.060 Okay, that's what it does. Okay. Repeats the range several times. 581 01:13:51.060 --> 01:13:58.859 Fun I'm going to play with make file. 582 01:14:03.000 --> 01:14:06.300 We don't need that. 583 01:14:06.300 --> 01:14:13.680 Okay, actually yeah. Okay. Good. 584 01:14:13.680 --> 01:14:22.140 For people not wildly familiar with make file. What it does is that you give it a target. 585 01:14:22.140 --> 01:14:27.175 It does compilations and what this is doing, 586 01:14:27.204 --> 01:14:29.064 is it looks if I say, 587 01:14:29.064 --> 01:14:32.545 make food and will look for food food, 588 01:14:33.295 --> 01:14:33.534 food, 589 01:14:33.534 --> 01:14:38.755 dot f or whatever and this is the command that it will use to generate to compile food. 590 01:14:40.949 --> 01:14:48.510 I just deleted the definition, a dollar, a dollar variable name expense expands variable. 591 01:14:48.510 --> 01:14:58.020 She XX, slag's not even been used here dollar less than is the input file in this case with. 592 01:14:58.020 --> 01:15:06.720 And dollar ad sign is the output variable, which will be full, and this will compile it, but it will only compile it. 593 01:15:06.720 --> 01:15:10.979 If the source is newer than the output. 594 01:15:10.979 --> 01:15:16.649 Or the output doesn't exist, so it doesn't do unnecessary compiles. Cool. 595 01:15:16.649 --> 01:15:19.739 So, if I said, make this again. 596 01:15:21.420 --> 01:15:24.930 It's up to date. Okay. In any case. So. 597 01:15:24.930 --> 01:15:27.960 What's happening here? 598 01:15:29.609 --> 01:15:33.539 You do. 599 01:15:42.329 --> 01:15:47.699 So tiled range is a class it overloads per. 600 01:15:47.699 --> 01:15:55.680 To return takes an argument I and does a mod operator. Okay. I. 601 01:15:55.680 --> 01:16:00.930 Tiled size tiled size is a local variable. 602 01:16:00.930 --> 01:16:07.140 In here, it's a member of the class, and when you construct a variable of this class. 603 01:16:07.140 --> 01:16:15.270 It takes the, it's argument as the, that variable to store. 604 01:16:15.270 --> 01:16:18.420 Now, the. 605 01:16:18.420 --> 01:16:25.229 The syntax here for people that are little touch week on C. plus plus. 606 01:16:25.229 --> 01:16:29.189 Okay, so this is a class the class contains. 607 01:16:29.189 --> 01:16:43.109 It contains variables and it contains function. So there's a variable variable member of the classes. His name is tile size and its type difference type difference type was defined. 608 01:16:47.425 --> 01:17:00.954 It's it's an here it's a, it's a type deaf just ignore that at the moment. Okay. So this is a constructor here because the function name is the same as the class instructor takes 1 argument. 609 01:17:01.260 --> 01:17:06.180 A tile size, and by the way, if you call the constructor with a different. 610 01:17:06.180 --> 01:17:11.850 Type of argument, the compiler may fail because there may be no constructor because this will work only for. 611 01:17:11.850 --> 01:17:16.020 1, are you into this? Okay so what the syntax here says. 612 01:17:16.020 --> 01:17:22.409 At initial I, so we have. 613 01:17:22.409 --> 01:17:25.710 You know, the name, the argument list colon. 614 01:17:25.710 --> 01:17:30.359 And what we have here is this is. 615 01:17:30.359 --> 01:17:36.449 The name this is a member named tile size. This initializes a member. 616 01:17:36.449 --> 01:17:44.220 For this new variable, this is instructing new variable of this class. This is and this is initializing a member of the variable. 617 01:17:44.220 --> 01:17:48.569 To this value tile size, which was the argument here. 618 01:17:48.569 --> 01:17:54.720 And then there's nothing else happens. This is the body of the constructor function. It's empty just a braces. 619 01:17:54.720 --> 01:18:04.829 Now, we could put tile size equals tile size in the body here, but this, this thing's a little better. So perhaps a little more efficient. 620 01:18:04.829 --> 01:18:10.229 And if we have a number of members, we're initializing, they're all done in parallel. 621 01:18:10.229 --> 01:18:17.069 Touch odds and tax, and we're defining the apprentice operator and. 622 01:18:17.069 --> 01:18:22.739 This takes 1 argument, and all it does is a model on tile size, which was. 623 01:18:22.739 --> 01:18:25.829 A local variable inside. 624 01:18:25.829 --> 01:18:33.119 The variable of class tiled range go back a little here. Okay. 625 01:18:34.800 --> 01:18:38.699 Can you the end of the class. 626 01:18:38.699 --> 01:18:42.300 So, we'll finish it next time. 627 01:18:42.300 --> 01:18:47.789 You can look at this and look at other. You're going to walk your way down the blog. If you want. 628 01:18:47.789 --> 01:18:54.510 What makes things weird. 629 01:18:55.829 --> 01:19:04.920 Yeah, this is too complicated to even give a summary in a minute or 2, but that's so, what we're doing now and for the next day is we're going to look at these examples. 630 01:19:04.920 --> 01:19:09.569 And look at ways I've taken some of them, and I've actually made them faster. 631 01:19:09.569 --> 01:19:18.510 So, I've got various versions of tiled range, like, Tile range 2. 632 01:19:18.510 --> 01:19:24.060 And where I use place holders, and so on. 633 01:19:24.060 --> 01:19:27.180 And I'd like my method here. I think it's nice. 634 01:19:27.654 --> 01:19:30.204 Okay, so that's enough for today. 635 01:19:30.204 --> 01:19:32.515 So we finished off the Stanford slides, 636 01:19:32.904 --> 01:19:44.185 showing you some pro parallel programming paradigms and then the GTC part little chunk of the GTC tutorial and now we're looking at actual examples of trust code, 637 01:19:44.185 --> 01:19:45.805 which has some weird things. 638 01:19:46.380 --> 01:19:52.260 It will, it will be surprisingly hard to understand how these small programs operate. 639 01:19:52.260 --> 01:19:57.329 So, it's a thinking style, but when they but they're very fast. 640 01:19:57.329 --> 01:20:05.159 On the gpo and that that's what's nice. And and I've got my notes that I wrote on a lot of these here that you can. 641 01:20:05.159 --> 01:20:08.520 Look at, and some of them. 642 01:20:08.520 --> 01:20:15.840 Like, expand does run lanes decoding and it's surprisingly sophisticated what it does. 643 01:20:15.840 --> 01:20:23.399 Sparse factor going to add 2 sparse factors and got some comments here and how we sort. 644 01:20:23.399 --> 01:20:27.930 A lot of race of arrays and so on. 645 01:20:29.520 --> 01:20:33.510 Okay, and some other lots of other paradigms. Okay. 646 01:20:35.159 --> 01:20:44.220 Count unique keys to subset of a histogram basically repeated range repeat each element do it passed past in parallel? 647 01:20:44.220 --> 01:20:53.789 Okay, and then so this will be another class at least 1 and then 2, and then we'll morph into. 648 01:20:53.789 --> 01:20:56.970 Quantum computing for the rest of the semester. 649 01:20:56.970 --> 01:21:00.989 So, if there's any questions. 650 01:21:00.989 --> 01:21:11.279 If not the optional day off so they don't want a day off. I'm quite glad to give you more material you're paying for the material, but if you'd like to have a break. 651 01:21:11.279 --> 01:21:18.239 You can comment on that, if not, um, if there's any questions or anything. 652 01:21:20.640 --> 01:21:24.359 No, okay. Have a good week. See you Thursday then. 653 01:21:26.220 --> 01:21:30.210 No reason to start a chat as well.