WEBVTT 1 00:03:16.680 --> 00:03:20.580 Right, right? 2 00:03:30.030 --> 00:03:33.900 Silence. 3 00:03:33.900 --> 00:03:40.919 Okay, I'm good afternoon class and. 4 00:03:43.379 --> 00:03:49.259 Universal question, because I can actually never tell. 5 00:03:49.259 --> 00:03:52.889 Is, can you hear me. 6 00:03:52.889 --> 00:04:01.319 Thanks Isaac. Great. So what is happening today? 7 00:04:01.319 --> 00:04:10.469 Parallel computing, I guess class 17, March 25th, 2021. I've got a few more student talks. 8 00:04:10.469 --> 00:04:14.610 Really want to hear what you have to tell me and then after that. 9 00:04:15.324 --> 00:04:29.454 To finish off the time, I've got a general blurb about C plus plus programming and how several ways you can do functions in it. And this is actually why I love C. plus. Plus it has these weird powerful things that have. 10 00:04:29.999 --> 00:04:33.538 Implications that you wouldn't expect and they are. 11 00:04:33.538 --> 00:04:44.309 They let you do powerful efficient programming and we'll see more about thrust and I've got lots of information. I copied some stuff over from last time since we didn't cover it. 12 00:04:44.309 --> 00:04:51.209 Other information on thrust, and a blurb about heterogeneous memory management in Linux. 13 00:04:51.209 --> 00:04:56.428 But 1st guess Connor, Isaac and Jack. 14 00:04:56.428 --> 00:05:01.709 Depending on who would like to go? 1st Connor. 15 00:05:01.709 --> 00:05:05.038 And see, if you can take over the screen and so on. 16 00:05:05.038 --> 00:05:09.689 For sure, let me let me see if I can share my screen real quick. 17 00:05:09.689 --> 00:05:20.668 All right, let me see if you can let me know if you can see my screen. I'm not yet. 18 00:05:20.668 --> 00:05:24.389 I think we're coming in now. It's blank. Good. 19 00:05:24.389 --> 00:05:35.548 Perfect. Okay, so I'm just going to quickly talk about cloud based computing and briefly how it can apply to parallel computing. 20 00:05:35.548 --> 00:05:39.358 Um, so what is cloud based computing? Um, um. 21 00:05:39.358 --> 00:05:45.418 Feuding simply put is the delivery of computing resources and services over the Internet. 22 00:05:45.418 --> 00:05:50.459 This enables faster innovations, more flexible and. 23 00:05:50.459 --> 00:05:59.514 Resources at computing resources required for a large scale or demanding applications, easily accessible to all sorts of groups of people. 24 00:06:00.653 --> 00:06:12.834 This create essentially an Internet based distributed computing platform where users can easily introduce new nodes and resources and scale their computing network as their applications demands grow. 25 00:06:12.834 --> 00:06:26.634 And shrink, this enables the ability, not only to pay for the resources that you need and use, which can largely reduce operating costs and operating costs for hosting and running your application. 26 00:06:26.874 --> 00:06:35.004 And then again, it can make it more efficient and easily to scale and expand. So, what are some of the biggest benefits of cloud computing? 27 00:06:35.548 --> 00:06:45.298 1st, and foremost is the capital requirement to buy expensive hardware and software. 28 00:06:45.298 --> 00:06:48.478 Uh, that can be required to develop and operate your program. 29 00:06:48.478 --> 00:06:54.569 This also eliminates the overhead top of up and onsite computing system. 30 00:06:54.569 --> 00:06:59.759 Which can actually be quite hard for large scale computing. That's there is. 31 00:07:00.053 --> 00:07:00.564 As well, 32 00:07:00.564 --> 00:07:04.254 as for the to run those computing resources, 33 00:07:04.254 --> 00:07:09.564 and then the staffing and it infrastructure requirements in order to maintain and manage those resources, 34 00:07:09.713 --> 00:07:10.103 um, 35 00:07:10.134 --> 00:07:10.553 cloud, 36 00:07:10.553 --> 00:07:17.184 we've had large amounts of computing resources can be quickly allocated and provisioned in just a matter of a few minutes. 37 00:07:17.184 --> 00:07:18.444 And a few mouse clicks. 38 00:07:18.658 --> 00:07:23.218 Which really gives you the ability to have on demand computing resources. 39 00:07:23.874 --> 00:07:37.733 You no longer have to have the overhead time to purchase and set up those computing resources. They're just available to you. I've already mentioned the benefit of it being very scalable and he didn't provision you need that. 40 00:07:38.459 --> 00:07:41.459 Um, cloud computing. 41 00:07:41.783 --> 00:07:53.843 Reliable as a data backup and disaster recovery, increasingly seamless and less expensive as data can be easily mirrored to different points on the provider's network. 42 00:07:54.233 --> 00:07:59.903 Instead of you then having to purchase additional resources and additional computing cluster. 43 00:08:01.379 --> 00:08:11.728 Post your backups and everything when it comes to security, um, using cloud computing, allow you to take advantage of the service provider security practices and features. 44 00:08:11.728 --> 00:08:18.209 Which can be really helpful due to the fact that most of the large scale cloud providers are tech giants and they have. 45 00:08:18.209 --> 00:08:31.858 Tons of money and resources to devote to security and security practices. However, they're definitely plenty of security risks anytime that you introduce something to the Internet. So you do have to keep in mind. Um. 46 00:08:31.858 --> 00:08:37.048 When you look at, uh, utilizing cloud computing thing for your application. 47 00:08:37.048 --> 00:08:41.339 There are however long and. 48 00:08:41.339 --> 00:08:45.538 You can see this with. 49 00:08:45.538 --> 00:08:57.178 You know, Microsoft and Amazon to name 2 of them working closely on classified contracts and projects that require cloud computing. So there are definitely ways for separate your club. 50 00:08:57.178 --> 00:09:00.928 Sources from from the open Internet, you can play there. 51 00:09:00.928 --> 00:09:04.408 In there, um, so what. 52 00:09:04.408 --> 00:09:07.798 Apply to parallel computing. 53 00:09:07.798 --> 00:09:16.769 Um, as we've seen up to this point, computing typically requires a lot of computing power and resources, especially for the large scale industry applications. 54 00:09:16.769 --> 00:09:24.479 Um, this previously created a barrier between parallelization and your average consumer. 55 00:09:24.479 --> 00:09:29.849 As well, as even the mute and maintain the resources that are required for those applications. 56 00:09:29.849 --> 00:09:35.818 Or they simply don't have the space to house those large scale computing clusters, or a supercomputer. 57 00:09:35.818 --> 00:09:43.739 Um, with cloud based computing, though, almost anyone can get access to huge amounts of computing research. 58 00:09:43.739 --> 00:09:49.859 Money by them, instead of buying them, and they're also able to quickly access to us that on demand. 59 00:09:49.859 --> 00:10:00.629 Um, as a result of this, I think that we're going to start seeing lots more development in the parallel computing space because of how cheap and easy it is to access the resources required to perform. 60 00:10:00.864 --> 00:10:13.283 This kind of research and development, and you no longer need to be in person at a cutting edge computing facility, or even associated with an institution that has traditionally had access to these resources. 61 00:10:13.703 --> 00:10:18.744 Um, as we often see what technologies, you know, advancements com as resources, become. 62 00:10:19.048 --> 00:10:22.948 Increasingly available and more and more people began. 63 00:10:22.948 --> 00:10:26.129 Work with and work on applications. 64 00:10:26.129 --> 00:10:27.413 This is without a doubt, 65 00:10:27.443 --> 00:10:28.614 the case with computing, 66 00:10:28.614 --> 00:10:31.583 as we've seen with the introduction of the personal computer and everything, 67 00:10:31.974 --> 00:10:40.254 and making previously unattainable resources available to the masses can only help to further explore the possibilities of computing power, 68 00:10:40.254 --> 00:10:40.673 parallel, 69 00:10:40.673 --> 00:10:42.354 isation and technological. 70 00:10:43.048 --> 00:10:48.028 That's pretty much everything I have. Are there any questions. 71 00:10:49.168 --> 00:10:56.698 Well, thank you, I'll start with the question if no 1 else has 1. 72 00:10:56.698 --> 00:11:04.798 Of course, cities run internal clouds for faculty, research labs and Solomon want computers. 73 00:11:04.798 --> 00:11:10.558 And so, instead of buying your own computer, they would strongly urge you to basically. 74 00:11:10.558 --> 00:11:16.109 Rent virtual computers on the university infrastructure. 75 00:11:16.109 --> 00:11:21.778 What do you think of that idea? I think it's a really smart idea again. 76 00:11:21.778 --> 00:11:25.379 Now, it. 77 00:11:25.379 --> 00:11:32.188 Back to the aspect, you can essentially then operate to close the network. Um, that's. 78 00:11:32.188 --> 00:11:47.038 A cloud computing network, instead of actually having to go through the Internet or having to go through a 3rd party to work, whether it's sensitive work, things like that on your own sort of resources instead. So while you still have the overhead of. 79 00:11:47.038 --> 00:11:52.528 Having to purchase those resources to make available to your staff and students and whoever else. 80 00:11:52.528 --> 00:11:59.969 Um, the, I guess, convenience of having to be cloud based. Definitely, um, helps. 81 00:11:59.969 --> 00:12:13.048 And makes it worth, in my opinion, I'm wondering what limitations there might be like, for example. So, let's take parallel. That machine is basically 10,000 dollars. But the thing is, I could configure it as I wanted. 82 00:12:13.048 --> 00:12:17.399 And so there's a trade off, but I have to be my own systems programmer. 83 00:12:17.399 --> 00:12:23.308 And if it breaks, it's not going to get fixed. So, you know, that's the. 84 00:12:23.308 --> 00:12:27.119 Fun anyone else would like to contribute on the topic. 85 00:12:27.119 --> 00:12:32.879 No, okay. So next would be. 86 00:12:32.879 --> 00:12:37.678 Isaac, would you like to tell us about aerospace applications from NVIDIA? 87 00:12:37.678 --> 00:12:41.399 Oh, sure. But, um. 88 00:12:41.399 --> 00:12:46.048 Just give me 1 moment so. 89 00:12:46.048 --> 00:12:53.339 All right, still I am actually ended up doing this a bit more generally, but I am going to cover some. 90 00:12:53.339 --> 00:12:59.938 And video graphics cards, so I'll be focusing on a space based applications. 91 00:12:59.938 --> 00:13:10.734 So, since I think it'd be a bit more interesting than spending 5 minutes talking about engineers spending centuries of process or time modeling airflow around landing gear. So for spacecraft Mr. 92 00:13:10.734 --> 00:13:21.624 time you look to parallel for strong scaling rather than a week scaling since anything. That's a really big problem that you'd want to solve. You would've taken care of it on the ground with a supercomputer. 93 00:13:21.869 --> 00:13:27.298 So, you are interested in smaller problems that you want to solve really quickly on board. 94 00:13:28.798 --> 00:13:41.303 So so you might have things like, um, graphics image processing or computer vision programs that you might want to speed up. So, uh, example of this would be, um, star trackers. 95 00:13:41.333 --> 00:13:45.714 So these are 1 of the best ways to oriented spacecraft and pure. You just. 96 00:13:46.494 --> 00:14:00.293 Do you take an image of a set of stars and you try to match them to a Pre computed catalog? You can see the base guide the bottom right here. So, this involves going through lots of different permutations of different star patterns, reading in image data processing. 97 00:14:00.293 --> 00:14:04.793 That and trying to match it to a Pre computed catalog. 98 00:14:05.399 --> 00:14:08.788 And solving lots and lots of New York saver problems. 99 00:14:08.788 --> 00:14:11.969 So another common problem. 100 00:14:11.969 --> 00:14:24.323 That you'd want to solve is, um, things related to a guidance navigation and control. So if you have a control system, that's supposed to predict the stage of your spacecraft in 100 milliseconds in the future. 101 00:14:24.323 --> 00:14:31.403 You'd probably want to get that answer in less than 100 milliseconds. So, for the guidance and navigation part of this. 102 00:14:32.188 --> 00:14:43.254 This is, um, there can be really complicated algorithms, so a good example is the landing vision system that the perseverance row for just used to land on Mars. So, the process is kind of illustrate here. 103 00:14:43.734 --> 00:14:53.214 It's a similar idea to star tracker data where it's trying to match hundreds of train patterns and features to onboard data set to figure out where it is and how it's moving. 104 00:14:53.489 --> 00:15:03.749 So the latter stage involves commuting over 150 different patches on a map and trying to match them to the images it's taking. And as it's moving. 105 00:15:03.749 --> 00:15:13.078 So another thing to keep in mind for competing on space for raft, is that what do you compare it to most workstations or computer clusters? There's a much more. 106 00:15:13.078 --> 00:15:23.339 Fundamental connection between the software it's running and the hardware it's running on, at least for, um, these kind of a big flagship spacecraft so. 107 00:15:23.339 --> 00:15:37.139 You can see a kind of schematic of the of the, um, lander vision system here. And as you can see, there's hardware on board that's made to take in very specific inputs and to return very specific outputs. So. 108 00:15:37.139 --> 00:15:51.953 The vision compute system here is actually made up of free components that were built specifically for spacecraft and 1 of those components was actually made just for this mission and they could actually get away with calling it the computer vision accelerated card. 109 00:15:52.283 --> 00:15:58.344 Because it was designed with exactly this algorithm here in mind. So this. 110 00:15:58.589 --> 00:16:09.509 So this is 1 reason why I'm fields pro ground bull gateways are really popular in space based applications. And, uh, another reason is that they're easier to radiation hard and then. 111 00:16:09.509 --> 00:16:23.038 So, radiation is actually a really big concern for computing and space. So, in, um, consumer computer setups on the ground here, they regularly have to deal with cosmic rays coming in through the atmosphere. 112 00:16:23.038 --> 00:16:36.203 And flipping random bits, or causing vulture spikes in random bits of circuitry. So they can handle this with like, a hardening techniques like, uh, error correction, codes, or, um, algorithm based fault tolerance. 113 00:16:36.533 --> 00:16:38.094 But this does introduce overhead. 114 00:16:38.724 --> 00:16:51.833 So, this isn't even limited to software overhead. So, 1 solution is to put a big shield over your hardware, but that's extra mass. That's literally feeding into an exponential function for how much bigger rocket needs to be. If he wants to get it into space. 115 00:16:52.374 --> 00:17:03.384 And, um, if you do end up, taking the space vehicle, designed capstone, course, your teammates will fight you over who needs that extra 100 grams from mass and that's before we even get into their problems that, or. 116 00:17:03.479 --> 00:17:07.709 Parallel processing has in general, these kind of errors so. 117 00:17:07.709 --> 00:17:18.929 A lot of these air correction codes are, um, algorithms that are designed to detect these faults. They work much better in serial, but since I'm in parallel. 118 00:17:18.929 --> 00:17:29.969 Since some programs run in parallel, they are much more sensitive to these kind of cash errors. So an example is, um, if a bit flips somewhere into L to cash. 119 00:17:29.969 --> 00:17:35.459 It could, like, propagates to almost every single process that's running on the GPU. 120 00:17:35.459 --> 00:17:39.328 And this would and there's. 121 00:17:39.328 --> 00:17:43.949 It'd be difficult to kind of detect that in less than a single cycle. 122 00:17:43.949 --> 00:17:51.989 So another thing to kind of consider is the, um, the effects of different Fred distributions on the, because there's. 123 00:17:51.989 --> 00:17:55.618 Because the, uh, the physical locations of them. 124 00:17:55.618 --> 00:18:01.409 Of the memory that you're actually using well, if you're using more feds, you'll use a literally. 125 00:18:01.409 --> 00:18:09.509 Wider area of memory on the card, they literally call it a device cross section when they're shooting out the of a new strong beam here. 126 00:18:09.509 --> 00:18:17.909 So, if you use less, Fred's and less flocks, you can reduce the cross section, but you're also going to take more time. 127 00:18:17.909 --> 00:18:22.679 But that's a kind of trade off from there that you have to consider. 128 00:18:22.679 --> 00:18:28.019 So, recently, things have gone better at the, um, like the high performance computing left. 129 00:18:28.019 --> 00:18:42.479 Since manufacturers aren't including, uh, reliabilities features and cheap use, but these are only in the major storage structures, like the cash and the shared shared memory and registers but things like, um, the. 130 00:18:42.479 --> 00:18:47.848 Logic Gates queues Fred Flock, schedulers, work, schedulers and, um. 131 00:18:47.848 --> 00:18:55.648 Different interconnect networks aren't actually protected by this. It's also worth noting that, um, from embedded systems. 132 00:18:55.648 --> 00:19:08.939 Uh, don't typically don't have these kind of liability systems on them. So, from a programming perspective, if we're going to look at what these errors actually look like, Here's a few fingers. So. 133 00:19:09.534 --> 00:19:24.263 The sheet that we saw earlier here, it was a K20 so it's a little dated, but it's still in use and to get this kind of data. They made a do Matrix, multiple patients and fast for transforms as they shot at it with a specific particle being. 134 00:19:24.263 --> 00:19:30.324 So, you can see the distributions for the different kinds of errors that you get for matrix multiplications. So. 135 00:19:31.588 --> 00:19:36.449 You might have a singularity there, Matrix steroids or a single row errors and single column and. 136 00:19:36.449 --> 00:19:40.409 Other types of errors, and for a fast for transform. 137 00:19:40.409 --> 00:19:49.439 You can kind of see how a single error in a Fred can kind of propagate fruit into the entire. 138 00:19:49.439 --> 00:19:58.679 Output, so this kind of integration of a single output factor seems like it comes up a lot in different scans and reductions. We've looked at class. 139 00:19:58.679 --> 00:20:04.229 So, I imagine the problem isn't really limited to fast forward transforms. So, um. 140 00:20:04.229 --> 00:20:09.118 Yeah, so that kind of brings up be glad to take any questions. 141 00:20:12.778 --> 00:20:17.189 So, thank you. 142 00:20:17.189 --> 00:20:22.169 So, could you summarize what is the hardware like, on the recent Mars missions? 143 00:20:22.169 --> 00:20:25.648 Like, what sort of computer do they have on the same. 144 00:20:25.648 --> 00:20:33.298 So, the main compute unit that they use is, um, it's called a rad 750, and it's kind of. 145 00:20:33.298 --> 00:20:39.538 So, it's a, it's based around a field program for, um. 146 00:20:39.538 --> 00:20:42.719 Gage a case rate. Sorry and. 147 00:20:42.719 --> 00:20:48.778 There it's a designed to kind of be protected against radiation, both physically and, um. 148 00:20:48.778 --> 00:20:52.618 And in terms of the circuitry and and to. 149 00:20:52.618 --> 00:20:58.679 And to do sequential tasks in a very specific manner. So it's. 150 00:20:58.679 --> 00:21:02.969 I think it's, it keeps multiple copies of data in. 151 00:21:02.969 --> 00:21:08.999 Of, um, things are supposed to be in its memory, it regularly checks it so on and so forth. 152 00:21:08.999 --> 00:21:17.189 This kind, I think this exact card has been used since the eighties, but there are there also are like, um. 153 00:21:17.189 --> 00:21:21.929 Specific hardware that they that they built for missions. 154 00:21:22.979 --> 00:21:26.848 Okay, thank you. Anyone else have a question. 155 00:21:28.558 --> 00:21:31.979 No, okay. So. 156 00:21:33.209 --> 00:21:37.318 Jack tell us about autonomous vehicles. 157 00:21:37.318 --> 00:21:41.909 And why they are relevant to the course, right? 158 00:21:41.909 --> 00:21:45.689 Just a moment. 159 00:21:54.509 --> 00:21:58.409 Hello. 160 00:21:58.409 --> 00:22:01.499 Oops. 161 00:22:01.499 --> 00:22:04.828 Good can you see. 162 00:22:04.828 --> 00:22:09.689 Can see, you can hear you. Great. All right. Cool. So, um. 163 00:22:09.689 --> 00:22:17.489 Yeah, Here's some recent advancements with NVIDIA with regards to autonomous machines. 164 00:22:17.489 --> 00:22:22.528 Some of the notable features that I want to point out the 1st, 1 being. 165 00:22:22.528 --> 00:22:25.919 The physics 4.0. 166 00:22:25.919 --> 00:22:28.949 So, previously the physics. 167 00:22:28.949 --> 00:22:33.749 Has been a traditionally used in video games to. 168 00:22:33.749 --> 00:22:36.898 Um, computer like physics. 169 00:22:36.898 --> 00:22:44.398 Simulations, for example, in a game engines, like Unreal or unity, for example, and. 170 00:22:44.398 --> 00:22:48.689 If anyone's a video game player out there, you may recognize some of. 171 00:22:48.689 --> 00:22:53.489 The titles that have used this fall out for, or borderlines to. 172 00:22:53.489 --> 00:23:00.328 But in December of 2018, the physics s. T. K. 173 00:23:00.328 --> 00:23:05.189 Has been released to open source and. 174 00:23:05.189 --> 00:23:09.419 Also has been upgraded to version 4.0. 175 00:23:09.419 --> 00:23:18.028 For some of the features in 4.0 that the 1st, 1 being a temporal cycle method. 176 00:23:18.028 --> 00:23:21.808 For solving nonlinear systems so this basically. 177 00:23:21.808 --> 00:23:27.509 The way I understood it compared to older equation solvers, rather than. 178 00:23:27.509 --> 00:23:30.598 Updating all variables at once. 179 00:23:30.598 --> 00:23:35.098 As the method solves. 180 00:23:35.098 --> 00:23:39.179 Uh, these systems as updates become available, it will immediately. 181 00:23:39.179 --> 00:23:44.578 Update the variable of allowing for better convergence and better estimation of. 182 00:23:44.578 --> 00:23:48.449 Things like position and velocity of a robot per se. 183 00:23:48.449 --> 00:23:52.439 Um, so, yeah, this, this. 184 00:23:52.439 --> 00:23:58.499 And going forward, combining this with a reduced coordinate articulation feature has. 185 00:23:58.499 --> 00:24:04.739 Led to applications of physics and robots so on the picture on the left there, you can see it. 186 00:24:04.739 --> 00:24:09.358 Single arm robots with the reduced coordinate articulation and. 187 00:24:09.358 --> 00:24:13.048 Better accuracy of costs title. 188 00:24:13.048 --> 00:24:17.219 System solvers, these machines can better track. 189 00:24:17.219 --> 00:24:23.189 Their position, velocity and contract objects that they're interacting with better. 190 00:24:23.189 --> 00:24:27.179 This can also help humanoid type. 191 00:24:28.288 --> 00:24:32.969 Um, objects or machines rather to to, uh. 192 00:24:32.969 --> 00:24:40.828 Better conduct their motion more fluidity with a more fluidity or rather. Sorry um. 193 00:24:40.828 --> 00:24:45.989 And also, now that the physics is open source, we may see. 194 00:24:45.989 --> 00:24:49.798 Team read try to. 195 00:24:49.798 --> 00:24:53.669 Incorporate it somehow, um, just a thought now that. 196 00:24:53.669 --> 00:24:57.028 It's open source who knows what they might plan to do with it. 197 00:24:57.028 --> 00:25:01.348 Um, going forward. 198 00:25:01.348 --> 00:25:06.509 This was a video a 5 minute video that I. 199 00:25:06.509 --> 00:25:10.169 Head sound on the tax I don't not going to play the entire video, but. 200 00:25:10.169 --> 00:25:16.378 I don't think there's any sound either, but I'm going to mute it as well. I'm just going to let it go while I talk. 201 00:25:16.378 --> 00:25:21.179 Video is also published to open source a. 202 00:25:21.179 --> 00:25:26.249 Trail that deep neural network. 203 00:25:26.249 --> 00:25:32.009 Framework which in the video, as you can see allows for. 204 00:25:32.009 --> 00:25:36.628 Objects to take in a sensory input. 205 00:25:36.628 --> 00:25:39.959 Their surroundings their environment and. 206 00:25:39.959 --> 00:25:43.199 Use the output of the DN to control. 207 00:25:43.199 --> 00:25:46.439 The robot, which is exactly what this drone is doing. 208 00:25:46.439 --> 00:25:50.788 Something that I had found interesting about this, was that they were. 209 00:25:50.788 --> 00:25:54.419 Testing this drone in the forest. 210 00:25:54.419 --> 00:26:00.749 Which being in a forest, there's a lot of complex objects like trees and leaves and stuff. 211 00:26:00.749 --> 00:26:04.138 And a lot of different distances and ranges, so. 212 00:26:04.138 --> 00:26:10.048 It would be a very hard place to navigate. So that's why I thought it would be a great place to test. 213 00:26:10.048 --> 00:26:15.239 Something like a drone that's probably why they had done that. Um, and this is all running on a. 214 00:26:15.239 --> 00:26:21.959 And video Jackson, single board computer, it's an embedded system with a. 215 00:26:21.959 --> 00:26:26.038 Cpu and GPU, I included some of the specs of. 216 00:26:26.038 --> 00:26:29.278 The jets in here, the various versions. 217 00:26:29.278 --> 00:26:32.878 Um, yeah, so. 218 00:26:32.878 --> 00:26:39.388 And this is strikingly fascinating because if we can navigate something like a forest. 219 00:26:39.388 --> 00:26:44.669 We're not very far off from navigating, say, like a cleared. 220 00:26:44.669 --> 00:26:48.028 Wide open highway being in a car. 221 00:26:49.048 --> 00:26:52.138 Um, so, yeah. 222 00:26:52.138 --> 00:27:00.868 In the last whoops, lastly, I just wanted to talk about at NVIDIA, Mercedes Benz partnership. 223 00:27:00.868 --> 00:27:06.808 So, starting in 2020 for Mercedes Benz will include. 224 00:27:06.808 --> 00:27:10.169 High performance and video computing. 225 00:27:10.169 --> 00:27:15.689 Our computers in their cars, um, using the invidia drive embedded system. 226 00:27:15.689 --> 00:27:21.598 And this will primarily feature the ability to drive regular routes. 227 00:27:21.598 --> 00:27:26.519 From 1 address to another, and also would feature. 228 00:27:26.519 --> 00:27:29.818 Level 4, which means. 229 00:27:29.818 --> 00:27:37.618 Translates to high driving automation, so level 4 automated parking in cars. 230 00:27:37.618 --> 00:27:40.858 Um, so, yeah, I guess. 231 00:27:40.858 --> 00:27:46.138 And show off the article. 232 00:27:46.138 --> 00:27:49.919 Or not, they can't open it, but. 233 00:27:51.778 --> 00:27:57.118 Yeah, that's that's basically all I got any questions. 234 00:27:58.318 --> 00:28:02.969 Yeah, thank you. So. 235 00:28:04.979 --> 00:28:17.878 What would be the limitations of this? Perhaps like, I read skeptics that say that Tesla is going to kill people with their economist vehicles and it also doesn't exist. And I don't know. What's your opinion? 236 00:28:17.878 --> 00:28:25.769 Um, well, and reading about this level 4, for example, this, this parking automation. 237 00:28:25.769 --> 00:28:31.108 What level for can account for is errors or, like, obstructions if something were to get in the way. 238 00:28:31.108 --> 00:28:34.828 Um, but I don't think it's. 239 00:28:34.828 --> 00:28:40.229 I think it's objectively impossible to prepare for everything that could happen. So. 240 00:28:40.229 --> 00:28:44.759 If some error where to happen, I don't know like. 241 00:28:44.759 --> 00:28:49.409 A major accident, or something like that. I just think that these system. 242 00:28:49.409 --> 00:28:52.709 I'm not ready to trust him to that extent. 243 00:28:52.709 --> 00:28:56.788 If that makes sense. Okay, anyone else like to chime in. 244 00:28:58.229 --> 00:29:04.709 Okay, great Thank you. So I learned a lot from your guys presentations so that's 1 reason I do it. 245 00:29:04.709 --> 00:29:10.048 And so. 246 00:29:11.368 --> 00:29:14.368 Let me go back. 247 00:29:14.368 --> 00:29:18.749 To look. 248 00:29:18.749 --> 00:29:25.078 Good. 249 00:29:26.278 --> 00:29:32.578 So, I'm talking more about NVIDIA related things now. 250 00:29:32.578 --> 00:29:38.519 And I got the chat window open in the side. I think if you have questions. 251 00:29:38.519 --> 00:29:47.909 So, in order to do things okay, well, the broader thing for a couple of class, and I was thrust, which is a parallel. 252 00:29:47.909 --> 00:29:51.568 An API to, um. 253 00:29:51.568 --> 00:30:03.239 Now, it has some functional programming concepts, which is you work with functions as objects that you can transform and combine functions. 254 00:30:03.239 --> 00:30:17.993 It's a powerful theoretical idea, but C, plus plus also may what it leads to is simpler code also. So theoretically interesting idea working with functions as objects, but it all can also lead to more compact code. 255 00:30:18.298 --> 00:30:21.538 Which is nice and in C plus plus. 256 00:30:21.538 --> 00:30:29.759 It is compiled to run very fast. This is a reason. I like C plus plus compared to. 257 00:30:29.759 --> 00:30:36.028 Some other languages that there's an overhead and calling functions and in fact, I think in. 258 00:30:36.503 --> 00:30:51.054 Python, I believe they warn you about it that too many function calls slow you down not the case and C. plus plus well, you get to say, you know, you've got classes and types. You got these little member functions called billions of times. 259 00:30:51.054 --> 00:30:54.923 But the thing is, they get inserted in line and then optimized out of existence. So. 260 00:30:55.169 --> 00:31:00.868 Okay, so there are several ways you can do functions and C. plus plus that are relevant. 261 00:31:00.868 --> 00:31:04.949 Some are familiar and some are. 262 00:31:04.949 --> 00:31:18.509 Knock the traditional 1 this goes back to see you just have a function add and so on now, something in recent versions a C. plus plus you can define it type is auto and this means that the compiler. 263 00:31:18.894 --> 00:31:30.473 In many cases can infer the type and in a case, like the function I've got up there, it's really is easier. Cause ad it's a function because the type of ad includes the fact it takes 2 arguments. 264 00:31:30.473 --> 00:31:43.763 Each of them has an integer passed by value and then it returns and now they're in danger and that's the type of ad is actually quite complicated. And you say auto then it's inferred. Okay, so you've got add there. 265 00:31:43.854 --> 00:31:48.683 And you can pass it as a point or to a function that reduction, let's say, and. 266 00:31:49.019 --> 00:31:54.179 This is an old idea goes back decades. It goes back to C. 267 00:31:54.179 --> 00:31:57.868 The problem is that you're passing a pointer to a function and. 268 00:31:57.868 --> 00:32:03.929 It can't be optimized function cannot be inserted in line and so it doesn't optimize. 269 00:32:05.159 --> 00:32:15.328 Um, there's a new way, our 2nd way, which is totally crazy. And C plus plus you can overload functions, make this a touch bigger for people here. 270 00:32:15.328 --> 00:32:19.739 There and C plus plus you can overload functions. 271 00:32:20.243 --> 00:32:30.294 And you can overload operators and CRNS it's an operator. You could overload it. I got to show you examples so you understand it, but it's really crazy. 272 00:32:30.503 --> 00:32:44.993 You can define a new class and C plus plus, and a member function for operator per ends. And so a variable in this class can be called, like, a function, and it will call this member of thing. It's really crazy. It's also very powerful. 273 00:32:44.993 --> 00:32:53.094 It allows you to do something called a closure, or you can make a version of the functional just certain environmental information wrapped up inside it. 274 00:32:54.203 --> 00:32:57.173 This optimize as well. We'll see examples of this. 275 00:32:57.713 --> 00:33:10.584 And then another 3rd method is again, this goes back a few years and C plus plus, you can have lamb disco slammed as and the tool just currently is programming languages are lists, been for trend. And what does that go back? 276 00:33:10.644 --> 00:33:15.743 Approximately 19 and 57 although they've been updated since then. 277 00:33:16.558 --> 00:33:27.239 So, and land land, this is a theoretical idea. Go back to the 19 thirties. I believe the irritation reasoning with these. In any case it's not part of the language. 278 00:33:27.239 --> 00:33:31.618 And C, plus plus, and that's what it looks like up here. Um. 279 00:33:31.618 --> 00:33:41.038 You define it, and the thing is the big event it's local to the containing block like normal functions are global and CC. Plus plus the global concepts. 280 00:33:41.038 --> 00:33:53.784 But this lets you write a function, which exists only inside the containing block, which is nice. You've got some containment, which is a good idea, and software engineering. So you defined the function. 281 00:33:55.104 --> 00:34:01.733 It's a variable add again. It's type is auto because the, the actual type it is a horrible mess. 282 00:34:01.979 --> 00:34:10.018 And it's a variable you can assign to other variables and so on. And what the syntax is. 283 00:34:10.018 --> 00:34:13.798 And being now, here's something again, it's new to. 284 00:34:13.798 --> 00:34:27.898 This being local, if you have a global global function CC, plus plus it can access global variables, and it can access all the functions because they're all global and it can also access global variables. Well, this local variable ad. 285 00:34:27.898 --> 00:34:31.259 Local anonymous it can access variables. 286 00:34:31.259 --> 00:34:34.949 That are inside the containing block, just like any other local block. 287 00:34:34.949 --> 00:34:38.099 And the square brackets, tell how. 288 00:34:38.099 --> 00:34:50.398 It should access those variables and the square brackets here empty. So we're not saying anything particular this function is not using the, its environment. But but this is also a powerful thing that. 289 00:34:50.398 --> 00:34:54.028 He may have a local function again at once. 290 00:34:54.028 --> 00:35:05.304 You may have a repeated block of code in a big block. I repeated section of code, and you'd want to say embedded in a function, but it's using its environment and you can't do it as a global function to the global function. 291 00:35:05.483 --> 00:35:19.434 Doesn't have access to the local environment unless you put the thing as lots of arguments. So the function, which gets to be a mess, so, here, this local function can access the global environment. It can access other variables that exist in the containing block. 292 00:35:19.554 --> 00:35:20.963 If you wish it to. 293 00:35:21.509 --> 00:35:30.208 It's not forced to and inside the square brackets, you would define how it inherits variables from the containing block. Here were not inheriting anything. 294 00:35:30.208 --> 00:35:35.668 The 2nd thing is that now inside the parentheses, it's just like any other function. 295 00:35:35.668 --> 00:35:41.699 It you give us arguments, and there are 2 arguments that are managers, and they're called by value. 296 00:35:42.503 --> 00:35:56.003 Which is relevant, and then inside the braces, you have the body of the function and it could be as long as you want. But typically, these lab is a very short bodies, but it could it could be 100 lines could be a 1000 lines of code. 297 00:35:56.003 --> 00:36:04.164 If you wanted, but then your program starts looking funny. So typically these lamb base, I use the very short things like something like that. Let's say. 298 00:36:05.454 --> 00:36:19.583 And so that's your lab it's called a Lambda, because going back to the irritation of the 19 thirties, they used the Greek lamb to talk about this. Okay. So, or an anonymous function to function itself doesn't have a name, which is assigned to a variable. 299 00:36:20.034 --> 00:36:27.923 Now, the nice thing about this is that again, if you're doing a reduction, let's say we'll see examples. You can give the Lambda as the. 300 00:36:28.199 --> 00:36:33.568 Function that does the reducing and the secret sauce. Compiler. 301 00:36:33.568 --> 00:36:46.858 When it sees this will insert the body of the lamb in place substitute in the actual arguments inserted in. Now, the function as a function doesn't exist anymore. It's in line, and now can optimize it. So that's good. 302 00:36:48.778 --> 00:37:03.623 Here's a 4th thing called a placeholder notation and this is again this is crazy stuff of C plus plus. And I don't think we already screwed up. Actually anticipated this when he lay down the original rules. So transform is 1 of these things. 303 00:37:03.653 --> 00:37:18.443 It does a map and the map reduce idea transform will take a factor and we'll create a new vector, which has every element transformed with take a pair of factors like transform and just the dot dot dot dot it would take to input factors. 304 00:37:18.443 --> 00:37:26.994 Would add them element element and the last argument to transform would be the reduction function. So it could be the add that I described before. 305 00:37:27.239 --> 00:37:33.748 Well, here's an even briefer way. I would just say, underscore 1 plus underscore 2. 306 00:37:33.748 --> 00:37:38.309 And that's an in line and autonomous function. Really, really short. 307 00:37:38.309 --> 00:37:42.358 Now, what makes that work. 308 00:37:43.679 --> 00:37:48.719 Underscore 1 is to find in a header file as a. 309 00:37:48.719 --> 00:38:02.759 Variable of some of a new class and the plus is overloaded. So when it sees variables of that class, it doesn't do an immediate edition. What it creates. It is a function. 310 00:38:02.759 --> 00:38:06.028 So this is functional. 311 00:38:06.028 --> 00:38:12.900 Functions on function sort of thing. So this makes it so underscore 1 underscore 2 their. 312 00:38:12.900 --> 00:38:23.574 Their variables there that already exist in this class, they've already been predefined and plus has been overloaded to operate on those variables to return a function. 313 00:38:23.574 --> 00:38:28.074 I mean, these overloaded operators can return any class that you wish. 314 00:38:28.349 --> 00:38:32.280 But you define them to return and this returns that. 315 00:38:32.934 --> 00:38:47.034 Now, the thing is, this works only for predefined operators so plus minus times divide and so on. So the operators have to be predefined to be overloaded with this place or polar notation. But that's that's as brief as possible. 316 00:38:47.309 --> 00:38:52.500 Okay, and again it compiles efficiently. Well. 317 00:38:52.500 --> 00:38:59.010 The compilation a be slow, but the code that compiler produces runs really? 318 00:38:59.010 --> 00:39:05.519 Past that's what I like. Okay. So how are these things used? So we got thrust. 319 00:39:05.605 --> 00:39:08.364 Api just remind you back end could be sequential. 320 00:39:08.364 --> 00:39:22.704 Based code could be open and pay could be entails threading building blocks threading building blocks is an Intel product, which is a competitor to open and it might be sometimes faster than open and people that it runs. Only on the Intel. 321 00:39:23.010 --> 00:39:36.750 Because entail sort of paces some of the open products sometimes it better. Sometimes they're worse. They copy from each other and in the back end could be cool. So this functional programming philosophy and I'll show you. 322 00:39:36.750 --> 00:39:41.309 Examples and uses these overload and things. 323 00:39:41.309 --> 00:39:48.449 Okay, and I copied some of this stuff over from last time since we didn't cover it last time. 324 00:39:49.554 --> 00:40:02.545 I did mention last time that there's like, 3 locations for this. Okay. So we'll do stanford's 8 now, which has a lot of content. This will actually be introduce. We saw a little quick introduction to thrust before. 325 00:40:02.934 --> 00:40:06.054 This will be a really big introduction. 326 00:40:06.389 --> 00:40:12.539 Okay, I'm just a 2nd, here I have to share. 327 00:40:12.539 --> 00:40:15.840 I'm experimenting with a sharing. 328 00:40:15.840 --> 00:40:18.900 Of window by window instead of the whole thing. 329 00:40:21.510 --> 00:40:25.829 Hello. 330 00:40:28.949 --> 00:40:39.150 The 2nd, here. 331 00:40:41.099 --> 00:40:46.320 That makes good. 332 00:40:46.320 --> 00:40:51.269 Oh, great. I may have that. Shelly. 333 00:40:51.269 --> 00:40:55.170 Mess something up here. 334 00:40:57.630 --> 00:41:04.079 Silence. 335 00:41:04.079 --> 00:41:18.809 Silence. 336 00:41:40.110 --> 00:41:43.199 Silence. 337 00:41:44.369 --> 00:41:49.170 I'm having difficulty sharing the screen. That's what's slowing me down for a minute here. 338 00:41:49.170 --> 00:42:09.840 Silence. 339 00:42:33.090 --> 00:42:37.889 Wait a minute here. 340 00:42:49.559 --> 00:42:53.130 Okay, this is getting weird. 341 00:42:53.130 --> 00:42:59.969 I'm thinking the only way I'm able be able to share my screen is to fire up Webex. 342 00:42:59.969 --> 00:43:03.840 In another window, this is totally weird, but. 343 00:43:05.400 --> 00:43:06.655 Let's see here, 344 00:43:21.175 --> 00:43:22.405 give me a minute here. 345 00:44:26.550 --> 00:44:30.510 Just a 2nd. 346 00:44:37.409 --> 00:44:48.630 Okay. 347 00:44:48.630 --> 00:44:52.860 Don't go away. 348 00:45:29.369 --> 00:45:32.730 And. 349 00:45:42.570 --> 00:45:48.000 Okay, if you can still hear me. 350 00:45:51.780 --> 00:45:54.989 Let me just check is. 351 00:45:56.369 --> 00:46:02.250 Can you still hear me anyone. 352 00:46:10.704 --> 00:46:18.985 Thank you Dan, the reason I said that is I had to kill my Webex session and start a new session and it wasn't clear to me that the audio. 353 00:46:19.230 --> 00:46:30.389 Would still work after I did that in any case. So now the theory is that you might be able to see the. 354 00:46:34.019 --> 00:46:43.710 You might be able to see the screen. Yeah, you can't good. Okay. So it says problems. Let me know. This is. 355 00:46:43.710 --> 00:46:55.559 To hate computers, that's just a joke. Okay so we saw this before. Just a quick rerun. You got to header code. You have only. 356 00:46:56.215 --> 00:47:07.135 2 data types, host factor and a device factor. They have their templated host factor of managers, and the obvious ways to initialize it. 357 00:47:07.644 --> 00:47:12.474 You have map functions generate, which apply. 358 00:47:13.105 --> 00:47:23.244 So, you default you input the vector as they begin and the end, and you give the function that applies to it says, generates obviously each element. 359 00:47:23.519 --> 00:47:27.420 You can have a device that you can initialize the host from the device. 360 00:47:27.420 --> 00:47:31.860 And I can't see anything. 361 00:47:31.860 --> 00:47:35.280 Interesting. Okay. Thank you. 362 00:47:35.280 --> 00:47:38.429 It says your share start off to share. 363 00:47:38.429 --> 00:47:42.630 Okay, yeah, but it's not disappointing anything that. 364 00:47:45.480 --> 00:47:50.099 Let me try again. 365 00:47:51.750 --> 00:48:03.329 Eva, can you see that now? 366 00:48:05.219 --> 00:48:15.960 Yeah, now we can. Okay. Thank you. I don't know what changed. I stopped sharing and started sharing again. Okay. Any case you saw we saw this before. 367 00:48:15.960 --> 00:48:19.530 Just a quick refund and you got there. Okay so. 368 00:48:19.530 --> 00:48:24.659 Um. 369 00:48:24.659 --> 00:48:27.780 And so this is just. 370 00:48:27.780 --> 00:48:37.440 Motherhood stuff, um, why it's good. We've seen this before generic programming means you can use different classes and so on. 371 00:48:37.440 --> 00:48:43.559 Okay, so what trust is now okay now more level detail it's a template library. 372 00:48:43.559 --> 00:48:49.800 Um, it's got the containers, the host and device Hector, and it's got Alvarez, some sort reduced and so on. 373 00:48:49.800 --> 00:48:58.050 The containers are, as I just said, it's host and device factors and. 374 00:48:58.050 --> 00:49:12.239 You basically want to make the template, some plain old data type in some floats and so on and you can assign individual elements as well as the whole things. So if you access the device vector element in the host, it's going to do some background copy. 375 00:49:12.239 --> 00:49:17.880 And it's and when you go outside to containing block, the things get freed. 376 00:49:17.880 --> 00:49:20.940 So, like. 377 00:49:20.940 --> 00:49:27.780 Alice listens on, but I'll trust just has 2 factors. So nothing new there. 378 00:49:29.965 --> 00:49:41.545 Okay, so the big thing is so you have a vector and you got to pointer to the start and a pointer to 1 plus 1 past the end and you need reference iterated to get at the element. 379 00:49:41.844 --> 00:49:49.105 And you add 1 to the which goes to the next element, regardless of how big the element is. So this is probably pretty familiar to people. 380 00:49:49.380 --> 00:49:58.710 Okay, now, in this slide, we're seeing the pointing to actual elements and memory, but in fact, they could be. 381 00:49:58.710 --> 00:50:07.019 The integrator could be some fancy type. You D, reference that you get something it doesn't have to be pointing to actual real memory. 382 00:50:07.019 --> 00:50:20.460 And is a pointer, it's a pointer with stuff added to it. So you can subtract to and you get the number of elements, not the size and bites the number of elements for length here. 383 00:50:20.460 --> 00:50:27.150 And you can add, so if you add 3 to a, you get 3 elements down the line, regardless of how big the elements are. 384 00:50:27.150 --> 00:50:36.809 But the thing is, you can do reference generators and you get elements. So so your reference began. 385 00:50:36.809 --> 00:50:45.269 You get the 1st, element of the factor on and that sort of thing. You, I'd want to begin. It's now pointing to the 2nd element, and you could. 386 00:50:45.269 --> 00:50:48.599 And it's to okay. 387 00:50:48.599 --> 00:50:52.500 It's the underlying classes writable. It may not be. 388 00:50:52.500 --> 00:50:56.280 So are quite similar to pointers. 389 00:51:02.155 --> 00:51:12.744 Well, what's happening here is we have a host factor of a 1000 elements. We load it with random elements as a function ran that we're not talking about. You can copy it to the device. 390 00:51:13.105 --> 00:51:20.364 You can do a reduction on the device, or you can do a reduction on the host and now here's okay this dispatch idea here. 391 00:51:21.114 --> 00:51:35.695 Look at the last 2 lines of code, so the reduced function, it's overloaded and at compile time, it dispatches 2 different versions of reduced, depending on the class, depending on the type of its arguments. 392 00:51:36.085 --> 00:51:48.625 And so if it's a host factor, it dispatches to host version of reduce if it's a device factor, it dispatches to compiles and device version of reduce. And the decision is made by the compiler. 393 00:51:48.929 --> 00:51:57.420 So, there's no overhead at runtime because host back and vice Packer different now unified memory of course, is sort of. 394 00:51:57.420 --> 00:52:03.539 Get slightly different, but and now the thing is, if you want to convert a 1 of these. 395 00:52:03.539 --> 00:52:09.690 Generators to a raw pointer, this action of an official way to do it, because it's a device vector. You want to. 396 00:52:09.690 --> 00:52:21.480 So, sorter, what's what's happening on here is maybe you want to have traditional, could have code. Like we've seen for the last few weeks, and you want to combine it with thrust. 397 00:52:21.480 --> 00:52:32.130 So you want to access a thrust vector, say device factor inside your normal CUDA, colder inside a kernel. So you have to basically the, the. 398 00:52:32.130 --> 00:52:46.885 The device factor inside side, it's called extra stuff, wrapped around bookkeeping stuff that you have to get rid of and so the raw pointer cache takes the device factor, pass it to a simple point, or removes this bookkeeping stuff. Now you can pass it into a colonel. 399 00:52:47.159 --> 00:52:55.409 On the device, and you can go back and forth as code and so on if you want. 400 00:52:55.409 --> 00:53:02.699 Skip that name space as well everyone uses name spaces. Nothing interesting. There. 401 00:53:03.960 --> 00:53:07.739 Nothing interesting containers that are is nothing interesting here. 402 00:53:07.739 --> 00:53:11.909 Starts getting interesting now here. 403 00:53:11.909 --> 00:53:21.059 Okay, what's happening on this slide is we're starting to talk about overloading the parenthesis operator. 404 00:53:21.059 --> 00:53:29.250 So we have a class T, well, it's a, it's a variables in the template. 405 00:53:29.250 --> 00:53:32.639 And we're defining add here. 406 00:53:32.639 --> 00:53:37.199 So, ad is defined for class T. 407 00:53:37.199 --> 00:53:48.420 And again, just a recap, which happens with the template mechanism and C. plus plus is the compiler will generate a different version of ad for each. T. 408 00:53:48.420 --> 00:53:52.769 So, if this code is the. 409 00:53:52.769 --> 00:54:07.375 If you do add event, you'll get into a version of ad, compile code generated and compiled. If you say, add angled brackets float, you get a float version of ad code is generated and compiled 2 separate blocks of code are compiled as it shows down here. 410 00:54:07.375 --> 00:54:10.224 So, now with add down here. 411 00:54:11.454 --> 00:54:15.144 It's it's, 412 00:54:15.144 --> 00:54:20.905 you're explicitly putting the template in so that says add of the information of ad now, 413 00:54:20.905 --> 00:54:25.644 normally the compiler might try to infer it as you see, 414 00:54:25.644 --> 00:54:26.664 add of X and Y, 415 00:54:26.664 --> 00:54:30.085 X and Y or and then that's the best match and C. 416 00:54:30.085 --> 00:54:35.425 plus plus, and fairly complicated rules as to what would be the best match to infer. 417 00:54:35.820 --> 00:54:39.000 The template value, which. 418 00:54:39.000 --> 00:54:52.650 Type of ad to compile to compile into the code, or you can be explicit. There. You say, add event and down show sat and get what makes us inference. Complicated is suppose you call ad with 1 and 1 fluid argument. Let's say. 419 00:54:52.650 --> 00:54:58.199 Okay, here is. 420 00:54:58.199 --> 00:55:02.159 Overloading the apprentices operator for. 421 00:55:02.159 --> 00:55:13.530 For class ad and again, class ad parameter tries it's got a template key. So we say operator Opera, you can have operator plus minus years. 422 00:55:13.530 --> 00:55:19.829 Operator, parentheses, it's a public member of ad and it. 423 00:55:19.829 --> 00:55:34.590 So, it's a function, it's a function inside, add the public member bad. It's a function and the functions name is operator parentheses takes 2 arguments a, and B, they're both teas and returns at T. it could return anything it wants. It doesn't have to return a T. 424 00:55:34.590 --> 00:55:42.809 Return something differently than your code starts looking odd, but yeah. Okay. So just returns a plus. B. so now. 425 00:55:44.309 --> 00:55:57.204 And what you can do is is this gets very interesting. There's a lot of content on this slide. So we have had the class at this template. 426 00:55:57.414 --> 00:56:03.025 And in the class ad, we've over, we've defined your overload operator practices now, down here. 427 00:56:03.300 --> 00:56:11.280 I don't know how well, my, my mouse cursor is visible, but we say, add and f*** f**. 428 00:56:11.280 --> 00:56:16.920 Is a variable it's a variable of class of type. 429 00:56:16.920 --> 00:56:20.849 Add in so add. 430 00:56:20.849 --> 00:56:28.650 It's a type name and funk is it's important to understand. Funk is not a function. F*** is a variable. 431 00:56:28.650 --> 00:56:35.039 And it's a variable of type add and now. 432 00:56:35.815 --> 00:56:47.275 But this variable as the purchase offer so it's so if add here had local members, that were variables, we could assign values to this simple case. 433 00:56:47.275 --> 00:56:57.534 It does not, but we could have a local member of ad, say, full integer food and then we could say dot 4 equals 5 and we would assign to a member of. 434 00:56:57.780 --> 00:57:05.579 Funk, but the only member of phone now is that overloaded friend. So now we can say, f*** of X and Y. 435 00:57:05.579 --> 00:57:08.730 Now, the syntax, so that is complicated. 436 00:57:08.730 --> 00:57:21.300 Phone is a variable, but the variable class has overloaded parentheses. So we say f*** of X and Y, it calls that overloaded version of parentheses. 437 00:57:21.300 --> 00:57:34.679 So, dads, X and Y, in return. See, if I go down to another block of code, it would have to be a different block. We say, add flow to folks. So, f*** is very. So, in this case is a variable of type ad float. 438 00:57:34.679 --> 00:57:39.750 And if we say, parentheses, that calls the overloaded. 439 00:57:39.750 --> 00:57:45.150 Function operator parentheses, operator on funk and. 440 00:57:45.150 --> 00:57:59.039 Does the ad so, the thing is that the syntax is surprising here it's funk is a variable which happens to have parentheses overloaded. 441 00:58:00.210 --> 00:58:10.289 Now, what makes this interesting is we can now use funk as an, as an argument inside reduction function calls and. 442 00:58:10.289 --> 00:58:18.780 When this is compiled, the compiler will insert the body of the overloaded function in line and optimize. 443 00:58:18.780 --> 00:58:28.554 So there is no overhead for this function call. So you can write. Really? You can use this idea of overloading operators to write code. 444 00:58:28.974 --> 00:58:40.315 And the code is very easy to read because you've overloaded operators hopefully, with natural concepts and make your code very reasonable. And then it compiles to run fast. Okay so we overloaded parentheses here. 445 00:58:40.619 --> 00:58:44.460 Now, how can we start using some of this. 446 00:58:45.780 --> 00:58:52.079 Um, let's see, what do we have happen here? 447 00:58:52.079 --> 00:59:04.710 Here we're defining a function transform, so we're not calling function. We're defining what the transform function would be. If we had to invent it. 448 00:59:04.710 --> 00:59:07.889 It's a function it has templates. 449 00:59:07.889 --> 00:59:11.099 2 arguments the type. 450 00:59:11.099 --> 00:59:14.760 Of the vector that we're going to transform. 451 00:59:14.760 --> 00:59:18.179 And the function that we're going to transform it with. 452 00:59:18.179 --> 00:59:21.750 And what transform is going to do. 453 00:59:21.750 --> 00:59:31.920 This particular version, it's going to take the size of the vectors. It's gonna take 2 vectors input factors, X, and Y, produce an output factor Z and it's going to use function. F. 454 00:59:32.969 --> 00:59:41.909 And this is what the code will look like, you could write the code. We just iterate over X and Y, we apply the function and we return to Z. 455 00:59:41.909 --> 00:59:45.690 Nice and simple. Okay. 456 00:59:46.920 --> 00:59:53.460 So so this 5, this is inside this. 457 00:59:53.460 --> 00:59:57.420 Well, this is defined locally. It's actually inside. 458 00:59:58.829 --> 01:00:03.269 Transform is actually has the template's argument. So, now what we can do is down here. 459 01:00:03.269 --> 01:00:10.199 There's a typo in the probe we should have a f*** here somewhere, but. 460 01:00:11.369 --> 01:00:17.550 Yeah, it says cold won't execute as you've done it. It's missing a little, but ignoring what it's missing we can now. 461 01:00:17.550 --> 01:00:22.980 Create a new variable funk it's a local variable of class add and. 462 01:00:22.980 --> 01:00:30.659 This is all inside the, this is a member function inside the previous thing. Actually inside ad event. 463 01:00:30.659 --> 01:00:34.650 So so now it's a member function. 464 01:00:34.650 --> 01:00:39.599 And now we can call. 465 01:00:39.599 --> 01:00:44.219 Oh, that's the old now we can call transform up here. 466 01:00:44.219 --> 01:00:47.820 Give it the arguments with funk as an argument. 467 01:00:47.820 --> 01:00:53.460 And again was the variable that had the parentheses overloaded. 468 01:00:53.460 --> 01:00:59.429 And this will call all the version of transform for. 469 01:00:59.429 --> 01:01:02.909 And this function, and this will do the transformation. 470 01:01:03.960 --> 01:01:07.949 Or we could do the version down here. Now, this is sort of crazy. 471 01:01:07.949 --> 01:01:13.530 Okay, the syntax of the last line of code here is weird. 472 01:01:16.380 --> 01:01:26.010 Okay, well, let me go to the 2nd, last line. So, 2nd, last line of code is okay, we had the local variable funk. 473 01:01:26.010 --> 01:01:32.400 And transform gets called with that local variable and then. 474 01:01:33.510 --> 01:01:36.809 Transform as this, and it uses to. 475 01:01:36.809 --> 01:01:44.039 Very, simply to execute stuff. Okay now this down to the trends. Okay. What's happening down here? 476 01:01:46.170 --> 01:01:51.750 Is ad event parentheses is not a function call. Exactly. 477 01:01:51.750 --> 01:01:56.130 Adam, and it's a type and. 478 01:01:57.840 --> 01:02:06.030 And at event with is constructing a default variable of class ad event. 479 01:02:06.030 --> 01:02:14.489 You could construct a variable by giving a tight name with some arguments and the arguments go into the constructor for that tight. 480 01:02:14.489 --> 01:02:23.010 So, there is no explicit constructor for this type. It goes falls back to a default, which takes no argument. 481 01:02:23.010 --> 01:02:27.809 So, add event with parentheses returns a variable. 482 01:02:27.809 --> 01:02:31.829 Returns a new variable of type ad of hand. 483 01:02:31.829 --> 01:02:41.519 It doesn't have a name and that anonymous variable of type add event goes into the transform function. 484 01:02:41.519 --> 01:02:54.269 Or, to use to do the transformations because in that new, because I know what the transform function does is it, I'll put parentheses, after it'll called apprentice operator on that variable of class ad event. 485 01:02:54.269 --> 01:03:01.139 I write this down actually, let me put it on the blog for Monday. So this syntax here. 486 01:03:01.585 --> 01:03:09.715 Is surprisingly complicated, but you have to understand what's happening. It's not calling at event. Really? As a function. 487 01:03:09.715 --> 01:03:17.364 It's constructing well, in a sense, but it's really it's constructing a new variable of class ad event and the instructor takes no arguments. 488 01:03:17.639 --> 01:03:22.380 That's what's happening there. Okay. Nice. And murky. 489 01:03:22.380 --> 01:03:25.949 Okay, um. 490 01:03:25.949 --> 01:03:33.989 So so, now that was some syntax stuff. Now more general what trust is doing we've got lots of. 491 01:03:33.989 --> 01:03:47.010 Functions transformations reductions prefix. Those are things like the intersections the scans sorting generic types reduction. scannings. 492 01:03:47.010 --> 01:03:54.360 You can, and you can provide the operator that does scanning and reduction provide your own operator. If you wish this has to be associative and commutative. 493 01:03:54.360 --> 01:03:58.469 Okay, um. 494 01:03:58.469 --> 01:04:06.719 This examples here, you device factor host factor, we can do a reduction, you give it to begin and end. 495 01:04:06.719 --> 01:04:09.929 And by default, it does, in addition. 496 01:04:09.929 --> 01:04:21.389 You can get explicit down here with the reduction. You give an initial value and you give the operator for the reduction. In this case. It's a plus on nse. Plus this is a predefined. 497 01:04:22.469 --> 01:04:29.250 Same thing with floats. If you just reduction, it will compile the float version because. 498 01:04:29.875 --> 01:04:42.474 Effect is a vector on float so this is compile time dispatching. You can be implicit explicit if you want, you could use other reduction operators touches maximum. 499 01:04:42.715 --> 01:04:45.324 This works because maximum is another built in. 500 01:04:45.630 --> 01:04:48.900 Function in it and thrust so you can. 501 01:04:48.900 --> 01:04:53.159 Have fun with reduction. 502 01:04:53.159 --> 01:05:00.659 Here's a more complicated example of transform. 503 01:05:01.920 --> 01:05:06.150 What this will do is it will negate a, um. 504 01:05:07.260 --> 01:05:12.030 No gate a 2 vector like, essentially to get a complex number, you might say. 505 01:05:12.030 --> 01:05:20.820 And what's happening here is that float 2 is a built in data type, which is. 506 01:05:20.820 --> 01:05:28.889 2 floats, it's not a vector because you see, you're addressing it by component dot X and Y, so it's an array. 507 01:05:28.889 --> 01:05:39.449 Of 2 floats or a structure of 2 floats that access by components named X and Y, and what this is doing, it's overloading the apprentices operator to. 508 01:05:39.449 --> 01:05:52.170 Negate the 2 components and make float to a lot of functions that are make underscore type name. And this will make this will construct a new float 2 with those arguments. 509 01:05:52.170 --> 01:06:00.659 I don't honestly know why these functions exists. You could just call, you know, make them part of the constructors. I don't know why this exists, but oh, well. 510 01:06:00.659 --> 01:06:03.989 In any case, so we have a structure. 511 01:06:03.989 --> 01:06:17.369 And the structure overloads, the apprentices operator, the host device means that this will 2 versions of this are compiled 1 for the host. And 1 for the device you have to say that. 512 01:06:17.369 --> 01:06:21.869 Then we have a host and device factory to float to. 513 01:06:21.869 --> 01:06:29.519 And now we create a function, we create a variable of class negate float to. 514 01:06:29.519 --> 01:06:34.320 By the way the only difference in instruct and class. 515 01:06:34.320 --> 01:06:44.789 And C, plus plus, is that instruct the members by default are public in class members by default are private and if you want to override that, you have to say it. 516 01:06:45.864 --> 01:07:00.025 Okay, so create a variable of class negate flow 2 and this variable now have parentheses overloaded, which will return indication and now we do a transform and this transform takes only 4 arguments. 517 01:07:00.295 --> 01:07:06.414 It takes to begin an end for the input factor. It takes it to the beginning of the output factor. 518 01:07:06.840 --> 01:07:10.260 And it assumes that the output is the same length as the input. 519 01:07:10.585 --> 01:07:22.045 And then we have the function, so this is showing nice, simple notation, this functional location. Now before we, we talked about a transform that combined 2 input factors to make an output factor. But that had more arguments. 520 01:07:22.045 --> 01:07:34.135 So transformed the compiler to some fairly complicated dispatching based on the number and class of the arguments. So, this says for arguments that as theaters and simple variable. 521 01:07:34.409 --> 01:07:39.449 The, if it had 4 in a simple variable, then it would be. 522 01:07:39.449 --> 01:07:43.230 Combining 2 input factors to make an output factor. 523 01:07:43.230 --> 01:07:46.980 And all the different combinations are listed in the. 524 01:07:46.980 --> 01:07:57.510 Detailed documentation, so what's new on this slide is we're overloading the apprentice operating to do something a little more complicated in this case in a gate. A. 525 01:07:57.510 --> 01:08:07.230 I pair up to element now, by the way, the reason you want to have a structure with 2 elements, instead of saying, 2 vector of 2 elements is a vector of overhead. 526 01:08:07.230 --> 01:08:10.559 And vectors of vector start getting horrible. So. 527 01:08:10.559 --> 01:08:16.590 If you're going to have a lot of short factors, don't use short factors, make them structures like this with explicit elements. 528 01:08:16.590 --> 01:08:26.760 Okay, so here's another example of overloading the apprentices operator. It's overloading it to do a compare. 529 01:08:26.760 --> 01:08:33.420 So this so have a structure compare float to again, just like a class or a type, except. 530 01:08:33.420 --> 01:08:41.760 And member for public, and the only thing in this truck is to overload, it's got no members with data in them just the local data. 531 01:08:41.760 --> 01:08:53.520 It overloads apprentice operator to take 2 floats and return a bull. Like I said, the operators can return whatever type you choose. So this returns a bull and does a comparison on the ex component. 532 01:08:54.600 --> 01:09:09.475 And now you can do a sort down here. So now we create a variable of this class compare flow to and we call sort beginning and a function, which does comparison. And the sort assumes that this function will. 533 01:09:10.439 --> 01:09:21.210 Take 2 float twos and return. Well, but again, this is a variable. Okay not a function, but it's a variable with friends overloaded and so inside sort and it will take this. 534 01:09:21.210 --> 01:09:29.250 And put parentheses after it and execute it, and it will execute it as a function. It will okay. Call it. 535 01:09:29.250 --> 01:09:38.520 And again, it all runs fast. So this is showing why you like, why you want to work with short. 536 01:09:38.520 --> 01:09:50.789 Functions like this now, this example, these 2 examples here could be done much better with placeholder notation. So here you'd say underscore 1 dot X lesson underscore 2 X. 537 01:09:50.789 --> 01:09:57.630 For example, but these slides are actually written before place on rotation became common. 538 01:09:57.630 --> 01:10:09.989 If I give you go back a little here negation. This could also be written with a lab. The quicker way to do this is either with a Lambda or with placeholder notation. 539 01:10:11.670 --> 01:10:18.359 Okay, now here we're taking this overloaded friends and we're adding another twist to it. 540 01:10:18.359 --> 01:10:28.319 And this is another cool thing. This slide is illustrating the concept of a closure. So closure in theoretical computer science, as we take. 541 01:10:28.319 --> 01:10:40.109 A function, and we combine we take some local information and we produce a new object, which is the function of the local information combined. 542 01:10:40.109 --> 01:10:44.010 And the idea is that. 543 01:10:44.010 --> 01:10:51.779 We have some extra state that the function wants to use, but we don't want to. 544 01:10:51.779 --> 01:10:56.909 I have to put it as an explicit argument every time. Oops, I'm sorry about that. 545 01:10:56.909 --> 01:11:04.529 Where where we now operators with. 546 01:11:04.529 --> 01:11:10.289 Date okay. What's happening here we've got a structure is greater than. 547 01:11:10.289 --> 01:11:14.909 And Scott, the overloaded Francis, it's, um. 548 01:11:14.909 --> 01:11:19.409 It's got a couple of things in here. It overloads parenthesis. 549 01:11:19.409 --> 01:11:25.920 It's got 2 other things. It actually has a local member variable threshold. It's a manager. 550 01:11:25.920 --> 01:11:29.489 So, if we create variables of this class. 551 01:11:29.489 --> 01:11:36.420 We're going to have the apprentices overloaded. We also have a local component. You can access by saying dot threshold. 552 01:11:36.420 --> 01:11:39.420 It's got a 3rd thing. 553 01:11:39.420 --> 01:11:42.869 It's got a, um. 554 01:11:46.350 --> 01:11:49.979 Oh, the 3rd thing, it's got a constructor. 555 01:11:49.979 --> 01:11:58.560 So, again, I mean, I don't know, in the class always a question here and host device. What is the host device? Instruct. 556 01:11:58.560 --> 01:12:04.859 The host device says that that code will get compiled twice once to run on. The host wants to run on the device. 557 01:12:04.859 --> 01:12:09.539 The problem is that. 558 01:12:11.159 --> 01:12:18.960 Well, you're doing the reduction on the host to host version of that. You're doing a reduction on the device you need a device version of that. 559 01:12:18.960 --> 01:12:24.029 And it's weird basically. 560 01:12:24.029 --> 01:12:27.960 My executive summary is just it's magic do it. 561 01:12:27.960 --> 01:12:31.380 That an answer. Okay. 562 01:12:32.579 --> 01:12:36.960 Um, but I said the detailed thing is to compile it. 563 01:12:36.960 --> 01:12:43.890 Because that whenever you've got code, you have to set in your program, you have to say. 564 01:12:45.060 --> 01:12:50.369 The compiler has to know where the code's going to get executed. Now, by the way the latest version of it. 565 01:12:50.369 --> 01:12:55.229 Of CUDA and and video C. plus plus compiler. 566 01:12:55.229 --> 01:13:00.449 And g, what's it called? Plus I can't remember the name now. 567 01:13:00.449 --> 01:13:06.779 N. G. C. plus plus I guess, does some of this automatically? But that's newer than these slides. 568 01:13:06.779 --> 01:13:10.350 Okay, so. 569 01:13:11.430 --> 01:13:17.399 In any case so now, okay, what we have this line here is greater than event. That's. 570 01:13:17.399 --> 01:13:23.159 That's a non default constructor. So this constructor takes an argument. 571 01:13:23.159 --> 01:13:29.460 In tea, and what it does is it can fuck's a variable, but it stores. 572 01:13:29.460 --> 01:13:34.439 The argument of the constructor in the local variable threshold. So. 573 01:13:34.439 --> 01:13:40.020 So, now, if we construct variables of class is greater than. 574 01:13:40.020 --> 01:13:54.000 They'll actually have a value threshold. They not only have overloaded, the princess operate and have a value. So now this means that you can construct a multiple is greater than variables all with their own values for threshold. 575 01:13:54.000 --> 01:13:58.020 So. 576 01:13:58.020 --> 01:14:01.949 So, now, what's happening is that. 577 01:14:01.949 --> 01:14:10.050 We do the operator, the apprentices operator, and it's going to compare the argument against this stored threshold. So. 578 01:14:10.050 --> 01:14:21.204 So this is a powerful idea here we can create many is greater than variables of classes, greater than each variable has a local threshold stored inside it that was stored inside when the variable was constructed. 579 01:14:21.715 --> 01:14:29.635 And now, when we use the variable as a function, it will use the threshold that was stored when the variable was created. This is a powerful idea. 580 01:14:29.939 --> 01:14:37.170 And so we got it down here is greater than proud of 10. 581 01:14:37.170 --> 01:14:46.109 That is not a function call on Prem that is a construction that is constructing a local variable named Fred. 582 01:14:46.109 --> 01:14:54.149 Of class is greater than and the constructor had an argument 10 so threshold will be 10. 583 01:14:55.439 --> 01:15:04.350 By the way that line there illustrates why writing C. plus plus compilers are writing C. plus plus parsers. 584 01:15:04.350 --> 01:15:07.829 Is a little tricky, um. 585 01:15:07.829 --> 01:15:12.210 You know, how much information do you have to know that? So that's not a function call on Fred. 586 01:15:12.210 --> 01:15:22.859 So, and in fact, is actually, so C, plus plus considered as a formal grammar is actually context sensitive. 587 01:15:22.859 --> 01:15:27.000 And and actually in 1 place is ambiguous. So. 588 01:15:27.000 --> 01:15:32.250 Because the lexical category of various tokens. 589 01:15:32.250 --> 01:15:44.010 Are these semantic class? Various tokens depends on the context. Okay. In any case. So credit is a local variable of class that's greater than. And Fred dot threshold is 10. 590 01:15:44.574 --> 01:15:59.064 So, now we call Pratt as a function, it's going to test if his argument is greater than 10. you see, we have to use this information. I said, 1 point about closures is we can wrap some information inside the function. So we don't have to. 591 01:15:59.369 --> 01:16:03.000 Give it as a as extra arguments. 592 01:16:03.000 --> 01:16:13.199 A reason that that's important is that the function may be used in a place where we cannot add extra arguments. Like, we have things like count if that's another. 593 01:16:13.199 --> 01:16:17.430 Thrust function the. 594 01:16:18.479 --> 01:16:23.640 You know, the interfaced account if it's Pre defined in the definition of thrust. 595 01:16:23.640 --> 01:16:33.750 And the definition of count f, says that our 3rd argument here is going to be called as a function with 1 argument an element of the vector. 596 01:16:33.750 --> 01:16:37.560 We don't get to say we want to call prad with an extra argument, which. 597 01:16:37.560 --> 01:16:42.689 Thresholds so we have to use this idea of a closure where we're storing. 598 01:16:42.689 --> 01:16:47.399 Internal state inside the local variable. At the time we constructed a local variable. 599 01:16:47.399 --> 01:16:56.579 I mean, you could change Sarah, you could say Fred dot threshold equals 13 if you wanted to. But the point is, is that local state inside the operator powerful idea. 600 01:16:56.579 --> 01:17:03.000 And again, so count, if you can guess what it does, it counts the number of elements of factor that are bigger than 10. 601 01:17:04.050 --> 01:17:08.310 Returns in it. Okay so. 602 01:17:08.310 --> 01:17:12.329 We get assorted algorithms. 603 01:17:12.329 --> 01:17:19.710 Statically okay, like reduce and radically dispatch the compiler decides which version to compile in. 604 01:17:19.710 --> 01:17:26.220 And the final point is here you have simple default versions reduce. 605 01:17:26.220 --> 01:17:29.489 By default, you reduce the vector it adds the elements. 606 01:17:29.489 --> 01:17:34.319 Or you can give the binary operator, whatever you like, user defined. 607 01:17:35.215 --> 01:17:44.965 You see, if you remember if you open MP opening, you couldn't have use, use it to find reduction operators. They had to be 1 of a built in set here can be user defined and does hash. 608 01:17:44.965 --> 01:17:51.234 And by the way, all this goes fast and parallel by the way, log in time and you give the initial value if you wanted. Okay. 609 01:17:51.539 --> 01:17:54.810 Now. 610 01:17:54.810 --> 01:17:58.079 You thought you understood? Ha, ha. 611 01:17:58.079 --> 01:18:02.760 We got I was careful to say. 612 01:18:02.760 --> 01:18:06.329 And enter a reference that you get a value. 613 01:18:06.329 --> 01:18:15.300 Now, this is where these entertainers are more powerful than pointers a pointer. It's an address. Okay. It's an address of of, of a value. 614 01:18:16.885 --> 01:18:23.395 Points to 10, it's the address of a 4 bites in memory that contain the value 10. 615 01:18:23.395 --> 01:18:31.975 let's say these are fancy underwriters that don't actually point to real memory but when you reference them, you get something useful. 616 01:18:32.279 --> 01:18:38.279 So, this the way this works is these are classes. 617 01:18:38.279 --> 01:18:42.720 And the referencing is overloaded. 618 01:18:42.720 --> 01:18:49.529 In a sense, such that well, we'll go through these, but we'll look at this meeting is somewhat obvious. Um. 619 01:18:49.529 --> 01:18:53.399 A constant points to a constant. 620 01:18:53.399 --> 01:18:57.869 3, let's say, and you do reference the so you might say. 621 01:18:58.255 --> 01:19:13.194 Why just use a 3 why do I have an integrated to a 3? Well, the point is that you increment the and still points to 3. so maybe you want to factor. That's all threes. Okay you could create a vector. 622 01:19:13.194 --> 01:19:27.654 That's all threes but that takes space. It takes time, why would you want to vector of all 3? So you want to go into the transform maybe you want to add 2 factors you want to add 3 to each element of a vector 1 way is to have another vector. That's all threes. 623 01:19:27.685 --> 01:19:35.154 And add this and add it. So how do you get a factor of all 3? Use the constant? So the vector of all threes never exists. 624 01:19:35.399 --> 01:19:39.329 All that exists is a narrator that every time you. 625 01:19:39.329 --> 01:19:42.449 Your reference and you get to 3, and every time you incremental. 626 01:19:42.449 --> 01:19:45.810 You still get a, an indicator that the references to a 3. 627 01:19:46.859 --> 01:19:58.619 The counting, it counts up so 12345 and so on. So, you to reference you get an incremental reference if you get the next number. So. 628 01:19:58.619 --> 01:20:13.555 If you want a vector of indices, let's say, you create an accounting enter and again the vector never exists. It says you de, reference accounting iterate or an incremental you'd get elements of the factory. 629 01:20:13.555 --> 01:20:19.704 It's a lazy evaluation. So, you get elements of the vector the vector never exists. Its elements are returned as needed. 630 01:20:19.949 --> 01:20:23.609 The. 631 01:20:23.609 --> 01:20:28.470 Takes a vector and transforms returns transformations of the elements. 632 01:20:28.470 --> 01:20:34.560 And again, it doesn't construct a new vector just returns transformed elements as you need them. 633 01:20:34.560 --> 01:20:43.590 So, we don't have the overhead of the 2nd factor stored. The permutation editor does what you might think it permits. 634 01:20:43.590 --> 01:20:53.310 A vector you want to do random access I'd say. So randomly access is the whole vector in parallel and from you, that's a sort of thing. If you're inside 1 warp. 635 01:20:53.310 --> 01:20:59.220 In permutations inside a warfare actually quite fast. 636 01:20:59.220 --> 01:21:06.239 It got hardware to do that permutations between, across warps are another matter, but you may need them. 637 01:21:06.239 --> 01:21:14.310 Um, and is it better 8 or what that will do is it will take a couple of scale, or. 638 01:21:14.310 --> 01:21:24.659 bactericide and return pairs of of corresponding elements you may have a vector of axis, another vector wise and other factors. These. 639 01:21:24.659 --> 01:21:28.140 And you want to consider them as 3 dimensional points. 640 01:21:28.140 --> 01:21:33.869 So the zip well, March and step down the X Y and Z vectors and return triples. 641 01:21:33.869 --> 01:21:46.710 Pointers why you have the 3 separate factors instead of a vector of triples directly is the factor of triples is not as efficient. And could I remember you got your threads included you want. 642 01:21:46.710 --> 01:21:55.920 Consecutive threads to be accessing consecutive elements from the global memory. So you do not want to have the vector of cripples because consecutive. 643 01:21:55.920 --> 01:22:00.359 Elements it messes with the cash. The cash is not. 644 01:22:00.359 --> 01:22:05.670 Is not efficiency you starting 3 points in coda you want to have. 645 01:22:05.670 --> 01:22:15.600 What's called an array a structure of a raise array of X, Ray Y, rates and the structure of that but does it help to work with that? Okay. 646 01:22:15.600 --> 01:22:30.029 This is a good point to stop now and we all want to get lunch. I'm up through slide 24 on Monday. I'll start with 25. Oh, another question. I just we're calling it on the host here. 647 01:22:30.029 --> 01:22:34.319 Yes, um, in this case. 648 01:22:34.319 --> 01:22:39.359 Yeah, but you may you may be doing it on the device. It let me scroll back. 649 01:22:39.359 --> 01:22:43.470 Several pages I just saw isaac's question. 650 01:22:47.039 --> 01:22:57.210 No, we're calling it on the device here because you see vector is a device factor. So the count, if is executing on the device. 651 01:22:58.350 --> 01:23:03.869 Well, I may have not been explicit enough, these belt and functions and thrust. 652 01:23:03.869 --> 01:23:06.869 If the data is on the device, they run on the device. 653 01:23:07.979 --> 01:23:14.609 Their dispatch, so this is part of the power of thrust. You've got the code looks the same. You give it. 654 01:23:14.609 --> 01:23:18.239 Post data, it runs on the host, you give it. 655 01:23:18.239 --> 01:23:25.140 Device data and runs on the device. That's what's cool about it. And running on. The device is very fast. 656 01:23:25.140 --> 01:23:31.500 If you get a mixture of host and device, don't ask me, you can try it and tell me what happens. So, no, this runs on the device here. 657 01:23:33.060 --> 01:23:46.439 Because vector is a device factor if you're way, way way ahead of me, you could force it to run on the host and copies the data back. But let's not talk about that. 658 01:23:46.439 --> 01:23:50.039 Okay, so. 659 01:23:52.914 --> 01:24:07.255 That's enough new stuff for today. So we're seeing today apart from 3. really nice talks is more of thrust. So well, 1st, we saw different ways. You can do functions in C. plus plus you can overload the apprentice operate, or you can. 660 01:24:07.770 --> 01:24:16.890 Have well, the old way, just having a pointed to a function you can have BLAM does, which are nice because they're local variables. 661 01:24:17.635 --> 01:24:31.494 And you can have now doing a landed around on a device, and we'll have to talk about later and you can use play solo notation, which is a cool thing. And it's also an scl by the way. I don't know if this yeah. It's in boost. Actually. 662 01:24:31.494 --> 01:24:33.265 What boost is like a. 663 01:24:33.630 --> 01:24:40.015 10 8 an extension to s. T. L. I don't think so. Okay. You're standard template library. 664 01:24:40.284 --> 01:24:54.055 There's something called boost, which is an extension of and it's things that are basically being tested out to see if they're worthy of being promoted to sales. So a lot of cool stuff and boost boost dot org. Okay. So we saw that. 665 01:24:54.055 --> 01:24:56.215 Now we saw more detailed stuff with. 666 01:24:56.520 --> 01:25:05.100 In thrust, and we'll start seeing next about some non trivial things you can do and thrust with this basic set of tools. 667 01:25:05.100 --> 01:25:11.760 Okay, there is the RPI spring town meeting at 3. 668 01:25:11.760 --> 01:25:20.460 I guess other than that, enjoy the good weather, I think skiing is a little sparse perhaps this weekend. 669 01:25:20.460 --> 01:25:26.579 And see you Monday, if there's no other questions. 670 01:25:26.579 --> 01:25:32.010 It's not worth storing the chat window. Actually I don't think so. 671 01:25:33.750 --> 01:25:37.409 Just a quick question. Sure. Quest away. 672 01:25:38.939 --> 01:25:46.409 Is I don't see the the homework 5 or? It hasn't been post that? I don't think. 673 01:25:46.409 --> 01:25:54.479 The submission thing. Oh, okay. Thank you. I'll update. Great scope then. Cool. Thank you. Okay. 674 01:25:54.479 --> 01:25:58.409 Else okay. 675 01:25:58.409 --> 01:26:04.050 Good bye.