WEBVTT 1 00:08:00.598 --> 00:08:10.168 Okay, can people hear me now? Great. I give up on using the computer for audio. Thank you. 2 00:08:11.699 --> 00:08:18.119 So, it should be oh, so as before. 3 00:08:18.119 --> 00:08:21.538 I have the chat window open. 4 00:08:21.538 --> 00:08:26.728 And you should be on your screen, you should be seeing. 5 00:08:26.728 --> 00:08:30.238 The browser window. 6 00:08:31.379 --> 00:08:39.328 So, and feel free great and feel free to there's only 10 people in class speak up. 7 00:08:39.328 --> 00:08:51.778 with some type questions on chat speak up and so on first first just picks up you'll have to see if everyone else can hear you because again . 8 00:08:51.778 --> 00:08:56.489 The hardware is a little iffy. Some bookkeeping stop. 9 00:08:56.754 --> 00:09:02.004 I'm using Webex instead of what Webex meetings Webex teams that rename to Webex. 10 00:09:02.333 --> 00:09:12.894 The reason I'm not using Webex teams for the class is that I can't figure out how to display the chat window simultaneously with a part of my screen. And I've actually spent. 11 00:09:13.198 --> 00:09:20.938 Too much time trying to figure out how to do it. If anyone can figure that out. Otherwise we'll use the meet. 12 00:09:20.938 --> 00:09:24.538 For the classes that works fine and. 13 00:09:26.249 --> 00:09:33.298 And then we'll have a Webex teams thing, set up for people to talk and post questions and so on. 14 00:09:33.298 --> 00:09:42.568 I'm posting the videos in usual place media site. I can put up the link if you want. 15 00:09:42.568 --> 00:09:53.399 Things like transcripts and so on I have a files directory here. It's in the top bar now. I just added it this morning and it has chats. 16 00:09:54.629 --> 00:10:02.009 I guess I just did chats and scanned transcripts to the extent that they are so that. 17 00:10:03.028 --> 00:10:11.303 Will help there what I'm going to do today is finish off the Lawrence Livermore tutorial quickly. It's fairly low level, but it's a good introduction. 18 00:10:11.874 --> 00:10:24.323 And then some questions we can think about, and then I'll talk about my researches for a 2nd, or 2, and then about more about the parallel machine and I've given you all accounts on that. 19 00:10:24.354 --> 00:10:36.953 So, so, during this class, in fact, you're welcome to share your screen. I can help you log in if you want during the class, the class might be actually a lab in a set and then we'll get into open M. P. a little talk about that. 20 00:10:37.288 --> 00:10:46.708 Now, 1 thing for the homework on Monday, remember I get a chance to shut up and listen to you guys talk. So. 21 00:10:46.708 --> 00:11:01.109 I've given a large set of possible topics about half of you have emailed me with what you'd like to talk about and for everyone who did you got your 1st choice? So, those of you who haven't yet emailed me pick a couple of topics. 22 00:11:01.109 --> 00:11:08.249 Ranked order email me and I'll give you the 1st, topic on your list of people have not already. 23 00:11:08.249 --> 00:11:12.928 So, at a. 24 00:11:12.928 --> 00:11:21.178 Back up to quickly do this again. It's somewhat. 25 00:11:21.178 --> 00:11:31.438 Well, but Tom is a sense of programming models. Okay this here. 26 00:11:31.438 --> 00:11:36.869 There's a lot of meat in this thing here right here. 27 00:11:36.869 --> 00:11:41.068 At what level do you have sharing and in. 28 00:11:41.068 --> 00:11:47.519 In your parallel computing, you may share the memory, you may not share the memory. 29 00:11:47.519 --> 00:11:52.889 You may have full pallet processes. You may have like, what things called threads and. 30 00:11:52.889 --> 00:11:56.369 It's just sort of motherhood thing, but it's worth. 31 00:11:56.369 --> 00:12:03.839 Talking about and again, the thing with shared memory, it makes programming nice but the idea doesn't scale up to thousands of them. 32 00:12:04.918 --> 00:12:13.739 Thousands of processors do to do which model shared memory problem with sharing is keeping things consistent. 33 00:12:13.739 --> 00:12:18.688 Thread to model. 34 00:12:18.688 --> 00:12:21.808 A thread it's. 35 00:12:21.808 --> 00:12:26.818 It's a local context, plus a program calendar it's executing. 36 00:12:26.818 --> 00:12:30.749 The difference between threads and the process say. 37 00:12:30.749 --> 00:12:43.918 In something like Linux is that a process as a separate memory space and so on or threads often say, share the memory space. 38 00:12:43.918 --> 00:12:49.259 Wherever they were cloned from all they've got is their separate. 39 00:12:49.259 --> 00:12:57.149 May have some separate registers, perhaps in a separate program counter, and a separate thread ID and that's how the parents would keep themselves separate. 40 00:12:57.149 --> 00:13:06.479 From each other, how they could write into different memory, they would index with the thread ID. Perhaps other than that, they would share the memory. 41 00:13:06.479 --> 00:13:12.119 So, it's like, didn't think about a threat is it's cheaper to fire 1 up and to stop them. 42 00:13:12.119 --> 00:13:24.658 And we'll see this, by example, I'll just some standards and dimensions open, empty here, which I'm going to get into next good introduction to what open MP is. So, it's. 43 00:13:24.658 --> 00:13:29.578 It's been going on for a number of years. I mean, overnight successes don't happen. 44 00:13:29.578 --> 00:13:35.879 Overnight, so this is actually started give or take 20 years ago. 45 00:13:35.879 --> 00:13:40.678 And it's been widely used, so it has been. 46 00:13:41.543 --> 00:13:52.073 It's been extended based on past experience. I like packages that have been around for a while. I like to joke that I don't want to use a package until it's at least 10 years old. 47 00:13:52.614 --> 00:13:57.984 I want to see if it's going to stand around, stay around a little and to see that it's actively being used. 48 00:13:58.288 --> 00:14:06.208 And it's open empty pitch that it's more for shared memory Intel Z on. It's not so strong on. 49 00:14:06.208 --> 00:14:13.048 Working with yet and so that's why I'll be talking about other things later in the course. 50 00:14:13.048 --> 00:14:21.749 But it is multi platform C and port trend. It add some. We'll see. But it adds it adds some extra. 51 00:14:21.749 --> 00:14:29.698 Directives to U. C. C. plus plus program. This is fairly low level here. I'm going to go through a fast MTI. 52 00:14:29.698 --> 00:14:39.119 Take chris's course, make the computer science. Course my view of MTI, as I mentioned last time is, I think. 53 00:14:39.119 --> 00:14:44.458 Most problems will fit in to shared memories you don't have to go distributed, but. 54 00:14:44.458 --> 00:14:48.778 And did you do and nothing much. 55 00:14:48.778 --> 00:14:52.048 Interesting here. Um. 56 00:14:52.048 --> 00:14:55.048 So that's enough for that. You can browse had it a little. 57 00:14:55.048 --> 00:15:06.269 Okay, but just to review some things here again, machine cycle, suites, stopped increasing because of physics. And you can think about that. 58 00:15:06.269 --> 00:15:14.129 The various parallel technology hardware technologies that have come and gone Z on had. 59 00:15:14.129 --> 00:15:21.028 Something called Z on 5 for a few years. It was an accelerator card that plugged into your. 60 00:15:21.028 --> 00:15:27.538 It plugged into your CPU and I have I have a car I don't have it plugged in now and it had. 61 00:15:27.538 --> 00:15:32.759 The 1, I haven't had like 16 cores each corporate run for. 62 00:15:32.759 --> 00:15:39.208 For random embedded version of Linux so it was effectively as a. 63 00:15:39.208 --> 00:15:43.349 Process are plugged into the CPO that would. 64 00:15:43.349 --> 00:15:46.889 It had stripped down form of hardware to see on Fi. 65 00:15:48.293 --> 00:15:58.014 It was sort of running, like an intel's Dion, but this stripped out a lot of the Super scaler things that make a full grades and run faster. 66 00:15:58.583 --> 00:16:06.833 So they made the thing smaller, but for straight line code without conditionals is beyond a way to parallelize. A couple of 100 spreads, but. 67 00:16:07.109 --> 00:16:11.938 Intel dropped the project and about 2 years ago, and they're using what they learned and. 68 00:16:11.938 --> 00:16:15.749 They're either hardware. 69 00:16:16.134 --> 00:16:30.114 1 of the CSC departments advised external advisors actually worked on this at Intel, IBM, blue Jean project. They're super computers they have killed that 1 also and they took it as far as they could. 70 00:16:30.443 --> 00:16:39.683 I think the limiting problems with the blue Jean while it had many processes, these profit not with very much memory. 71 00:16:39.989 --> 00:16:43.828 The current thing, IBM is pushing are their. 72 00:16:43.828 --> 00:16:53.339 Work together with the video where they have their workstations, and then they plug and video cards and do their workstation. So that's the current IBM parallel push. 73 00:16:53.339 --> 00:16:57.479 What I do for parallel computing is. 74 00:16:57.479 --> 00:17:04.409 I do a parallel aggressive for geometry for cat and yes and. 75 00:17:04.409 --> 00:17:10.679 I got more information on my home page if you want, but I've been playing with parallel computing for. 76 00:17:10.679 --> 00:17:15.269 A while, and then I got interested in, I got disinterested and I got interested again. 77 00:17:15.269 --> 00:17:19.919 Okay now have a machine here. 78 00:17:19.919 --> 00:17:23.969 For you parallel issues for you, and I created accounts. 79 00:17:23.969 --> 00:17:29.519 For you on it, what I'm going to do now is. 80 00:17:29.519 --> 00:17:34.979 Actually, what I'm going to do, give me a minute. I'm going to share my whole screen instead of just this. 81 00:17:34.979 --> 00:17:39.659 Window 3rd here. 82 00:17:57.959 --> 00:18:03.179 So, I can't tell. 83 00:18:06.239 --> 00:18:12.778 See, my screen. 84 00:18:17.249 --> 00:18:26.548 Anyone can you hear me? I see. The reason I'm asking is I'm getting no video feedback. I mean, I'm getting no audio feedback. 85 00:18:26.548 --> 00:18:29.548 With the headset set up that aren't using. 86 00:18:29.548 --> 00:18:35.638 2nd here, so all we see, right now. 87 00:18:35.638 --> 00:18:41.489 Is it just says W, Randall Franklin is starting to share content. 88 00:18:41.489 --> 00:18:45.449 Okay, thank you. I can hear you. Fine. Okay. Thank you. 89 00:18:45.449 --> 00:18:50.608 But it didn't start. I've got a lot of hardware. Okay. 90 00:18:53.729 --> 00:19:00.058 This might involve me killing and restarting. Firefox. This might be the quickest way to do it. 91 00:19:02.038 --> 00:19:07.199 Silence. 92 00:19:11.578 --> 00:19:15.058 Oh, I'm in control now. 93 00:19:15.058 --> 00:19:24.689 So, I guess you should start kicking people for being, not having mikes. Like, well, we're toxic here. It's we need to show we're better than everyone else. 94 00:19:24.689 --> 00:19:27.989 All right guys, let me just share for you here. 95 00:19:27.989 --> 00:19:38.189 Okay, wait, it says someone is waiting in the lobby. 96 00:19:38.189 --> 00:19:47.368 Where do you see that? You should let them and it could be the professor probably probably the professor so you should let them in. 97 00:19:54.148 --> 00:19:58.949 Silence. 98 00:20:02.999 --> 00:20:07.919 Silence. 99 00:20:12.659 --> 00:20:16.138 Good. 100 00:20:16.138 --> 00:20:19.528 Can you hear me? Yes. 101 00:20:21.388 --> 00:20:24.598 This weekend. 102 00:20:25.679 --> 00:20:30.118 Can anyone hear me now? 103 00:20:30.118 --> 00:20:34.259 Yes. 104 00:20:34.259 --> 00:20:38.159 Great Thank you. 105 00:20:38.159 --> 00:20:43.769 And again, as I get no feedback on these things okay, now, let me try sharing the whole screen. 106 00:20:48.868 --> 00:21:03.358 Silence. 107 00:21:19.169 --> 00:21:22.378 Silence. 108 00:21:36.239 --> 00:21:44.249 Silence. 109 00:22:02.219 --> 00:22:08.999 Silence. 110 00:22:18.239 --> 00:22:27.118 Silence. 111 00:22:40.618 --> 00:22:57.898 Silence. 112 00:23:27.263 --> 00:23:28.493 Professor Lee. 113 00:23:31.588 --> 00:23:34.919 I guess. 114 00:23:36.778 --> 00:23:43.648 Looks like you're in charge again, Jack. 115 00:23:43.648 --> 00:23:46.769 I'm the captain now. 116 00:23:46.769 --> 00:23:50.459 Me didn't pop. 117 00:23:50.459 --> 00:23:54.058 Oh. 118 00:23:54.058 --> 00:23:57.838 Mike Mike or. 119 00:23:57.838 --> 00:24:02.368 Chad, if you're ready. 120 00:24:06.088 --> 00:24:09.749 Silence. 121 00:24:09.749 --> 00:24:15.898 So, um, did he come back in last time? 122 00:24:17.969 --> 00:24:27.358 She came back last time this is just comical happened last semester, when I had them to unlock. 123 00:24:30.659 --> 00:24:37.348 Who would have thought at Quadro? 8000 couldn't handle screen share. 124 00:24:37.348 --> 00:24:42.749 So many features. 125 00:24:42.749 --> 00:24:46.798 Well, I think he's using the web version of Webex for some reason. 126 00:24:47.693 --> 00:25:02.423 It's probably just the 1 for the web version. It is complete and utter dog s*** I've used to before I don't know what he's using for and finally again, here we go. 127 00:25:11.699 --> 00:25:23.038 Okay, so the theory is, you see my screen. 128 00:25:23.038 --> 00:25:27.028 Um. 129 00:25:27.028 --> 00:25:37.439 Yes, yes, we do. And so on. Okay, so we've got that. We've got that what happened was I tried to share the screen and. 130 00:25:37.439 --> 00:25:49.828 It locked up my, it locked up the window manager on my laptop so I had to go to my 2nd laptop sitting beside me and log in the 1st laptop, kill, kill the window manager. 131 00:25:49.828 --> 00:25:55.979 Okay, okay good. 132 00:25:55.979 --> 00:26:01.949 So, but you should be able to see is. 133 00:26:01.949 --> 00:26:07.439 This, and what I've created for every 1, are. 134 00:26:07.439 --> 00:26:12.538 Accounts and the username, your account is your RCS user ID. 135 00:26:12.538 --> 00:26:18.358 And the your initial password is your rent is your. 136 00:26:18.358 --> 00:26:22.469 Is your Pi, serial number so you welcome. 137 00:26:22.469 --> 00:26:29.278 The way to connect to it would be to use s. H and you're welcome to try connecting to it. Now. 138 00:26:29.278 --> 00:26:35.068 So, like, what I would do here is I do a. 139 00:26:35.068 --> 00:26:43.378 And so on, and then I get connected like that. So. 140 00:26:43.378 --> 00:26:46.528 Bringing up the point here. 141 00:26:46.528 --> 00:26:49.919 You are welcome to. 142 00:26:49.919 --> 00:27:03.479 To try right now in class start, start the VPN if you're not already using it and try logging in to parallel with. And I'll try to do bug things in class right now. 143 00:27:04.044 --> 00:27:12.233 So you said your, your username is your RCS ID in my case it's f. R. A. N. K. W. R. 144 00:27:12.653 --> 00:27:20.693 and your password is your 9 digit 6 6 0T something something something. 145 00:27:20.999 --> 00:27:26.759 And so you can try it now and that's it. 146 00:27:26.759 --> 00:27:30.388 Okay, great. And. 147 00:27:30.388 --> 00:27:40.409 While you're while you're working on that, I'll just show you some stuff on parallel here. 148 00:27:40.409 --> 00:27:44.578 And just to show you what it's like. 149 00:27:46.199 --> 00:27:51.868 Let me actually, 2nd, here. 150 00:27:52.979 --> 00:27:59.578 I got the chat window visible. This is parallel to show you what sort of machine it is. 151 00:28:01.439 --> 00:28:07.378 25256 gigabytes of memory. 152 00:28:07.378 --> 00:28:11.548 And Scott to court, it's. 153 00:28:11.548 --> 00:28:18.509 Dual on 14 core Z on. So that's 28 cores. And each core has. 154 00:28:18.509 --> 00:28:25.858 What 2 hyper threads so that's 656 threads here. The flip side is the cores are not incredibly fast. 155 00:28:25.858 --> 00:28:37.199 So and, hey, good. I see several people on here and Janet. Great. What is the patent? Great. So a couple of people are getting it to work. 156 00:28:38.278 --> 00:28:43.739 And see, you info. 157 00:28:43.739 --> 00:28:53.308 I don't have it issues. S. H, well, you have to use the ve. 158 00:28:54.868 --> 00:29:01.199 You have to run the VPN. It depends on your operating system. 159 00:29:01.199 --> 00:29:05.909 If you're using Windows there is a, um. 160 00:29:05.909 --> 00:29:14.519 If you browse to the right web site, I think it starts up the VPN automatically. If you're using Linux. 161 00:29:14.519 --> 00:29:18.269 Install a VPN package. 162 00:29:18.269 --> 00:29:24.838 So, I'm writing base, well, basically open VPN and so on. 163 00:29:24.838 --> 00:29:28.409 So. 164 00:29:28.409 --> 00:29:37.739 Great. Okay, so we've got several people on it coming in some things that you can look at. 165 00:29:37.739 --> 00:29:42.449 Okay, so I've got a directory, um. 166 00:29:42.449 --> 00:29:46.709 Our class and it's such a it's a link. 167 00:29:46.709 --> 00:29:50.429 User accounts, which are. 168 00:29:50.429 --> 00:30:01.709 Files here. Oops. Sorry about that. 1 if you're curious about the machine size working. 169 00:30:01.709 --> 00:30:05.098 I do cool. Um. 170 00:30:05.098 --> 00:30:15.808 The invoice is the connection refused that PDF file talks about the. 171 00:30:15.808 --> 00:30:22.288 The machine itself device query it goes into an invidia thing. 172 00:30:22.288 --> 00:30:26.189 And and video. 173 00:30:26.189 --> 00:30:37.138 Semi talks about this talks about what's happening on the video thing. It's got 2 cards here. Quadro T x8000. 174 00:30:37.138 --> 00:30:47.519 With 48 gigabytes of memory, and an older g towards 1088 gigabytes of memory and neither of them is doing anything. Okay. So. 175 00:30:47.519 --> 00:30:51.449 Got various files here. Um. 176 00:30:51.449 --> 00:31:02.788 And let me I'm going to start to write off really quickly by example, before I talk about open MP more form, and I'm going to throw you right in and. 177 00:31:02.788 --> 00:31:16.048 Hello, this is a Hello world program. You might say 1 of your 1st programs. So this is a functioning open MP program. 178 00:31:16.048 --> 00:31:24.239 And let me show you the address, the VPN and to. 179 00:31:24.239 --> 00:31:29.368 Um. 180 00:31:33.088 --> 00:31:40.618 I don't know. Oh, okay. I'll Thank you. Yeah. 181 00:31:40.618 --> 00:31:45.449 Okay, so here's your Hello world program C. plus plus. 182 00:31:45.449 --> 00:31:51.028 And, I mean, I have some calm and stuff. I'd like to include just, um. 183 00:31:51.028 --> 00:31:54.479 Does some fun things for me you don't have to use it. 184 00:31:54.953 --> 00:32:09.473 And so open MP has a couple of extensions, it adds some fragments to the C plus plus programs. C. plus plus as an extension mechanism called fragments and pound cry and these. 185 00:32:09.834 --> 00:32:18.144 So this is the extension. This is the class open MP and this is a particular open empty director. 186 00:32:18.749 --> 00:32:26.759 So this program is 1 extension. That's 1 thing that open MP has a 2nd thing open. Mp has has some library functions. 187 00:32:26.759 --> 00:32:31.288 And and what this does is. 188 00:32:31.288 --> 00:32:35.578 This runs a program with parallel threads and gets the number of the thread. 189 00:32:35.578 --> 00:32:40.648 And now, print in here is a macro from my own. 190 00:32:40.648 --> 00:32:47.009 That I define up and calm and so this is showing you 2 of the things that open MP adds some. 191 00:32:47.009 --> 00:32:51.419 And some library functions, and you compile it. 192 00:32:51.419 --> 00:32:55.048 Um. 193 00:32:55.048 --> 00:32:58.739 Well, I, I'm just going to remove hello. 194 00:32:58.739 --> 00:33:03.449 And. 195 00:33:03.449 --> 00:33:15.179 Oops, so this shows the 3rd thing using g plus plus you tell the compiler that you're going to be compiling with open empty. 196 00:33:15.179 --> 00:33:22.108 Now, on different compilers use different ways to mention that and also just. 197 00:33:22.108 --> 00:33:29.578 I'm adding math extensions here. Now what the program does. 198 00:33:29.578 --> 00:33:32.999 Is this and. 199 00:33:36.028 --> 00:33:41.128 It's let me look at the source code for the program to show you. 200 00:33:42.148 --> 00:33:51.659 Okay, so what's happening here is this machine has 56 some can run 56, hyper threads. 201 00:33:51.659 --> 00:34:03.659 And fragmental parallel says to run the following block on all the threads. So this block that I've highlighted will run on all 56 threads. 202 00:34:03.659 --> 00:34:11.369 And so what it's running in parallel is it gets the thread number. 203 00:34:11.369 --> 00:34:20.728 And so each of the 56 threads are numbered 0, 1, 2, 3 up to 55. and then so it prints the thread number. 204 00:34:20.728 --> 00:34:33.088 The bad news, Matt, there's 1 thread that's called a master thread. It's thread 0T and here we're doing conditional. So, this runs only on thread 0T and prints and. 205 00:34:33.088 --> 00:34:44.009 Getting, um, trench prints, the number of threads. So, what this program will do is it will do this on all 56 threads in parallel. And then finally it will print. 206 00:34:44.009 --> 00:34:49.139 The number of threads, and we see up here now. 207 00:34:51.358 --> 00:34:56.489 Couple of things to notice is that. 208 00:34:57.628 --> 00:35:01.108 The output is scrambled. 209 00:35:01.108 --> 00:35:08.728 And this is the 1st lesson about parallel computing go back to the program here. 210 00:35:08.728 --> 00:35:18.179 So this block runs on all 56 threads in parallel. There are no guarantees about the ordering. 211 00:35:18.534 --> 00:35:33.204 Okay, it might happen that thread. 0T runs to completion and then thread 1 starts and runs the completion up to 355 or it might happen. That thread 0T will run 1 line and then thread 1 and there'll be totally interspersed. 212 00:35:34.978 --> 00:35:41.969 Track with fractions of statements, being interspersed so thread 0T might run. 213 00:35:41.969 --> 00:35:45.958 Part of a statement, and then thread 1 runs part of a statement. 214 00:35:47.003 --> 00:35:58.733 It doesn't have to start a thread it could be thread 25 runs for a few micro seconds and then thread 37 runs for a few micro seconds and then thread 3. and so so there are no guarantees about ordering. 215 00:35:58.764 --> 00:36:09.324 And we see that up here is these numbers are not printed in order. This is a fraction. If I go up to the run this thing again. 216 00:36:14.429 --> 00:36:28.289 Okay, so so what we have is we have a low world printed for many of the threads, and then thread ID printed so it's a total mess. Okay. 217 00:36:29.458 --> 00:36:34.858 Oops, sorry and then it's, um. 218 00:36:34.858 --> 00:36:38.518 And then the number of threads gets printed. 219 00:36:38.518 --> 00:36:43.679 At the end, that's there. I don't even know where that's getting printed. 220 00:36:43.679 --> 00:36:46.739 And every time I run it, it's totally different. 221 00:36:46.739 --> 00:36:52.768 So, I could actually, I could. 222 00:37:03.179 --> 00:37:14.068 So, Here's another feature of open MP is you can set environment variables and they will control the program. So, this says, let's have only. 223 00:37:14.068 --> 00:37:21.418 3, 3, 0, 1 and 2, and now you can see the mixed up and this is the thing here that was. 224 00:37:22.648 --> 00:37:26.818 Done last, but it isn't even printed last. So. 225 00:37:28.469 --> 00:37:38.759 Try this again every time I run it something different happens. Okay so there's 1 of your lessons here about parallel. 226 00:37:39.324 --> 00:37:54.143 And I'm mentioning at some point here, so that's the 1st example. So this was the 1st Hello world example we saw and we saw this. 227 00:37:56.548 --> 00:38:01.438 Now, I can edit the program and so on if I want to. 228 00:38:03.719 --> 00:38:07.918 To show you that it's work and show you this actually working. 229 00:38:30.989 --> 00:38:36.719 I mean, I think with 3 threads here. 230 00:38:36.719 --> 00:38:48.119 It actually printed starting and ending at the start in the end, but inside that every time I do it, it's different. So okay. Any questions about. 231 00:38:48.119 --> 00:38:54.480 About that, oh, by the way, make our people familiar with make make file and so on. 232 00:38:54.480 --> 00:39:00.030 What happens with the make program. 233 00:39:01.920 --> 00:39:13.559 Good. Okay people are familiar so it takes a target and it looks for source files and it compiles only what needs to be recompiled. It looks at the dates the, and. 234 00:39:13.559 --> 00:39:18.389 Here, if we have, so Hello is newer. 235 00:39:18.389 --> 00:39:23.519 Then hello dot see if I make hello? Try it again. 236 00:39:23.519 --> 00:39:29.670 Nothing will happen. Okay great. Um. 237 00:39:29.670 --> 00:39:33.869 Show you another program also showing you. 238 00:39:36.210 --> 00:39:40.079 So this is using this. 239 00:39:40.079 --> 00:39:47.489 In parallel, we're going to sum up the numbers from 0T to 999999. 240 00:39:47.489 --> 00:39:53.280 And by the way, this syntax here, I like this syntax. 241 00:39:53.280 --> 00:39:57.389 Is this familiar? This has been added to see in the last few years. 242 00:39:57.389 --> 00:40:08.760 What is happening here is that the, in this context, the apostrophes are comments that you can use to limit big numbers. I like them. 243 00:40:08.760 --> 00:40:18.599 So, in this case, only they are not elimiting a string or a character. They're not limiting a character. They're just comments. So cool. 244 00:40:18.599 --> 00:40:24.300 Okay, so got our main program here. This is the formula for. 245 00:40:24.300 --> 00:40:29.699 A song and this is here we have a loop. 246 00:40:29.699 --> 00:40:42.570 We're just adding up numbers in there and what I'm doing here is I'm printing the correct number and the computed number, and we can watch what happens. 247 00:40:42.570 --> 00:40:51.239 Oh, by the way my print see, macros what this does is it prints the name here and then it prints the value. 248 00:40:51.239 --> 00:40:56.159 It's fun, you know, just to help you. 249 00:40:56.159 --> 00:41:04.260 You know, just to help the so I've started cool. Um, some, and let me remake it. 250 00:41:04.260 --> 00:41:15.719 Okay, so this is what I mean, you see here. Oh, the threads thing is wrong. I got to fix that. 251 00:41:15.719 --> 00:41:20.250 Because they're outside the block, but look here. 252 00:41:20.250 --> 00:41:26.219 I saw the numbers from 0T to a 1M minus 1. that's what I should get. 253 00:41:27.539 --> 00:41:32.820 This is what I do get a. 254 00:41:34.139 --> 00:41:37.739 Hey. 255 00:41:37.739 --> 00:41:41.760 So. 256 00:41:43.349 --> 00:41:49.230 What is going wrong here? Any ideas. 257 00:41:49.230 --> 00:42:00.900 What might be going wrong? 258 00:42:01.920 --> 00:42:06.360 Any ideas. 259 00:42:12.000 --> 00:42:16.079 Yes. 260 00:42:16.079 --> 00:42:24.360 Right oh, by the 1st, let me run it with 1 thread and we will see you get the right number. 261 00:42:28.409 --> 00:42:33.630 I say dot slash here, just. 262 00:42:33.630 --> 00:42:41.519 You're got correct. Okay if I put in 2 threads or make 3 threads. 263 00:42:41.519 --> 00:42:48.000 It's somewhat last 10. 264 00:42:48.000 --> 00:42:53.699 Much okay. 265 00:42:53.699 --> 00:42:57.690 The problem is this right here? 266 00:42:59.340 --> 00:43:10.320 How is this implemented? You read the old value of computed you add to it and you store it back into the new value, so to read modify right? 267 00:43:10.320 --> 00:43:16.559 And remember, I told you that there are no guarantees for how the threads that are late. 268 00:43:16.559 --> 00:43:19.920 And so what happens is multiple threads. 269 00:43:19.920 --> 00:43:25.320 Might be reading computed before the increment and write back. 270 00:43:25.320 --> 00:43:28.530 So, it's a problem. 271 00:43:28.530 --> 00:43:32.670 Yeah, it's a problem with the different threads. 272 00:43:32.670 --> 00:43:35.909 They several of them will do the read. 273 00:43:35.909 --> 00:43:45.480 And then, in parallel do the incremental and then do the right back. So, 2nd, problem with this. And I run the thing several times actually. 274 00:43:45.480 --> 00:43:51.929 Even with only 10 threads, every time I run and I get a different answer here. 275 00:43:51.929 --> 00:43:57.599 So. 276 00:43:57.599 --> 00:44:01.170 And then, of course, if I have. 277 00:44:05.664 --> 00:44:17.094 Yeah, 100 sorry I have only 56 hardware threads, but I can specify more than 56 software thread status queue up and wait. Okay, so there's a lesson here with synchronization now. 278 00:44:20.670 --> 00:44:29.460 So, you might think that this is a pretty bad thing. Every time I run the program, I get a different answer. 279 00:44:29.460 --> 00:44:37.920 Let me tell you a worst possible thing that can happen. Sometimes if you're not synchronizing inside the program. 280 00:44:37.920 --> 00:44:41.460 You might perhaps get the same answer every time, but it's wrong. 281 00:44:41.460 --> 00:44:46.980 So, it could happen so just because the answer is consistent doesn't mean it's right. 282 00:44:46.980 --> 00:44:51.420 In any case, so we have to do something with that. 283 00:44:51.420 --> 00:44:57.269 And what do we have to do? I'll bring in 1 from last year. Um. 284 00:45:00.780 --> 00:45:04.260 Comic. 285 00:45:07.500 --> 00:45:14.550 Okay, what I'm doing here, the extension is I've got a new pragmatic this. 286 00:45:15.780 --> 00:45:20.849 And what this says is that the following line will be executed at. 287 00:45:20.849 --> 00:45:32.519 In 1 thread without interruption by any other, just so being the incriminating, the variable computed will be atomic and that whole statement will get executed. 288 00:45:32.519 --> 00:45:38.760 By 1 thread without being interrupted by any other threat. So this is. 289 00:45:39.840 --> 00:45:49.800 So this is a new oh, by the way what I forgot to show you, was this up here parallel 4 says. 290 00:45:49.800 --> 00:45:53.670 Do the following for loop. 291 00:45:53.670 --> 00:45:56.909 And. 292 00:45:56.909 --> 00:46:06.750 Stripe the iterations of the 4 loop across the available threads. So each iteration of the loop will get executed by only 1 thread. 293 00:46:06.750 --> 00:46:16.829 Which 1 thread is another matter, but each iteration of the loop book, it executed by only 1 thread that may come back to that for a sec. Hello here. 294 00:46:18.269 --> 00:46:24.150 You see, I had a fragment parallel. The block here was executed. 295 00:46:25.590 --> 00:46:28.590 By. 296 00:46:28.590 --> 00:46:36.900 By all all the threads if I say fragment parallel for here. 297 00:46:38.909 --> 00:46:42.239 Then the contents are executed. 298 00:46:42.239 --> 00:46:48.510 Each iteration of the 4 loop gets on, goes onto only 1 3 is executed only once. 299 00:46:48.510 --> 00:46:52.500 But the order again doesn't matter, and the different iterations. 300 00:46:52.500 --> 00:46:56.190 Might might be interspersed what we have here now. 301 00:46:56.190 --> 00:47:01.829 Is that this pragmatist says that this is executed on only 1. 302 00:47:03.840 --> 00:47:06.989 Not only 1. 303 00:47:08.190 --> 00:47:14.250 Threat a threat at a time. Now if I run this. 304 00:47:20.670 --> 00:47:25.889 We get the right answer. 305 00:47:25.889 --> 00:47:30.059 So now. 306 00:47:31.500 --> 00:47:37.590 Let me show you a flip side here. 307 00:47:37.590 --> 00:47:42.599 Okay, so a lot of new things in in today. 308 00:47:44.369 --> 00:47:50.340 So you have to wrap critical, certain critical sections and perhaps. 309 00:47:50.340 --> 00:47:56.099 The atomic Craig, I'll come back to that more later and you'll get the right answer. 310 00:47:58.019 --> 00:48:02.699 But the time is somewhat more this took this much time and this. 311 00:48:03.840 --> 00:48:09.809 That much time, so the. 312 00:48:09.809 --> 00:48:14.219 When you mark off the critical sections. 313 00:48:14.219 --> 00:48:24.389 1st, that section runs on only 1 thread at a time. So that part now is serial instead of parallel. So you come back to abdul's law. 314 00:48:24.389 --> 00:48:28.469 Your programs not going to be as parallelizable as point 1. 315 00:48:28.469 --> 00:48:34.619 Point 2 is that there's an overhead and starting up and taking down these blocks. 316 00:48:34.619 --> 00:48:38.849 So, even if there was nothing inside the block, then. 317 00:48:38.849 --> 00:48:45.960 There'd be an overhead that's now another thing here. So this was the. 318 00:48:45.960 --> 00:48:50.730 Real time point 1, 5 seconds. 319 00:48:50.730 --> 00:48:59.699 The user time was 5 seconds, because the user time has summed over all of the threads and we have 56 hyper threads. 320 00:48:59.699 --> 00:49:08.250 And, in fact, what the show is here, is that the user, the CPU time was 38 times as much as the real time. 321 00:49:08.250 --> 00:49:14.280 Now, which leads to another. 322 00:49:14.280 --> 00:49:19.349 Point is that when you measure parallel speed up, you look at real time. 323 00:49:19.349 --> 00:49:24.300 You don't look at time is not so interesting. 324 00:49:24.300 --> 00:49:30.360 So, it's real time real speed up is the point. 325 00:49:31.739 --> 00:49:36.869 And for 1 reason, this is large. 326 00:49:36.869 --> 00:49:46.650 Is that the way the system may implement a thread? Waiting for another thread? Sometimes is the thread that's waiting. Just started spends the CPU. 327 00:49:46.650 --> 00:49:51.989 It just burned time waiting for the other threat to finish. 328 00:49:51.989 --> 00:49:58.469 This may actually lead to better real time performance than some explicit or something. I. E. 329 00:49:58.469 --> 00:50:07.260 If a threads waiting for another thread 1 way to wait, we could implement semaphores probably taken in various classes. 330 00:50:07.260 --> 00:50:16.019 And but there's an overhead with implementing semaphores an easier way is the 1st, or just continually checks to see if the 2nd thread is finished. 331 00:50:16.019 --> 00:50:19.139 So, it's just wasting time, but. 332 00:50:19.139 --> 00:50:25.380 There's less overhead talk about parallel here. Also. Let's look at the 1 without any. 333 00:50:25.380 --> 00:50:28.619 Interaction without any atomic. 334 00:50:28.619 --> 00:50:37.619 No parallelization, we got the right answer. 335 00:50:39.090 --> 00:50:43.769 1. 336 00:50:43.769 --> 00:50:50.969 Okay, we already got the wrong answer with 2 threads. 337 00:50:50.969 --> 00:50:57.000 Okay, I'll look what happened here with 10 threads. 338 00:50:57.000 --> 00:51:01.920 Well, 1st, we got the wrong answer. The 2nd, the real time grew. 339 00:51:01.920 --> 00:51:05.940 So, another lesson from this example. 340 00:51:06.114 --> 00:51:15.594 Is that increase? Parallelization does not necessarily increase does not necessarily decrease here real clock time. 341 00:51:15.985 --> 00:51:27.114 There's an overhead with starting multiple threads and so on starting them up and taking them down and excessive parallelism can run slower, real clock time. So. 342 00:51:27.420 --> 00:51:30.630 Um. 343 00:51:30.630 --> 00:51:37.380 And if we try this on, let's see, 10 threads here let's try. 344 00:51:37.380 --> 00:51:43.590 To scratch, so, 10 scratch was slower than 2 threads. 345 00:51:43.590 --> 00:51:49.679 And so 1 thread here took point 0, 3 seconds. 346 00:51:49.679 --> 00:51:54.059 2 threads took point 8 seconds 10 for. 347 00:51:55.710 --> 00:52:00.570 Yeah. 348 00:52:00.570 --> 00:52:07.050 So, again, there's an overhead and starting up parallelization here. 349 00:52:07.050 --> 00:52:15.719 Okay, other lessons from this little program can let me go back to the little program. 350 00:52:15.719 --> 00:52:19.349 So, this here. 351 00:52:21.505 --> 00:52:29.364 This takes only 1 statement so this is here is used it's a low overhead fragment. 352 00:52:29.695 --> 00:52:37.045 It's used when you're want to lock something very simple, like, incremental counter. 353 00:52:37.320 --> 00:52:49.230 So, with an atomic, the next statement, the next thing is 1 statement, not a block, and it has to be something really simple, like, plus equals but. 354 00:52:49.230 --> 00:52:55.139 The flip side is atomic is the least expensive way to sterilize something. 355 00:52:57.150 --> 00:53:02.280 So other ways. 356 00:53:06.420 --> 00:53:12.210 There's other things there is a critical. 357 00:53:12.210 --> 00:53:15.510 Let's see here. 358 00:53:15.510 --> 00:53:27.630 You could have a critical block here and that was a critical block. You're less limited in what you can put after you can put more general things that have to be serialized but the overhead will be more. So. 359 00:53:38.340 --> 00:53:41.489 She took much more time. 360 00:53:42.960 --> 00:53:48.389 To the critical thing, because the critical. 361 00:53:48.389 --> 00:53:53.429 A critical block can be more general in an atomic block, but it's more expensive to. 362 00:54:03.570 --> 00:54:07.559 Hello. 363 00:54:07.559 --> 00:54:13.829 Yeah, so once it gets slower. Okay. 364 00:54:13.829 --> 00:54:19.469 So, various lessons should I bring in here to show you. 365 00:54:19.469 --> 00:54:24.239 I can bring in Hello credit and. 366 00:54:31.590 --> 00:54:46.469 So, here I put a critical around the Hello world here so we're not going to get this Matt, this scrambled mess Rosie. Hello? World and printing. The thread ID are separate from each other. 367 00:54:50.760 --> 00:54:54.960 You see everything. 368 00:54:54.960 --> 00:55:05.670 The things are not scrambled up so much, except don't get numb thread. Just scrambled in with the other stuff. But still, every time I run the order is different. 369 00:55:05.670 --> 00:55:09.269 Okay, okay so that's. 370 00:55:09.269 --> 00:55:12.329 Quick introductions to. 371 00:55:14.849 --> 00:55:22.230 To help to open M. P. 372 00:55:24.989 --> 00:55:30.960 Let me, um, now start talking in a more formal way about it. 373 00:55:40.769 --> 00:55:47.010 Stuff here. 374 00:55:51.570 --> 00:55:57.480 Where did I put it? 375 00:55:59.519 --> 00:56:07.619 Going go here. Okay, so I'm going to go through this fast and let you read it on your own. 376 00:56:07.619 --> 00:56:11.309 Um. 377 00:56:11.309 --> 00:56:14.760 Just a 2nd, here. 378 00:56:17.880 --> 00:56:21.449 Okay, I got to chat window open and again. 379 00:56:23.369 --> 00:56:33.750 Okay, concepts are that it's portable to some, it much details of the hardware. 380 00:56:33.750 --> 00:56:39.000 This tutorial talks about an older version of open empty, but the level we're talking about it. 381 00:56:39.000 --> 00:56:42.684 It doesn't matter, we'll get to the later stuff later. Okay. 382 00:56:42.684 --> 00:56:54.925 So the buzzwords are so multi threaded and shared memory and it's explicitly you explicitly say what you want to parallelize it's no compiler trying to infer it. 383 00:56:55.289 --> 00:57:07.050 So, in that sense, it's low level. We got our 3 things to compile a directive to pragmatist library routines and the environment variables that control things like a number of threads to use. 384 00:57:07.050 --> 00:57:10.619 Don't do it for distributed memory. 385 00:57:10.619 --> 00:57:16.289 Yeah, you're going to do some efficiencies because of the abstraction level. 386 00:57:16.289 --> 00:57:22.500 And look at this, it does not check for this stuff. That's your job. 387 00:57:23.519 --> 00:57:33.510 And ignore I. Oh, okay. But but it is fairly simple. It's been used for a number of years. So they've gotten the, um. 388 00:57:33.510 --> 00:57:42.059 Things that cause problems, and they've tried to fix them. So it's why I like a tool that's been in use for a while, because it's been iterated and improved. 389 00:57:42.059 --> 00:57:45.150 They've been around, they've been thinking about it for a while. 390 00:57:45.150 --> 00:57:50.280 It's been effectively going for 20 years for getting better. 391 00:57:50.280 --> 00:57:54.659 New version. 392 00:57:54.659 --> 00:57:58.619 Going to do oh, yeah. The, uh. 393 00:58:00.570 --> 00:58:08.400 Let me do what can be good because people love slamming Wikipedia. So I like to say, it's actually quite good often. 394 00:58:10.349 --> 00:58:14.460 So actually. 395 00:58:16.860 --> 00:58:23.250 Let me actually hit this thing here. Okay. On. 396 00:58:23.250 --> 00:58:29.639 This is the abstract machine model for open MP. We have a master thread. 397 00:58:29.639 --> 00:58:33.300 You do a pragmatic for parallel or something. 398 00:58:33.300 --> 00:58:45.449 And it splits into these parallel tasks, or whatever task is a loaded word, but these parallel things do things in parallel. Then it rejoins and it splits again and so on this is. 399 00:58:45.449 --> 00:58:50.639 How the thing works, you control the splitting and joining. 400 00:58:50.639 --> 00:58:59.099 Okay, oh, the other thing is the big website here. 401 00:58:59.099 --> 00:59:04.199 This is where you go to find everything compilers books. 402 00:59:04.199 --> 00:59:12.119 Yeah, you can have fun browsing around here so on your own browse around there, let me hit some highlights of the model here. 403 00:59:12.119 --> 00:59:19.349 Shared memory, multi car, uniform and non uniform. 404 00:59:19.349 --> 00:59:22.980 Doesn't matter as long as it's 1 address space. 405 00:59:22.980 --> 00:59:31.559 And again, thread based. 406 00:59:31.559 --> 00:59:44.010 The thing about threads, is that what again, a thread is lighter weight than a separate process. The threads by default share the memory space. So variable. 407 00:59:44.010 --> 00:59:51.449 Is common to all the threads, unless you want to make it private, which you can, but default. 408 00:59:51.449 --> 00:59:56.070 Things shared within the threads, um. 409 00:59:57.300 --> 01:00:04.679 So there within a single process, you can vary the number of threads. 410 01:00:04.679 --> 01:00:08.010 I was I demonstrated to, you. 411 01:00:08.010 --> 01:00:12.900 Explicit parallelism. 412 01:00:14.190 --> 01:00:29.039 And so insert director so here's the thing. The hard work is, you have to write your program so that it's again, we don't have some machine learning compiler. That determines what can be parallelized. So. 413 01:00:30.239 --> 01:00:34.079 Court join model on Wikipedia, install this figure. 414 01:00:34.079 --> 01:00:40.440 Jackie master thread for can join. 415 01:00:43.800 --> 01:00:47.969 Again, data shared within the regions by default. 416 01:00:47.969 --> 01:00:51.300 All threads can access this shared data. 417 01:00:52.440 --> 01:00:57.690 But if it if it's not desired, you can control that. 418 01:00:57.690 --> 01:01:10.829 Okay, you can nest parallel regions, although I don't actually see the point of it. You're not going to get more parallel problem probably, but you can ask for us to some extent. 419 01:01:12.090 --> 01:01:18.329 Um, you can get dynamic with this thing. 420 01:01:18.329 --> 01:01:22.739 If you want. 421 01:01:22.739 --> 01:01:27.599 I'll give an example of that actually. Yeah. 422 01:01:27.599 --> 01:01:36.059 Oh, you see, gets all scrambled together and you probably want to. 423 01:01:36.059 --> 01:01:41.880 You know, entirely up to the program are. 424 01:01:44.039 --> 01:01:48.269 Let me show you some dynamic threads and then I'll come back to this. 425 01:01:51.840 --> 01:02:03.690 Where are we going again? You see what I'm doing here is I'm copying files into the current years directory, only as I use them. So you're not confused. 426 01:02:03.690 --> 01:02:08.670 Okay, come in. 427 01:02:10.800 --> 01:02:14.849 Let me, I'm going to show you 1 other thing here. 428 01:02:14.849 --> 01:02:20.130 The problem. 429 01:02:29.369 --> 01:02:39.809 Okay, this is not a parallel program at all. She does no fragments in here. What I'm doing is I'm looping from 1 to a 1000000000. 430 01:02:43.380 --> 01:02:55.710 And I'm summing a float, I'm also have the sub, total as a double. And here I've got the sub toggle as a float, but I'm counting down and set up. 431 01:03:02.400 --> 01:03:09.269 Really okay. Now. 432 01:03:10.469 --> 01:03:14.010 The correct answer is 5 times 10 of the 17th. 433 01:03:15.030 --> 01:03:21.659 If I compute with single precision, I get a totally wrong answer. 434 01:03:21.659 --> 01:03:32.340 If I confused with double precision, I get the right answer and counting down. Did not make any difference. It's almost so. 435 01:03:39.420 --> 01:03:43.320 Question for you. 436 01:03:43.320 --> 01:03:47.639 Why a single precision giving me the wrong answer. 437 01:04:04.199 --> 01:04:08.369 Any idea. 438 01:04:27.840 --> 01:04:33.690 Any ideas. 439 01:04:37.679 --> 01:04:47.010 Um, Eva. 440 01:04:52.230 --> 01:04:59.639 Any idea going to kind of the 37th. 441 01:04:59.639 --> 01:05:05.880 So, no, that's. 442 01:05:06.989 --> 01:05:11.670 It's something like the 38, I think. 443 01:05:13.320 --> 01:05:16.710 So, that's fine because the number is kind of the 17th. 444 01:05:16.710 --> 01:05:20.639 That's not going to be the problem. Um. 445 01:05:21.929 --> 01:05:27.269 Kind of the 17th is less than 10 to 38. 446 01:05:27.269 --> 01:05:40.980 Um, no, it's a different problem. Um. 447 01:05:43.769 --> 01:05:52.500 Right. 448 01:05:52.500 --> 01:05:57.329 Well, not precisely. You see, Here's your problem computed. 449 01:05:58.409 --> 01:06:04.349 So, single precision, once you get. 450 01:06:04.349 --> 01:06:14.519 You know, it's only got about 6, 7 digits of significance. So once compute it gets more than about 10 to the 7th. 451 01:06:14.519 --> 01:06:20.820 When you add to it, um, it doesn't increase compute it any more. 452 01:06:22.320 --> 01:06:31.559 So, in a rounds 0. 453 01:06:33.059 --> 01:06:38.760 Greater about, you know, say. 454 01:06:38.760 --> 01:06:43.860 Tend to the. 455 01:06:43.860 --> 01:06:50.250 Anything I change it. 456 01:06:50.250 --> 01:06:54.239 So. 457 01:06:57.150 --> 01:07:02.429 So, there's the problem, but with double precision, it does work. So. 458 01:07:05.610 --> 01:07:08.820 Maybe a little bigger tend to the 10th or something so. 459 01:07:14.280 --> 01:07:19.800 So well, we got up to 10 of the 16th or something. 460 01:07:21.360 --> 01:07:29.940 No, not quite 10 because adding a 1M to 10 to the 16th and starts falling off the end. Okay so problems here what's around off. 461 01:07:29.940 --> 01:07:44.159 Um, show you another dynamic since I just add up the page here that said dynamic. Let me copy in. 462 01:07:52.889 --> 01:07:57.150 And let me show you what task is doing. 463 01:08:05.340 --> 01:08:16.470 We're doing Fibonacci now they're the correct way to find Fibonacci event is to use the closed formula, which gets you the answer in constant time. 464 01:08:16.470 --> 01:08:23.430 But here, we're going to do Fibonacci by the recursive way than, like, showing in computer science classes. 465 01:08:23.430 --> 01:08:29.039 Of this. Okay. 466 01:08:31.260 --> 01:08:41.609 But what we're going to do is for these 2, and for these, we're going to fire off we're going to start a parallel thread. 467 01:08:44.970 --> 01:08:54.239 So here, so, what's happening up here? Fibonacci event event is less than 2. we just return and we've got a minimum level here of something like. 468 01:08:55.770 --> 01:09:01.140 Some number, and if it's very small, we do it sequentially. 469 01:09:01.140 --> 01:09:09.750 But otherwise, in these numbers are probably a way too big here, but you guys the idea but when Dan is very large, we fire off. 470 01:09:09.750 --> 01:09:16.529 2 parallel threads here, so this is showing another thing that you can do here. 471 01:09:16.529 --> 01:09:19.529 The here. 472 01:09:19.529 --> 01:09:25.680 What that's doing is it's dynamically starting another thread at this point in the code here. 473 01:09:27.390 --> 01:09:33.689 So shared I means it will share the variable. I. 474 01:09:35.039 --> 01:09:41.010 And because it's going to return and use that to return it. So, what happens here is that. 475 01:09:42.630 --> 01:09:57.600 We run this in parallel and calls things. Recursively starts other threads in parallel and so on. So, this is an amix red thing incremental, a number of tasks, and starts another parallel thread down here. 476 01:09:57.600 --> 01:10:01.710 So, we can start dynamically start parallel threads and open empty. 477 01:10:03.390 --> 01:10:06.840 And. 478 01:10:08.340 --> 01:10:12.600 So, I'll play around with that and. 479 01:10:12.600 --> 01:10:19.109 Actually move up here and now we're in the block here. 480 01:10:19.109 --> 01:10:26.760 The task wait says, wait till the tasks that are the threads that I started finish and then there. 481 01:10:26.760 --> 01:10:33.840 Returning the values in global line, Jane, we just return the sum of it and main just and. 482 01:10:44.430 --> 01:10:48.600 And we can see everything was all scrambled here. 483 01:10:48.600 --> 01:10:54.659 I was the, it's fired up this many parallels tasks and. 484 01:10:54.659 --> 01:11:01.979 2400% of the machines on the average, but it was using a lot of time and we could. 485 01:11:06.300 --> 01:11:21.180 And see, what happens now, while that is running, we can inside the program. 486 01:11:23.670 --> 01:11:26.760 Okay, we got the same answer. 487 01:11:28.649 --> 01:11:39.539 But so either it was correct or it was consistently wrong. Well, if it was 1 thread, the answer is more likely to be correct but the elapsed time. 488 01:11:39.539 --> 01:11:45.600 Was not every, you know, 1520 times as much, which. 489 01:11:45.600 --> 01:11:52.560 Is reasonable here. Okay so again, so what this program was showing. 490 01:11:53.939 --> 01:12:03.180 Was dynamically so inside your program, you can fire up a task, which will run as a parallel thread and you can. 491 01:12:03.180 --> 01:12:17.100 Say explicit what variables to share and that task and fire up can fire up either start other threads and so on. But this don't guarantees about the ordering. So, at the end, maybe a good idea to wait till, you. 492 01:12:17.100 --> 01:12:22.680 Tasks that you started have finished and again, just to show you up here. 493 01:12:22.680 --> 01:12:26.399 Um. 494 01:12:26.399 --> 01:12:37.229 It's every time I run it, it's different. There's no guarantees about the ordering. Everything is totally scrambled, but the answer should be correct because I waited. 495 01:12:37.229 --> 01:12:42.479 Okay, memory. 496 01:12:43.560 --> 01:12:47.460 So, this is dynamically alter threads and so on. 497 01:12:49.050 --> 01:12:56.250 I always your job this is biggie here. 498 01:12:58.560 --> 01:13:03.149 Even if the threads. 499 01:13:03.149 --> 01:13:08.699 Have are accessing the same global variables. 500 01:13:08.699 --> 01:13:20.460 Um, it may be cashed in the thread and may be not visible to the other threads. This is relax consistency. Buzzword here. So. 501 01:13:22.229 --> 01:13:25.829 You know, the data is not propagated immediately. 502 01:13:25.829 --> 01:13:30.449 Because if you this gives you an more efficiency, so. 503 01:13:32.819 --> 01:13:41.909 And if you want the new variable value to be immediately visible, well, then you have to ensure that by the way. 504 01:13:41.909 --> 01:13:46.289 C, plus, plus itself has this now. 505 01:13:46.289 --> 01:13:53.340 C, plus, plus without going to open MVPs, some parallel ideas in it, including relax consistency. 506 01:13:54.510 --> 01:13:59.010 Okay, directives library, routines, environment variables. 507 01:14:02.250 --> 01:14:12.689 Okay, so it's a things you can do with a directive, start a parallel region. We saw that. 508 01:14:12.689 --> 01:14:21.479 Divide coat among threads like, in the loop distributing federations, we can serialize things atomic operations. 509 01:14:21.479 --> 01:14:25.260 You can synchronize stuff with that waits and so on. 510 01:14:27.270 --> 01:14:30.810 Here's a parallel thing. 511 01:14:30.810 --> 01:14:40.710 That I showed you the initial program, so it runs the following block in parallel on all your threads. 512 01:14:40.710 --> 01:14:47.760 This talks about what variables I shared and what are private, you can guess what that means. 513 01:14:47.760 --> 01:14:52.140 These variables will be private to the thread and everything else is shared. 514 01:14:54.899 --> 01:15:03.930 And there's a lot of those okay. Library routines. You can query the total state of the system and a number of threads. Your thread. 515 01:15:03.930 --> 01:15:08.579 You can tell are you in a parallel region? 516 01:15:08.579 --> 01:15:18.210 And so on, you can set locks and sterilization all that wall clock time. And so again, you want to minimize wall clock time. 517 01:15:20.609 --> 01:15:27.210 An example of a library routine, you want to include this? I included this in my common file. 518 01:15:27.210 --> 01:15:32.399 And get the number of threads, so okay. 519 01:15:33.750 --> 01:15:42.960 Environment variables they can set policy for the program so you can change policy for the program without having to recompile it. 520 01:15:42.960 --> 01:15:50.579 Set number threads, it's a lupus moderation's in the number of threads you can specify how they could divide it up. 521 01:15:50.579 --> 01:15:57.000 And various other things, so stacks and stuff. 522 01:15:57.000 --> 01:16:05.430 Oh, I, each thread has a separate stack for local variables and you can set the size. I'd make it big. 523 01:16:07.109 --> 01:16:15.180 You can do things like that. You can set them. I just said it before the aggregate of that 1 command. 524 01:16:16.229 --> 01:16:24.090 Okay, general structure here you got the fragment, which does things. 525 01:16:24.090 --> 01:16:30.539 By all threads and whatever, then at the end of this block, it resumes cereal call it again. 526 01:16:30.539 --> 01:16:38.970 Okay. 527 01:16:42.479 --> 01:16:45.840 All of your. 528 01:16:45.840 --> 01:16:55.590 Compilers are, they are all slightly different I'm showing you with new g. plus. Plus it's not the best that Shelly its. 529 01:16:55.590 --> 01:17:03.420 Doesn't support the latest version the PGI compiler is free for academics. That's. 530 01:17:04.500 --> 01:17:10.859 Reasonable and NVIDIA has their compiler, which is an extension of this 1. 531 01:17:13.229 --> 01:17:20.399 Why am I using Linux? Well, you look at the supercomputer. It's top 500. they're all running versions of Linux. 532 01:17:22.470 --> 01:17:28.890 Compiling, should you be using different compilers? The flags are all different. 533 01:17:28.890 --> 01:17:35.609 Yes, versus that you need me to create a cheat sheet. Klein is nice. 534 01:17:37.140 --> 01:17:41.460 Okay, um, the direct is. 535 01:17:42.779 --> 01:17:50.340 And skip the 4 trend. 536 01:17:50.340 --> 01:17:53.609 And then some name like parallel. 537 01:17:53.609 --> 01:17:56.670 And then files, so clauses so. 538 01:17:58.170 --> 01:18:06.090 Static. 539 01:18:06.090 --> 01:18:10.529 Static extents. So. 540 01:18:36.569 --> 01:18:40.229 Huh. 541 01:18:42.029 --> 01:18:51.930 The thing with things, running on top of Windows is you never sure how the little level stuff is implemented. So. 542 01:18:51.930 --> 01:18:58.770 Any case scoping you can read this, but it gets to be weird. Um. 543 01:18:58.770 --> 01:19:07.680 Problem with scoping is if you call out from 1 of these extends to another function outside the extent, does it run in parallel. 544 01:19:08.970 --> 01:19:16.470 There's the orphan stuff, so. 545 01:19:18.869 --> 01:19:29.699 Yeah, okay. And I showed you the parallel thing it's got piles of options, which the interesting 1 is reduction. 546 01:19:29.699 --> 01:19:33.840 I'll show you and. 547 01:19:33.840 --> 01:19:39.989 Okay, you can browse on through this. So what I basically. 548 01:19:42.180 --> 01:19:52.739 So review what I showed you well, we finished off the quick introductory Lawrence Livermore tutorial and. 549 01:19:52.739 --> 01:19:59.250 Also, so I introduce you to parallel. 550 01:19:59.250 --> 01:20:07.739 Change your passwords, because I just made it so simple and introduce you to open empty by showing you some examples. 551 01:20:07.739 --> 01:20:11.430 And showing you the Lawrence Livermore open empty tutorial. 552 01:20:11.430 --> 01:20:17.340 And no guarantees. 553 01:20:17.340 --> 01:20:22.229 Okay, and things can be unpredictable. 554 01:20:22.229 --> 01:20:29.100 Okay, so what you what half the class owes me is here. 555 01:20:29.100 --> 01:20:33.090 Refer talk topics. 556 01:20:40.199 --> 01:20:46.140 Okay, and I'll stay around with questions. 557 01:20:46.140 --> 01:20:50.340 And. 558 01:20:55.710 --> 01:20:59.579 Oh, welcome. 559 01:20:59.579 --> 01:21:04.350 So. 560 01:21:04.350 --> 01:21:11.220 See myself yeah. Oh, okay. Other than that. Have a good weekend. 561 01:21:46.680 --> 01:21:55.529 Hello.