WEBVTT 1 00:07:37.499 --> 00:07:45.028 Okay, good afternoon. 2 00:07:45.028 --> 00:07:57.059 Okay, good afternoon class. So I can't tell if you can hear me if anyone could hear me. Could you post. 3 00:07:58.163 --> 00:08:09.504 Thank you. Great. Okay, so what's happening today is, I can attach to shut up and you get a chance to tell me a pile of interesting topics. 4 00:08:09.504 --> 00:08:20.454 It doesn't interesting topics in parallel programming and what we'll continue on with the next class on Monday. We'll be, I'll finish off the Lawrence Livermore tutorial. 5 00:08:20.939 --> 00:08:28.168 And some of the references in it okay, so I put on the website a list of, um. 6 00:08:28.168 --> 00:08:37.678 Of the order for the, for the talks, and I'm having troubles sharing it actually and I've got 2 different machines in front of me. 7 00:08:37.678 --> 00:08:48.058 So the easiest thing is, if you can browse to the website, and I'll just read the 1st, few Reagan and Allen talk about the blue jeans and Lucy talk about. 8 00:08:48.058 --> 00:08:58.913 Physics and Tom, talk about the top 2 computers on top 500 list and Kevin 50 and 6 computer sack on Python. Hey, well, thank you, Allen. Great. 9 00:08:59.303 --> 00:09:08.394 I tried to paste it into the chat window and something didn't work beautiful. So, from divide up the time reasonably, I'm not going to. 10 00:09:09.028 --> 00:09:17.009 We'd ask you about it and then if we run over today, then. 11 00:09:17.009 --> 00:09:30.778 Continue on continue on Monday. Now I think you can possibly unmute yourselves if you can't. I'll on mute you. So, Reagan. 12 00:09:30.778 --> 00:09:40.229 And Allen, let me just let me unmute. You also. 13 00:09:48.688 --> 00:09:57.869 Okay, right now and you have the stage. 14 00:09:59.458 --> 00:10:02.609 All right, um, Reagan, do you want to share the screen? 15 00:10:02.609 --> 00:10:11.219 Sure, I would, but I might end up dropping my head here. Just have the battery. Okay. I'll share mine then. 16 00:10:11.219 --> 00:10:14.639 Let's see. 17 00:10:14.639 --> 00:10:19.948 Beautiful, it's coming in. 18 00:10:22.739 --> 00:10:28.139 All right, so our topic was the IBM blue jeans. 19 00:10:29.428 --> 00:10:35.668 So, what is the blue, Jean? It's a massively parallel computer, which was designed by IBM. 20 00:10:35.668 --> 00:10:39.269 And it was originally released in 2004. 21 00:10:39.269 --> 00:10:49.918 And at the time, it was the fastest computer in the world. The design was supposed to be a low power supercomputer that could reach operating speeds in the. 22 00:10:49.918 --> 00:10:55.769 It costs around 100Million dollars to develop in the Lawrence Livermore, National Laboratory in California. 23 00:10:55.769 --> 00:11:02.428 And it was originally developed to study protein folding and genetics. Hence the name blue Jean. 24 00:11:02.428 --> 00:11:07.288 It also was fairly influential and it helped map the human genome. 25 00:11:10.139 --> 00:11:16.379 So, there were 3 iterations of the blue Jean. The 1st, was the blue Jean L released in 2004. 26 00:11:16.379 --> 00:11:20.849 The 2nd generation was the blue jeans P released in 2007. 27 00:11:20.849 --> 00:11:25.349 And the 3rd generation was the blue Jean queue released in 2011. 28 00:11:26.428 --> 00:11:30.869 So, here's the general structure of. 29 00:11:30.869 --> 00:11:36.389 A supercomputer so added smallest level. They had these compute chips. 30 00:11:36.389 --> 00:11:39.958 And they had 2 processors on them with 4 megabytes of. 31 00:11:39.958 --> 00:11:45.178 And this was different than traditional computers, which were very large. 32 00:11:45.178 --> 00:11:49.798 So, this you could, uh, scaled whatever size you wanted. 33 00:11:49.798 --> 00:11:53.879 So you had these compute cards, which had 2 compute chips. 34 00:11:53.879 --> 00:11:59.519 And then you can take these cards and slot them into a board call a notecard. 35 00:11:59.519 --> 00:12:02.879 And then you could put these note cards in a cabinet. 36 00:12:02.879 --> 00:12:07.438 Imagine like a server rack with all these different layers of node cards. 37 00:12:07.438 --> 00:12:10.619 And then you could assemble them into a system. 38 00:12:10.619 --> 00:12:14.969 I see him here on the right so, in this case, it has 64 cabinets. 39 00:12:14.969 --> 00:12:25.109 And the benefit is that this is scalable. So imagine you had a smaller laboratory, you might want 32 cabinets, or if you have a giant 1, you might want 128. 40 00:12:25.109 --> 00:12:30.328 So, you could adjust this, however, you want it and it also was more. 41 00:12:30.328 --> 00:12:34.229 A robust because if any computer chips fail. 42 00:12:34.229 --> 00:12:48.928 You can reroute the processing to other nodes. So imagine 1 of these 64 cabinets failed, then you could have 63 cabinets running and then you could pull out the broken cabinet and the broken cards, and put in ones that were working. 43 00:12:48.928 --> 00:12:52.198 And this was considered innovative at the time. 44 00:12:53.938 --> 00:12:58.288 Here you can see the, uh, the architecture for the 2nd generation. 45 00:12:58.288 --> 00:13:01.499 So, the main change was that the. 46 00:13:01.499 --> 00:13:06.208 Chips here these small compute chips increased from 2 cores to 4 cores. 47 00:13:06.208 --> 00:13:14.129 And they also increase the bandwidth so that the course could talk to each other more easily actually faster. 48 00:13:14.129 --> 00:13:20.969 And it had the same general structure with the compute cards, the notecards, and putting them in big server racks. 49 00:13:21.989 --> 00:13:25.438 They also increase the total number of racks here. 50 00:13:25.438 --> 00:13:31.019 And the 3rd generation was a pretty big improvement. So they jumped from 4 quarters per. 51 00:13:31.019 --> 00:13:35.818 Ship here to 18, and they also changed it to a 64 bit. 52 00:13:35.818 --> 00:13:39.089 Uh, computing system, so they can have more RAM. 53 00:13:39.089 --> 00:13:43.649 And I'll pass it off to Reagan to talk about the applications. 54 00:13:43.649 --> 00:13:46.979 Sure, yeah, so, um. 55 00:13:46.979 --> 00:13:55.048 1 thing important that everyone is wondering about the stupid computer is that the hardware is interesting, but we also want to know what exactly has. 56 00:13:55.048 --> 00:14:05.308 Blue Jean done over its lifetime. Well, as we discussed a few slides, go the 1st, intentional application of blue jeans was to observe gene folding in real time. 57 00:14:05.308 --> 00:14:13.168 The name blue jeans, but there have been many other applications that have been tackled and there's a list of a few here. 58 00:14:13.168 --> 00:14:16.288 Eventually blue jeans mapped to the human genome. 59 00:14:16.288 --> 00:14:20.969 Investigated medical therapies, simulated radioactive decay. 60 00:14:20.969 --> 00:14:29.219 Replicated Blaine brainpower flew airplanes, pinpointed tumors, predicted climate trends and identified fossil fuels. 61 00:14:29.219 --> 00:14:33.719 A blue jeans was very popular in the health sector in terms of research. 62 00:14:34.193 --> 00:14:44.724 And the reason why it was so important is because of the amount of processing power, it had to put this in perspective a single scientist with a calculator. 63 00:14:44.724 --> 00:14:53.124 Like, you are, I would have to work nonstop for 177,000 years to perform a single operation that the blue Jean could do in 3rd. 64 00:14:54.839 --> 00:15:00.839 So, it has the processing power, not equal to what we and I could do. 65 00:15:00.839 --> 00:15:11.369 And our lives and the applications for the 2nd generation with the blue Jean P, uh, there's a few things. 66 00:15:11.369 --> 00:15:17.578 That was done 1. um, I'm sure some of them may have heard of is the world chess champion at the time. 67 00:15:17.578 --> 00:15:26.219 Use blue jeans train for competitions. Additionally, it was used to simulate about 1% of the human cerebral cortex. 68 00:15:26.219 --> 00:15:31.708 And this, it campaign 1.6Billion neurons 9Trillion connections. 69 00:15:31.708 --> 00:15:41.399 And then eventually it was used to when the scale 2011 challenge, which was an oil reservoir optimization application. 70 00:15:41.399 --> 00:15:44.849 So, it was used to in high computing scenarios like those. 71 00:15:48.389 --> 00:16:02.788 Excellent. And then for the 3rd and final gen, blue jeans, blue Jean Q. or Mira the 1st super computer it was the 1st supercomputer to cross, uh, 10 pedal lots of sustained performance, which at the time it was. 72 00:16:02.788 --> 00:16:06.808 A huge deal, and it was the fastest computer at the time. 73 00:16:06.808 --> 00:16:12.058 This computer also modeled the electro physiology of the human art, which achieved. 74 00:16:12.058 --> 00:16:16.948 Nearly 12 slots in a real in a real time stimulation of. 75 00:16:16.948 --> 00:16:20.399 It can be hard. 76 00:16:20.399 --> 00:16:31.589 And in conclusion, blue Jean was eventually replaced by Syndicators that many of us have heard of like, Watson and then the current 2nd, fastest computer summit. 77 00:16:31.589 --> 00:16:39.958 Which reaches 200, so that's much grander than blue jeans. But bluejeans itself was still insane at the time. 78 00:16:39.958 --> 00:16:52.109 Um, IBM researchers during when they created, bluejeans were the 1st to discuss parallel calculations. And IBM continues to make large contributions in the field with Watson in Summit. 79 00:16:54.928 --> 00:17:08.999 And that's all we have our sources. So hey, thank you very much. Then. Does anyone have any questions? I've unmuted everyone. So anyone would like to. 80 00:17:08.999 --> 00:17:15.929 Ask the questions, so which is the 1 RPI was using which generation. 81 00:17:17.278 --> 00:17:28.288 Or generations, you know, I'm not sure, but I do know that the last generation was available for commercial use. So I think or. 82 00:17:28.288 --> 00:17:33.659 There were available for commercial use for, like, business analytics and. 83 00:17:33.659 --> 00:17:41.278 Other sort of, um, uses, so I'm not sure, but I would assume the last generation. Okay. Well, thank you. 84 00:17:41.278 --> 00:17:47.098 Anyone else? Yeah. Is that a question? So. 85 00:17:47.098 --> 00:17:55.648 This seems like a pretty powerful computer, but it only simulated 1% of the human brain. And I guess the question is kind of 2 fold. 86 00:17:55.648 --> 00:18:01.618 I see the reason for a lot of the other applications, you guys mentioned, um, if you could go back to that slide, but. 87 00:18:01.618 --> 00:18:14.094 Do you know why simulating the human brain is kind of, of interest to people? Why is that a challenge that the blue Jean was important in solving? 88 00:18:14.094 --> 00:18:19.614 And why only 1% that could they have done more? Are we just really cool? 89 00:18:21.989 --> 00:18:29.338 While the human brains extremely complex, it has, I think billions of neurons and trillions of connections. 90 00:18:29.338 --> 00:18:33.479 So, that makes it extremely difficult because when they're all interconnected, it. 91 00:18:33.479 --> 00:18:42.419 It raises the complexity exponentially. Um, I'm not exactly sure why they were simulating it. I'm guessing it's to better understand how. 92 00:18:42.419 --> 00:18:47.578 The brain works, um, but I think even simulating 1% of the brain is. 93 00:18:47.578 --> 00:18:50.818 An extremely difficult task computationally. 94 00:18:51.959 --> 00:19:03.929 Yeah, thank you. I was wondering, since you mentioned that bluejeans was used for a lot of biomedical tasks, whether there was a medical reason behind simulating the brain. 95 00:19:05.519 --> 00:19:15.179 Well, I think a general answer is the trying to understand the body because then you. 96 00:19:15.179 --> 00:19:29.993 For example, design better drugs, I mean, the current process where you watch neurons firing, that's trying to understand computers by watching transmission always basically and you don't even have a concept of modulation. 97 00:19:32.638 --> 00:19:41.818 So that, you know, they're trying to get a little off the protocol stack. That's my guess. And then another goal would be to perhaps better proactive drugs. 98 00:19:41.818 --> 00:19:49.439 Again, right now the current state of the art unfortunately, is a very primitive. 99 00:19:51.328 --> 00:19:56.219 That's I'm I have no medical competence, but that's my guess. 100 00:19:58.618 --> 00:20:05.098 Anything else? Okay, we're easy. And. 101 00:20:05.098 --> 00:20:08.999 Tell us about physics applications. 102 00:20:11.308 --> 00:20:17.788 Oh, sure. 103 00:20:22.979 --> 00:20:27.179 See. 104 00:20:28.378 --> 00:20:31.769 Can you see my screen? Yes. 105 00:20:31.769 --> 00:20:39.808 Applicant yeah. Did I get your name wrong on something? 106 00:20:39.808 --> 00:20:44.578 Oh, wow. I mean, my preferred name is. 107 00:20:44.578 --> 00:20:52.588 Okay, I pulled it off. Okay. Okay. Okay. Let me I'm going to mute everyone in unmute. You. 108 00:20:52.588 --> 00:20:59.548 Um, just a sec. 109 00:20:59.548 --> 00:21:03.509 Okay, okay, see if you can go now. 110 00:21:03.509 --> 00:21:10.318 Oh, okay. So yeah, I'm going to talk about an application of. 111 00:21:10.318 --> 00:21:13.588 Parallel computing in physics. 112 00:21:13.588 --> 00:21:17.128 Uh, so the problem that. 113 00:21:17.128 --> 00:21:21.659 I want to talk about is in physics as the problem. 114 00:21:21.659 --> 00:21:25.318 This is a classic problem in physics. 115 00:21:25.318 --> 00:21:30.689 Which is predicting the motion of an, so that's true. 116 00:21:30.689 --> 00:21:36.628 A bot is to interact for invitation so basically for. 117 00:21:36.628 --> 00:21:41.519 Each article, uh, QA then, uh, the. 118 00:21:41.519 --> 00:21:45.148 The problem will need to calculate a force. 119 00:21:45.148 --> 00:21:50.909 That is, uh, that that is between, uh, this particle. 120 00:21:50.909 --> 00:21:53.939 With the other in minus 1 particles. 121 00:21:53.939 --> 00:21:57.659 So, uh, 1. 122 00:21:57.659 --> 00:22:01.528 Probably familiar example will be the 2 body problem. 123 00:22:01.528 --> 00:22:05.608 That we are, uh, learn in physics, I think physics. 124 00:22:05.608 --> 00:22:15.118 2, um, and also 1 of my favorite science fiction book talks about, uh, the free body. 125 00:22:15.118 --> 00:22:22.828 And since the forces are completed from simultaneously for each article, each time stamp. 126 00:22:22.828 --> 00:22:27.538 It is intuitive thing that we could improve around it. 127 00:22:27.538 --> 00:22:32.068 We should improve this, uh, the calculation of embody problems. 128 00:22:32.068 --> 00:22:37.739 Uh, using parallel program, so. 129 00:22:37.739 --> 00:22:40.798 To analyze the. 130 00:22:40.798 --> 00:22:46.078 Application of computing in the problem we can. 131 00:22:46.078 --> 00:22:54.058 Break down to see, uh, properties of this problem in 4 categories. 132 00:22:54.058 --> 00:22:57.989 So 1st, we'll see how we can partition this problem. 133 00:22:57.989 --> 00:23:04.858 So things we need to compute for and a position for each article at each time step. 134 00:23:04.858 --> 00:23:13.078 It is a natural to think that we can partition the tasks by by particle. 135 00:23:13.078 --> 00:23:17.189 And for communication. 136 00:23:17.189 --> 00:23:23.038 We can think that for we can if we look back at the formula, I see that. 137 00:23:23.038 --> 00:23:31.199 Oh, we need to we will need the values completed in the previous previous time. Is that. 138 00:23:31.199 --> 00:23:36.148 Generate, uh, the new, the new forces at the. 139 00:23:36.148 --> 00:23:45.898 Uh, new timestamp, uh, for aggregation. We, we can combine a task associated with the single particle because. 140 00:23:45.898 --> 00:23:50.009 Uh, the communication August on prr particle basis, because. 141 00:23:50.009 --> 00:23:55.558 The current value depends on the previous value of. 142 00:23:55.558 --> 00:24:00.209 A, certain of the of this party. 143 00:24:00.209 --> 00:24:03.898 And also for mapping. 144 00:24:03.898 --> 00:24:07.739 We can assume that worked on 1st step is roughly equal so. 145 00:24:07.739 --> 00:24:13.588 If we assign a balance number of articles for each processor. 146 00:24:13.588 --> 00:24:19.588 And we should, we will have a well balanced work workload. Uh. 147 00:24:19.588 --> 00:24:24.328 Distributed among the different processors, making this parallel. 148 00:24:24.328 --> 00:24:28.318 More. 149 00:24:28.318 --> 00:24:33.328 So, there's some benefits and, uh. 150 00:24:34.378 --> 00:24:38.878 Improvements, uh, using computing to, um. 151 00:24:38.878 --> 00:24:44.308 Problem so, for a typical scenario, a. 152 00:24:44.308 --> 00:24:48.388 Algorithm computing the and value prop. 153 00:24:48.388 --> 00:24:52.469 Problem we can see the wrong time. It will be in the square of the. 154 00:24:52.469 --> 00:24:56.098 Um, we, we need to compute. 155 00:24:56.098 --> 00:24:59.189 Uh, forces between. 156 00:24:59.189 --> 00:25:04.439 The article and minus 1 particles, which means there is. 157 00:25:04.439 --> 00:25:09.719 And minus 1 for each article, and we repeat that for an hour. 158 00:25:09.719 --> 00:25:15.239 And so it will be elsewhere by using I think we're using the. 159 00:25:15.239 --> 00:25:21.148 Uh, we're using the parallel computing, we can distribute the work. 160 00:25:21.148 --> 00:25:25.048 Uh, Holly among P part. 161 00:25:25.048 --> 00:25:29.909 Key processors, therefore, we can improve the running time to be, uh, Square. 162 00:25:29.909 --> 00:25:34.979 5 and using, and we can even further. 163 00:25:34.979 --> 00:25:37.979 Improve the parallel computing by. 164 00:25:37.979 --> 00:25:44.548 Find better better, uh, data structure, such as Barnes trees. 165 00:25:45.868 --> 00:25:50.368 And everything is to analog. 166 00:25:50.368 --> 00:25:54.509 So, using computing is definitely. 167 00:25:54.509 --> 00:25:58.318 More time efficient than stereo. 168 00:25:58.318 --> 00:26:02.249 Uh, program, serial program, um. 169 00:26:02.249 --> 00:26:09.148 So there is also besides the improvement of time that the. 170 00:26:09.148 --> 00:26:14.519 That parallel computing could give, uh, to benefit this problem. 171 00:26:14.519 --> 00:26:17.638 There's also improvement in precision. 172 00:26:17.638 --> 00:26:21.868 That could benefit Malaysian of ours. 173 00:26:21.868 --> 00:26:26.068 Uh, in your paper to, um. 174 00:26:26.068 --> 00:26:30.598 It stated that, uh, chief. 175 00:26:30.598 --> 00:26:35.009 A 1.0 box with an efficiency of. 176 00:26:35.009 --> 00:26:39.598 74% are using 4,096. 177 00:26:39.598 --> 00:26:43.108 If you use, so you can see that, uh. 178 00:26:43.108 --> 00:26:48.719 The application of parallel computing in the embody pro programming. 179 00:26:48.719 --> 00:26:53.189 Uh, significantly, uh, improving the time. 180 00:26:53.189 --> 00:26:56.669 And the precision of the simulations. 181 00:26:58.469 --> 00:27:01.739 Thank you and this is my references. 182 00:27:36.209 --> 00:27:48.209 Professor, if you're talking, we can't hear you. 183 00:27:59.519 --> 00:28:07.979 Yeah, it looks like your headsets muted, but not here. 184 00:28:07.979 --> 00:28:12.358 Okay, there. 185 00:28:12.358 --> 00:28:18.628 I don't know what, when I was sitting on mute many times. Okay. Yeah, I was asking so why do we want to simulate dead bodies? 186 00:28:21.298 --> 00:28:24.628 Uh, because it is Dave. 187 00:28:24.628 --> 00:28:29.009 Cross code physical problem and also. 188 00:28:29.009 --> 00:28:34.409 Assimilating of the interaction between the bot is conveyed by 2. 189 00:28:34.409 --> 00:28:37.949 Uh, um, like. 190 00:28:37.949 --> 00:28:42.989 Sales other than other than I. 191 00:28:42.989 --> 00:28:46.108 Just gravitation, it can be applied to, like. 192 00:28:46.108 --> 00:28:50.788 I think chemicals the party. 193 00:28:50.788 --> 00:28:55.828 Uh, the body of the articles and and their interaction. 194 00:28:55.828 --> 00:29:01.919 Okay, I, thank you. Okay no, 1 else has a question. 195 00:29:01.919 --> 00:29:07.588 Okay, so the 3rd talk is he and Tom telling us about the 1st, 2 computers. 196 00:29:07.588 --> 00:29:17.848 On the top, 500 list and to prevent background noise. So see mute everyone and unmute you 2 and, um. 197 00:29:21.328 --> 00:29:27.628 Okay, you can also unmute yourself. 198 00:29:30.689 --> 00:29:35.878 Okay, can you see that. 199 00:29:35.878 --> 00:29:40.019 Ah, yes okay. 200 00:29:42.419 --> 00:29:47.368 Okay, so Hi, my name's Tom. 201 00:29:47.368 --> 00:29:57.239 Hey, my name's Ian today will be the supplier topic is to describe the 2 fastest computers and the top 500 list. 202 00:29:57.239 --> 00:30:01.979 So, starting with the number 1 computer, uh, that's the, the food. 203 00:30:01.979 --> 00:30:09.239 It was jointly developed by and and it is located at the retina center for a computational science and Koby Japan. 204 00:30:09.239 --> 00:30:16.679 Unlike a lot of modern supercomputers, it's fully armed based CPU computer, super, Super. 205 00:30:16.679 --> 00:30:23.368 In practice, it's a, it has achieved over 442 flops position. 206 00:30:23.368 --> 00:30:32.999 Uh, 64 bit, uh, testing, uh, and what they call standard mode, which is just it's base clock frequency of 2 gigahertz on all the cpu's. 207 00:30:32.999 --> 00:30:43.499 Um, past that, it's theoretically capable of 488, double physicians, 64 bit and as far as 537. 208 00:30:43.499 --> 00:30:47.429 Kind of if it's in what they call their boost mode, which is. 209 00:30:47.429 --> 00:30:54.778 Um, where all the CPMs boost 2.2 days hurts the standard 2 to 2 gigahertz. 210 00:30:54.778 --> 00:30:59.159 It uses a custom Wednesdays kernel based around what's. 211 00:30:59.159 --> 00:31:06.479 What they call maternal, but it's based around as far as I know from my research that's based around, which is IBM side. 212 00:31:06.479 --> 00:31:19.169 Um, and it has, it's composed over over a 150,000 nodes or 158,900. he said to me that. 213 00:31:19.169 --> 00:31:25.439 And it consumes around 30 to 40 megawatts, depending upon the. 214 00:31:25.439 --> 00:31:32.519 Standard mode versus boost rather than how what how much it's doing, which is anywhere from 3600 to. 215 00:31:32.519 --> 00:31:37.078 Uh, 4,800 US dollars an hour at 12 cents per kilowatt hour. 216 00:31:37.078 --> 00:31:44.249 And then going over top position more, uh, each note, as I mentioned is an arm. 217 00:31:44.249 --> 00:31:47.909 Uh, could you to develop these arms. 218 00:31:47.909 --> 00:31:55.528 For both, they developed it specifically for this, uh, supercomputer. It's the. 219 00:31:55.528 --> 00:32:03.808 Pharmacy, which is a 48 for 48 computational ports, plus 4 non computational for us. 220 00:32:03.808 --> 00:32:08.398 And it's based around a 7 nanometer architecture. 221 00:32:08.398 --> 00:32:13.169 These aren't processed were used mainly because they are very, uh, power efficient. 222 00:32:13.169 --> 00:32:18.058 And do you want me then? Uh. 223 00:32:18.058 --> 00:32:24.719 Going over, it has each node or CPU has 32 gigabytes of 2 memory. 224 00:32:24.719 --> 00:32:28.019 A 1024 terabytes per 2nd bandwidth. 225 00:32:28.019 --> 00:32:36.898 And in total, if you combine all the notes, it's over 4.5, heavy bites total memory which up heavy right? This to the 50th. 226 00:32:36.898 --> 00:32:45.628 Rather than powers of the 10 as part of the 2, and then it's plopped it over 163 petabytes per 2nd, total memory bandwidth. 227 00:32:45.628 --> 00:32:52.108 And in the entire machine itself is over 893 kilometers of fiber optics and electrical table. 228 00:32:52.108 --> 00:32:59.788 So, just a little more history it's sort of development in 2014 is as the successor to the computer, which is. 229 00:32:59.788 --> 00:33:04.078 Their, the brighton's computer before and, um. 230 00:33:04.078 --> 00:33:08.699 Officially started operation in 2021. 231 00:33:08.699 --> 00:33:13.558 And was sponsored by the by, which is the ministry of education culture. 232 00:33:13.558 --> 00:33:25.439 Sports science and technology in Japan, it's a 2 story computing center. It's, uh, or at least the actual parts that make up the actual computer are 2, 2 stories. 233 00:33:25.439 --> 00:33:39.659 The 1st floor is the old AC units that cool everything. And then the 2nd floor is a computer room and all the server access. You can see some pictures there. Um, and you can also see the basic basis around how they structured their shelves. 234 00:33:39.659 --> 00:33:42.989 And their system in the bottom, right picture. 235 00:33:42.989 --> 00:33:55.888 So, as far as what it's been used for, it's, uh, they actually started being used a little bit earlier, um, for then it was scheduled to for 2019, specific projects. 236 00:33:55.888 --> 00:33:59.429 Uh, obviously that was done to try to. 237 00:33:59.429 --> 00:34:04.648 Improve our knowledge of 2019 that helps with issues like, uh, how. 238 00:34:04.648 --> 00:34:09.239 Effective our mass and various other topics like that. 239 00:34:09.384 --> 00:34:10.313 Japan wants these, 240 00:34:10.313 --> 00:34:10.523 uh, 241 00:34:10.583 --> 00:34:12.474 in terms of their, 242 00:34:12.653 --> 00:34:14.634 what they're calling society 5.0, 243 00:34:14.634 --> 00:34:25.253 which is really just that they want to look into how can we make people more safe and comfortable lives and more recent times it's been used for light matter interaction simulations. 244 00:34:26.969 --> 00:34:34.168 Tsunami prediction simulations for flooding and the atomic forces from month to Charlotte theory. 245 00:34:34.168 --> 00:34:38.728 So, and then now it's currently 3 times. 246 00:34:38.728 --> 00:34:43.438 Faster than the 2nd tip here, which we'll be moving on to you now. 247 00:34:43.438 --> 00:34:53.159 Thank you Tom, so the 2nd, fastest computers maybe know is the summit the for. 248 00:34:53.159 --> 00:35:00.838 Is located at the Oakridge National Laboratory in Tennessee, United States, and it was developed by both IBM and NVIDIA. 249 00:35:00.838 --> 00:35:12.509 Kind of interesting fact is that, uh, the Department of energy actually commissions them to build 2 different computers, which is the summit in this year, which is the 3rd best computer and operation. 250 00:35:12.509 --> 00:35:16.469 I saw earlier that they mentioned. 251 00:35:16.469 --> 00:35:27.148 The summit could reach 200 powerful Ops, but that's just a theoretical limit and boost mode. The highest it has actually achieved is just about 148.6. 252 00:35:27.148 --> 00:35:32.338 And it has a pretty big storage system I'd say, at the. 253 00:35:32.338 --> 00:35:38.639 250 petabytes, but only about 10 5 bites of that can actually be used as system memory. 254 00:35:38.639 --> 00:35:44.398 Similarly, to the this also runs on Red Hat by IBM. 255 00:35:44.398 --> 00:35:49.559 And it actually has, it believes fewer notes, which makes sense because it's smaller. 256 00:35:49.559 --> 00:35:52.978 And only 4,608 and. 257 00:35:52.978 --> 00:35:59.278 Also, since it's 4th is fast, it concerns about 4th as much power from 10 to 13 megawatts. 258 00:36:01.498 --> 00:36:07.409 So, next we can look a little bit more into the composition of what the computer. 259 00:36:07.409 --> 00:36:10.588 So, as I mentioned, it has 4,608 nodes. 260 00:36:10.588 --> 00:36:14.518 And unlike the of this 1 actually has 2 per node. 261 00:36:14.518 --> 00:36:18.298 Instead of 1, and it also has 6 GP use for. 262 00:36:18.298 --> 00:36:24.748 The cpu's are IBM power 9 and the are envious test the V100. 263 00:36:25.829 --> 00:36:29.579 And Calvin course was actually really complicated for me. I didn't. 264 00:36:29.579 --> 00:36:34.018 Quite see what was going on there, mainly because of the GP use. 265 00:36:34.018 --> 00:36:38.248 So, there's 80 streaming also processors, which they count as course. 266 00:36:38.248 --> 00:36:42.958 Otherwise that number of 2.4Million down the bottom would be much larger. 267 00:36:42.958 --> 00:36:48.389 They counted the code, of course, and the 22 cores and the per CPU. 268 00:36:48.389 --> 00:36:57.028 We're also included in that calculation. So if we did use the credit course, the real number would be more like 141Million or. 269 00:36:57.028 --> 00:37:02.969 159Million, if you include the tenser force, so. 270 00:37:02.969 --> 00:37:08.219 Next little bit of history just like development started in 2014. 271 00:37:08.219 --> 00:37:12.059 This computer was commissioned as an upgrade from. 272 00:37:12.059 --> 00:37:17.969 Tighten by department of energy, and it was 1st used in 2018. 273 00:37:17.969 --> 00:37:21.958 End 2018 it was the world's best computer. 274 00:37:21.958 --> 00:37:25.588 From June when start operation to November. 275 00:37:26.639 --> 00:37:31.889 In 2019, and now, since if we've got who's in operation, it's the 2nd, best computer. 276 00:37:31.889 --> 00:37:35.728 The 1 thing I thought was really cool is that they added onto it. 277 00:37:35.728 --> 00:37:44.219 Continually so, even in 2018, it was the fastest it only had about 2.3Million cores and now they're. 278 00:37:44.219 --> 00:37:54.599 2.4Million course, has shown before another crazy fact is how big this thing is so it's over 340 tons and 5,600 square feet. 279 00:37:54.599 --> 00:37:58.018 It's like 3 of my houses, which. 280 00:37:58.018 --> 00:38:05.159 It's kinda hard to fathom how big news are and I think it would be really interesting if they got scaled down in the future. 281 00:38:05.159 --> 00:38:09.208 With the same speed, so. 282 00:38:09.208 --> 00:38:16.079 Lastly, what is it being used for a lot has been done on cancer research with other things. 283 00:38:16.079 --> 00:38:19.920 Other processes and the specific. 284 00:38:21.059 --> 00:38:27.480 Uh, program that they're doing is the cancer distributed learning environment, a candle. 285 00:38:27.480 --> 00:38:32.010 Fusion is also being researched so that we can. 286 00:38:32.010 --> 00:38:37.710 Be a better civilization fusion by the way is just what powers the sun basically. 287 00:38:37.710 --> 00:38:41.969 But on medicine as well, so. 288 00:38:41.969 --> 00:38:47.670 Like, I mentioned cancer, there's also disease and addiction and. 289 00:38:47.670 --> 00:38:50.969 Couple other things they're trying to predict whether better. 290 00:38:50.969 --> 00:38:54.329 I believe forgot who is working on. 291 00:38:54.329 --> 00:39:00.659 Someone is working more on earthquakes as you can see in the bottom. Right? The San Andreas that's just an example. What may look like. 292 00:39:00.659 --> 00:39:08.159 And lastly, it's working on identifying new generations of materials. So that's what the material science people are working on. 293 00:39:08.159 --> 00:39:12.269 Trying to predict what something's gonna do before they actually make it. 294 00:39:14.460 --> 00:39:18.389 Which can have huge applications and save lots of money. 295 00:39:18.389 --> 00:39:21.869 Or research companies, that's it. 296 00:39:23.400 --> 00:39:28.500 Here are some references and does anyone have any questions for us? 297 00:39:28.500 --> 00:39:32.550 Yeah, thank you very much. 298 00:39:32.550 --> 00:39:38.969 Questions questions anyone. 299 00:39:43.380 --> 00:39:49.739 So why God, 1, then, why do you think they commission 2 computers simultaneously? 300 00:39:51.239 --> 00:39:54.389 Why is a good question? Um. 301 00:39:56.369 --> 00:40:08.844 They didn't want to put all their money into 1 bigger 1. yeah, I think it's still they had different processes. So, like, there was the 2 cpu's to 6 in the summit. So there's a 3 to 1 ratio. 302 00:40:08.905 --> 00:40:12.954 I think they were also playing with other ratios, or maybe other. 303 00:40:13.289 --> 00:40:16.349 Gpu system instead of the T V100. 304 00:40:18.570 --> 00:40:30.599 Okay, thanks just to comment on the thing about the CUDA course we'll get into that more detail later and videos themselves was getting a way of, um. 305 00:40:30.599 --> 00:40:34.949 A, while, back from even using the term cooler car, cause you've got all these. 306 00:40:34.949 --> 00:40:44.789 Separate chords doing different things inside the GPU, and they're not really grouped into CUDA courses units, floating units. 307 00:40:44.789 --> 00:40:50.760 10 X, Ray tracing units and NVIDIA keeps sparing the mix. 308 00:40:50.760 --> 00:40:56.219 Of 1 to the other and you're right so it does make it. 309 00:40:56.219 --> 00:40:59.760 Almost obsolete question to count the number of cores in a computer. 310 00:40:59.760 --> 00:41:06.449 So and you're right, you got on a big point by saying you're bearing the mix of. 311 00:41:06.449 --> 00:41:10.469 Cps to, and so on time to find what's optimal. 312 00:41:13.679 --> 00:41:17.070 Any other questions okay. 313 00:41:17.070 --> 00:41:28.920 So, let's see, Kevin will tell us about 5 and 6 and again, I just made this list in the order that you replied to my email so that you emailed me. So. 314 00:41:33.960 --> 00:41:38.550 And view it in your application, thank you. 315 00:41:38.550 --> 00:41:46.650 All right, so I'll be going over the 5th and 6th status computer as of November of last year. 316 00:41:46.650 --> 00:41:51.510 So, the 5th fastest computer is new in 2021. it's a promoter. 317 00:41:51.510 --> 00:41:55.320 At the National Energy Research science computing center in Berkeley. 318 00:41:55.320 --> 00:42:08.755 And it's made by HP, so, as of 2021, it's only completed phase 1, which is all the file systems and networks, the GPU nodes, intensive cores. And then the platform integrated storage has a 761,000 cores. And then 400. 319 00:42:12.150 --> 00:42:15.449 20,000 gigabytes should say gigabytes, but. 320 00:42:15.449 --> 00:42:21.659 Gigabytes of memory, and uses, uh, 2.45 years and the processors. 321 00:42:21.659 --> 00:42:27.000 And then it's interconnected shot 10, which was, uh, custom Britain. 322 00:42:27.000 --> 00:42:34.710 For the supercomputer, so based on the standard standard, it maxed out at about 70,000. 323 00:42:35.425 --> 00:42:49.465 For a 2nd, but it's got a theoretical max of almost 9,394,000 per 2nd, and using a separate standard the standard it can get up to 1935 per 2nd and as of right now it's got about 2600. 324 00:42:54.630 --> 00:43:01.320 Power consumption, and it runs the HP with the in compiler. 325 00:43:01.320 --> 00:43:06.239 And, uh, map library and with open. 326 00:43:06.239 --> 00:43:10.380 And the, the plan for the Super computer. 327 00:43:10.380 --> 00:43:17.280 Is use it for extreme scale science. So because this supercomputer was commissioned by the National Energy Research Center. 328 00:43:17.280 --> 00:43:21.510 They obviously want to use it to compute huge data sets. 329 00:43:21.510 --> 00:43:32.969 Try to find new energy sources, improve the nature of energy efficiency as well as discover some new materials. But as of right now, only phase 1 of this computer has actually been built. 330 00:43:32.969 --> 00:43:36.539 So, phase 2 is actually not done so this isn't. 331 00:43:36.539 --> 00:43:42.480 100% in use yet. So you can see here. This is, this is the general plan. 332 00:43:42.480 --> 00:43:48.360 So the last update for this construction is actually in November when it was ranked. 333 00:43:48.360 --> 00:43:51.750 So, as of right now, only phase 1 is complete. I didn't. 334 00:43:51.750 --> 00:43:56.250 So, anywhere online about anything about phase 2 so I'm assuming none of that has actually done. 335 00:43:56.250 --> 00:43:59.969 And this isn't actually being used right now, but. 336 00:43:59.969 --> 00:44:07.500 I sent it was planned on being completed in 2021. I think it's nearing completion and should be in you soon. 337 00:44:09.420 --> 00:44:13.619 And then the 6 fastest computers so we add the video by in the video. 338 00:44:13.619 --> 00:44:21.539 And this has 555,000 cores with a 1Million gigabytes of memory, and also uses an AMD processor. 339 00:44:21.539 --> 00:44:24.989 And its interconnect is the Mellanox and. 340 00:44:24.989 --> 00:44:29.670 And the same 2 using the same Q standard is before this 1 is. 341 00:44:29.670 --> 00:44:39.659 Obviously slightly slower, because range farther down it can only hit about 63 turn flops. And then with the it's 6,800 circles per 2nd. 342 00:44:39.659 --> 00:44:46.050 I think I wrote this number down wrong. I think it should just be 2600. 343 00:44:46.050 --> 00:44:50.880 Our consumption, so, yeah, this 1 actually runs 1 to. 344 00:44:50.880 --> 00:44:54.690 2008.04 and it uses in the video and vcc. 345 00:44:54.690 --> 00:44:58.590 Compiler and recruiter math libraries with open. 346 00:44:58.590 --> 00:45:02.670 And the video uses this mostly to do a ai work it. 347 00:45:02.670 --> 00:45:14.610 Does a lot of processing with self driving and robotics, that kind of stuff. And also, because this, this computer is completed in 2020, they used it to do a lot of research with and stuff like that. 348 00:45:16.590 --> 00:45:20.579 And here are the sources that I used for this. 349 00:45:20.579 --> 00:45:24.510 Presentation. 350 00:45:24.510 --> 00:45:32.309 Well, thank you. Does anyone have any questions. 351 00:45:33.449 --> 00:45:36.599 Free 1. 352 00:45:36.599 --> 00:45:43.289 So, we actually have a machine that wasn't running Linux. 353 00:45:43.289 --> 00:45:46.590 The running. Okay. 354 00:45:48.269 --> 00:45:51.900 And, um, this. 355 00:45:51.900 --> 00:45:59.639 Probably didn't come up, but I'm kind of curious. We've seen a lot of these machines used for different things. Right? 1 was just for mostly I. 356 00:45:59.639 --> 00:46:04.409 Others to use for physics problems or for, and so on. 357 00:46:04.409 --> 00:46:07.860 Is there a difference. 358 00:46:07.860 --> 00:46:15.179 In the built, like, was mentioned previously that there are GPU course to CPU course ratios. Um. 359 00:46:15.179 --> 00:46:24.570 Are those choices made knowing in advance what kind of problems they want to solve on it? Does it make a difference? What parallel problems you want to solve on it? 360 00:46:28.135 --> 00:46:40.014 No, 1 else wants to. Yes yes. I think somewhat, for example, in video the few years ago, they added the tenser cores and would attend circuit. Does it takes for vector? 361 00:46:40.344 --> 00:46:43.704 Basically multiplication vector replication. 362 00:46:43.980 --> 00:46:54.389 Major complication factor, addition with low precision, like to bite floats and so on because for the machine learning there is not a lot of. 363 00:46:54.389 --> 00:47:05.724 Significant bits in the data, so the machine learning no, I'm not an expert, so correct me if I'm wrong, but machine learning does not have much use, for example, for double precision floats. 364 00:47:06.445 --> 00:47:10.195 So, if your targets machine learning, you want to have more units that work with. 365 00:47:10.980 --> 00:47:14.340 Um, shorter data that would be the best example. 366 00:47:14.340 --> 00:47:24.119 And obviously it's working with shorter data, then you can push twice through it twice as much data through the same bandwidth. 367 00:47:24.119 --> 00:47:38.789 So and does that also fix the operating system they chose like you mentioned unique versus et cetera, et cetera or? I was joking. Actually, I was struggling, I don't know much about trail, but it may have a, um. 368 00:48:02.550 --> 00:48:15.835 Um, CRE, I was doing some of the fastest supercomputers 2 decades ago, the Craig computers, and he was famous for not doing floating point, standardize his fault. 369 00:48:15.835 --> 00:48:23.905 His float calculations were not I triple E standard, which means they were not as accurate, but they also took fewer gates to implement. So. 370 00:48:24.150 --> 00:48:27.989 He figured for the applications on his machines, he'd rather. 371 00:48:27.989 --> 00:48:32.340 Provide more floating point performance, but less accurate. 372 00:48:33.960 --> 00:48:37.500 So that was a design decision he made, so. 373 00:48:37.500 --> 00:48:40.980 I know I've got using video for a 3rd example. Um. 374 00:48:40.980 --> 00:48:52.230 A video, every generation, they they get, they affect the balance a few generations back. They had to Maxwell the Maxwell had fewer double precision units in it. So. 375 00:48:52.230 --> 00:49:04.050 They had more space for other stuff, but they're double. Precision performance was awful and then the version after that, when they went to Pascale, they reverted that decision. So, yeah, you look at the application. 376 00:49:04.050 --> 00:49:17.099 You keep bearing the mix another big 1 is the course processing versus storage an error that IBM made in the 1st, blue jeans was they had too little memory. 377 00:49:17.099 --> 00:49:25.440 So some of the users they decide, we will just idol some of the CPU so that we can use their memory on the other. So. 378 00:49:25.440 --> 00:49:28.860 That was, you know, check trading the balance. 379 00:49:30.539 --> 00:49:42.030 Anyone else have any comments on that. Okay, thank you. So now we will hear Zach, tell us about Python and what can you do parallel wise in it? 380 00:49:45.750 --> 00:49:52.199 All right can everybody see my screen? Yes. Great. 381 00:49:54.599 --> 00:50:04.800 So, Hello, everyone, my name's Zach and thank you all for being here. Virtually today for my presentation, which will cover the 5th topic on the topic list. 382 00:50:04.800 --> 00:50:08.909 That being pythons parallel capabilities. 383 00:50:08.909 --> 00:50:12.960 So, I broken my presentation up into 4 different sections. 384 00:50:12.960 --> 00:50:17.670 We'll start off by talking about the feasibility of parallel programming and Python. 385 00:50:17.670 --> 00:50:24.389 And then talk a little bit more about some of the important parallel Python modules and libraries out there. 386 00:50:24.389 --> 00:50:26.844 And finally wrap it up the conclusion at the very end. 387 00:50:27.684 --> 00:50:40.224 Um, so let's start by talking about feasibility of parallel pro game and Python for this whole implement for, for this whole presentation I should say, will only be considering the C Python implementation. 388 00:50:40.530 --> 00:50:47.010 Which is the, um, the official reference implementation of Python written in the C programming language. 389 00:50:47.010 --> 00:50:55.139 But the question we really need to ask ourselves right off the bat about Python is does it even support multi threading at all? 390 00:50:55.139 --> 00:50:59.219 The answer might surprise you. Um, not exactly. 391 00:50:59.219 --> 00:51:03.719 Um, Python is something called the, uh, global interpreter lock. 392 00:51:03.719 --> 00:51:09.719 Also notice the, which is basically a new text that protects access to shared Python memory. 393 00:51:09.719 --> 00:51:15.179 Basically ensures that 1 thread accesses, um, shared memory at a time. 394 00:51:15.179 --> 00:51:18.630 Um, and in doing this, it basically inhibits multithreading. 395 00:51:18.630 --> 00:51:26.340 And obviously, this isn't great for performance reasons. There have been, you know, several attempts to remove it over the years. 396 00:51:26.340 --> 00:51:30.179 Yeah, none of which have been completely successful. 397 00:51:30.179 --> 00:51:37.650 Um, because the other things have come to depend on the, and it's sort of made it more difficult with its dependencies. 398 00:51:37.650 --> 00:51:42.960 But all, it's not lost whoever there are still some workarounds for this issue. 399 00:51:42.960 --> 00:51:46.260 Um, for 1, you can use some different extra libraries. 400 00:51:46.260 --> 00:51:53.760 Like, non pie in pandas, um, you know, these libraries are sort of able to release the lock and get around it to. 401 00:51:53.760 --> 00:52:00.719 And performance in depth, different operations, and 2, you can also choose different Python with limitation. 402 00:52:00.719 --> 00:52:08.070 So there are many out there see, Python is really, you know, the most popular, but you can pick anyone you want. 403 00:52:08.070 --> 00:52:14.309 Um, that doesn't have the, um, yeah, such as iron Python and Python, um, et cetera, et cetera. 404 00:52:14.309 --> 00:52:18.269 So, now let's move on to, uh, talking about some. 405 00:52:18.269 --> 00:52:21.480 Important parallel Python modules. 406 00:52:21.480 --> 00:52:30.750 Probably the most famous and most useful, the multi processing module, um, which has the name implies uses processes instead of threads. 407 00:52:30.750 --> 00:52:35.730 Process referring to, um, sort of separate programs and execution. 408 00:52:35.730 --> 00:52:39.750 Um, whereas threads are more sub segments of the process. 409 00:52:39.750 --> 00:52:45.360 As you can see on the right so, 1 of the big difference here, being separate memory versus shared memory. 410 00:52:45.360 --> 00:52:51.630 And there are 2 big classes in this module. 1 is the process class. 411 00:52:51.630 --> 00:52:57.030 This would be more used for your function based parallelism applications. 412 00:52:57.030 --> 00:53:07.590 You've got different functions, you just sort of doing their own thing for some set of data. Um, whereas the pool class, it's more used for database parallelism. 413 00:53:07.590 --> 00:53:14.550 Where you've got the same function running over and over again, um, for some data that you pass into it. 414 00:53:14.550 --> 00:53:23.280 And we'll see an example on this slide. So, here we have the process class, we've got 2 different functions and. 415 00:53:23.280 --> 00:53:28.650 We make a process object for each of these with giving a target function arguments. 416 00:53:28.650 --> 00:53:38.309 You know, call the start function and the program waits for each of these processes to complete the joint function for continuing. 417 00:53:38.309 --> 00:53:42.389 On the right we have the pool class so here we only have. 418 00:53:42.389 --> 00:53:48.719 A single function called cube we make a pool object with 5 different processors. 419 00:53:48.719 --> 00:54:00.179 Um, and distribute 3 arguments 12 and 3 amongst these processors, um, with the map function and the program waits for the last processor. Um, you know, to complete. 420 00:54:00.179 --> 00:54:12.090 Or it continues, the other big module is the threading module. Um, which likely we talked about earlier isn't quite as good because it is, um, sort of fairly limited by the. 421 00:54:12.090 --> 00:54:15.449 But this module is really as simple as you would expect. 422 00:54:15.449 --> 00:54:19.110 The thread class is very similar to the process class. 423 00:54:19.110 --> 00:54:24.360 Yeah, you make it you make a thread object to run a separate thread of control. 424 00:54:24.360 --> 00:54:29.789 Give it a target function arguments, um, you know, call those same starting joint functions. 425 00:54:29.789 --> 00:54:35.699 That you would for the process class, the diagram on the right is actually taken from the. 426 00:54:35.699 --> 00:54:40.739 Lawrence Livermore, National Lab tutorial and it basically just demonstrates the fact that. 427 00:54:40.739 --> 00:54:46.199 Threads are more concurrent sunpass of control and process. 428 00:54:46.199 --> 00:54:49.500 They can all share the same program memory. 429 00:54:50.550 --> 00:54:54.119 So Here's a nice little summary comparison of the 2 modules. 430 00:54:54.119 --> 00:54:58.170 And multi processing is really nice, because it does avoid the. 431 00:54:58.170 --> 00:55:01.409 You can take advantage of different and course. 432 00:55:01.409 --> 00:55:05.489 Um, it's a lot more straightforward because you don't have to deal with things like. 433 00:55:05.489 --> 00:55:14.489 Race conditions, which can be a headache, but it does make interprocess communication slower because more difficult. I should say. 434 00:55:14.489 --> 00:55:18.780 Because it does have more overhead, a much larger memory footprint. 435 00:55:18.780 --> 00:55:24.329 The threading module is still kind of nice because you do to get that shared memory space. 436 00:55:24.329 --> 00:55:27.420 Faster inter, thread communication. 437 00:55:27.420 --> 00:55:30.480 I'm a smaller memory footprint. Um. 438 00:55:30.480 --> 00:55:35.099 But, in reality those limitations post by the really make. 439 00:55:35.099 --> 00:55:41.760 Multi processing the better option for most applications. We're trying to get the best performance. 440 00:55:41.760 --> 00:55:45.539 So, let's briefly talk about some of the other. 441 00:55:45.539 --> 00:55:49.230 Parallel Python library is out there too. Um. 442 00:55:49.230 --> 00:55:52.980 You have symmetric multi processing s amp P libraries. 443 00:55:52.980 --> 00:55:58.559 Um, they're using Concur, concurrent parallel program techniques, um, with shared resources. 444 00:55:58.559 --> 00:56:01.769 Um, some examples here be, um, this pie. 445 00:56:01.769 --> 00:56:08.789 Jesus asynchronous sockets and different polling mechanisms, uh, which is an open NP framework. 446 00:56:08.789 --> 00:56:17.849 In torque Pi, which is using multi threading the, um, cluster computing libraries and they're had other hand unlike. 447 00:56:17.849 --> 00:56:26.280 Libraries are not using shared resources. Um, so examples here would be the distributor task manager. 448 00:56:26.280 --> 00:56:31.769 Um, an, for pie, which is an implementation. 449 00:56:31.769 --> 00:56:36.869 You also have libraries that could do some cloud computing and interacting with. 450 00:56:36.869 --> 00:56:41.130 Resources that are managed by a 3rd party. Your company. 451 00:56:41.130 --> 00:56:46.260 Examples be like Google, cloud app, engine, Amazon Web services. 452 00:56:46.260 --> 00:56:53.670 An example, the specific library would be, um, high comps, which was able to interact with different clusters and clouds. 453 00:56:53.670 --> 00:57:00.090 And I didn't list any specific examples here, but you also have grid computing libraries. 454 00:57:00.090 --> 00:57:07.739 And these are libraries that are interacting with multiple computers and that might be connected across some big networks. 455 00:57:07.739 --> 00:57:15.420 Um, but in conclusion we've really seen that Python may not be the best choice. 456 00:57:15.420 --> 00:57:22.679 For your parallel computing applications, mostly because of that global interpreter LOC like we talked about. 457 00:57:22.679 --> 00:57:28.889 The fact that, you know, it's sort of makes running multi threaded applications, pretty difficult. 458 00:57:28.889 --> 00:57:32.010 Despite, you know, these downsides, you know. 459 00:57:32.010 --> 00:57:36.360 Python really is still 1 of the easiest languages to use and pick up. 460 00:57:36.360 --> 00:57:42.269 Um, it's very simple, um, and in applications where you are working with, you know, processes. 461 00:57:42.269 --> 00:57:45.840 And you can get some really helpful functionality for modules like. 462 00:57:45.840 --> 00:57:51.510 Um, multiprocessing, um, and some other modules and libraries out there. 463 00:57:52.650 --> 00:57:56.460 Um, but that's all I had to say for today. I just wanted to thank everybody for. 464 00:57:56.460 --> 00:58:00.420 Listening and just open it up for questions here. 465 00:58:00.420 --> 00:58:05.820 At the end of my presentation well, thank you. 466 00:58:05.820 --> 00:58:12.449 Questions anyone. Okay. 467 00:58:12.449 --> 00:58:21.570 So the fact that it's easy to use trumps the fact in practice. I think that that global interpreter WalkMe slow you down. That's what I'm guessing. 468 00:58:21.570 --> 00:58:28.949 Exactly, yeah, because your time is more important than the machine's time and lots of the cases. 469 00:58:28.949 --> 00:58:37.349 I'm being serious. Oh, the question about the, um, did I say I was told? I did, but I just can't remember the application now. Um. 470 00:58:37.349 --> 00:58:40.800 It's on the video is sending the video works. 471 00:58:41.849 --> 00:58:49.079 Okay, so Mark will have a comment. Yeah, I've done some work in this space in. 472 00:58:49.079 --> 00:58:50.215 The Gill is a problem, 473 00:58:50.215 --> 00:58:50.364 but, 474 00:58:50.364 --> 00:58:51.324 like you mentioned, 475 00:58:51.355 --> 00:58:53.034 the developers time is more important, 476 00:58:53.065 --> 00:58:55.704 and a lot of the libraries built, 477 00:58:55.735 --> 00:58:56.005 like, 478 00:58:56.005 --> 00:59:00.085 TensorFlow or pie torch for common parallel problems like a AI, 479 00:59:00.715 --> 00:59:01.284 um, 480 00:59:01.315 --> 00:59:03.925 extrapolate a lot of the process into, 481 00:59:03.954 --> 00:59:04.224 you know, 482 00:59:04.224 --> 00:59:10.014 separate optimized compiled non guild code so writing your own is tricky, 483 00:59:10.014 --> 00:59:11.485 but a lot of libraries do that, 484 00:59:11.485 --> 00:59:12.594 which makes Python. 485 00:59:13.230 --> 00:59:17.309 A better choice, but still not the best choice. 486 00:59:17.309 --> 00:59:20.519 Right. 487 00:59:20.519 --> 00:59:23.639 Okay, well Thank you. Yeah, I. 488 00:59:23.639 --> 00:59:32.219 Personally, you C, plus plus not Python for my program so I don't of that, but we also have see realizing locks for example, for memory. 489 00:59:32.219 --> 00:59:37.500 Allocations on the heap, so you you get those comparable. 490 00:59:37.500 --> 00:59:40.889 Difficulties in the other platforms also. 491 00:59:40.889 --> 00:59:52.889 Do those locks serve to, like, prevent race conditions when you're, like, accessing? Like, when you're when multiple things are running at the same time? Like, do they do they prevent memory from being read and written to at the same time? 492 01:00:17.070 --> 01:00:30.179 Computing and dumping stuff into a column in array so they need a unique index. So you have a counter for the number of items that have been allocated so far. So you read the calendar, you add 1 to it and record that number and write it back. 493 01:00:30.179 --> 01:00:37.829 And that's but enough to processes and or leave the read model, the read modify rights you can guess what will happen. 494 01:00:37.829 --> 01:00:41.699 And I've got an example and see what you. 495 01:00:41.699 --> 01:00:44.969 It actually will happen if you write a parallel program that. 496 01:00:44.969 --> 01:00:48.719 Different threads will step on each other's toes you might say. 497 01:00:50.159 --> 01:00:53.909 So, that's what they avoid with the serialization. 498 01:00:53.909 --> 01:00:59.280 And it's made worse if. 499 01:00:59.844 --> 01:01:10.255 Again, because, you know, when you write back to memory, it may not be reflected in the memory as seen by the other processes. Immediately it has to go through a cash. Perhaps. 500 01:01:10.255 --> 01:01:15.985 And if you try to force everything to be always consistent, immediately, that will be the cost will be just too great. 501 01:01:18.929 --> 01:01:33.150 Yeah, so that's the problem that you're fighting and that comes into the embarrassing parallelism question I put up a homework today, is that if you can decompose the stop. So it's not. 502 01:01:33.150 --> 01:01:39.929 Writing to the same memory, so the different processes different types are not running to the same memory. Then it gets much more parallelizable. 503 01:01:39.929 --> 01:01:44.969 But that depends on the application. 504 01:01:44.969 --> 01:01:52.500 Other comments. Okay. So we will hear about P threads now. 505 01:01:54.269 --> 01:01:59.159 Okay, I'll share my screen. 506 01:02:04.289 --> 01:02:08.190 We all can. 507 01:02:12.840 --> 01:02:16.110 Get a confirmation, you can see my screen. 508 01:02:17.159 --> 01:02:28.800 Yes, you can see your positive threads slide. Okay. Thanks. Because sometimes I'm double. Okay, so, me and, uh, Mark me and Bill for that. Pete Greg. 509 01:02:28.800 --> 01:02:33.480 In class, um, I'll just start off with a, uh. 510 01:02:33.480 --> 01:02:36.630 Basic overview, um. 511 01:02:36.630 --> 01:02:43.769 In a UNIX slash Linux operating system the CnC plus languages provide. 512 01:02:43.769 --> 01:02:47.639 P threads standard. 513 01:02:47.639 --> 01:02:54.000 For all red related functions so basically lets us create multiple threads. 514 01:02:54.000 --> 01:02:57.000 Uh, process flow. 515 01:02:57.000 --> 01:03:06.719 And it's defined as set of C, language types and procedure calls using a higher file. 516 01:03:06.719 --> 01:03:15.900 Um, in a thread library, so, uh, yes, specifies a set of interfaces for threaded programming. 517 01:03:15.900 --> 01:03:26.159 And what I found is most effective on a multi processor, multi core systems, where threads can be implemented on colonel level. 518 01:03:26.159 --> 01:03:30.840 Um, to achieve the speeds of execution. 519 01:03:30.840 --> 01:03:37.440 Um, and, yeah, so basically, you have to include the, the header file. 520 01:03:37.440 --> 01:03:42.809 At the beginning of your script to use all the functions that it gave him a library. 521 01:03:42.809 --> 01:03:50.699 And then to execute the file, you usually on the command line dash P. T. dash L. 522 01:03:50.699 --> 01:03:54.750 1, to 1, um. 523 01:03:56.940 --> 01:04:02.849 And, yeah, in case, you didn't know if a thread is a procedure running independently from his main program. 524 01:04:04.679 --> 01:04:09.840 So, for synchronization uses, uh, Texas. 525 01:04:09.840 --> 01:04:14.550 Um, and like like a work on. 526 01:04:14.550 --> 01:04:22.019 Uses the close this call and for thread both processes share the same memory. 527 01:04:22.019 --> 01:04:29.460 So, for example, the 4th column that uses a minimal sharing and the feedback create, um. 528 01:04:29.460 --> 01:04:33.690 Function calls cloning as much as possible. 529 01:04:33.690 --> 01:04:38.219 Um, food. 530 01:04:38.219 --> 01:04:42.059 For example, awesome. Thanks. So. 531 01:04:42.059 --> 01:04:52.110 Over here, we have example C code of how we can set up a p thread we import. The main thing. Is you import P threads that H. C. 532 01:04:52.110 --> 01:05:02.159 We've defined 4 different threads that we're gonna create and then in the main function, we're going to iterate through those 4 numbers and create 4 different threats. 533 01:05:02.159 --> 01:05:08.400 So really quick. Really simple is not too difficult. Set up. 534 01:05:09.355 --> 01:05:23.454 Next slide so a great way to understand what P threads does is to understand threads and processes versus processes as we've touched upon in the last slide show. So. 535 01:05:23.730 --> 01:05:28.800 Forking is a way to create a new process threading. 536 01:05:28.800 --> 01:05:32.159 Is a way to create more threads within a process. 537 01:05:32.159 --> 01:05:41.159 So, 14 will create a new process. Each has its own separate memory and due to this, it's a lot of overhead. 538 01:05:41.159 --> 01:05:46.019 Threading is is creating new threads within each process. 539 01:05:46.019 --> 01:05:50.849 Within each process, um, each thread. 540 01:05:50.849 --> 01:06:00.960 Will share the same piece of shared memory, share data code and files what differentiates them is the register address. 541 01:06:00.960 --> 01:06:10.559 Excellent, and a great analogy is that forking will create an entire new new lamp if you think about it. 542 01:06:10.559 --> 01:06:14.400 These 2 new lamps will suck up different. 543 01:06:14.400 --> 01:06:18.750 Um, power sources in your wall outlet, they shared different pieces of. 544 01:06:18.750 --> 01:06:25.320 Data when you want to turn 1 on, you got to turn it on concurrently. So you got to turn light 1 on, like, 2 on. 545 01:06:25.320 --> 01:06:36.449 Threads is like creating multiple threads. So each light bulk in the 2nd picture shares the same stem same power outlet. 546 01:06:36.449 --> 01:06:40.320 You're able to turn them all at once, turn them all off at once. 547 01:06:40.320 --> 01:06:45.719 And it's a much better way to run things. 548 01:06:45.719 --> 01:06:48.900 In parallel. 549 01:06:51.329 --> 01:06:55.199 Yep, this is these are our resources and are there any questions. 550 01:07:12.539 --> 01:07:19.949 So, I was just having trouble muting myself. Um. 551 01:07:21.420 --> 01:07:26.820 And so anyone questions. 552 01:07:26.820 --> 01:07:30.239 You don't see threads is widely used or not. 553 01:07:30.239 --> 01:07:35.760 Any comments on should I start teaching it more in class? Perhaps. 554 01:07:37.980 --> 01:07:41.400 I think it'd be interesting. I don't have any experience with. 555 01:07:41.400 --> 01:07:45.420 Threads or anything. So do you need to hear about. 556 01:07:45.420 --> 01:07:49.769 Okay, anyone else have other questions. 557 01:07:49.769 --> 01:07:55.199 We have 1 little question you showed us the example using P threads and see. 558 01:07:55.199 --> 01:08:01.110 Can it also be used in other programs like Python? Yeah. 559 01:08:01.110 --> 01:08:12.659 Yes, I don't have too much experience with that. I know um, when I code and Python, I use, which. 560 01:08:12.659 --> 01:08:16.289 It doesn't really create threads. It just makes. 561 01:08:16.289 --> 01:08:26.279 Functions we on each other or something, but you can, uh, tend to achieve the same thing. The doesn't sound doesn't mean Python. It means. 562 01:08:27.479 --> 01:08:33.960 The. 563 01:08:33.960 --> 01:08:43.979 Yeah, the thing is that it's it's a standard layer on top of the hardware so you could have different back ends to it. I believe and. 564 01:08:43.979 --> 01:08:47.189 You could have the standard I guess you could have the standard layer and Python. 565 01:08:48.329 --> 01:08:57.149 So other questions or comments. 566 01:08:58.260 --> 01:09:08.520 Okay, so let's hear Adrian James tell us about what? Cool things the astronomers are doing with parallel computers. 567 01:09:33.630 --> 01:09:38.310 Hello can you guys see my screen? All right? No, we can't. 568 01:09:38.310 --> 01:09:46.020 Okay, it's coming it. It came in for a 2nd and then vanished. 569 01:09:47.369 --> 01:09:51.359 All right, let me try this again. Um. 570 01:09:53.189 --> 01:09:56.189 How about now? Yes. 571 01:09:56.189 --> 01:10:01.140 All right cool. 572 01:10:03.925 --> 01:10:16.135 All right, so I'm going to be talking about parallel computing and astronomy. Uh, I'll be specifically focusing on a NASA project called high performance space, flight computing, or. 573 01:10:17.550 --> 01:10:27.149 Um, yeah, so this is a project undertaken by, uh, NASA and. 574 01:10:27.149 --> 01:10:35.395 Uh, it was formulated by some engineers at jet propulsion laboratory, or and this was formulated in recent years. 575 01:10:35.395 --> 01:10:46.314 I believe this talks 1st started around 2015 and it was kind of a result of the sort of stagnation of the of space flight computing specifically in the software. 576 01:10:48.204 --> 01:11:00.654 Um, so the project has a hardware and software aspect to it in terms of hardware they're designing in house, some new multi core computing chips as well as multiple processing queries on each chip. 577 01:11:01.465 --> 01:11:02.604 But more importantly, 578 01:11:02.755 --> 01:11:07.555 NASA is looking into developing new operating software and 1, 579 01:11:07.555 --> 01:11:14.814 such surrogate for this sort of new development of parallel computing is in the form of the dissent in landing computer, 580 01:11:14.845 --> 01:11:17.244 which is maintained at NASA Johnson Space Center. 581 01:11:18.960 --> 01:11:28.770 So, space flight computing, as it stands, um, looking into it and what currently exists out there, it is overwhelmingly cereal. 582 01:11:28.770 --> 01:11:42.539 Um, so NASA sees the sort of landscape of parallel computing, and they think to themselves, um, this seems like a very interesting field that we could sort of pioneer for the future. Um. 583 01:11:42.539 --> 01:11:56.640 So, yeah, NASA projects, a lot of them are pushing the boundaries of what can be achieved through hardware. Uh, but the software is something that they are just now starting to push forward in order to catch up with everything else. 584 01:11:58.375 --> 01:12:10.524 The descent and landing computer it is part of a larger NASA project known as Splice and this project is dedicated to implementing advanced technologies on space crafts. 585 01:12:11.244 --> 01:12:16.225 The descendant landing computer has to perform massively resource intensive algorithms, 586 01:12:16.225 --> 01:12:16.404 like, 587 01:12:17.125 --> 01:12:26.904 or rain relative navigation as well as compute things like video processing and graphics in order to manage. 588 01:12:26.935 --> 01:12:27.385 Um. 589 01:12:27.659 --> 01:12:37.680 All of the data that's coming in through the sensors. Uh, so parallelization of these algorithms seems like a very beneficial endeavor for NASA to pursue. 590 01:12:39.960 --> 01:12:44.220 So, speaking, specifically about the descent and landing computer. 591 01:12:44.220 --> 01:12:49.524 Uh, this was really the 1st, real test for parallelization of spaceflight algorithms. 592 01:12:49.885 --> 01:13:00.324 It's still in its early stages and as previously mentioned, it's part of the safe and precise landing integrated capabilities, evolution or splice project. 593 01:13:01.045 --> 01:13:07.375 And as it stands, it's acting as the circuit board for the, the high performance spaces by computer. 594 01:13:40.585 --> 01:13:54.324 Ah, so I worked specifically with the, which is the 2nd iteration of the dissent and landing computer. Um, and the, which is the 1st iteration was recently tested on a blue origin, new shepherd flight. 595 01:13:54.534 --> 01:13:59.515 So they've been running tests since about 2020. um, and I think they've run, like, 4 or 5 tests by now. 596 01:14:39.420 --> 01:14:51.359 So, parallel computing in flight is very novel, and it's very exciting proposition. Um, and because it's mission critical, you can't have faulty software, uh, you know. 597 01:14:52.949 --> 01:15:07.734 The price of an entire space ship feeling in flight the development for it is going relatively slowly, but conversely there's still a lot of exciting applications that can be improved using parallelization that are focused mainly on ground. 598 01:15:08.039 --> 01:15:21.029 For example, it was discussed, I believe, in an earlier lecture at peril computing is being used to keep track of stars and various star systems that we are observing as well as sort of. 599 01:15:21.029 --> 01:15:28.680 Uh, performing the calculations of tracking celestial objects that are more local, specifically different. 600 01:15:28.680 --> 01:15:30.505 Planets within our solar system, 601 01:15:30.774 --> 01:15:35.574 or if there's any media or asteroids that we need to be on the lookout for parallel computing, 602 01:15:35.574 --> 01:15:44.215 provides us the sort of resources and computational ability to keep track of all these dynamic and moving parts. 603 01:15:46.350 --> 01:15:52.710 And that's the end of my presentation Thank you all for listening. And here are some of the works cited. 604 01:15:52.710 --> 01:16:02.159 Cool. Thank you. Great to hear somebody involved in the project questions. 605 01:16:02.159 --> 01:16:08.159 Anyone. 606 01:16:08.159 --> 01:16:13.590 Oh, it's a silly question. And so what do they hope to gain from parallel is finalization. 607 01:16:13.590 --> 01:16:16.890 It's better performance. 608 01:16:16.890 --> 01:16:21.090 Yeah, so I believe the sort of. 609 01:16:21.204 --> 01:16:28.314 Rationale behind it was, uh, NASA is implementing a lot of these new complex algorithms. 610 01:16:28.734 --> 01:16:37.914 Uh, and I feel like they find themselves more and more sort of balancing a lot of resources that they don't have a lot of, uh, access to. So. 611 01:17:05.220 --> 01:17:09.720 Right. Sounds sensible other comments. 612 01:17:09.720 --> 01:17:17.760 Okay, thank you. So I think we'll do 1 more talk today and then finish off on. 613 01:17:17.760 --> 01:17:25.409 Monday, so, Colin, would you like to tell us about the 3rd and 4th computers? 614 01:17:25.409 --> 01:17:29.970 Sure, let me share my screen here. 615 01:17:29.970 --> 01:17:33.359 I was trying to see something. 616 01:17:33.359 --> 01:17:36.960 Okay. 617 01:17:44.579 --> 01:17:48.750 All right does that show up? Okay? Oh, yes, it does. 618 01:17:48.750 --> 01:17:57.420 Already, so I'm going to talk a bit about computers 3 and 4 on the top 500 list. 619 01:17:57.420 --> 01:18:04.199 Starting with number 3, uh, Sierra, which was previously mentioned by 1 of the other groups. 620 01:18:04.199 --> 01:18:13.380 So, just to give a little bit of background, Sierra 1st went online back in 2018. 621 01:18:13.380 --> 01:18:25.380 And it was actually used a bit, um, before it was fully completed. So users were able to use portions of the system in early 2018. and then it was completed later in the year. 622 01:18:25.380 --> 01:18:29.819 Um, as a replacement to, um. 623 01:18:29.819 --> 01:18:37.739 Which was at 1 point a number 1 computer on the top 500 list. Um, uh, I suppose a decade ago at this point. 624 01:18:37.739 --> 01:18:49.319 And so this computer was commissioned by the National nuclear security administration, and it's hosted at the Lawrence Livermore lab over in California. 625 01:18:49.319 --> 01:19:03.300 And this isn't a computer that's really open for a wide variety of public use, like, for research, or by students or anything like that. This is specifically for their advanced stimulation and computing program. 626 01:19:03.300 --> 01:19:13.470 Which is part of that program they're looking to, instead of running, um, underground, nuclear tests on new weapon designs. Um, they're looking to. 627 01:19:13.470 --> 01:19:25.380 Uh, test these weapons via simulation on, uh, on this computer and then run, of course, any other necessary or relevant engineering and nuclear science calculations. 628 01:19:25.380 --> 01:19:33.899 And as part of this investment, they invested 150Million dollars to get the system built. 629 01:19:35.399 --> 01:19:43.380 So, getting into specifications, this is a heterogeneous system leveraging both CPUs. Um. 630 01:19:43.380 --> 01:19:46.470 You'll see in some of the, um. 631 01:19:46.470 --> 01:19:56.100 Press releases regarding the system they highlight the use of is quite a bit as its predecessor didn't utilize. 632 01:19:56.100 --> 01:20:04.739 So, they were particularly excited about what the could bring to the nuclear science calculation calculations that they were performing. 633 01:20:04.739 --> 01:20:15.899 But on the CPU side of things, they used an IBM power 9 architecture with 4,320 compute nodes. 634 01:20:16.585 --> 01:20:30.145 And within each compute node, you had 2 CPUs with each CPU being 20 of 22 cores, giving us a total of over 190,000, total CPU cores for the compute portion of the system. 635 01:20:31.800 --> 01:20:38.489 And as you can see in that snippet over on the right there, that was from a, um. 636 01:20:38.489 --> 01:20:52.199 In progress updates that I was able to find on the system that the Lawrence Livermore laboratory, um, had online and that just lists, um, how, uh, the compute racks. Like we saw, um. 637 01:20:52.199 --> 01:21:01.229 Couple slides ago here how those were allocated between the different portions of the system from the compute portion to the network and storage portions. 638 01:21:01.229 --> 01:21:11.670 Besides the on each of those compute nodes, uh, with those 2 CPUs you had 4 and video. 639 01:21:11.670 --> 01:21:23.039 So a bit of a downgrade from Summit, which I think he was mentioned, had 6 per node this has 4 per node for over 17,000 total in the system. 640 01:21:23.039 --> 01:21:28.350 On the memory side of things they were able to use for. 641 01:21:28.350 --> 01:21:39.479 And they had just under 1.3 petabytes of it and being of a being a newer architecture power 9, allowed them to use 4.0. um. 642 01:21:39.479 --> 01:21:49.529 As an interconnect method within each node and then, of course, Volta allowed them to use link to facilitate communication between. 643 01:21:50.635 --> 01:22:04.164 And then, as far as communication between the notes themselves, they use a fairly standard solution for the industry standard solution from Mellanox called their infant band, which, I believe, permitted communication up to 100 gigabits per. 644 01:22:05.399 --> 01:22:08.460 Per 2nd, or a 1000 sorry gigabits per. 2nd. 645 01:22:08.460 --> 01:22:13.380 Um, and then sort of storage and, um. 646 01:22:13.380 --> 01:22:27.119 They equipped each node with 1.6, um, terabytes of, um, storage and then they on the software side of things. They kept things fairly standard using, uh, Red hat on the system. 647 01:22:29.875 --> 01:22:43.494 Source performance, I won't go through every number here, but, um, just for reference um, when was disassembled its predecessor that, uh, that system was I believe ranked number 22 on the top 500 list. 648 01:22:43.494 --> 01:22:50.635 Um, this of course now, being at number 3. so, a significantly better performer. But at 5 times, the power efficiency. 649 01:22:52.555 --> 01:23:06.805 Of course, that only means so much when you're talking about a, uh, an 11 megawatt peak system. So quite a bit of power, but significantly more efficient than a number 4 before on the top 500 list that I'll talk about. Um. 650 01:23:07.619 --> 01:23:17.430 Here in just a moment that being the Sunday title light, which is located in Russia, China. 651 01:23:17.430 --> 01:23:21.899 Now, this system is a little bit older um. 652 01:23:21.899 --> 01:23:34.350 It went online back in 2016 to replace the 2, which was another, um, number 1 computer on the top 500 list. 653 01:23:34.350 --> 01:23:45.300 And this, this computer was developed by the Chinese, National Research Center for parallel computing, engineering and technology. 654 01:23:45.300 --> 01:23:55.500 Um, and working with some universities in the area, they opted to host the computer at the National supercomputer center in rashi. 655 01:23:55.500 --> 01:24:09.239 Now, unlike Sierra, which was constructed for a specific research purpose, this is a very much so multipurpose computer for. 656 01:24:09.239 --> 01:24:13.470 Used by a variety of groups and universities in the area. 657 01:24:13.470 --> 01:24:24.000 You can see a list there from a quote snippet from 1 of the universities as far as what they use the computer for, from weather aerospace to biomedicine. 658 01:24:24.000 --> 01:24:29.159 Et cetera, so used for quite a number of things over there in China. 659 01:24:29.159 --> 01:24:39.479 But this was a, as far as equivalent to the U s dollar, this was over at 273Million dollar investment, which. 660 01:24:39.479 --> 01:24:48.180 Is to be expected, considering the proprietary nature of much of this much of the solution. 661 01:24:48.180 --> 01:24:53.670 Now, they unlike to the. 662 01:24:53.670 --> 01:24:59.520 Did not, um, they didn't go with a, uh, like a Pre. 663 01:24:59.520 --> 01:25:13.680 Predesigned Intel chip, or anything like that. They went with a custom, uh, reduced instruction set processor that they designed themselves. Um, you can see a picture of it and it's heat spread over on the right there. 664 01:25:14.784 --> 01:25:25.765 But this reduced instructions set ship was a mini core, a 260 mini core processor, which is similar to multi core. Except each core is a bit simplified. 665 01:25:26.005 --> 01:25:32.574 Um, and the system as a whole is optimized or the chip as a whole is optimized for parallel computing purposes. And, um. 666 01:25:34.409 --> 01:25:39.479 As part of that very much optimized for Cindy instructions. 667 01:25:39.479 --> 01:25:44.220 But when you look at how many, um. 668 01:25:44.220 --> 01:25:53.399 Processors they included in the system and how many quarters those processors have you have over 10Million total processing cores in the system. 669 01:25:53.399 --> 01:25:57.930 And, of course, those processors are the only processors used. 670 01:25:57.930 --> 01:26:05.010 Um, as part of, uh, as part of this computer, they don't use any distinct or anything like that. 671 01:26:05.010 --> 01:26:15.869 And the memory side of things, a little bit less than the Sierra at just above 1.3 petabytes. 672 01:26:15.869 --> 01:26:20.819 And unfortunately, a lot of information about this system, um. 673 01:26:20.819 --> 01:26:27.300 Has been held pretty close to the chest by China, so I couldn't find too much on. 674 01:26:27.300 --> 01:26:32.819 Um, how on how some of these custom solutions. 675 01:26:32.819 --> 01:26:45.779 Or a design, but for example, their interconnect solution, they call the subway network. Um, and I couldn't find too much info about that. But from what I could gather, it seemed to be 3.0 based. 676 01:26:45.779 --> 01:26:58.260 And then, as far as storage, unlike Sierra, which has an on each compute node, um, this system doesn't have any non vial tile memory or storage. 677 01:26:58.260 --> 01:27:08.399 Within the system at all, it totally relies on contacting and pulling from external storage servers for whatever it needs. 678 01:27:08.399 --> 01:27:18.270 And then, as far as the operating system, a little bit custom, they call it. There's some way raised, but still very much Linux based. 679 01:27:20.039 --> 01:27:31.529 Smartest performance, it's theoretical peak in performance, uh, is listed on top 500 are actually quite close to Sierra. Um. 680 01:27:49.680 --> 01:27:53.729 So, I'm sure I pull it down a bit in the ranking. 681 01:27:53.729 --> 01:28:00.989 As far as power usage, very efficient relative to its process to its predecessor. Um. 682 01:28:00.989 --> 01:28:15.210 Using 14% less energy than the 2? Well, being almost 3 times as fast, but still running at over 15 megawatts at peak. So not quite as efficient as, um. 683 01:28:15.210 --> 01:28:20.130 As the Sierra supercomputer, um, and then. 684 01:28:20.130 --> 01:28:25.439 1, last interesting piece of information that I found. 685 01:28:25.439 --> 01:28:39.869 We're actually rumors about a potential successor to this system called ocean light and from what I saw, um, ocean lights rumored to be a 1.3 X of flop. Um, computer. 686 01:28:39.869 --> 01:28:46.140 That went online as of last year, um. 687 01:28:46.140 --> 01:29:01.020 And these are these rumors were substantiated by, uh, some industry professionals, fairly close to the top 500. um, so they're not completely baseless, but unfortunately, China hasn't gone public with anything about this potential system. 688 01:29:01.020 --> 01:29:14.609 But it'll be interesting to see here in the next couple of years if that turns out to be true. Um, as other countries are obviously currently developing and working on their own, um, capable supercomputers. So. 689 01:29:14.609 --> 01:29:19.229 That'll be interesting. Interesting to see, uh, going forward. 690 01:29:19.229 --> 01:29:23.489 I believe that is all I have. 691 01:29:23.489 --> 01:29:27.479 And there are my references with all those links. 692 01:29:32.159 --> 01:29:40.409 Well, thank you very much and yeah, I think there was an announcement or a, some story. 693 01:29:40.409 --> 01:29:44.939 Um, in the last month or 2. 694 01:29:44.939 --> 01:29:55.500 About about some new to new Chinese, super computers that it just came online. 695 01:29:55.500 --> 01:29:59.010 Hmm, yeah um. 696 01:29:59.010 --> 01:30:03.630 Yeah, 1 of those was this, the ocean light which, um. 697 01:30:03.630 --> 01:30:06.630 I believe that's the 1 that's, um. 698 01:30:06.630 --> 01:30:12.659 We know a little bit more about, um, at least as far as the rumors go, but we still. 699 01:30:12.659 --> 01:30:18.119 We still don't know anything for sure as again. Shanna hasn't gone public with any of that information. 700 01:30:20.369 --> 01:30:24.569 Okay, thank you. Well, even in the United States is. 701 01:30:24.569 --> 01:30:31.289 A comment somewhere that in Houston, Texas there might even be a supercomputer or 2 that's not on the list. So. 702 01:30:32.520 --> 01:30:40.350 Owned by oil companies, anyone else have any questions comments, et cetera. 703 01:30:40.350 --> 01:30:49.859 Okay, thanks everyone. So, that was, um, 8 talks today. We'll have remaining 4 talks on Monday and then I'll continue on. 704 01:30:49.859 --> 01:30:53.699 Finish off that Lawrence Livermore thing, and then get it into. 705 01:30:53.699 --> 01:31:01.140 Other parallel topics, so unless anyone has any questions. 706 01:31:01.140 --> 01:31:05.850 Then have a good weekend. 707 01:31:06.265 --> 01:31:20.005 What should we do to set up for class in terms of software? Well, what I'll do is, I'll set up when do I get to talking about stuff on my parallel computer I'll give everyone an account on it and I'll walk you through. 708 01:31:20.335 --> 01:31:22.494 You'll access it with s. S. H. 709 01:31:22.770 --> 01:31:26.909 Preferably most easily from another Linux system. All you could do it from. 710 01:31:26.909 --> 01:31:31.229 A window system, um, and so that. 711 01:31:31.229 --> 01:31:35.310 That will be your parallel access for the next part of the course. 712 01:31:35.310 --> 01:31:40.619 Also, I put a homework online and That'll just asking. 713 01:31:40.619 --> 01:31:44.819 Asking some questions due in a week. 714 01:31:44.819 --> 01:31:49.380 Let's just see what else came up on chat here and. 715 01:31:52.350 --> 01:31:58.588 Yeah, so that other questions. 716 01:32:00.238 --> 01:32:08.069 People wanted the weird set of I've got now I've got 2 machines in front of me. I think pad and an iPad. I'm using the think pad. 717 01:32:08.069 --> 01:32:13.708 Um, the audio is going through the think pad the video's going over the iPad just crazy things. 718 01:32:13.708 --> 01:32:18.179 If there's no other questions or anything. 719 01:32:18.179 --> 01:32:24.029 Then for the hardware. 720 01:32:26.849 --> 01:32:30.628 What do we set up for the class in terms of. 721 01:32:34.679 --> 01:32:40.019 How do we solve them? Yeah, I'm confused. 722 01:32:41.069 --> 01:32:46.859 You're talking about the homework, or there is no programming on the homework unless I screwed something up badly. 723 01:32:49.349 --> 01:32:54.238 Let me look up at the home there. Okay. Just a 2nd here. Let me, um. 724 01:32:54.238 --> 01:32:57.509 Hold on homework to. 725 01:32:58.649 --> 01:33:05.759 There's no programming, they're all questions based on stuff. We've talked about stuff. It's on the Lawrence Livermore tutorial. 726 01:33:05.759 --> 01:33:12.689 So, and I would have trouble sharing that. Yeah. 727 01:33:17.573 --> 01:33:27.323 You see my strategy for the classes. I wanted to have you guys talking at the 1st so you can see a wide use of parallel applications, but we can't do everything simultaneously. 728 01:33:27.323 --> 01:33:36.774 So, if we have your class, your presentations, 1st, then we start the programming after that that was a decision I made about what to do. 1st. So. 729 01:33:39.029 --> 01:33:44.519 Other questions, if not, then. 730 01:33:47.099 --> 01:33:49.019 See, you Monday.