WEBVTT

1
00:07:37.499 --> 00:07:45.028
Okay, good afternoon.

2
00:07:45.028 --> 00:07:57.059
Okay, good afternoon class. So I can't tell if you can hear me if anyone could hear me. Could you post.

3
00:07:58.163 --> 00:08:09.504
Thank you. Great. Okay, so what's happening today is, I can attach to shut up and you get a chance to tell me a pile of interesting topics.

4
00:08:09.504 --> 00:08:20.454
It doesn't interesting topics in parallel programming and what we'll continue on with the next class on Monday. We'll be, I'll finish off the Lawrence Livermore tutorial.

5
00:08:20.939 --> 00:08:28.168
And some of the references in it okay, so I put on the website a list of, um.

6
00:08:28.168 --> 00:08:37.678
Of the order for the, for the talks, and I'm having troubles sharing it actually and I've got 2 different machines in front of me.

7
00:08:37.678 --> 00:08:48.058
So the easiest thing is, if you can browse to the website, and I'll just read the 1st, few Reagan and Allen talk about the blue jeans and Lucy talk about.

8
00:08:48.058 --> 00:08:58.913
Physics and Tom, talk about the top 2 computers on top 500 list and Kevin 50 and 6 computer sack on Python. Hey, well, thank you, Allen. Great.

9
00:08:59.303 --> 00:09:08.394
I tried to paste it into the chat window and something didn't work beautiful. So, from divide up the time reasonably, I'm not going to.

10
00:09:09.028 --> 00:09:17.009
We'd ask you about it and then if we run over today, then.

11
00:09:17.009 --> 00:09:30.778
Continue on continue on Monday. Now I think you can possibly unmute yourselves if you can't. I'll on mute you. So, Reagan.

12
00:09:30.778 --> 00:09:40.229
And Allen, let me just let me unmute. You also.

13
00:09:48.688 --> 00:09:57.869
Okay, right now and you have the stage.

14
00:09:59.458 --> 00:10:02.609
All right, um, Reagan, do you want to share the screen?

15
00:10:02.609 --> 00:10:11.219
Sure, I would, but I might end up dropping my head here. Just have the battery. Okay. I'll share mine then.

16
00:10:11.219 --> 00:10:14.639
Let's see.

17
00:10:14.639 --> 00:10:19.948
Beautiful, it's coming in.

18
00:10:22.739 --> 00:10:28.139
All right, so our topic was the IBM blue jeans.

19
00:10:29.428 --> 00:10:35.668
So, what is the blue, Jean? It's a massively parallel computer, which was designed by IBM.

20
00:10:35.668 --> 00:10:39.269
And it was originally released in 2004.

21
00:10:39.269 --> 00:10:49.918
And at the time, it was the fastest computer in the world. The design was supposed to be a low power supercomputer that could reach operating speeds in the.

22
00:10:49.918 --> 00:10:55.769
It costs around 100Million dollars to develop in the Lawrence Livermore, National Laboratory in California.

23
00:10:55.769 --> 00:11:02.428
And it was originally developed to study protein folding and genetics. Hence the name blue Jean.

24
00:11:02.428 --> 00:11:07.288
It also was fairly influential and it helped map the human genome.

25
00:11:10.139 --> 00:11:16.379
So, there were 3 iterations of the blue Jean. The 1st, was the blue Jean L released in 2004.

26
00:11:16.379 --> 00:11:20.849
The 2nd generation was the blue jeans P released in 2007.

27
00:11:20.849 --> 00:11:25.349
And the 3rd generation was the blue Jean queue released in 2011.

28
00:11:26.428 --> 00:11:30.869
So, here's the general structure of.

29
00:11:30.869 --> 00:11:36.389
A supercomputer so added smallest level. They had these compute chips.

30
00:11:36.389 --> 00:11:39.958
And they had 2 processors on them with 4 megabytes of.

31
00:11:39.958 --> 00:11:45.178
And this was different than traditional computers, which were very large.

32
00:11:45.178 --> 00:11:49.798
So, this you could, uh, scaled whatever size you wanted.

33
00:11:49.798 --> 00:11:53.879
So you had these compute cards, which had 2 compute chips.

34
00:11:53.879 --> 00:11:59.519
And then you can take these cards and slot them into a board call a notecard.

35
00:11:59.519 --> 00:12:02.879
And then you could put these note cards in a cabinet.

36
00:12:02.879 --> 00:12:07.438
Imagine like a server rack with all these different layers of node cards.

37
00:12:07.438 --> 00:12:10.619
And then you could assemble them into a system.

38
00:12:10.619 --> 00:12:14.969
I see him here on the right so, in this case, it has 64 cabinets.

39
00:12:14.969 --> 00:12:25.109
And the benefit is that this is scalable. So imagine you had a smaller laboratory, you might want 32 cabinets, or if you have a giant 1, you might want 128.

40
00:12:25.109 --> 00:12:30.328
So, you could adjust this, however, you want it and it also was more.

41
00:12:30.328 --> 00:12:34.229
A robust because if any computer chips fail.

42
00:12:34.229 --> 00:12:48.928
You can reroute the processing to other nodes. So imagine 1 of these 64 cabinets failed, then you could have 63 cabinets running and then you could pull out the broken cabinet and the broken cards, and put in ones that were working.

43
00:12:48.928 --> 00:12:52.198
And this was considered innovative at the time.

44
00:12:53.938 --> 00:12:58.288
Here you can see the, uh, the architecture for the 2nd generation.

45
00:12:58.288 --> 00:13:01.499
So, the main change was that the.

46
00:13:01.499 --> 00:13:06.208
Chips here these small compute chips increased from 2 cores to 4 cores.

47
00:13:06.208 --> 00:13:14.129
And they also increase the bandwidth so that the course could talk to each other more easily actually faster.

48
00:13:14.129 --> 00:13:20.969
And it had the same general structure with the compute cards, the notecards, and putting them in big server racks.

49
00:13:21.989 --> 00:13:25.438
They also increase the total number of racks here.

50
00:13:25.438 --> 00:13:31.019
And the 3rd generation was a pretty big improvement. So they jumped from 4 quarters per.

51
00:13:31.019 --> 00:13:35.818
Ship here to 18, and they also changed it to a 64 bit.

52
00:13:35.818 --> 00:13:39.089
Uh, computing system, so they can have more RAM.

53
00:13:39.089 --> 00:13:43.649
And I'll pass it off to Reagan to talk about the applications.

54
00:13:43.649 --> 00:13:46.979
Sure, yeah, so, um.

55
00:13:46.979 --> 00:13:55.048
1 thing important that everyone is wondering about the stupid computer is that the hardware is interesting, but we also want to know what exactly has.

56
00:13:55.048 --> 00:14:05.308
Blue Jean done over its lifetime. Well, as we discussed a few slides, go the 1st, intentional application of blue jeans was to observe gene folding in real time.

57
00:14:05.308 --> 00:14:13.168
The name blue jeans, but there have been many other applications that have been tackled and there's a list of a few here.

58
00:14:13.168 --> 00:14:16.288
Eventually blue jeans mapped to the human genome.

59
00:14:16.288 --> 00:14:20.969
Investigated medical therapies, simulated radioactive decay.

60
00:14:20.969 --> 00:14:29.219
Replicated Blaine brainpower flew airplanes, pinpointed tumors, predicted climate trends and identified fossil fuels.

61
00:14:29.219 --> 00:14:33.719
A blue jeans was very popular in the health sector in terms of research.

62
00:14:34.193 --> 00:14:44.724
And the reason why it was so important is because of the amount of processing power, it had to put this in perspective a single scientist with a calculator.

63
00:14:44.724 --> 00:14:53.124
Like, you are, I would have to work nonstop for 177,000 years to perform a single operation that the blue Jean could do in 3rd.

64
00:14:54.839 --> 00:15:00.839
So, it has the processing power, not equal to what we and I could do.

65
00:15:00.839 --> 00:15:11.369
And our lives and the applications for the 2nd generation with the blue Jean P, uh, there's a few things.

66
00:15:11.369 --> 00:15:17.578
That was done 1. um, I'm sure some of them may have heard of is the world chess champion at the time.

67
00:15:17.578 --> 00:15:26.219
Use blue jeans train for competitions. Additionally, it was used to simulate about 1% of the human cerebral cortex.

68
00:15:26.219 --> 00:15:31.708
And this, it campaign 1.6Billion neurons 9Trillion connections.

69
00:15:31.708 --> 00:15:41.399
And then eventually it was used to when the scale 2011 challenge, which was an oil reservoir optimization application.

70
00:15:41.399 --> 00:15:44.849
So, it was used to in high computing scenarios like those.

71
00:15:48.389 --> 00:16:02.788
Excellent. And then for the 3rd and final gen, blue jeans, blue Jean Q. or Mira the 1st super computer it was the 1st supercomputer to cross, uh, 10 pedal lots of sustained performance, which at the time it was.

72
00:16:02.788 --> 00:16:06.808
A huge deal, and it was the fastest computer at the time.

73
00:16:06.808 --> 00:16:12.058
This computer also modeled the electro physiology of the human art, which achieved.

74
00:16:12.058 --> 00:16:16.948
Nearly 12 slots in a real in a real time stimulation of.

75
00:16:16.948 --> 00:16:20.399
It can be hard.

76
00:16:20.399 --> 00:16:31.589
And in conclusion, blue Jean was eventually replaced by Syndicators that many of us have heard of like, Watson and then the current 2nd, fastest computer summit.

77
00:16:31.589 --> 00:16:39.958
Which reaches 200, so that's much grander than blue jeans. But bluejeans itself was still insane at the time.

78
00:16:39.958 --> 00:16:52.109
Um, IBM researchers during when they created, bluejeans were the 1st to discuss parallel calculations. And IBM continues to make large contributions in the field with Watson in Summit.

79
00:16:54.928 --> 00:17:08.999
And that's all we have our sources. So hey, thank you very much. Then. Does anyone have any questions? I've unmuted everyone. So anyone would like to.

80
00:17:08.999 --> 00:17:15.929
Ask the questions, so which is the 1 RPI was using which generation.

81
00:17:17.278 --> 00:17:28.288
Or generations, you know, I'm not sure, but I do know that the last generation was available for commercial use. So I think or.

82
00:17:28.288 --> 00:17:33.659
There were available for commercial use for, like, business analytics and.

83
00:17:33.659 --> 00:17:41.278
Other sort of, um, uses, so I'm not sure, but I would assume the last generation. Okay. Well, thank you.

84
00:17:41.278 --> 00:17:47.098
Anyone else? Yeah. Is that a question? So.

85
00:17:47.098 --> 00:17:55.648
This seems like a pretty powerful computer, but it only simulated 1% of the human brain. And I guess the question is kind of 2 fold.

86
00:17:55.648 --> 00:18:01.618
I see the reason for a lot of the other applications, you guys mentioned, um, if you could go back to that slide, but.

87
00:18:01.618 --> 00:18:14.094
Do you know why simulating the human brain is kind of, of interest to people? Why is that a challenge that the blue Jean was important in solving?

88
00:18:14.094 --> 00:18:19.614
And why only 1% that could they have done more? Are we just really cool?

89
00:18:21.989 --> 00:18:29.338
While the human brains extremely complex, it has, I think billions of neurons and trillions of connections.

90
00:18:29.338 --> 00:18:33.479
So, that makes it extremely difficult because when they're all interconnected, it.

91
00:18:33.479 --> 00:18:42.419
It raises the complexity exponentially. Um, I'm not exactly sure why they were simulating it. I'm guessing it's to better understand how.

92
00:18:42.419 --> 00:18:47.578
The brain works, um, but I think even simulating 1% of the brain is.

93
00:18:47.578 --> 00:18:50.818
An extremely difficult task computationally.

94
00:18:51.959 --> 00:19:03.929
Yeah, thank you. I was wondering, since you mentioned that bluejeans was used for a lot of biomedical tasks, whether there was a medical reason behind simulating the brain.

95
00:19:05.519 --> 00:19:15.179
Well, I think a general answer is the trying to understand the body because then you.

96
00:19:15.179 --> 00:19:29.993
For example, design better drugs, I mean, the current process where you watch neurons firing, that's trying to understand computers by watching transmission always basically and you don't even have a concept of modulation.

97
00:19:32.638 --> 00:19:41.818
So that, you know, they're trying to get a little off the protocol stack. That's my guess. And then another goal would be to perhaps better proactive drugs.

98
00:19:41.818 --> 00:19:49.439
Again, right now the current state of the art unfortunately, is a very primitive.

99
00:19:51.328 --> 00:19:56.219
That's I'm I have no medical competence, but that's my guess.

100
00:19:58.618 --> 00:20:05.098
Anything else? Okay, we're easy. And.

101
00:20:05.098 --> 00:20:08.999
Tell us about physics applications.

102
00:20:11.308 --> 00:20:17.788
Oh, sure.

103
00:20:22.979 --> 00:20:27.179
See.

104
00:20:28.378 --> 00:20:31.769
Can you see my screen? Yes.

105
00:20:31.769 --> 00:20:39.808
Applicant yeah. Did I get your name wrong on something?

106
00:20:39.808 --> 00:20:44.578
Oh, wow. I mean, my preferred name is.

107
00:20:44.578 --> 00:20:52.588
Okay, I pulled it off. Okay. Okay. Okay. Let me I'm going to mute everyone in unmute. You.

108
00:20:52.588 --> 00:20:59.548
Um, just a sec.

109
00:20:59.548 --> 00:21:03.509
Okay, okay, see if you can go now.

110
00:21:03.509 --> 00:21:10.318
Oh, okay. So yeah, I'm going to talk about an application of.

111
00:21:10.318 --> 00:21:13.588
Parallel computing in physics.

112
00:21:13.588 --> 00:21:17.128
Uh, so the problem that.

113
00:21:17.128 --> 00:21:21.659
I want to talk about is in physics as the problem.

114
00:21:21.659 --> 00:21:25.318
This is a classic problem in physics.

115
00:21:25.318 --> 00:21:30.689
Which is predicting the motion of an, so that's true.

116
00:21:30.689 --> 00:21:36.628
A bot is to interact for invitation so basically for.

117
00:21:36.628 --> 00:21:41.519
Each article, uh, QA then, uh, the.

118
00:21:41.519 --> 00:21:45.148
The problem will need to calculate a force.

119
00:21:45.148 --> 00:21:50.909
That is, uh, that that is between, uh, this particle.

120
00:21:50.909 --> 00:21:53.939
With the other in minus 1 particles.

121
00:21:53.939 --> 00:21:57.659
So, uh, 1.

122
00:21:57.659 --> 00:22:01.528
Probably familiar example will be the 2 body problem.

123
00:22:01.528 --> 00:22:05.608
That we are, uh, learn in physics, I think physics.

124
00:22:05.608 --> 00:22:15.118
2, um, and also 1 of my favorite science fiction book talks about, uh, the free body.

125
00:22:15.118 --> 00:22:22.828
And since the forces are completed from simultaneously for each article, each time stamp.

126
00:22:22.828 --> 00:22:27.538
It is intuitive thing that we could improve around it.

127
00:22:27.538 --> 00:22:32.068
We should improve this, uh, the calculation of embody problems.

128
00:22:32.068 --> 00:22:37.739
Uh, using parallel program, so.

129
00:22:37.739 --> 00:22:40.798
To analyze the.

130
00:22:40.798 --> 00:22:46.078
Application of computing in the problem we can.

131
00:22:46.078 --> 00:22:54.058
Break down to see, uh, properties of this problem in 4 categories.

132
00:22:54.058 --> 00:22:57.989
So 1st, we'll see how we can partition this problem.

133
00:22:57.989 --> 00:23:04.858
So things we need to compute for and a position for each article at each time step.

134
00:23:04.858 --> 00:23:13.078
It is a natural to think that we can partition the tasks by by particle.

135
00:23:13.078 --> 00:23:17.189
And for communication.

136
00:23:17.189 --> 00:23:23.038
We can think that for we can if we look back at the formula, I see that.

137
00:23:23.038 --> 00:23:31.199
Oh, we need to we will need the values completed in the previous previous time. Is that.

138
00:23:31.199 --> 00:23:36.148
Generate, uh, the new, the new forces at the.

139
00:23:36.148 --> 00:23:45.898
Uh, new timestamp, uh, for aggregation. We, we can combine a task associated with the single particle because.

140
00:23:45.898 --> 00:23:50.009
Uh, the communication August on prr particle basis, because.

141
00:23:50.009 --> 00:23:55.558
The current value depends on the previous value of.

142
00:23:55.558 --> 00:24:00.209
A, certain of the of this party.

143
00:24:00.209 --> 00:24:03.898
And also for mapping.

144
00:24:03.898 --> 00:24:07.739
We can assume that worked on 1st step is roughly equal so.

145
00:24:07.739 --> 00:24:13.588
If we assign a balance number of articles for each processor.

146
00:24:13.588 --> 00:24:19.588
And we should, we will have a well balanced work workload. Uh.

147
00:24:19.588 --> 00:24:24.328
Distributed among the different processors, making this parallel.

148
00:24:24.328 --> 00:24:28.318
More.

149
00:24:28.318 --> 00:24:33.328
So, there's some benefits and, uh.

150
00:24:34.378 --> 00:24:38.878
Improvements, uh, using computing to, um.

151
00:24:38.878 --> 00:24:44.308
Problem so, for a typical scenario, a.

152
00:24:44.308 --> 00:24:48.388
Algorithm computing the and value prop.

153
00:24:48.388 --> 00:24:52.469
Problem we can see the wrong time. It will be in the square of the.

154
00:24:52.469 --> 00:24:56.098
Um, we, we need to compute.

155
00:24:56.098 --> 00:24:59.189
Uh, forces between.

156
00:24:59.189 --> 00:25:04.439
The article and minus 1 particles, which means there is.

157
00:25:04.439 --> 00:25:09.719
And minus 1 for each article, and we repeat that for an hour.

158
00:25:09.719 --> 00:25:15.239
And so it will be elsewhere by using I think we're using the.

159
00:25:15.239 --> 00:25:21.148
Uh, we're using the parallel computing, we can distribute the work.

160
00:25:21.148 --> 00:25:25.048
Uh, Holly among P part.

161
00:25:25.048 --> 00:25:29.909
Key processors, therefore, we can improve the running time to be, uh, Square.

162
00:25:29.909 --> 00:25:34.979
5 and using, and we can even further.

163
00:25:34.979 --> 00:25:37.979
Improve the parallel computing by.

164
00:25:37.979 --> 00:25:44.548
Find better better, uh, data structure, such as Barnes trees.

165
00:25:45.868 --> 00:25:50.368
And everything is to analog.

166
00:25:50.368 --> 00:25:54.509
So, using computing is definitely.

167
00:25:54.509 --> 00:25:58.318
More time efficient than stereo.

168
00:25:58.318 --> 00:26:02.249
Uh, program, serial program, um.

169
00:26:02.249 --> 00:26:09.148
So there is also besides the improvement of time that the.

170
00:26:09.148 --> 00:26:14.519
That parallel computing could give, uh, to benefit this problem.

171
00:26:14.519 --> 00:26:17.638
There's also improvement in precision.

172
00:26:17.638 --> 00:26:21.868
That could benefit Malaysian of ours.

173
00:26:21.868 --> 00:26:26.068
Uh, in your paper to, um.

174
00:26:26.068 --> 00:26:30.598
It stated that, uh, chief.

175
00:26:30.598 --> 00:26:35.009
A 1.0 box with an efficiency of.

176
00:26:35.009 --> 00:26:39.598
74% are using 4,096.

177
00:26:39.598 --> 00:26:43.108
If you use, so you can see that, uh.

178
00:26:43.108 --> 00:26:48.719
The application of parallel computing in the embody pro programming.

179
00:26:48.719 --> 00:26:53.189
Uh, significantly, uh, improving the time.

180
00:26:53.189 --> 00:26:56.669
And the precision of the simulations.

181
00:26:58.469 --> 00:27:01.739
Thank you and this is my references.

182
00:27:36.209 --> 00:27:48.209
Professor, if you're talking, we can't hear you.

183
00:27:59.519 --> 00:28:07.979
Yeah, it looks like your headsets muted, but not here.

184
00:28:07.979 --> 00:28:12.358
Okay, there.

185
00:28:12.358 --> 00:28:18.628
I don't know what, when I was sitting on mute many times. Okay. Yeah, I was asking so why do we want to simulate dead bodies?

186
00:28:21.298 --> 00:28:24.628
Uh, because it is Dave.

187
00:28:24.628 --> 00:28:29.009
Cross code physical problem and also.

188
00:28:29.009 --> 00:28:34.409
Assimilating of the interaction between the bot is conveyed by 2.

189
00:28:34.409 --> 00:28:37.949
Uh, um, like.

190
00:28:37.949 --> 00:28:42.989
Sales other than other than I.

191
00:28:42.989 --> 00:28:46.108
Just gravitation, it can be applied to, like.

192
00:28:46.108 --> 00:28:50.788
I think chemicals the party.

193
00:28:50.788 --> 00:28:55.828
Uh, the body of the articles and and their interaction.

194
00:28:55.828 --> 00:29:01.919
Okay, I, thank you. Okay no, 1 else has a question.

195
00:29:01.919 --> 00:29:07.588
Okay, so the 3rd talk is he and Tom telling us about the 1st, 2 computers.

196
00:29:07.588 --> 00:29:17.848
On the top, 500 list and to prevent background noise. So see mute everyone and unmute you 2 and, um.

197
00:29:21.328 --> 00:29:27.628
Okay, you can also unmute yourself.

198
00:29:30.689 --> 00:29:35.878
Okay, can you see that.

199
00:29:35.878 --> 00:29:40.019
Ah, yes okay.

200
00:29:42.419 --> 00:29:47.368
Okay, so Hi, my name's Tom.

201
00:29:47.368 --> 00:29:57.239
Hey, my name's Ian today will be the supplier topic is to describe the 2 fastest computers and the top 500 list.

202
00:29:57.239 --> 00:30:01.979
So, starting with the number 1 computer, uh, that's the, the food.

203
00:30:01.979 --> 00:30:09.239
It was jointly developed by and and it is located at the retina center for a computational science and Koby Japan.

204
00:30:09.239 --> 00:30:16.679
Unlike a lot of modern supercomputers, it's fully armed based CPU computer, super, Super.

205
00:30:16.679 --> 00:30:23.368
In practice, it's a, it has achieved over 442 flops position.

206
00:30:23.368 --> 00:30:32.999
Uh, 64 bit, uh, testing, uh, and what they call standard mode, which is just it's base clock frequency of 2 gigahertz on all the cpu's.

207
00:30:32.999 --> 00:30:43.499
Um, past that, it's theoretically capable of 488, double physicians, 64 bit and as far as 537.

208
00:30:43.499 --> 00:30:47.429
Kind of if it's in what they call their boost mode, which is.

209
00:30:47.429 --> 00:30:54.778
Um, where all the CPMs boost 2.2 days hurts the standard 2 to 2 gigahertz.

210
00:30:54.778 --> 00:30:59.159
It uses a custom Wednesdays kernel based around what's.

211
00:30:59.159 --> 00:31:06.479
What they call maternal, but it's based around as far as I know from my research that's based around, which is IBM side.

212
00:31:06.479 --> 00:31:19.169
Um, and it has, it's composed over over a 150,000 nodes or 158,900. he said to me that.

213
00:31:19.169 --> 00:31:25.439
And it consumes around 30 to 40 megawatts, depending upon the.

214
00:31:25.439 --> 00:31:32.519
Standard mode versus boost rather than how what how much it's doing, which is anywhere from 3600 to.

215
00:31:32.519 --> 00:31:37.078
Uh, 4,800 US dollars an hour at 12 cents per kilowatt hour.

216
00:31:37.078 --> 00:31:44.249
And then going over top position more, uh, each note, as I mentioned is an arm.

217
00:31:44.249 --> 00:31:47.909
Uh, could you to develop these arms.

218
00:31:47.909 --> 00:31:55.528
For both, they developed it specifically for this, uh, supercomputer. It's the.

219
00:31:55.528 --> 00:32:03.808
Pharmacy, which is a 48 for 48 computational ports, plus 4 non computational for us.

220
00:32:03.808 --> 00:32:08.398
And it's based around a 7 nanometer architecture.

221
00:32:08.398 --> 00:32:13.169
These aren't processed were used mainly because they are very, uh, power efficient.

222
00:32:13.169 --> 00:32:18.058
And do you want me then? Uh.

223
00:32:18.058 --> 00:32:24.719
Going over, it has each node or CPU has 32 gigabytes of 2 memory.

224
00:32:24.719 --> 00:32:28.019
A 1024 terabytes per 2nd bandwidth.

225
00:32:28.019 --> 00:32:36.898
And in total, if you combine all the notes, it's over 4.5, heavy bites total memory which up heavy right? This to the 50th.

226
00:32:36.898 --> 00:32:45.628
Rather than powers of the 10 as part of the 2, and then it's plopped it over 163 petabytes per 2nd, total memory bandwidth.

227
00:32:45.628 --> 00:32:52.108
And in the entire machine itself is over 893 kilometers of fiber optics and electrical table.

228
00:32:52.108 --> 00:32:59.788
So, just a little more history it's sort of development in 2014 is as the successor to the computer, which is.

229
00:32:59.788 --> 00:33:04.078
Their, the brighton's computer before and, um.

230
00:33:04.078 --> 00:33:08.699
Officially started operation in 2021.

231
00:33:08.699 --> 00:33:13.558
And was sponsored by the by, which is the ministry of education culture.

232
00:33:13.558 --> 00:33:25.439
Sports science and technology in Japan, it's a 2 story computing center. It's, uh, or at least the actual parts that make up the actual computer are 2, 2 stories.

233
00:33:25.439 --> 00:33:39.659
The 1st floor is the old AC units that cool everything. And then the 2nd floor is a computer room and all the server access. You can see some pictures there. Um, and you can also see the basic basis around how they structured their shelves.

234
00:33:39.659 --> 00:33:42.989
And their system in the bottom, right picture.

235
00:33:42.989 --> 00:33:55.888
So, as far as what it's been used for, it's, uh, they actually started being used a little bit earlier, um, for then it was scheduled to for 2019, specific projects.

236
00:33:55.888 --> 00:33:59.429
Uh, obviously that was done to try to.

237
00:33:59.429 --> 00:34:04.648
Improve our knowledge of 2019 that helps with issues like, uh, how.

238
00:34:04.648 --> 00:34:09.239
Effective our mass and various other topics like that.

239
00:34:09.384 --> 00:34:10.313
Japan wants these,

240
00:34:10.313 --> 00:34:10.523
uh,

241
00:34:10.583 --> 00:34:12.474
in terms of their,

242
00:34:12.653 --> 00:34:14.634
what they're calling society 5.0,

243
00:34:14.634 --> 00:34:25.253
which is really just that they want to look into how can we make people more safe and comfortable lives and more recent times it's been used for light matter interaction simulations.

244
00:34:26.969 --> 00:34:34.168
Tsunami prediction simulations for flooding and the atomic forces from month to Charlotte theory.

245
00:34:34.168 --> 00:34:38.728
So, and then now it's currently 3 times.

246
00:34:38.728 --> 00:34:43.438
Faster than the 2nd tip here, which we'll be moving on to you now.

247
00:34:43.438 --> 00:34:53.159
Thank you Tom, so the 2nd, fastest computers maybe know is the summit the for.

248
00:34:53.159 --> 00:35:00.838
Is located at the Oakridge National Laboratory in Tennessee, United States, and it was developed by both IBM and NVIDIA.

249
00:35:00.838 --> 00:35:12.509
Kind of interesting fact is that, uh, the Department of energy actually commissions them to build 2 different computers, which is the summit in this year, which is the 3rd best computer and operation.

250
00:35:12.509 --> 00:35:16.469
I saw earlier that they mentioned.

251
00:35:16.469 --> 00:35:27.148
The summit could reach 200 powerful Ops, but that's just a theoretical limit and boost mode. The highest it has actually achieved is just about 148.6.

252
00:35:27.148 --> 00:35:32.338
And it has a pretty big storage system I'd say, at the.

253
00:35:32.338 --> 00:35:38.639
250 petabytes, but only about 10 5 bites of that can actually be used as system memory.

254
00:35:38.639 --> 00:35:44.398
Similarly, to the this also runs on Red Hat by IBM.

255
00:35:44.398 --> 00:35:49.559
And it actually has, it believes fewer notes, which makes sense because it's smaller.

256
00:35:49.559 --> 00:35:52.978
And only 4,608 and.

257
00:35:52.978 --> 00:35:59.278
Also, since it's 4th is fast, it concerns about 4th as much power from 10 to 13 megawatts.

258
00:36:01.498 --> 00:36:07.409
So, next we can look a little bit more into the composition of what the computer.

259
00:36:07.409 --> 00:36:10.588
So, as I mentioned, it has 4,608 nodes.

260
00:36:10.588 --> 00:36:14.518
And unlike the of this 1 actually has 2 per node.

261
00:36:14.518 --> 00:36:18.298
Instead of 1, and it also has 6 GP use for.

262
00:36:18.298 --> 00:36:24.748
The cpu's are IBM power 9 and the are envious test the V100.

263
00:36:25.829 --> 00:36:29.579
And Calvin course was actually really complicated for me. I didn't.

264
00:36:29.579 --> 00:36:34.018
Quite see what was going on there, mainly because of the GP use.

265
00:36:34.018 --> 00:36:38.248
So, there's 80 streaming also processors, which they count as course.

266
00:36:38.248 --> 00:36:42.958
Otherwise that number of 2.4Million down the bottom would be much larger.

267
00:36:42.958 --> 00:36:48.389
They counted the code, of course, and the 22 cores and the per CPU.

268
00:36:48.389 --> 00:36:57.028
We're also included in that calculation. So if we did use the credit course, the real number would be more like 141Million or.

269
00:36:57.028 --> 00:37:02.969
159Million, if you include the tenser force, so.

270
00:37:02.969 --> 00:37:08.219
Next little bit of history just like development started in 2014.

271
00:37:08.219 --> 00:37:12.059
This computer was commissioned as an upgrade from.

272
00:37:12.059 --> 00:37:17.969
Tighten by department of energy, and it was 1st used in 2018.

273
00:37:17.969 --> 00:37:21.958
End 2018 it was the world's best computer.

274
00:37:21.958 --> 00:37:25.588
From June when start operation to November.

275
00:37:26.639 --> 00:37:31.889
In 2019, and now, since if we've got who's in operation, it's the 2nd, best computer.

276
00:37:31.889 --> 00:37:35.728
The 1 thing I thought was really cool is that they added onto it.

277
00:37:35.728 --> 00:37:44.219
Continually so, even in 2018, it was the fastest it only had about 2.3Million cores and now they're.

278
00:37:44.219 --> 00:37:54.599
2.4Million course, has shown before another crazy fact is how big this thing is so it's over 340 tons and 5,600 square feet.

279
00:37:54.599 --> 00:37:58.018
It's like 3 of my houses, which.

280
00:37:58.018 --> 00:38:05.159
It's kinda hard to fathom how big news are and I think it would be really interesting if they got scaled down in the future.

281
00:38:05.159 --> 00:38:09.208
With the same speed, so.

282
00:38:09.208 --> 00:38:16.079
Lastly, what is it being used for a lot has been done on cancer research with other things.

283
00:38:16.079 --> 00:38:19.920
Other processes and the specific.

284
00:38:21.059 --> 00:38:27.480
Uh, program that they're doing is the cancer distributed learning environment, a candle.

285
00:38:27.480 --> 00:38:32.010
Fusion is also being researched so that we can.

286
00:38:32.010 --> 00:38:37.710
Be a better civilization fusion by the way is just what powers the sun basically.

287
00:38:37.710 --> 00:38:41.969
But on medicine as well, so.

288
00:38:41.969 --> 00:38:47.670
Like, I mentioned cancer, there's also disease and addiction and.

289
00:38:47.670 --> 00:38:50.969
Couple other things they're trying to predict whether better.

290
00:38:50.969 --> 00:38:54.329
I believe forgot who is working on.

291
00:38:54.329 --> 00:39:00.659
Someone is working more on earthquakes as you can see in the bottom. Right? The San Andreas that's just an example. What may look like.

292
00:39:00.659 --> 00:39:08.159
And lastly, it's working on identifying new generations of materials. So that's what the material science people are working on.

293
00:39:08.159 --> 00:39:12.269
Trying to predict what something's gonna do before they actually make it.

294
00:39:14.460 --> 00:39:18.389
Which can have huge applications and save lots of money.

295
00:39:18.389 --> 00:39:21.869
Or research companies, that's it.

296
00:39:23.400 --> 00:39:28.500
Here are some references and does anyone have any questions for us?

297
00:39:28.500 --> 00:39:32.550
Yeah, thank you very much.

298
00:39:32.550 --> 00:39:38.969
Questions questions anyone.

299
00:39:43.380 --> 00:39:49.739
So why God, 1, then, why do you think they commission 2 computers simultaneously?

300
00:39:51.239 --> 00:39:54.389
Why is a good question? Um.

301
00:39:56.369 --> 00:40:08.844
They didn't want to put all their money into 1 bigger 1. yeah, I think it's still they had different processes. So, like, there was the 2 cpu's to 6 in the summit. So there's a 3 to 1 ratio.

302
00:40:08.905 --> 00:40:12.954
I think they were also playing with other ratios, or maybe other.

303
00:40:13.289 --> 00:40:16.349
Gpu system instead of the T V100.

304
00:40:18.570 --> 00:40:30.599
Okay, thanks just to comment on the thing about the CUDA course we'll get into that more detail later and videos themselves was getting a way of, um.

305
00:40:30.599 --> 00:40:34.949
A, while, back from even using the term cooler car, cause you've got all these.

306
00:40:34.949 --> 00:40:44.789
Separate chords doing different things inside the GPU, and they're not really grouped into CUDA courses units, floating units.

307
00:40:44.789 --> 00:40:50.760
10 X, Ray tracing units and NVIDIA keeps sparing the mix.

308
00:40:50.760 --> 00:40:56.219
Of 1 to the other and you're right so it does make it.

309
00:40:56.219 --> 00:40:59.760
Almost obsolete question to count the number of cores in a computer.

310
00:40:59.760 --> 00:41:06.449
So and you're right, you got on a big point by saying you're bearing the mix of.

311
00:41:06.449 --> 00:41:10.469
Cps to, and so on time to find what's optimal.

312
00:41:13.679 --> 00:41:17.070
Any other questions okay.

313
00:41:17.070 --> 00:41:28.920
So, let's see, Kevin will tell us about 5 and 6 and again, I just made this list in the order that you replied to my email so that you emailed me. So.

314
00:41:33.960 --> 00:41:38.550
And view it in your application, thank you.

315
00:41:38.550 --> 00:41:46.650
All right, so I'll be going over the 5th and 6th status computer as of November of last year.

316
00:41:46.650 --> 00:41:51.510
So, the 5th fastest computer is new in 2021. it's a promoter.

317
00:41:51.510 --> 00:41:55.320
At the National Energy Research science computing center in Berkeley.

318
00:41:55.320 --> 00:42:08.755
And it's made by HP, so, as of 2021, it's only completed phase 1, which is all the file systems and networks, the GPU nodes, intensive cores. And then the platform integrated storage has a 761,000 cores. And then 400.

319
00:42:12.150 --> 00:42:15.449
20,000 gigabytes should say gigabytes, but.

320
00:42:15.449 --> 00:42:21.659
Gigabytes of memory, and uses, uh, 2.45 years and the processors.

321
00:42:21.659 --> 00:42:27.000
And then it's interconnected shot 10, which was, uh, custom Britain.

322
00:42:27.000 --> 00:42:34.710
For the supercomputer, so based on the standard standard, it maxed out at about 70,000.

323
00:42:35.425 --> 00:42:49.465
For a 2nd, but it's got a theoretical max of almost 9,394,000 per 2nd, and using a separate standard the standard it can get up to 1935 per 2nd and as of right now it's got about 2600.

324
00:42:54.630 --> 00:43:01.320
Power consumption, and it runs the HP with the in compiler.

325
00:43:01.320 --> 00:43:06.239
And, uh, map library and with open.

326
00:43:06.239 --> 00:43:10.380
And the, the plan for the Super computer.

327
00:43:10.380 --> 00:43:17.280
Is use it for extreme scale science. So because this supercomputer was commissioned by the National Energy Research Center.

328
00:43:17.280 --> 00:43:21.510
They obviously want to use it to compute huge data sets.

329
00:43:21.510 --> 00:43:32.969
Try to find new energy sources, improve the nature of energy efficiency as well as discover some new materials. But as of right now, only phase 1 of this computer has actually been built.

330
00:43:32.969 --> 00:43:36.539
So, phase 2 is actually not done so this isn't.

331
00:43:36.539 --> 00:43:42.480
100% in use yet. So you can see here. This is, this is the general plan.

332
00:43:42.480 --> 00:43:48.360
So the last update for this construction is actually in November when it was ranked.

333
00:43:48.360 --> 00:43:51.750
So, as of right now, only phase 1 is complete. I didn't.

334
00:43:51.750 --> 00:43:56.250
So, anywhere online about anything about phase 2 so I'm assuming none of that has actually done.

335
00:43:56.250 --> 00:43:59.969
And this isn't actually being used right now, but.

336
00:43:59.969 --> 00:44:07.500
I sent it was planned on being completed in 2021. I think it's nearing completion and should be in you soon.

337
00:44:09.420 --> 00:44:13.619
And then the 6 fastest computers so we add the video by in the video.

338
00:44:13.619 --> 00:44:21.539
And this has 555,000 cores with a 1Million gigabytes of memory, and also uses an AMD processor.

339
00:44:21.539 --> 00:44:24.989
And its interconnect is the Mellanox and.

340
00:44:24.989 --> 00:44:29.670
And the same 2 using the same Q standard is before this 1 is.

341
00:44:29.670 --> 00:44:39.659
Obviously slightly slower, because range farther down it can only hit about 63 turn flops. And then with the it's 6,800 circles per 2nd.

342
00:44:39.659 --> 00:44:46.050
I think I wrote this number down wrong. I think it should just be 2600.

343
00:44:46.050 --> 00:44:50.880
Our consumption, so, yeah, this 1 actually runs 1 to.

344
00:44:50.880 --> 00:44:54.690
2008.04 and it uses in the video and vcc.

345
00:44:54.690 --> 00:44:58.590
Compiler and recruiter math libraries with open.

346
00:44:58.590 --> 00:45:02.670
And the video uses this mostly to do a ai work it.

347
00:45:02.670 --> 00:45:14.610
Does a lot of processing with self driving and robotics, that kind of stuff. And also, because this, this computer is completed in 2020, they used it to do a lot of research with and stuff like that.

348
00:45:16.590 --> 00:45:20.579
And here are the sources that I used for this.

349
00:45:20.579 --> 00:45:24.510
Presentation.

350
00:45:24.510 --> 00:45:32.309
Well, thank you. Does anyone have any questions.

351
00:45:33.449 --> 00:45:36.599
Free 1.

352
00:45:36.599 --> 00:45:43.289
So, we actually have a machine that wasn't running Linux.

353
00:45:43.289 --> 00:45:46.590
The running. Okay.

354
00:45:48.269 --> 00:45:51.900
And, um, this.

355
00:45:51.900 --> 00:45:59.639
Probably didn't come up, but I'm kind of curious. We've seen a lot of these machines used for different things. Right? 1 was just for mostly I.

356
00:45:59.639 --> 00:46:04.409
Others to use for physics problems or for, and so on.

357
00:46:04.409 --> 00:46:07.860
Is there a difference.

358
00:46:07.860 --> 00:46:15.179
In the built, like, was mentioned previously that there are GPU course to CPU course ratios. Um.

359
00:46:15.179 --> 00:46:24.570
Are those choices made knowing in advance what kind of problems they want to solve on it? Does it make a difference? What parallel problems you want to solve on it?

360
00:46:28.135 --> 00:46:40.014
No, 1 else wants to. Yes yes. I think somewhat, for example, in video the few years ago, they added the tenser cores and would attend circuit. Does it takes for vector?

361
00:46:40.344 --> 00:46:43.704
Basically multiplication vector replication.

362
00:46:43.980 --> 00:46:54.389
Major complication factor, addition with low precision, like to bite floats and so on because for the machine learning there is not a lot of.

363
00:46:54.389 --> 00:47:05.724
Significant bits in the data, so the machine learning no, I'm not an expert, so correct me if I'm wrong, but machine learning does not have much use, for example, for double precision floats.

364
00:47:06.445 --> 00:47:10.195
So, if your targets machine learning, you want to have more units that work with.

365
00:47:10.980 --> 00:47:14.340
Um, shorter data that would be the best example.

366
00:47:14.340 --> 00:47:24.119
And obviously it's working with shorter data, then you can push twice through it twice as much data through the same bandwidth.

367
00:47:24.119 --> 00:47:38.789
So and does that also fix the operating system they chose like you mentioned unique versus et cetera, et cetera or? I was joking. Actually, I was struggling, I don't know much about trail, but it may have a, um.

368
00:48:02.550 --> 00:48:15.835
Um, CRE, I was doing some of the fastest supercomputers 2 decades ago, the Craig computers, and he was famous for not doing floating point, standardize his fault.

369
00:48:15.835 --> 00:48:23.905
His float calculations were not I triple E standard, which means they were not as accurate, but they also took fewer gates to implement. So.

370
00:48:24.150 --> 00:48:27.989
He figured for the applications on his machines, he'd rather.

371
00:48:27.989 --> 00:48:32.340
Provide more floating point performance, but less accurate.

372
00:48:33.960 --> 00:48:37.500
So that was a design decision he made, so.

373
00:48:37.500 --> 00:48:40.980
I know I've got using video for a 3rd example. Um.

374
00:48:40.980 --> 00:48:52.230
A video, every generation, they they get, they affect the balance a few generations back. They had to Maxwell the Maxwell had fewer double precision units in it. So.

375
00:48:52.230 --> 00:49:04.050
They had more space for other stuff, but they're double. Precision performance was awful and then the version after that, when they went to Pascale, they reverted that decision. So, yeah, you look at the application.

376
00:49:04.050 --> 00:49:17.099
You keep bearing the mix another big 1 is the course processing versus storage an error that IBM made in the 1st, blue jeans was they had too little memory.

377
00:49:17.099 --> 00:49:25.440
So some of the users they decide, we will just idol some of the CPU so that we can use their memory on the other. So.

378
00:49:25.440 --> 00:49:28.860
That was, you know, check trading the balance.

379
00:49:30.539 --> 00:49:42.030
Anyone else have any comments on that. Okay, thank you. So now we will hear Zach, tell us about Python and what can you do parallel wise in it?

380
00:49:45.750 --> 00:49:52.199
All right can everybody see my screen? Yes. Great.

381
00:49:54.599 --> 00:50:04.800
So, Hello, everyone, my name's Zach and thank you all for being here. Virtually today for my presentation, which will cover the 5th topic on the topic list.

382
00:50:04.800 --> 00:50:08.909
That being pythons parallel capabilities.

383
00:50:08.909 --> 00:50:12.960
So, I broken my presentation up into 4 different sections.

384
00:50:12.960 --> 00:50:17.670
We'll start off by talking about the feasibility of parallel programming and Python.

385
00:50:17.670 --> 00:50:24.389
And then talk a little bit more about some of the important parallel Python modules and libraries out there.

386
00:50:24.389 --> 00:50:26.844
And finally wrap it up the conclusion at the very end.

387
00:50:27.684 --> 00:50:40.224
Um, so let's start by talking about feasibility of parallel pro game and Python for this whole implement for, for this whole presentation I should say, will only be considering the C Python implementation.

388
00:50:40.530 --> 00:50:47.010
Which is the, um, the official reference implementation of Python written in the C programming language.

389
00:50:47.010 --> 00:50:55.139
But the question we really need to ask ourselves right off the bat about Python is does it even support multi threading at all?

390
00:50:55.139 --> 00:50:59.219
The answer might surprise you. Um, not exactly.

391
00:50:59.219 --> 00:51:03.719
Um, Python is something called the, uh, global interpreter lock.

392
00:51:03.719 --> 00:51:09.719
Also notice the, which is basically a new text that protects access to shared Python memory.

393
00:51:09.719 --> 00:51:15.179
Basically ensures that 1 thread accesses, um, shared memory at a time.

394
00:51:15.179 --> 00:51:18.630
Um, and in doing this, it basically inhibits multithreading.

395
00:51:18.630 --> 00:51:26.340
And obviously, this isn't great for performance reasons. There have been, you know, several attempts to remove it over the years.

396
00:51:26.340 --> 00:51:30.179
Yeah, none of which have been completely successful.

397
00:51:30.179 --> 00:51:37.650
Um, because the other things have come to depend on the, and it's sort of made it more difficult with its dependencies.

398
00:51:37.650 --> 00:51:42.960
But all, it's not lost whoever there are still some workarounds for this issue.

399
00:51:42.960 --> 00:51:46.260
Um, for 1, you can use some different extra libraries.

400
00:51:46.260 --> 00:51:53.760
Like, non pie in pandas, um, you know, these libraries are sort of able to release the lock and get around it to.

401
00:51:53.760 --> 00:52:00.719
And performance in depth, different operations, and 2, you can also choose different Python with limitation.

402
00:52:00.719 --> 00:52:08.070
So there are many out there see, Python is really, you know, the most popular, but you can pick anyone you want.

403
00:52:08.070 --> 00:52:14.309
Um, that doesn't have the, um, yeah, such as iron Python and Python, um, et cetera, et cetera.

404
00:52:14.309 --> 00:52:18.269
So, now let's move on to, uh, talking about some.

405
00:52:18.269 --> 00:52:21.480
Important parallel Python modules.

406
00:52:21.480 --> 00:52:30.750
Probably the most famous and most useful, the multi processing module, um, which has the name implies uses processes instead of threads.

407
00:52:30.750 --> 00:52:35.730
Process referring to, um, sort of separate programs and execution.

408
00:52:35.730 --> 00:52:39.750
Um, whereas threads are more sub segments of the process.

409
00:52:39.750 --> 00:52:45.360
As you can see on the right so, 1 of the big difference here, being separate memory versus shared memory.

410
00:52:45.360 --> 00:52:51.630
And there are 2 big classes in this module. 1 is the process class.

411
00:52:51.630 --> 00:52:57.030
This would be more used for your function based parallelism applications.

412
00:52:57.030 --> 00:53:07.590
You've got different functions, you just sort of doing their own thing for some set of data. Um, whereas the pool class, it's more used for database parallelism.

413
00:53:07.590 --> 00:53:14.550
Where you've got the same function running over and over again, um, for some data that you pass into it.

414
00:53:14.550 --> 00:53:23.280
And we'll see an example on this slide. So, here we have the process class, we've got 2 different functions and.

415
00:53:23.280 --> 00:53:28.650
We make a process object for each of these with giving a target function arguments.

416
00:53:28.650 --> 00:53:38.309
You know, call the start function and the program waits for each of these processes to complete the joint function for continuing.

417
00:53:38.309 --> 00:53:42.389
On the right we have the pool class so here we only have.

418
00:53:42.389 --> 00:53:48.719
A single function called cube we make a pool object with 5 different processors.

419
00:53:48.719 --> 00:54:00.179
Um, and distribute 3 arguments 12 and 3 amongst these processors, um, with the map function and the program waits for the last processor. Um, you know, to complete.

420
00:54:00.179 --> 00:54:12.090
Or it continues, the other big module is the threading module. Um, which likely we talked about earlier isn't quite as good because it is, um, sort of fairly limited by the.

421
00:54:12.090 --> 00:54:15.449
But this module is really as simple as you would expect.

422
00:54:15.449 --> 00:54:19.110
The thread class is very similar to the process class.

423
00:54:19.110 --> 00:54:24.360
Yeah, you make it you make a thread object to run a separate thread of control.

424
00:54:24.360 --> 00:54:29.789
Give it a target function arguments, um, you know, call those same starting joint functions.

425
00:54:29.789 --> 00:54:35.699
That you would for the process class, the diagram on the right is actually taken from the.

426
00:54:35.699 --> 00:54:40.739
Lawrence Livermore, National Lab tutorial and it basically just demonstrates the fact that.

427
00:54:40.739 --> 00:54:46.199
Threads are more concurrent sunpass of control and process.

428
00:54:46.199 --> 00:54:49.500
They can all share the same program memory.

429
00:54:50.550 --> 00:54:54.119
So Here's a nice little summary comparison of the 2 modules.

430
00:54:54.119 --> 00:54:58.170
And multi processing is really nice, because it does avoid the.

431
00:54:58.170 --> 00:55:01.409
You can take advantage of different and course.

432
00:55:01.409 --> 00:55:05.489
Um, it's a lot more straightforward because you don't have to deal with things like.

433
00:55:05.489 --> 00:55:14.489
Race conditions, which can be a headache, but it does make interprocess communication slower because more difficult. I should say.

434
00:55:14.489 --> 00:55:18.780
Because it does have more overhead, a much larger memory footprint.

435
00:55:18.780 --> 00:55:24.329
The threading module is still kind of nice because you do to get that shared memory space.

436
00:55:24.329 --> 00:55:27.420
Faster inter, thread communication.

437
00:55:27.420 --> 00:55:30.480
I'm a smaller memory footprint. Um.

438
00:55:30.480 --> 00:55:35.099
But, in reality those limitations post by the really make.

439
00:55:35.099 --> 00:55:41.760
Multi processing the better option for most applications. We're trying to get the best performance.

440
00:55:41.760 --> 00:55:45.539
So, let's briefly talk about some of the other.

441
00:55:45.539 --> 00:55:49.230
Parallel Python library is out there too. Um.

442
00:55:49.230 --> 00:55:52.980
You have symmetric multi processing s amp P libraries.

443
00:55:52.980 --> 00:55:58.559
Um, they're using Concur, concurrent parallel program techniques, um, with shared resources.

444
00:55:58.559 --> 00:56:01.769
Um, some examples here be, um, this pie.

445
00:56:01.769 --> 00:56:08.789
Jesus asynchronous sockets and different polling mechanisms, uh, which is an open NP framework.

446
00:56:08.789 --> 00:56:17.849
In torque Pi, which is using multi threading the, um, cluster computing libraries and they're had other hand unlike.

447
00:56:17.849 --> 00:56:26.280
Libraries are not using shared resources. Um, so examples here would be the distributor task manager.

448
00:56:26.280 --> 00:56:31.769
Um, an, for pie, which is an implementation.

449
00:56:31.769 --> 00:56:36.869
You also have libraries that could do some cloud computing and interacting with.

450
00:56:36.869 --> 00:56:41.130
Resources that are managed by a 3rd party. Your company.

451
00:56:41.130 --> 00:56:46.260
Examples be like Google, cloud app, engine, Amazon Web services.

452
00:56:46.260 --> 00:56:53.670
An example, the specific library would be, um, high comps, which was able to interact with different clusters and clouds.

453
00:56:53.670 --> 00:57:00.090
And I didn't list any specific examples here, but you also have grid computing libraries.

454
00:57:00.090 --> 00:57:07.739
And these are libraries that are interacting with multiple computers and that might be connected across some big networks.

455
00:57:07.739 --> 00:57:15.420
Um, but in conclusion we've really seen that Python may not be the best choice.

456
00:57:15.420 --> 00:57:22.679
For your parallel computing applications, mostly because of that global interpreter LOC like we talked about.

457
00:57:22.679 --> 00:57:28.889
The fact that, you know, it's sort of makes running multi threaded applications, pretty difficult.

458
00:57:28.889 --> 00:57:32.010
Despite, you know, these downsides, you know.

459
00:57:32.010 --> 00:57:36.360
Python really is still 1 of the easiest languages to use and pick up.

460
00:57:36.360 --> 00:57:42.269
Um, it's very simple, um, and in applications where you are working with, you know, processes.

461
00:57:42.269 --> 00:57:45.840
And you can get some really helpful functionality for modules like.

462
00:57:45.840 --> 00:57:51.510
Um, multiprocessing, um, and some other modules and libraries out there.

463
00:57:52.650 --> 00:57:56.460
Um, but that's all I had to say for today. I just wanted to thank everybody for.

464
00:57:56.460 --> 00:58:00.420
Listening and just open it up for questions here.

465
00:58:00.420 --> 00:58:05.820
At the end of my presentation well, thank you.

466
00:58:05.820 --> 00:58:12.449
Questions anyone. Okay.

467
00:58:12.449 --> 00:58:21.570
So the fact that it's easy to use trumps the fact in practice. I think that that global interpreter WalkMe slow you down. That's what I'm guessing.

468
00:58:21.570 --> 00:58:28.949
Exactly, yeah, because your time is more important than the machine's time and lots of the cases.

469
00:58:28.949 --> 00:58:37.349
I'm being serious. Oh, the question about the, um, did I say I was told? I did, but I just can't remember the application now. Um.

470
00:58:37.349 --> 00:58:40.800
It's on the video is sending the video works.

471
00:58:41.849 --> 00:58:49.079
Okay, so Mark will have a comment. Yeah, I've done some work in this space in.

472
00:58:49.079 --> 00:58:50.215
The Gill is a problem,

473
00:58:50.215 --> 00:58:50.364
but,

474
00:58:50.364 --> 00:58:51.324
like you mentioned,

475
00:58:51.355 --> 00:58:53.034
the developers time is more important,

476
00:58:53.065 --> 00:58:55.704
and a lot of the libraries built,

477
00:58:55.735 --> 00:58:56.005
like,

478
00:58:56.005 --> 00:59:00.085
TensorFlow or pie torch for common parallel problems like a AI,

479
00:59:00.715 --> 00:59:01.284
um,

480
00:59:01.315 --> 00:59:03.925
extrapolate a lot of the process into,

481
00:59:03.954 --> 00:59:04.224
you know,

482
00:59:04.224 --> 00:59:10.014
separate optimized compiled non guild code so writing your own is tricky,

483
00:59:10.014 --> 00:59:11.485
but a lot of libraries do that,

484
00:59:11.485 --> 00:59:12.594
which makes Python.

485
00:59:13.230 --> 00:59:17.309
A better choice, but still not the best choice.

486
00:59:17.309 --> 00:59:20.519
Right.

487
00:59:20.519 --> 00:59:23.639
Okay, well Thank you. Yeah, I.

488
00:59:23.639 --> 00:59:32.219
Personally, you C, plus plus not Python for my program so I don't of that, but we also have see realizing locks for example, for memory.

489
00:59:32.219 --> 00:59:37.500
Allocations on the heap, so you you get those comparable.

490
00:59:37.500 --> 00:59:40.889
Difficulties in the other platforms also.

491
00:59:40.889 --> 00:59:52.889
Do those locks serve to, like, prevent race conditions when you're, like, accessing? Like, when you're when multiple things are running at the same time? Like, do they do they prevent memory from being read and written to at the same time?

492
01:00:17.070 --> 01:00:30.179
Computing and dumping stuff into a column in array so they need a unique index. So you have a counter for the number of items that have been allocated so far. So you read the calendar, you add 1 to it and record that number and write it back.

493
01:00:30.179 --> 01:00:37.829
And that's but enough to processes and or leave the read model, the read modify rights you can guess what will happen.

494
01:00:37.829 --> 01:00:41.699
And I've got an example and see what you.

495
01:00:41.699 --> 01:00:44.969
It actually will happen if you write a parallel program that.

496
01:00:44.969 --> 01:00:48.719
Different threads will step on each other's toes you might say.

497
01:00:50.159 --> 01:00:53.909
So, that's what they avoid with the serialization.

498
01:00:53.909 --> 01:00:59.280
And it's made worse if.

499
01:00:59.844 --> 01:01:10.255
Again, because, you know, when you write back to memory, it may not be reflected in the memory as seen by the other processes. Immediately it has to go through a cash. Perhaps.

500
01:01:10.255 --> 01:01:15.985
And if you try to force everything to be always consistent, immediately, that will be the cost will be just too great.

501
01:01:18.929 --> 01:01:33.150
Yeah, so that's the problem that you're fighting and that comes into the embarrassing parallelism question I put up a homework today, is that if you can decompose the stop. So it's not.

502
01:01:33.150 --> 01:01:39.929
Writing to the same memory, so the different processes different types are not running to the same memory. Then it gets much more parallelizable.

503
01:01:39.929 --> 01:01:44.969
But that depends on the application.

504
01:01:44.969 --> 01:01:52.500
Other comments. Okay. So we will hear about P threads now.

505
01:01:54.269 --> 01:01:59.159
Okay, I'll share my screen.

506
01:02:04.289 --> 01:02:08.190
We all can.

507
01:02:12.840 --> 01:02:16.110
Get a confirmation, you can see my screen.

508
01:02:17.159 --> 01:02:28.800
Yes, you can see your positive threads slide. Okay. Thanks. Because sometimes I'm double. Okay, so, me and, uh, Mark me and Bill for that. Pete Greg.

509
01:02:28.800 --> 01:02:33.480
In class, um, I'll just start off with a, uh.

510
01:02:33.480 --> 01:02:36.630
Basic overview, um.

511
01:02:36.630 --> 01:02:43.769
In a UNIX slash Linux operating system the CnC plus languages provide.

512
01:02:43.769 --> 01:02:47.639
P threads standard.

513
01:02:47.639 --> 01:02:54.000
For all red related functions so basically lets us create multiple threads.

514
01:02:54.000 --> 01:02:57.000
Uh, process flow.

515
01:02:57.000 --> 01:03:06.719
And it's defined as set of C, language types and procedure calls using a higher file.

516
01:03:06.719 --> 01:03:15.900
Um, in a thread library, so, uh, yes, specifies a set of interfaces for threaded programming.

517
01:03:15.900 --> 01:03:26.159
And what I found is most effective on a multi processor, multi core systems, where threads can be implemented on colonel level.

518
01:03:26.159 --> 01:03:30.840
Um, to achieve the speeds of execution.

519
01:03:30.840 --> 01:03:37.440
Um, and, yeah, so basically, you have to include the, the header file.

520
01:03:37.440 --> 01:03:42.809
At the beginning of your script to use all the functions that it gave him a library.

521
01:03:42.809 --> 01:03:50.699
And then to execute the file, you usually on the command line dash P. T. dash L.

522
01:03:50.699 --> 01:03:54.750
1, to 1, um.

523
01:03:56.940 --> 01:04:02.849
And, yeah, in case, you didn't know if a thread is a procedure running independently from his main program.

524
01:04:04.679 --> 01:04:09.840
So, for synchronization uses, uh, Texas.

525
01:04:09.840 --> 01:04:14.550
Um, and like like a work on.

526
01:04:14.550 --> 01:04:22.019
Uses the close this call and for thread both processes share the same memory.

527
01:04:22.019 --> 01:04:29.460
So, for example, the 4th column that uses a minimal sharing and the feedback create, um.

528
01:04:29.460 --> 01:04:33.690
Function calls cloning as much as possible.

529
01:04:33.690 --> 01:04:38.219
Um, food.

530
01:04:38.219 --> 01:04:42.059
For example, awesome. Thanks. So.

531
01:04:42.059 --> 01:04:52.110
Over here, we have example C code of how we can set up a p thread we import. The main thing. Is you import P threads that H. C.

532
01:04:52.110 --> 01:05:02.159
We've defined 4 different threads that we're gonna create and then in the main function, we're going to iterate through those 4 numbers and create 4 different threats.

533
01:05:02.159 --> 01:05:08.400
So really quick. Really simple is not too difficult. Set up.

534
01:05:09.355 --> 01:05:23.454
Next slide so a great way to understand what P threads does is to understand threads and processes versus processes as we've touched upon in the last slide show. So.

535
01:05:23.730 --> 01:05:28.800
Forking is a way to create a new process threading.

536
01:05:28.800 --> 01:05:32.159
Is a way to create more threads within a process.

537
01:05:32.159 --> 01:05:41.159
So, 14 will create a new process. Each has its own separate memory and due to this, it's a lot of overhead.

538
01:05:41.159 --> 01:05:46.019
Threading is is creating new threads within each process.

539
01:05:46.019 --> 01:05:50.849
Within each process, um, each thread.

540
01:05:50.849 --> 01:06:00.960
Will share the same piece of shared memory, share data code and files what differentiates them is the register address.

541
01:06:00.960 --> 01:06:10.559
Excellent, and a great analogy is that forking will create an entire new new lamp if you think about it.

542
01:06:10.559 --> 01:06:14.400
These 2 new lamps will suck up different.

543
01:06:14.400 --> 01:06:18.750
Um, power sources in your wall outlet, they shared different pieces of.

544
01:06:18.750 --> 01:06:25.320
Data when you want to turn 1 on, you got to turn it on concurrently. So you got to turn light 1 on, like, 2 on.

545
01:06:25.320 --> 01:06:36.449
Threads is like creating multiple threads. So each light bulk in the 2nd picture shares the same stem same power outlet.

546
01:06:36.449 --> 01:06:40.320
You're able to turn them all at once, turn them all off at once.

547
01:06:40.320 --> 01:06:45.719
And it's a much better way to run things.

548
01:06:45.719 --> 01:06:48.900
In parallel.

549
01:06:51.329 --> 01:06:55.199
Yep, this is these are our resources and are there any questions.

550
01:07:12.539 --> 01:07:19.949
So, I was just having trouble muting myself. Um.

551
01:07:21.420 --> 01:07:26.820
And so anyone questions.

552
01:07:26.820 --> 01:07:30.239
You don't see threads is widely used or not.

553
01:07:30.239 --> 01:07:35.760
Any comments on should I start teaching it more in class? Perhaps.

554
01:07:37.980 --> 01:07:41.400
I think it'd be interesting. I don't have any experience with.

555
01:07:41.400 --> 01:07:45.420
Threads or anything. So do you need to hear about.

556
01:07:45.420 --> 01:07:49.769
Okay, anyone else have other questions.

557
01:07:49.769 --> 01:07:55.199
We have 1 little question you showed us the example using P threads and see.

558
01:07:55.199 --> 01:08:01.110
Can it also be used in other programs like Python? Yeah.

559
01:08:01.110 --> 01:08:12.659
Yes, I don't have too much experience with that. I know um, when I code and Python, I use, which.

560
01:08:12.659 --> 01:08:16.289
It doesn't really create threads. It just makes.

561
01:08:16.289 --> 01:08:26.279
Functions we on each other or something, but you can, uh, tend to achieve the same thing. The doesn't sound doesn't mean Python. It means.

562
01:08:27.479 --> 01:08:33.960
The.

563
01:08:33.960 --> 01:08:43.979
Yeah, the thing is that it's it's a standard layer on top of the hardware so you could have different back ends to it. I believe and.

564
01:08:43.979 --> 01:08:47.189
You could have the standard I guess you could have the standard layer and Python.

565
01:08:48.329 --> 01:08:57.149
So other questions or comments.

566
01:08:58.260 --> 01:09:08.520
Okay, so let's hear Adrian James tell us about what? Cool things the astronomers are doing with parallel computers.

567
01:09:33.630 --> 01:09:38.310
Hello can you guys see my screen? All right? No, we can't.

568
01:09:38.310 --> 01:09:46.020
Okay, it's coming it. It came in for a 2nd and then vanished.

569
01:09:47.369 --> 01:09:51.359
All right, let me try this again. Um.

570
01:09:53.189 --> 01:09:56.189
How about now? Yes.

571
01:09:56.189 --> 01:10:01.140
All right cool.

572
01:10:03.925 --> 01:10:16.135
All right, so I'm going to be talking about parallel computing and astronomy. Uh, I'll be specifically focusing on a NASA project called high performance space, flight computing, or.

573
01:10:17.550 --> 01:10:27.149
Um, yeah, so this is a project undertaken by, uh, NASA and.

574
01:10:27.149 --> 01:10:35.395
Uh, it was formulated by some engineers at jet propulsion laboratory, or and this was formulated in recent years.

575
01:10:35.395 --> 01:10:46.314
I believe this talks 1st started around 2015 and it was kind of a result of the sort of stagnation of the of space flight computing specifically in the software.

576
01:10:48.204 --> 01:11:00.654
Um, so the project has a hardware and software aspect to it in terms of hardware they're designing in house, some new multi core computing chips as well as multiple processing queries on each chip.

577
01:11:01.465 --> 01:11:02.604
But more importantly,

578
01:11:02.755 --> 01:11:07.555
NASA is looking into developing new operating software and 1,

579
01:11:07.555 --> 01:11:14.814
such surrogate for this sort of new development of parallel computing is in the form of the dissent in landing computer,

580
01:11:14.845 --> 01:11:17.244
which is maintained at NASA Johnson Space Center.

581
01:11:18.960 --> 01:11:28.770
So, space flight computing, as it stands, um, looking into it and what currently exists out there, it is overwhelmingly cereal.

582
01:11:28.770 --> 01:11:42.539
Um, so NASA sees the sort of landscape of parallel computing, and they think to themselves, um, this seems like a very interesting field that we could sort of pioneer for the future. Um.

583
01:11:42.539 --> 01:11:56.640
So, yeah, NASA projects, a lot of them are pushing the boundaries of what can be achieved through hardware. Uh, but the software is something that they are just now starting to push forward in order to catch up with everything else.

584
01:11:58.375 --> 01:12:10.524
The descent and landing computer it is part of a larger NASA project known as Splice and this project is dedicated to implementing advanced technologies on space crafts.

585
01:12:11.244 --> 01:12:16.225
The descendant landing computer has to perform massively resource intensive algorithms,

586
01:12:16.225 --> 01:12:16.404
like,

587
01:12:17.125 --> 01:12:26.904
or rain relative navigation as well as compute things like video processing and graphics in order to manage.

588
01:12:26.935 --> 01:12:27.385
Um.

589
01:12:27.659 --> 01:12:37.680
All of the data that's coming in through the sensors. Uh, so parallelization of these algorithms seems like a very beneficial endeavor for NASA to pursue.

590
01:12:39.960 --> 01:12:44.220
So, speaking, specifically about the descent and landing computer.

591
01:12:44.220 --> 01:12:49.524
Uh, this was really the 1st, real test for parallelization of spaceflight algorithms.

592
01:12:49.885 --> 01:13:00.324
It's still in its early stages and as previously mentioned, it's part of the safe and precise landing integrated capabilities, evolution or splice project.

593
01:13:01.045 --> 01:13:07.375
And as it stands, it's acting as the circuit board for the, the high performance spaces by computer.

594
01:13:40.585 --> 01:13:54.324
Ah, so I worked specifically with the, which is the 2nd iteration of the dissent and landing computer. Um, and the, which is the 1st iteration was recently tested on a blue origin, new shepherd flight.

595
01:13:54.534 --> 01:13:59.515
So they've been running tests since about 2020. um, and I think they've run, like, 4 or 5 tests by now.

596
01:14:39.420 --> 01:14:51.359
So, parallel computing in flight is very novel, and it's very exciting proposition. Um, and because it's mission critical, you can't have faulty software, uh, you know.

597
01:14:52.949 --> 01:15:07.734
The price of an entire space ship feeling in flight the development for it is going relatively slowly, but conversely there's still a lot of exciting applications that can be improved using parallelization that are focused mainly on ground.

598
01:15:08.039 --> 01:15:21.029
For example, it was discussed, I believe, in an earlier lecture at peril computing is being used to keep track of stars and various star systems that we are observing as well as sort of.

599
01:15:21.029 --> 01:15:28.680
Uh, performing the calculations of tracking celestial objects that are more local, specifically different.

600
01:15:28.680 --> 01:15:30.505
Planets within our solar system,

601
01:15:30.774 --> 01:15:35.574
or if there's any media or asteroids that we need to be on the lookout for parallel computing,

602
01:15:35.574 --> 01:15:44.215
provides us the sort of resources and computational ability to keep track of all these dynamic and moving parts.

603
01:15:46.350 --> 01:15:52.710
And that's the end of my presentation Thank you all for listening. And here are some of the works cited.

604
01:15:52.710 --> 01:16:02.159
Cool. Thank you. Great to hear somebody involved in the project questions.

605
01:16:02.159 --> 01:16:08.159
Anyone.

606
01:16:08.159 --> 01:16:13.590
Oh, it's a silly question. And so what do they hope to gain from parallel is finalization.

607
01:16:13.590 --> 01:16:16.890
It's better performance.

608
01:16:16.890 --> 01:16:21.090
Yeah, so I believe the sort of.

609
01:16:21.204 --> 01:16:28.314
Rationale behind it was, uh, NASA is implementing a lot of these new complex algorithms.

610
01:16:28.734 --> 01:16:37.914
Uh, and I feel like they find themselves more and more sort of balancing a lot of resources that they don't have a lot of, uh, access to. So.

611
01:17:05.220 --> 01:17:09.720
Right. Sounds sensible other comments.

612
01:17:09.720 --> 01:17:17.760
Okay, thank you. So I think we'll do 1 more talk today and then finish off on.

613
01:17:17.760 --> 01:17:25.409
Monday, so, Colin, would you like to tell us about the 3rd and 4th computers?

614
01:17:25.409 --> 01:17:29.970
Sure, let me share my screen here.

615
01:17:29.970 --> 01:17:33.359
I was trying to see something.

616
01:17:33.359 --> 01:17:36.960
Okay.

617
01:17:44.579 --> 01:17:48.750
All right does that show up? Okay? Oh, yes, it does.

618
01:17:48.750 --> 01:17:57.420
Already, so I'm going to talk a bit about computers 3 and 4 on the top 500 list.

619
01:17:57.420 --> 01:18:04.199
Starting with number 3, uh, Sierra, which was previously mentioned by 1 of the other groups.

620
01:18:04.199 --> 01:18:13.380
So, just to give a little bit of background, Sierra 1st went online back in 2018.

621
01:18:13.380 --> 01:18:25.380
And it was actually used a bit, um, before it was fully completed. So users were able to use portions of the system in early 2018. and then it was completed later in the year.

622
01:18:25.380 --> 01:18:29.819
Um, as a replacement to, um.

623
01:18:29.819 --> 01:18:37.739
Which was at 1 point a number 1 computer on the top 500 list. Um, uh, I suppose a decade ago at this point.

624
01:18:37.739 --> 01:18:49.319
And so this computer was commissioned by the National nuclear security administration, and it's hosted at the Lawrence Livermore lab over in California.

625
01:18:49.319 --> 01:19:03.300
And this isn't a computer that's really open for a wide variety of public use, like, for research, or by students or anything like that. This is specifically for their advanced stimulation and computing program.

626
01:19:03.300 --> 01:19:13.470
Which is part of that program they're looking to, instead of running, um, underground, nuclear tests on new weapon designs. Um, they're looking to.

627
01:19:13.470 --> 01:19:25.380
Uh, test these weapons via simulation on, uh, on this computer and then run, of course, any other necessary or relevant engineering and nuclear science calculations.

628
01:19:25.380 --> 01:19:33.899
And as part of this investment, they invested 150Million dollars to get the system built.

629
01:19:35.399 --> 01:19:43.380
So, getting into specifications, this is a heterogeneous system leveraging both CPUs. Um.

630
01:19:43.380 --> 01:19:46.470
You'll see in some of the, um.

631
01:19:46.470 --> 01:19:56.100
Press releases regarding the system they highlight the use of is quite a bit as its predecessor didn't utilize.

632
01:19:56.100 --> 01:20:04.739
So, they were particularly excited about what the could bring to the nuclear science calculation calculations that they were performing.

633
01:20:04.739 --> 01:20:15.899
But on the CPU side of things, they used an IBM power 9 architecture with 4,320 compute nodes.

634
01:20:16.585 --> 01:20:30.145
And within each compute node, you had 2 CPUs with each CPU being 20 of 22 cores, giving us a total of over 190,000, total CPU cores for the compute portion of the system.

635
01:20:31.800 --> 01:20:38.489
And as you can see in that snippet over on the right there, that was from a, um.

636
01:20:38.489 --> 01:20:52.199
In progress updates that I was able to find on the system that the Lawrence Livermore laboratory, um, had online and that just lists, um, how, uh, the compute racks. Like we saw, um.

637
01:20:52.199 --> 01:21:01.229
Couple slides ago here how those were allocated between the different portions of the system from the compute portion to the network and storage portions.

638
01:21:01.229 --> 01:21:11.670
Besides the on each of those compute nodes, uh, with those 2 CPUs you had 4 and video.

639
01:21:11.670 --> 01:21:23.039
So a bit of a downgrade from Summit, which I think he was mentioned, had 6 per node this has 4 per node for over 17,000 total in the system.

640
01:21:23.039 --> 01:21:28.350
On the memory side of things they were able to use for.

641
01:21:28.350 --> 01:21:39.479
And they had just under 1.3 petabytes of it and being of a being a newer architecture power 9, allowed them to use 4.0. um.

642
01:21:39.479 --> 01:21:49.529
As an interconnect method within each node and then, of course, Volta allowed them to use link to facilitate communication between.

643
01:21:50.635 --> 01:22:04.164
And then, as far as communication between the notes themselves, they use a fairly standard solution for the industry standard solution from Mellanox called their infant band, which, I believe, permitted communication up to 100 gigabits per.

644
01:22:05.399 --> 01:22:08.460
Per 2nd, or a 1000 sorry gigabits per. 2nd.

645
01:22:08.460 --> 01:22:13.380
Um, and then sort of storage and, um.

646
01:22:13.380 --> 01:22:27.119
They equipped each node with 1.6, um, terabytes of, um, storage and then they on the software side of things. They kept things fairly standard using, uh, Red hat on the system.

647
01:22:29.875 --> 01:22:43.494
Source performance, I won't go through every number here, but, um, just for reference um, when was disassembled its predecessor that, uh, that system was I believe ranked number 22 on the top 500 list.

648
01:22:43.494 --> 01:22:50.635
Um, this of course now, being at number 3. so, a significantly better performer. But at 5 times, the power efficiency.

649
01:22:52.555 --> 01:23:06.805
Of course, that only means so much when you're talking about a, uh, an 11 megawatt peak system. So quite a bit of power, but significantly more efficient than a number 4 before on the top 500 list that I'll talk about. Um.

650
01:23:07.619 --> 01:23:17.430
Here in just a moment that being the Sunday title light, which is located in Russia, China.

651
01:23:17.430 --> 01:23:21.899
Now, this system is a little bit older um.

652
01:23:21.899 --> 01:23:34.350
It went online back in 2016 to replace the 2, which was another, um, number 1 computer on the top 500 list.

653
01:23:34.350 --> 01:23:45.300
And this, this computer was developed by the Chinese, National Research Center for parallel computing, engineering and technology.

654
01:23:45.300 --> 01:23:55.500
Um, and working with some universities in the area, they opted to host the computer at the National supercomputer center in rashi.

655
01:23:55.500 --> 01:24:09.239
Now, unlike Sierra, which was constructed for a specific research purpose, this is a very much so multipurpose computer for.

656
01:24:09.239 --> 01:24:13.470
Used by a variety of groups and universities in the area.

657
01:24:13.470 --> 01:24:24.000
You can see a list there from a quote snippet from 1 of the universities as far as what they use the computer for, from weather aerospace to biomedicine.

658
01:24:24.000 --> 01:24:29.159
Et cetera, so used for quite a number of things over there in China.

659
01:24:29.159 --> 01:24:39.479
But this was a, as far as equivalent to the U s dollar, this was over at 273Million dollar investment, which.

660
01:24:39.479 --> 01:24:48.180
Is to be expected, considering the proprietary nature of much of this much of the solution.

661
01:24:48.180 --> 01:24:53.670
Now, they unlike to the.

662
01:24:53.670 --> 01:24:59.520
Did not, um, they didn't go with a, uh, like a Pre.

663
01:24:59.520 --> 01:25:13.680
Predesigned Intel chip, or anything like that. They went with a custom, uh, reduced instruction set processor that they designed themselves. Um, you can see a picture of it and it's heat spread over on the right there.

664
01:25:14.784 --> 01:25:25.765
But this reduced instructions set ship was a mini core, a 260 mini core processor, which is similar to multi core. Except each core is a bit simplified.

665
01:25:26.005 --> 01:25:32.574
Um, and the system as a whole is optimized or the chip as a whole is optimized for parallel computing purposes. And, um.

666
01:25:34.409 --> 01:25:39.479
As part of that very much optimized for Cindy instructions.

667
01:25:39.479 --> 01:25:44.220
But when you look at how many, um.

668
01:25:44.220 --> 01:25:53.399
Processors they included in the system and how many quarters those processors have you have over 10Million total processing cores in the system.

669
01:25:53.399 --> 01:25:57.930
And, of course, those processors are the only processors used.

670
01:25:57.930 --> 01:26:05.010
Um, as part of, uh, as part of this computer, they don't use any distinct or anything like that.

671
01:26:05.010 --> 01:26:15.869
And the memory side of things, a little bit less than the Sierra at just above 1.3 petabytes.

672
01:26:15.869 --> 01:26:20.819
And unfortunately, a lot of information about this system, um.

673
01:26:20.819 --> 01:26:27.300
Has been held pretty close to the chest by China, so I couldn't find too much on.

674
01:26:27.300 --> 01:26:32.819
Um, how on how some of these custom solutions.

675
01:26:32.819 --> 01:26:45.779
Or a design, but for example, their interconnect solution, they call the subway network. Um, and I couldn't find too much info about that. But from what I could gather, it seemed to be 3.0 based.

676
01:26:45.779 --> 01:26:58.260
And then, as far as storage, unlike Sierra, which has an on each compute node, um, this system doesn't have any non vial tile memory or storage.

677
01:26:58.260 --> 01:27:08.399
Within the system at all, it totally relies on contacting and pulling from external storage servers for whatever it needs.

678
01:27:08.399 --> 01:27:18.270
And then, as far as the operating system, a little bit custom, they call it. There's some way raised, but still very much Linux based.

679
01:27:20.039 --> 01:27:31.529
Smartest performance, it's theoretical peak in performance, uh, is listed on top 500 are actually quite close to Sierra. Um.

680
01:27:49.680 --> 01:27:53.729
So, I'm sure I pull it down a bit in the ranking.

681
01:27:53.729 --> 01:28:00.989
As far as power usage, very efficient relative to its process to its predecessor. Um.

682
01:28:00.989 --> 01:28:15.210
Using 14% less energy than the 2? Well, being almost 3 times as fast, but still running at over 15 megawatts at peak. So not quite as efficient as, um.

683
01:28:15.210 --> 01:28:20.130
As the Sierra supercomputer, um, and then.

684
01:28:20.130 --> 01:28:25.439
1, last interesting piece of information that I found.

685
01:28:25.439 --> 01:28:39.869
We're actually rumors about a potential successor to this system called ocean light and from what I saw, um, ocean lights rumored to be a 1.3 X of flop. Um, computer.

686
01:28:39.869 --> 01:28:46.140
That went online as of last year, um.

687
01:28:46.140 --> 01:29:01.020
And these are these rumors were substantiated by, uh, some industry professionals, fairly close to the top 500. um, so they're not completely baseless, but unfortunately, China hasn't gone public with anything about this potential system.

688
01:29:01.020 --> 01:29:14.609
But it'll be interesting to see here in the next couple of years if that turns out to be true. Um, as other countries are obviously currently developing and working on their own, um, capable supercomputers. So.

689
01:29:14.609 --> 01:29:19.229
That'll be interesting. Interesting to see, uh, going forward.

690
01:29:19.229 --> 01:29:23.489
I believe that is all I have.

691
01:29:23.489 --> 01:29:27.479
And there are my references with all those links.

692
01:29:32.159 --> 01:29:40.409
Well, thank you very much and yeah, I think there was an announcement or a, some story.

693
01:29:40.409 --> 01:29:44.939
Um, in the last month or 2.

694
01:29:44.939 --> 01:29:55.500
About about some new to new Chinese, super computers that it just came online.

695
01:29:55.500 --> 01:29:59.010
Hmm, yeah um.

696
01:29:59.010 --> 01:30:03.630
Yeah, 1 of those was this, the ocean light which, um.

697
01:30:03.630 --> 01:30:06.630
I believe that's the 1 that's, um.

698
01:30:06.630 --> 01:30:12.659
We know a little bit more about, um, at least as far as the rumors go, but we still.

699
01:30:12.659 --> 01:30:18.119
We still don't know anything for sure as again. Shanna hasn't gone public with any of that information.

700
01:30:20.369 --> 01:30:24.569
Okay, thank you. Well, even in the United States is.

701
01:30:24.569 --> 01:30:31.289
A comment somewhere that in Houston, Texas there might even be a supercomputer or 2 that's not on the list. So.

702
01:30:32.520 --> 01:30:40.350
Owned by oil companies, anyone else have any questions comments, et cetera.

703
01:30:40.350 --> 01:30:49.859
Okay, thanks everyone. So, that was, um, 8 talks today. We'll have remaining 4 talks on Monday and then I'll continue on.

704
01:30:49.859 --> 01:30:53.699
Finish off that Lawrence Livermore thing, and then get it into.

705
01:30:53.699 --> 01:31:01.140
Other parallel topics, so unless anyone has any questions.

706
01:31:01.140 --> 01:31:05.850
Then have a good weekend.

707
01:31:06.265 --> 01:31:20.005
What should we do to set up for class in terms of software? Well, what I'll do is, I'll set up when do I get to talking about stuff on my parallel computer I'll give everyone an account on it and I'll walk you through.

708
01:31:20.335 --> 01:31:22.494
You'll access it with s. S. H.

709
01:31:22.770 --> 01:31:26.909
Preferably most easily from another Linux system. All you could do it from.

710
01:31:26.909 --> 01:31:31.229
A window system, um, and so that.

711
01:31:31.229 --> 01:31:35.310
That will be your parallel access for the next part of the course.

712
01:31:35.310 --> 01:31:40.619
Also, I put a homework online and That'll just asking.

713
01:31:40.619 --> 01:31:44.819
Asking some questions due in a week.

714
01:31:44.819 --> 01:31:49.380
Let's just see what else came up on chat here and.

715
01:31:52.350 --> 01:31:58.588
Yeah, so that other questions.

716
01:32:00.238 --> 01:32:08.069
People wanted the weird set of I've got now I've got 2 machines in front of me. I think pad and an iPad. I'm using the think pad.

717
01:32:08.069 --> 01:32:13.708
Um, the audio is going through the think pad the video's going over the iPad just crazy things.

718
01:32:13.708 --> 01:32:18.179
If there's no other questions or anything.

719
01:32:18.179 --> 01:32:24.029
Then for the hardware.

720
01:32:26.849 --> 01:32:30.628
What do we set up for the class in terms of.

721
01:32:34.679 --> 01:32:40.019
How do we solve them? Yeah, I'm confused.

722
01:32:41.069 --> 01:32:46.859
You're talking about the homework, or there is no programming on the homework unless I screwed something up badly.

723
01:32:49.349 --> 01:32:54.238
Let me look up at the home there. Okay. Just a 2nd here. Let me, um.

724
01:32:54.238 --> 01:32:57.509
Hold on homework to.

725
01:32:58.649 --> 01:33:05.759
There's no programming, they're all questions based on stuff. We've talked about stuff. It's on the Lawrence Livermore tutorial.

726
01:33:05.759 --> 01:33:12.689
So, and I would have trouble sharing that. Yeah.

727
01:33:17.573 --> 01:33:27.323
You see my strategy for the classes. I wanted to have you guys talking at the 1st so you can see a wide use of parallel applications, but we can't do everything simultaneously.

728
01:33:27.323 --> 01:33:36.774
So, if we have your class, your presentations, 1st, then we start the programming after that that was a decision I made about what to do. 1st. So.

729
01:33:39.029 --> 01:33:44.519
Other questions, if not, then.

730
01:33:47.099 --> 01:33:49.019
See, you Monday.