Comet to Expanse Transition Tutorial

Presented on Thursday, October 29, 2020 by Mary Thomas, Mahidhar Tatineni, Marty Kandes, Nicole Wolter, and Subha Sivagnanam.

SDSC’s new supercomputer, Expanse, goes into full production on November 1, 2020 and replaces Comet, which will be retired on March 31, 2021. This webinar covers the basics of making a successful transition from Comet to Expanse with both a broad overview of the most important issues and a deep dive into specifics to help you get started.

Slides for this tutorial can be found on the github repo linked below.

Github Repo | Chat Text | Mobile Version


Scrollable/Clickable Transcript

And we can go ahead and get started.
Hello, everyone.
Thanks very much for joining.
I'm Jeff Sale.
Today, we will be having a three and a half hour tutorial on the Comet to Expanse transition.
We have several presenters.
And I'm sure it's going to be a great presentation.
Before we get started, I would like to point out the XSEDE code of conduct.
Everyone is expected to be on their best behavior but if anyone observes any inappropriate conduct.
The links are here and you can reach out to xsede.
org/codeofconduct for that.
we would also like to emphasize that
We're committed to providing training events that foster inclusion and show respect for all.
This is a very important effort within the XSEDE training community to minimize the instances of inappropriate documentation.
And with that, I will go ahead and hand it off to Mary Thomas, who will be sort of the manager of proceedings take away, Mary.
.
.
.
Mary you're muted.
I am muted.
I am now unmuted.
So Jeff, you have those slides right
If not, I will share.
There we go.
Let's go to the next slide.
So the goal of this tutorial or this workshop is to
Show you guys what you are going to have to do a bit differently on Expanse in there are some differences.
The architecture is different the operating system is
we're now using CentOS, but we're using SLURM.
But the module environments, a little bit different.
So today we'll just try and show you the new features on Expanse and get you familiar and and
Move on to the next slide, please.
Jeff So the next slide just shows our schedule.
And I'm not sure that it got posted to the website yet, but we'll start with Mahidhar he'll talk about an overview of Expanse to talk about SLURM.
And then we'll have a short break and then we'll talk about modules and another break and then job charging and using the Expanse portal, a new a new application.
Running on Expanse and interactive computing.
And then we'll wrap up with data transfer mechanisms.
If you have questions, type them into the chat.
We'll have people that are working and supporting the speakers to answer any questions you might have.
And with that, let's start with Mahidhar to do overview of systems and applications.
Start by sharing and hopefully the right desktop.
Can everyone see the slide.
Okay.
It looks good.
So yeah, I'll start off with a Expand system overview and Then go into in the second presentation, I'll go into how to run jobs on Expanse So the outline of the talk, essentially.
Just introducing the system to look at the system architecture specifically look at the AMD EPYC processor architecture of the hardware details the NUMA options and Basically how how to use them and then Look at some of the innovative features in Expanse itself.
And then talk about allocations and Summarize what we look at so Expanse basically is a result of an NSF award.
It's basically
A category one capacity system.
The PI of the project is Mike Norman and a lot of us co-PIs there.
And then Basically, the primary vendor for the system is Dell, Aeon for the storage.
And then we have a lot of other components processors from AMD.
The Intel processors on the GPU node.
The host processor, the NVIDIA GPUs and then Mellanox interconnect.
So it's a System that's made up of 13 scalable units and I'll go into a little detail on each scalable unit in the next few slides.
There are 728 standard compute nodes, 52 GPU nodes, and each GPU node has four GPUs.
So you have a total of 208 GPUs.
We have four live memory nodes.
So if you think about it, this is basically similar to our setup on Comet.
So it's best you can almost see the evolution coming from Comet to Expanse.
In terms of the data we have a 12 petabyte Lustre file system, which can do about 140 gigabytes per second on Rights and 200 kilo 200 K io operations per second on the metric data side.
Actually, it's much bigger than that and then As we've heard on a lot of the SDSC systems.
We have Node local NVME storage.
For fast IO local storage.
We are going to have a seven petabytes Seph-based objects store.
This is not going to be available right at the beginning because
We're essentially going to restructure some existing storage.
So that's going to show up sometime next year, but that will be an additional storage option that will be available on Expanse.
So quick overview of the compute side and the system itself.
We looked at all the numbers.
So overall, it's 93,000 compute cores so you almost double the Comet Core count.
And based on the benchmarks.
We run we expect
Close to I mean more than 2X throughput over Comet.
Most of the applications will see a per core improvement or a Haswell, we see anywhere between Even performance with per core going out to almost 1.
8X on some applications.
So it really depends on the application.
And since the nodes have 128 cores.
So on a per node basis, you're going to see a pretty large improvement in performance.
And then given the 2X core counts.
The throughputs easily want to go 2X, we expect a smooth transition going from Intel to the AMD processors, we've compiled and run most of the common software packages.
Lot of benchmarking and it seems good so far.
And whatever feedback we've gotten from the early user testing kind of confirms that people seem to be happy with the system in terms of
Running and compiling and Right now, the system is in early user period, as I mentioned, we expect product production in November.
Which I mean, should be pretty early November.
And the operations scheduled for five years.
So let's go into the rack-based scalable compute unit so As I mentioned, basically 13 scalable compute units over all we're looking So let me go to the next slide.
So each SSCU you is essentially designed To have full bisection bandwidth within the rack.
So you have 56 compute nodes, which amounts to a lot of course it's 7168 cores compared to Comet which had 1700 cores in a rack.
We also will have for GPU nodes in on SSCU and all of its connected through with HDR switch HDR switch which has It's a 200 gig HDR switch.
So, what we are doing is putting the cables to get hundred HDR 100 to each of the compute and GPU nodes.
So what that does is gives you the 60 connections down to the nodes and then 20 and then 200 gigs going up.
So it's effectively works, or two, or 321 over-subscription.
Each of these standard compute nodes is to AMD EPYC 7742 processors 2.
25 gigahertz.
They are 128 Zen 2 CPU cores essentially And It features PCI Jen for 256 gigs of RAM on each node and That's DDR for 3.
2 gigahertz, I think, and a terabyte of NVME so you got a lot more space per node, but you also have a lot more cores per node so
Something to keep in mind when you're running is to use the local scratch if you if you have an AI ops heavy workload.
On the GPU nodes.
We have four NVIDIA 132 big GDR on each V100 GPU.
1.
6 terabytes of NVME, as I mentioned, their Intel CPUs on the GPU node.
And then going out to the network diagram you can see basically you have 56 compute nodes, four GPU nodes, and then the uplink of 10X 200 gigs.
Overall the network plugs into a lot of the external fast networks listed out there so Which is similar to what we had on Expanse but we have a 25 gig connection on each compute node, so you can get quite a bit of external connectivity.
So next let's look at the AMD EPYC processor architecture.
So it has eight core complex dies.
So if you look at the picture on the right you have each of these CCDs has eight cores essentially
And there is a IO die in the center, which is at 40 nanometers and CCDs or seven nanometer.
So this kind of innovative design lets them.
Pack in a lot more cores and And I through the IO die there are multiple ways of kind of booting up this node in terms of NUMA domains.
And I'll talk about that slides coming up.
So each of those core complex dies.
So, this
Picture on the next slide is one of these green CCD CCDs in the diagram.
So there are two core complexes on each of them, and each core complex has four Zen2 cores which L3 cache of 16 Meg.
There's a lot of L3 cache on this machine insights 256 megabytes of memory cache total Each core also has private L2 cache of 512 kilobytes, and So, so now you can kind of see that there is a hierarchy setup.
They're basically you have the IO die you have the
CPUs split off into the CCDs and then each CCD has two core complex dies with the I mean with the shared L3 caches.
So in terms of how these can be booted up there are several options where
You can essentially do NUMA domains on each socket.
So multiple NUMA domains and for the HPC workloads, the best option is to do an NPS4 which is based four NUMA domains per socket.
And the way this works is the processor is partition into for NUMA domain.
So you have two CCDs and to memory channels in each of the new model makes
This lets you if you do the binding right you can kind of get scaling out to all for like the full socket.
Because of the memory channel has been split this way.
The PCI devices will be local to one of these NUMA domains and So the network card could be on one of them.
For example, so
So the main thing to keep in mind is that there is some complexity in the NUMA domains.
So this is just on one socket.
Remember, so you then you have two sockets.
That gives you another level of memory for NUMA memory, right.
So you have to be really careful when you're running jobs on the system to be binding and using the layouts, the right way.
So there are several compilers available on the system.
So we've tested applications with AOCC, gnu, and Intel compilers and with MPI versions.
We have got MVAPICH2, OpenMPI, and Intel MPI
Right now, we don't have any defaults because what we found is depending on the applications different sets of compilers work better for some and worse for others, kind of, so you know you should like test out things For the applications we are installing ourselves we primarily stuck to the GCC compilers and then used AOCC, where appropriate, where it was better performing basically This is a moving target as what I would say because there's a lot of optimizations going on these processes and we are involved with a lot of different sites which are running AMD processors and also with the AMD themselves developing application recipes.
So things might change but you have a lot of options right now.
So as I mentioned, at runtime.
It's very important to bind correctly so you can for MPI jobs basically use open domain if it's Intel MPI it's OpenMPI just do the mapping.
You can also use SLURM directives to SLURM options to essentially bind the right way.
And we are actually developing some have some tools to help out.
Especially for pthreads codes and for hybrid MPI open and being cases.
So, we will have an ibrun and an affinity script available.
So the type of applications we run on Expanse, we've gone through quite a few benchmarking efforts basically run ChromeX, NAMD, Neuron, Openform.
Quantum espresso RAXML.
And so you can see there's a broad range of applications that we have tested and kind of different application areas, all kinds of combinations.
Overall, what we've seen as performance engine matching on a per core basis to almost 1.
8X faster on a per core basis.
On the GPU side we have Benchmarks on the AMD.
MD code codes like NAMD and Amber and also a lot of the machine learning kind of codes.
What we've seen is greater than 1.
5 X per GPU improvement over the Comet P100 nodes and for some quick codes, it's closer to 2X at 1.
852 to export Amber, for example.
So that actually brings me to the GPU node architecture.
So it's four V100s as I mentioned 32 gigs on each V100 GPU, the
The host node itself as basically 384 gigabytes of RAM and 1.
6 terabytes of PCI NVME and the processers on the nodes are Intel XEON 6248 which are basically cascade lakes.
And if you look at the topology.
You can see that the four GPUs are split between the two sockets and
The numbering is already even in terms of all the even numbers are on one socket and the odd numbers on others.
Okay, so you can see the
Numbering there and then You have NVLink between the GPUs.
So that is a difference from Comet on Comet if you had to talk between two GPUs on different sockets, you would have to go through the host
CPU and through across, but now with NVLink should have much better performance.
So in terms of software stack will support a pretty broad application based we are trying to replicate what's on Comet with a lot of the packages, we Commonly used packages like bioinformatics, molecular dynamics, machine learning.
Quantum Chemistry, structural mechanics, visualization and try to get things on
You may not have your favorite package on day one, but we can definitely work to install as people come online.
We have a pretty big chunk of it replicated already so
Today's talk, basically we are aimed at informing you about like basically the difference is going between common to the Expanse application environment Marty's going to talk a lot about Modules, and the setup.
For those who are curious.
The main mechanism for installs.
We are using is Spack.
This lets us leverage a lot of the work that's been going on in the community, spack community to
Port and develop a lot of HPC software stack and AMD has its own Fork of the Spack base and they are putting in performance.
Application.
You know packages into that.
So that's an ongoing effort.
So as things become available.
We will update them with AMD version so that we get the best performance.
We will continue to support singularity based containerization on Expanse, just like on Comet.
So if so if you've been using containers on Comet, you can do the same on Expanse.
So some other Expanse, in addition to the standard way of You know, standard compute nodes and GPU nodes and the SLURM based scheduling also has a few innovative features.
So we have an option to integrate with public clouds.
It of course needs a lot of work.
So this is something that would be done essentially on a project basis and will require some interaction with SDSC So it's not a push button kind of thing.
But we have demonstrated this with work on Comet and we're going to replicate that on On Expanse basically you will be able to use SLURM to submit your job and then there's a whole infrastructure to make that work seamlessly so that you end up running the job on a cloud resource.
Early work was kind of used for Running CIPRES jobs from the gateway.
This approach is cloud agnostic and kind of aimed to support most of the cloud providers.
So the So there are options behind the back end.
Basically, you have to look at data movement data and cloud and
Lots of different things to look at.
So if you're interested.
This is something that you would have to request additional support for
Similarly, we are also looking at composible systems.
This is so that we can expand beyond the boundaries of what's available on Expanse hardware.
So, for example,
You may have some pieces that are available on Expanse some pieces that are available on a public cloud, or a different resource.
And what we what we are trying to do on Expanse is to be able to carve out some of the nodes using and set up a kubernetes cluster on that and and Basically federate with other communities.
Resources outside of the system.
Again, this is going to be
needing like a little more advanced support.
So you would need to request that
As part of an XRAC request and also make sure That there is a case to do it.
So this is not something you'd want to do lightly.
It's more if you if you have a workflow that actually needs something like this, then
we can definitely make this happen.
So then in terms of the rest of the support mechanism.
So we have, Expanse going to be integrated as an XSEDE level one resource so
You can have the support coming through the XSEDE ticketing system, as you had on Comet In terms of transitioning, we have an overlap of six months between Comet and Expanse or rather it's five months, but quite a bit of overlap.
So we can definitely work on the transition with a lot of users.
And today's workshop is essentially the first in a set that basically will help you transition from Comet to Expanse As I mentioned, you have advanced support available for cloud integration and composible systems.
We're also starting a new program by HPC@MSI targeted at minority serving institutions, so we can award.
Some discretionary time.
But let's let's people get on the system in a rapid fashion.
And we'll support.
such projects.
So in terms of allocations.
Expanse will show up as a resource right now and XSEDE XRAC window.
So if you're looking to submit proposals, that's one of the options.
So there are three resources that are related to Expanse
There's the Expanse CPU, basically, which is the allocations on the compute part of the system Expanse GPU, which is basically allocations on the GPU we 100 parts of the system.
And then we have allocated project storage.
These are locations on the Lustre file system essentially and the file system is available on all the nodes of the system.
Not that it's a single file system that split between scratch and projects.
So the total allocated space.
That we can do on the project side is At least at this point is around 5 petabytes and note that we will have a per project limit of 50 terabytes.
To begin with, so that
You know, So in terms of,.
.
.
I should add that any storage allocations have to be justified in terms of why you need that for the duration of a project.
So it's meant to essentially keep data
for basically files that you can expect to need for the duration of a project.
So any scratch space should just go into the scratch section of the file system.
I have a few snapshots of the allocations page basically how you can log in and where the Expanse options are basically they will see the Expanse Dell cluster with AMD Rome and then the GPU and the storage piece.
So to summarize, basically Expanse will provide a huge increase in performance and throughput compared to the Comet machine and it's essentially an evolution of the common design.
Some innovations in cloud integration and composible systems and some of that work was started earlier and kind of continued to evolve into the production side of Expanse essentially.
We will have integration with open science grid.
So that's another piece that would allow jobs on the machine.
And we looked at the, the architecture of the system basically the 728 compute nodes and 52 GPU nodes and the HDR100 interconnect on the nodes.
As I mentioned, it's going to go production in a couple of weeks, essentially, and you can follow all things Expanse at this website.
So I think that puts me right are
Almost 11"30 so we can probably do a few minutes of questions.
Let me see if I can pick out the chat.
I'm going to stop sharing and look at the chat.
And Marty and Trevor have been handling it except for the last question.
There's a question specifically about the project storage.
So Ron you had the thing that your projects, but the directory is different and another Yeah, I think the question about the project storage is it's possible.
I mean, these directories are created manually right now.
So, okay, it's possible if you're if you're already on the system.
It's part
Of the early user Access Program.
And maybe we have not created it for so if you want to send us a ticket.
That would be the best way to go about that.
And I think Paul.
There was a question on your question on Globus yes we have an XSEDE#Expanse endpoint setup so you can use that for globus transfers.
And we have enabled sharing on that now in on specific directories.
So you can also do Globus sharing on that end point.
So Mahidhar.
Can I ask a question.
Sure.
Yeah.
So could you give me a little bit more explicit sense of how
a Comet research allocation right now transfers to Expanse, I mean, I would just assume that like within a couple of weeks.
If we have a research allocation on Comet will be able to run on Expanse is that
Is that right.
No.
Actually, that's not true.
So, what we are doing is
They're people have actually asked for Expanse explicitly in their allocations and because this has been available in an allocation cycle for a while.
I think it's been available.
The last two allocation cycle.
So there are actually people with Expanse allocation specifically
Comet users who who have an allocation that goes beyond the end date of Comet which is true for a lot of people.
We will transfer them to Expanse when we get closer to the end date of Comet.
Now, you can always
Request a transfer from Comet to Expanse, if you like to use Expanse and that is totally fine.
Or you could ask for a supplement on Expanse on that is also fine.
I see in it.
But we are not automatic transmission in every Comet allocation too expensive.
That's the question.
Okay, good, good.
And so that that kind of transfer is how you, how you do that is found under
So it's like a regular XSEDE transfer between resources.
Okay, great.
Thank you.
So I think there was a question of what whether there's a purge cycle for scratch.
Yes, we have a 90 day purge from create time so
It's and the project spaces where you would keep something that you need for the longer term that is for the duration of Allocation I should say both the locations are not backed up.
So if you have critical data.
We always urge you to make off site backups of anything that's critical.
Alright so I think I will transition to the next talk which is running jobs on Expanse.
Go to the front of this thing and start sharing again.
So again, for completeness.
I have a system overview and
again in this slide deck, just in case somebody picks up this slide deck offline.
So I have a system overview and then we'll just look at the login info and after that what I'm essentially focusing on in this talk is looking at the SLURM scheduler.
The partition info and how things are a little different from Comet and how things are similar in some cases.
Then we'll look at just running jobs examples for MPI openMP and hybrid cases and GPU jobs.
I'll go over the file systems, a little bit and also give you some example script.
To use local scratch usage.
So that was architecture thing I was talking about.
So since we just went through that I can
quickly go over it in case somebody just joined.
So basically Expanse has 13 scalable units, 728 compute nodes with about 93,000 plus cores
And then 52 GPU nodes with four V100s connected with NVLink.
It's for a total of 208 V100 GPUs there is HDR 100 connectivity into the node and then the switches on the top of the rack for each of these racks rights that are basically HDR 200 switches.
So overall, we get a three to one subscription between the nodes.
So let me skip through that.
So logging into Expanse, this is basically very similar to what you're doing on Comet
The allocations are different for CPU and GPU resources, just like on Comet so like when you get an XSEDE allocation, you would have a separate application for each however the login is the same so you can log in to login.
expanse.
sdsc.
edu
So if you have an SSH client.
You can just directly SSH using basically your username.
I actually, I think I missed one question on the previous talk where somebody asked about if the allocation IDs will be the same.
If your project is the same.
Yes, it will it will be the same, but there is a tool and I think Nicole's going to talk about it, which basically lets you
Get information on What your allocations are so anyway and your user.
And the reason this came up to me is basically the username is also basically going to be the same as what you heard on comma
So you can Directly log in and use your XSEDE portal password or And this will basically put you on one of the two login nodes login01-expanse.
sdsc.
edu and login02-expanse.
sdsc.
edu both nodes are identical
You can also log in via the single XSEDE single sign on hosts or login.
sec.
org so you can basically SSH with your XSEDE portal username to the login.
xsede.
org which has two factor.
So if you've got your two factors that up with Duo there.
You should be able to log in there and then GSA SSH to login.
expanse.
sdsc.
edu.
One other new thing on On Expanse is we will have an Expanse user portal.
And so we're going to talk about that later today.
So that's another mechanism of accessing Expanse, you can
Basically submit jobs from that user portal.
You can also use some of the interactive
Programs through that and there's more details coming in Subha's talk there.
You can use SSH keys.
To do essentially enable access from Authorized hosts without having to enter your password but Make sure that you have a strong pass phrase on the private key on your local machine and don't copy that key around keep keep it on your local machine.
And you can always do.
You can use SSH agent to Avoid repeatedly typing the private key password on your end, but Whatever you do, don't do a passwordless thing because that's a huge security problem and And if you are For some reason, like to have a programmatic thing that connects Via SSH very frequently.
We do have a limit on that and you might get blocked for a little bit, but if you do have this use case, please.
Email us so we can give you an option that lets you avoid the problem and there is a security webinar that Scott did a little bit ago that kind of goes into some of these things from a Comet perspective what most of that applies on Expanse to So the first thing is the appropriate use of login nodes.
So login nodes are meant for like file editing simple data management, like, you know, moving files around or
Copying some small files.
Remotely that's okay, but The main thing is you should use using minimal compute resources.
So don't run
Jobs on the login or even if they're meant to be short, because if you do that not enough to slow the login not down so please avoid running on the login node.
And then all Computationally demanding jobs, even if they're interactively run should be going through the queuing system.
So don't use it for anything computationally intensive Don't use them to run some intensive workflow management tool because that's going to cause problems too.
And essentially If you're doing some data transfer.
If it's a small amount.
That's okay.
But if not, please use one of our data more nodes for doing any intensive data transfers and preferably you should use Globus for the big transfers, because it'll give you an efficient way into the system.
And do not run Jupyter notebooks on the login node and Mary's going to talk a little bit about how to Run those appropriately and also don't set up servers on the login node that provide services to the outside world, because that could lead to issues to.
The one other big difference, I think, between Comet and Expanse is The GPU nodes.
As you may remember from the earlier talk basically have different CPUs.
So if you're doing compilations on the GPU nodes, it's best to request an interactive session and do compilation
Since this whole CPU is different.
The other thing that's different is the stack on the software stack for the GPUs is different from the software stack for for the CPUs so
So that's another reason to essentially get a GPU node allocation and then do the compiles there.
So in terms of scheduling so Expanse uses SLURM, just like on Comet.
It's a much newer version of SLURM.
So there are additional features and additional strict enforcement of
limits and so on.
So I'll talk about that a little bit.
But the usual SLURM commands will work.
So, submit jobs using sbatch.
You can check your jobs using squeue.
I encourage people to use the -u option so that you're not querying everyone's jobs at the same time.
That thing slows things down and also causes an additional load on the scheduler, it's easier for it to deal with specific requests.
So
And if you want to cancel a job, you can use the scancel command.
So let's look at what similar to Comet and what's different from Comet in terms of the partitions.
So just, like, Comet, we are going to support both exclusive and shared partitions.
So in terms of what's common with Comet, like you have the compute partition, which is essentially CPU nodes exclusive access One real thing to keep in mind is If you make a mistake and put a shared Like two core job on a compute node there's a huge cost to it.
So please be careful in terms of how many cores you need.
And if you really need the 128 cores from the compute partition because you will go through your SUs real quick if you
Make a mistake and run on the compute partition.
And we've seen this happen on Comet and was the typically catch that but
It can lead to A much bigger penalty in this case because of the higher core count And we obviously can't refund that because the jobs, the jobs are blocking those resources when you do that.
In terms of shared partition.
So these are shared partition has four CPU nodes that now and for jobs that need less than 128 cores.
And similarly, on the GPU side we have the GPU nodes with exclusive access in the GPU partition and then the GPU shared partition.
If you need less than four GPUs.
And just like Comet, we have a large shared partition for the large memory nodes.
And a debug partition for short tests and quick access So what's new on Expanse.
So a few things.
One is there's a GPU debug production, which I know will make a lot of users happy because it was one of the things that was asked for a lot on Comet
So we do have a GPU debug partition.
So this is essentially for a short test and compilation tasks.
We also have preempt partitions, which basically discount Job a discount SUs used and so that he can run on the nodes if they're free.
But the jobs can be preempted by jobs in other partitions like compute, shared, for example.
Similarly, we have a GPU preamp partition.
Minute, can I jump in here for a moment.
Y'all, there's been a lot of activity on the chat and it can be a challenge for some of our own tracking both at the same time.
So if you have a minute or two, can you review briefly what's been going on.
Some of it has already been covered by you.
But you know, the questions.
Let me go back.
I might have to stop sharing and go back to the thing.
Okay.
Let's see.
So I think there was a comment about Switching to node hours on the SU definition.
I think the reason we didn't do that is because of the shared partitions basically
Because then you would end up with fractional SUs which I don't think we have set up to deal with right now.
And then So there's a question about memory usage being enforced for shared nodes as it is, and I'm going to talk about that a little bit further down So there was a question about The transfers basically going between Comet to Expanse So if, if you're intending to transfer basically in the March timeframe.
You don't really have to do that because we're going to do that.
I got transfer Comet was more meant for people who want to immediately use
time on Expanse, since we are not automatically moving the Comet time right away.
So if someone has Comet time today and wants to use Expanse mid-November, they would like they would have to put a transfer in.
So yeah, I'm not sure how much more time, you need to spend on that machine taking a bit of for people to get caught up on.
Oh yeah, I think hopefully everyone's caught up and yeah And Marty and Trevor have been doing a great job.
Thanks.
So back to the partitions so So this slide basically and Nicole is going to cover this again when she does the charging aspect of it.
But basically what we
Have To share your slides.
Oh, Thanks.
Alright, good.
Looks good.
Now,
Yes.
Okay.
So yeah, so this is a quick overview of basically the limits.
So we have a 48 hour limit, just like on Comet on the partitions by by default.
So some of the Limits are big.
One thing I should say is that these limits are what we kind of thought of looking at like the performance numbers and kind of
The size of the jobs that typically run and we kind of put those in But they are flexible and if it turns out that some of these might need to change.
They will be changed, but right now this is these are the limits we are going in with.
So the max number of nodes you can request on a compute partition is 32 so that gives you basically 4096 cores.
And so that's quite a bit more than what you had on Comet in terms of number of cores per job that you can run And so you can have maximum of 64 nNodes running your particular groups jobs and the reason we have these limits is because 64 nodes is essentially nearly 10% of the system.
So we are trying to balance the system out for all the users.
So so similarly on the shared compute side we have the equivalent of 32 nodes essentially if you run once core or So you want single core jobs, essentially.
So you will end up with 4096 as the max running jobs similar things on GPU side, we have a max of four nodes
For a GPU job.
Remember, these are the V100 nodes which are quite a bit faster than what we had on Comet so equal to 8 nodes on the Comet side.
So that's where we came up with the limit.
But this is something that may change it depending on what the utilization looks like and
How the queue at times look like.
One thing I should note is that the GPU nodes have already been fully allocated based on what I've seen of the allocation request so far so
So, these, these will be busy I think.
And then on the debug partition.
We have a max of two nodes per job and the two nodes is so you can test the MPI job essentially
We Deliberately kind of stuck a lower run limit on the number of jobs because We have noticed people kind of trying to game the debug queue, a little bit and us and roll through a lot of jobs will be our lower limit.
And as I mentioned, we have the GPU debug partition now.
So a few other things, basically, on the Expanse Scheduler side.
We have a few required party parameters.
So, first thing is the partition the --p option or --partition option we we would like your portrait in explicitly on Comet.
I think we defaulted to compute.
So this is something you would want to put in
You will need anumber of nodes and the number of tasks per node or the number of N tasks, right, one of the two needs to be specified.
SLURM will pick and tasks.
If you
Specify that And The wall clock time It should be specified Explicitly and then are two other things.
One is the account.
Unlike Comet.
We are not going to pick our default.
So, you will have to explicitly set this account.
On Expanse, and then on the GPU side if you remember on Comet, we were setting --gres So on Expanse it so --gpu, which is the total number of GPUs needed by the job.
One other critical thing to note and this goes to I think it's a question on memory.
So --manda shirt CPU or GPU one of these parameters, they are not required, however, the default is quite low.
It's one gigabyte
per core.
So, it is recommended basically to explicitly request this even on compute and GPU partitions.
So you're not going to get 248 gigs on the compute partition.
If you don't ask for it.
So in this version of SLURM is quite Strict with all the I mean the way of your configured.
It's quite strict with basically enforcing limits, including
Like the topology of your request.
And I'll talk about that in a bit.
But this is a big change.
So you kind of need to get in the habit of requesting some memory parameters.
So it could be, mem, or mem-per-CPU or GPU.
So I'll start off with the interactive jobs and you can do is run just as you did on Comet and that will give you interactive access to the compute nodes and then you can run through the scheduler, whatever you need to So again, one thing that I added in red here is basically you need to specify the account.
So let's go into some sample scripts to see if anything's different essentially compared to Comet, so simple Hello World MPI job here so If you look at the parameters.
Most of them are pretty standard right you have the job name, you have the output file.
Usually we tag that with the job ID and the
Name of the first primary compute node so that it's easier to track down issues later for us, like so, we recommend doing that because if you have a problem with a job.
The first thing we ask you is, what was the job ID so
It's nice.
If it's in the output files name.
And then Yeah, you have the partition.
Number of nodes and the task per node, in this case, we could do ntask if you want And I highlighted the memory per CPU.
So in this case I put in 1800 megabytes.
That's 1.
8 gig essentially roughly per
Per task and per CPU.
In this case, and then the account, as I mentioned, is a required quantity now.
I'm not going to go into the environment piece of it so much, but since Marty is going to talk about it.
But in your job scripts.
I think you should get in the habit of basically starting off clean and loading exactly what you need.
So do this in your job script instead of, instead of your .
bashrc, especially on Expanse because
The GPU nodes and CPU nodes are different stacks and you can't really load both of them in the .
bashrc so do not put any module loads in your .
bashrc unless you're absolutely sure that that's not going to conflict with something else you're trying to do in the batch script.
So, so in this case, basically.
As you can see I'm purging and now one other difference, and I should have made this right and I will, in the final slides is you do have to load the SLURM module.
So SLURM is not in your default path like as it is on Comet, so you will have to load that module.
And that is needed if you want to use a command like srun in your MPI job, for example.
And also for MPI runs because they are integrated with the scheduler.
So you would need module load slurm, and then you the module load cpu here.
What it does is sets the model path to the CPU stack essentially
So as I mentioned, it's different on both sides.
So you have to load the rest and then it's like the usual like you're learning a GCC compiler and an open MPI compiler here.
So we are doing srun here to launch this MPI job.
You could also use MPI and I have a different example for the art.
But one thing I should not.
Is that right now.
We only support.
Actually, we will only support.
MPI launch mechanisms that are integrated with the scheduler.
This is because on a shared node, if we do an SSH based thing it could get real complicated so sorry.
So remember that you have to go through
The scheduler.
Either way, is that if it's an srun or an mpirun
That's integrated So the one other thing is, basically.
So let's look at an openMP job and I will talk about this piece.
So you can do partition equals shared, nodes equals one and task per node one and CPUs 16 know here I should caution.
This is, again, different from Comet so on Comet.
You could have done ntask per node 16 and then launched an openMP job and it would pick up the 16 cores, but on Expanse, you probably will end up bound to one core.
If you do that,
So explicitly ask for what you need.
So, so in this case we saying it says an openMP job.
It's going to have one primary
Task, and then it is 116 threads.
So what you're saying is, do an aspirin or as one ops and CPUs per task is 16
And then I'm basically doing a similar thing.
Setting up the environment and then setting the openMP threads to 16 that matches the CPU task ask and then run the code.
So for hybrid openMP and MPI jobs.
I want to go back to that NPS for thing.
So because
As, as I mentioned, you're basically partitioning into for new model no domains in there.
And the memory is interleaved on the two channels there.
So if you're doing hybrid jobs.
It's important that you try to fit your jobs in a way that basically lays out the openMP tasks in the same NUMA domain and doesn't cross domains all so so it's very important for hybrid jobs, essentially.
And we are developing an ibrun script which be available when machine goes into production.
So you can use ibrun on
With option.
So this happens under the covers for you.
But it's important to remember this, if you're writing your own way of launching jobs or
scripts that You're developing so because they could make a pretty big performance difference if you cross NUMA domains in this architecture.
So I have a couple of hybrid MPI openMP Examples.
Let me take a quick look at the time, we're doing so have till 12:15 right
Correct.
Okay, and you're doing fine.
I think
Okay, I think we're doing okay.
Okay, so this is an example of a hybrid case.
So basically you can see this is a shared partition job with one node and two task per node and then I'd said CPUs per task is 16 so you will have two MPI tasks and 16 threads on each of them.
And so I do the export here and I'm doing a basically a pin domain on Option with the Intel MPI, which lets you say that, hey, you have an open MP set up here with the compact.
Pinning That we want to do so that lets you kind of Lay out the tasks the right way.
Now you can do other things like spreading and all all the options are available and you can look at it.
But what we have seen with codes on
At least the ones we have tested so far is that it's good to align with the L3 cache that's been shared on the four cores.
So if you can Do your openMP threads so that the four of those threads are on one CCX that will work well.
This is another example, basically similar setup, but now I'm using OpenMPI so I'm doing a different thing.
And here you can see I'm doing a map by L3 cache option.
So the next, next thing I want to look at is GPU nodes.
And then compare it to what we were doing.
On, Comet, so the partition is GPU shared, which was similar to what we had on Comet, and I'm doing a nodes equals one and tasks per node equels one, account and all that.
Okay.
The difference here is now I'm doing a --gpu equals one.
Which is different from Comet where you would have done --gres-gpus equals one And I'll talk about the multi node thing where the difference shows up.
So in this case, this was a
OpenSSC job on the GPU node.
So I'm purging everything loading pgi which has the openSSC support and running the job.
So the multi node job is a little different.
Again from Comet.
So what you have is nodes equals four, ntasks per node equals four in this case I'm basically matching the number of tasks with the MPI with the number of GPUs.
Not necessary to do this.
And just as an example with CPUs.
First off, does that to 10 so you have 40 total calls on the node.
And then I'm basically setting the --gpus equals 16 so this is different.
Right, we're not setting.
Something equals four.
Like you would have done on Comet where it was a poor not basis you are.
This is the total number of GPUs.
So you're requesting
And then the rest of it is similar.
Yeah, you can use us around to launch an MPI job.
So, The last piece I want to kind of go into is the file systems so When you log in, you're on your home file system.
Basically $HOME will also get you there.
The home directory is limited in space, like I said, it should only be used for like source code binaries and maybe small input files and You can have your job script here, but make sure that the job script is about that.
It's not writing into the home directory, but writing into Lustre, a local scratch for the actual IO
You have a core of hundred gigs on the home file system.
And can't stress this enough, don't run any intensive IO to or from the home file system.
This is not set up for high performance.
So it's not the place for anything intensive
Like I said, you could have like simple, like, you know, standard out kind of output go there.
That should be okay.
But if you have any parallel IO or
In our file per core IO that does large scale stuff.
You should move to LA state or local scratch.
So talking of Lustre.
So we have basically that should be a singular there.
It's a parallel one file system.
So it's a global parallel file system with 12 petabytes and it's available on all the nodes.
There are two locations on this one file system.
So it's important to remember, it's a single file system.
The /expanse/lustre/scratch/$user, which we will purge for files older than 90 days based on create date and the Lustre projects location, which is the /expanse/lustre/projects/groupid/$user.
So, What we are doing here is, since it's the same file system.
We set a user base, quota for basically the whole file system and then track the group usage in the projects directory and will basically follow up if someone's over the limit.
And The other thing to keep in mind as we are limiting the number and this is true on Comet too or we are limiting the inodes to 2 million per user and This this limit is basically in place.
Because if you're writing
Millions of files, Lustre is probably not the right location for it and you should look into changing the workflow to using the NVME that's sitting on the files on each node.
And if your workflow does need intensive extensive small block IO like lots of lots of small files, you should contact us if you need help with using the local scratch.
I mean, this particular Lustre file system has metadata on NDT and can handle a lot more Iops, so it's it's a little more better equipped to deal with the small file I O, but we still recommend using NVME wherever you can So that brings me to the local NVME best scratch file system, all the Expanse nodes have NVME local scratch storage that this sizes vary based on the nodes.
That's OK.
So the regular compute nodes have a one terabyte disk.
It's about 900 gigs usable.
The large memory, noes have 3.
2 terabytes of local disk and about 2.
9 terabytes usable.
The GPU nodes have 1.
6 terabytes of local storage and 1.
4 terabytes usable.
This is excellent for like Iops intensive workloads, it's also pretty good and performance with large files too as long as they fit and the storage that is there.
So if you have codes are generated a lot of files on a per task basis for example, openform us to do that.
I think the newer versions, don't put the older versions of open form would
Generate five per core per variable.
I think so.
For something like that.
I would say you should modify things so that you use the local storage and only store like large tar files and the Lustre location.
And just like on Comet this Directory gets created when your job starts.
It's in the prologue that job specific directory gets created.
So it's a slash scratch dollar user little different from Comet its job underscore the SLURM job ID instead of just the job ID on Comet
Key thing is this location is purged at the end of a job.
So if you have any important data that has to be copied out and
Make sure if you're copying out and it's a million files.
You're not copying out million files.
You're tar'ing up that million files on the NVME and moving the tar file to the Lustre.
Or you can tar directly into Lustre.
But don't try to move a million files into Lustre that's going to
Probably not finish in the time that you have And will cause problems on those systems, right.
So, This is an example.
Basically, it's a Gaussian application that has a scratch directory settings.
So you can see how we use it.
It's pretty simple.
You just you point.
Your scratch directory /scratch/$USER/job_$SLURM_JOBID .
.
.
The other thing to keep in mind is the /tmp is not that large on these nodes.
So you if you are anticipating if you're using code that just
Keys off the tmp directory and writes to the tmp directory, please set your tmp dir, TMPDIR variable to this location so that you pick up the NVME scratch.
So I think I'm about five minutes out.
So let me summarize.
So basically we are using SLURM just like Comet.
All the jobs have to go through the scheduler don't run on the login nodes.
We support both node exclusive, and shared partition, just like Comet.
So you have a few extra ones to help you on the GPU side and also with the preempt options.
The account information is required and job scripts.
I want to add one more here.
You should Put in memory requirements explicitly also And then we looked at the file system options.
As I mentioned, we are going in production soon so you will all get to get on and try this once, once the machines available in production.
Thanks.
So let me stop and see if there were any
unanswered questions on the I think the mighty.
The last question from dawn is wearing where you can pick up
See Can you say something about dev sham and the size and speed.
Let's see.
Well, the speeds.
Want to be faster because it's 3.
2 gigahertz.
Memory so that message.
I'm sure do better.
Hopefully they're more channels per node.
Also, like so.
Let's see.
The Expanse node does about Marty it correct me if I'm wrong, but it's like 350 gigabytes per second, right on the 325 gigabytes per second on the
memory bandwidth total on the node I think compared to About 110 on Comet or there's a big, big increase.
Now if you're in a shared partition you're sharing that with everybody, so
You could end up with lower bandwidth per course or you really depends on what's going on your field and I shared on but on a regular compute node.
If you take the whole audio.
You have a lot more memory bandwidth
In terms of size, well it's a 256 gigabyte RAM.
So, you will have to work with that constraint in addition to what your code needs.
Let's see, Marty.
Was there anything else missing.
I don't think so.
I don't think I missed any other ones.
Okay, so.
But yeah, if anyone has to as there's there's one from Canada.
So instead of a signal to the job.
I think maybe the question for travel.
I love to check on that.
This shouldn't be.
I'm assuming
I mean, I mean, I think I can encourage you did.
But, so, I mean, Kenneth, I think we we've discussed this.
With with the systems group.
And so it's it's possible.
I think it's just a matter of how much time we can provide to cough allowed the signaling to actually happen in for you to catch it.
So it's sort of an open question.
So, so this is Trevor, I don't see the question to Cuba is the question about a signal for a printing Yeah, kind of like, though.
So there is the ability for the user to submit their job and ask for specific signal to be sent when the job is going to be preempted.
But there are limits to how early that can happen before present time occurs because The preempt grace, time is only five minutes, so you can't ask for a 10 minute notification because we only know INSERM that it's five minutes until preamp time so You know, you got to look at the settings on that and and figure out whether or not you can basically do all your work that you need to do to To check on your job within the time limit that you have, you basically think of it as probably About three minutes of time limit from that signal until you could just have the note disappear on your feet.
So, yeah.
Um, but you can you can definitely register your job when you submit it to receive a signal if it's going to be preempted.
And that will come as soon as firm decides it wants that node and then you'll get some some subsequent signals again later on.
But yeah, that that's totally
Doable I'd say I don't think many people are using preamp right now and probably aren't even using that feature because the users that are using presenting early user don't really care if the know disappears.
They've got workflow managers that deal with all that kind of stuff.
So
Hey, folks.
Thanks, Trevor.
I'm going to step in.
We're starting our 10 minute
Break.
Now although people are free to keep chatting if they want and asking questions and we'll come back at
Where Marty Kandes is going to talk about modules compiling and basic optimization for CPU and then GPU.
So that will go until about 125 so we're on our break.
But like I said, feel free to keep asking questions are answering them if you want to.
Thanks.
So
One quick addition to answer.
Don Jon's question about the mem diverse hmm So apparently
There was a gentleman said to half of the total memory so 128 gigs.
Okay, thanks.
Mahidhar.
I'm going to go ahead and pause the recording and we'll take a 10 minute break, stand up, get some do some stretching and we'll see you back again about 1225
I have a couple comments, then Okay, I started the recording.
Okay everybody, welcome back.
It's about 12:25 and we're going to go into our section on compiling and basic optimization and modules by Marty Kandes to Comet.
We typically post the chats.
So we'll have those online in a few days, along with an interactive some interactive videos and also the talks will be posted on the repository and I posted the link into the chat and will repost it.
At the end of the talks with that out.
Welcome, Marty Kandes, and he can get started.
Alright.
Thanks, Mary.
I'm gonna go ahead and share my screen.
Full screen here.
You Can go ahead and get started figure this out in a second.
So thanks for attending today.
I'm Marty Kandes, I work with under Mahidhar in that user Services Group and today I
wanted to go over what the module environment on Expanse looks like and how it's different from Comet.
apologize in advance that I have not completely fleshed out the, the, all the material for the talk today.
So I'll go over some of the material regarding
What module what the module environment does for you and how it's useful and sort of give you some best practices of how to use it because In my experience some users are a little inexperienced with setting up their environment.
And you can kind of give them a bit of a headache.
And so I kind of want to go over some
Sort of basic material that you can, you know, if you're not familiar with what the module environments doing it'll give you some better context of why we're suggesting you use it in a certain way and how to go about compiling code and then also running code as well so So I'll go over the module environment.
material and then sort of use the second half of the talk to do some interactive demos and show you how, you know, In real time how you can, you know, take say some code that you've compiled on Comet if you have certain, you know, if you've Documented sort of your build process there and how you can, you know, modify it to then compile your code on Expanse, and how the different sets of software stacks that Mahi talked about are presented to you in the module environment itself.
So let me close this chat so I can pay attention to the slides here.
So I want to start with some basic definitions, though, for, for people who who might not be familiar, so When you log into Expanse or Comet right the program that's actually running when you log into your terminal is the shell program right and so This is the fundamental interface that you use to interact with the system.
Right.
So every time you log in this the shell program is running in this is what you use to issue commands to
To the system to get your work done right.
So the common ones common Linux shells are sort of listed here.
The, the new default when you log into Expanse will be
Your bash shell and this is what's used by, I would say, majority of users on the system.
However, if you do have a preference on what your default shell.
Would be, then you need to contact us via the ticketing system with the change request essentially this default needs to be set on the system side for each user So when you log in and your shell starts up, you have a bunch of built in commands that you're probably familiar with.
Right.
So you have sort of
You know, the CD command to change directories that make directory to create directories, you know, the list commands the ls command to list the files and directories.
In your home directory or whatever directory that you're working in.
So this is probably pretty basic stuff that everybody's familiar with, but you know there is a point that I'll get to here in a second.
And when you are working on Expanse or Comet you know many people often have multiple Terminal windows open up and these are both, for example, I'm running two different terminal windows here on my local desktop in this image.
Where these are these different shell sessions are individual instances of the shell program running That has you know its own each, each one has its own state and environment that is set right so these although these two windows look completely same right there might be differences between the two shell environments, depending on what commands.
You've issued in them, right
Now there are it when you're working with the shell.
You can also set variables in with with the shell.
And so basically a shell variables, just a
String that you can attach a value to and they can have different types.
You know, I'll show some examples.
During the compilation of using shell environment variables.
When compiling code.
But most importantly, one of the things that's shell variable is only sort of bound to that shell session.
So if you have for example in the last image.
Two separate shell windows open and you've defined a variable in one and not in the other while you're sort of maybe doing some deep interactive debugging or compiling your code.
Those environments might be different.
So the shell variables are only local to a specific instance of your shell that's run
And so, for example, one of the things that show environments we usually use for is for, you know, very temporary data that might change pretty quickly.
And so
One example of this is the present working directory environment variable.
And so, for example, I'm showing here a bit of code at the bottom where
I'm just, you know, Echo, I'm logged into Expanse I'm echoing the preent working directory.
And when you log in your default directory is your home directory on the system.
But for example, if I then change to the Expanse Lustre scratch.
Directory for my my username, then the president working directory changes to that right, these, these variables are used to, you know, track the changes that are going on where you are in On the system and so that the shell knows you know when you run an ls command.
You know what directory you want to run the ls command on.
Right.
So this is pretty basic stuff.
But, you know, just to give you some context that
This is what some of these definitions that you should be aware of.
Now environment variables are sort of the key to what the module environment is going to be controlling for you.
And environmental environment variables are just shell variables that are sort of export into what we call the shells environment.
And this allows
Other processes use or commands that you are running through the shell program to see these environment variables.
So unlike just straight shell variables.
The shell variables are only visible to the shell program itself, environment variables are then visible to the other you know applications binaries.
Other commands that you might be running from the shell to, right and so Effectively also one of the main uses is the environment variables are used to sort of configure how your shell responds to the commands that you issue.
And you know I would say in general environment variables are unlike regular shell variables, environment variables are used to store.
More persistent data that isn't going to change much during your shell session.
Okay.
So, for example, I'm giving the quick example here at the bottom of your, your home directory.
Right, no matter where you are on the system.
Or whatever program you're running, you're probably going to want to know where your home directory is on the system.
So this is sort of a static environment variable that really shouldn't change ever for you right
Now the when the shell starts up there and you log into this when you log in the system, the shell and it goes this sort of initialization phase on startup right And it's usually a multi step process that actually there's some configuration files that some of you are probably familiar with that are looked at by the shell and use to configure your environment at start right so there's going to be a default set of environment variables and shell variables that are in your in your shelves environment when you start up And I won't probably go through all of the different initialization files that you can use, but I'll cover the high level ones that we Are most commonly used by people.
So there's, I think I have a list of seven in the next few slides, but there are system wide shell initialization files and these are the ones in the /etc directory so /etc/profile and /etc/bashrc
So these are launched basically by by the operating or the shell looks from these configuration files.
First, these are sort of system wide settings that
We control for you and you know may put things in there from time to time to simplify sort of your default environment on the system.
Now you can also
Control what your particular shell environment looks like by modifying some of these other configuration files or creating these configuration files in your home directory on the system.
So the two most common ones that people would modifies either the .
bash profile.
If these are obviously for the for the Bash shell.
The names are different for different shells, but the .
bash profile is one and the .
bashrc is probably the most common one that I think I see
users use on Comet and systems in general.
So
The thing about the different.
The reason there's so many different configuration files is there are different types of shells when you
When you're running on the system essentially right so this .
bashrc.
You can see this is execute when they
Non interactive bash shell starts.
And so for example when running a batch job script on Expanse.
So this is why this one's pretty important.
So if you modify
Your .
bashrc file to change sort of the configuration of your shells environment this is always going to be run for any batch job that you run on Expanse.
And so this is why I kind of wanted to go over this sort of set of
Definitions and provide the context.
if you try to modify your module.
Try to modify your, your environment.
Through these configuration files, you have to be aware of when they're going to be executed.
We see a lot of times people modifying their .
bashrc file.
And getting into trouble when for example they take you know they're doing one type of workflow for a certain set of Calculations or simulations and then they switch to a different code, the different application for a completely different set of work.
Maybe some post processing stuff, but they haven't changed
How their .
bashrc was set up for the first workflow for how it needs to change for the second workflow.
Okay.
And so
Even though I'm going over this material.
The, the key takeaway is really
To not use these configuration files.
This is what the module environment on HPC systems is for.
It's going to allow you to easily modify your shell environment on the fly for each individual job type, you might want to run on the system.
Okay and
This is sort of a little bit more information that didn't change the title, the slides, but this is As I mentioned, there's different types of shells and when the different configuration files are executed depend upon whether it's a non login or login shell or an interactive versus non interactive shell And so I'll clean up these notes for you online.
But I don't think we necessarily need to go into it, unless there's questions but
The, the key one is for when you're running a batch job script or a bash shell script in general is a shell script is always running in a non interactive shell.
Right, and that's why this modifications to your .
bashrc will affect any scripts that you run on the system.
Right.
And that's why the key takeaway is
If you aren't if you're going to run multiple applications with different that require different environments on the system really don't want to modify Your .
bashrc.
I see.
Otherwise you're going to have to change it for every single job that you need to slightly change the software environment around it.
Right.
And so that's sort of my
Overall, Perspective on the issues I see with with users using their .
bashrc.
Now the module and environment.
You'll see is mostly modifying both environment variables that are required for a certain piece of software.
But the two most important things that is modifying is your path environment variable.
So if you're not familiar with the path environment variable.
Right, this is what is allowing when you run a command in your shell in your terminal right the operating system needs to know where to look for that command.
So if you've compiled your application and you have the binary sitting in some directory in your home directory.
If you haven't told the shell.
Program about where to look for that primary it won't be found.
You can't just run the command and expect the operating system to know where the binary is right and this is the same issue with any command that you're going to run from the shell right and so
The, the module environment.
One of the main its main purposes is to control your path changes for you so that when you load a module.
It tells the shell, where to look for the application and all of the dependencies that are required to be available in the path for you.
And this is probably the second most Modified variable that you'll see that the module environment is doing for you.
So the LD library path.
If you're not familiar is basically where
The, the shell's going to look for when you have to, when you go to run an application.
If it's compiled against say shared libraries.
Again, just like just like the operating system needs to know where to look for the binaries.
it needs to also look for the libraries that
It's going to call out to right and so it's, again, this is what the module environment is actually doing for you, under the hood.
It's trying to simplify all of these path changes to set up the shell environment to
You know as easily as possible to get you a simplified user interface to running the applications that you want.
Now on Comet what we used was the environment module system.
So this module system is one of the module systems that are out there that allow you to set up a nice command line driven interface to make all of these changes to your shells environment right And it's particularly you'll many people who are familiar with running HPC systems are running on HPC systems, you probably use environment module before.
If you've run on Comet.
This is the environment module system that we've we've been using there and it's a little bit of an older version of what's been used before.
But if you're curious about, you know the details right there's the link to the URL for the projects.
And so the module environment allow it gives you the module system essentially gives you a set of command line tools to You know, navigate the software that's been installed on the system and is Manageable by the module system itself.
And so these are the module commands that you can use the module systems before you're familiar with these are probably the ones that I use the most that I just listed so module list.
Allows you to look at what modules are available or what modules are currently loaded on the system module avail can be run with or without this package input to look for what's available in the system, module purge allows you to sort of clear any modules that you've loaded And again, Which is the next commands.
If you want to actually put a specific package into your shell environment, you do a module load.
If you want to take it out, you do a module unload
And then a module show is basically to get more information about the package itself.
For example, I'll show you that it tells more information about
How the how that application may be compiled for example.
So I have examples in here.
Any other to kind of not use one might switch over to doing some interactive work before I get to the new module environment system.
Let me go ahead and boot up a terminal here.
And log into Expanse.
So let's just go over the module commands.
on Expanse at least the ones that are going to be exactly the same as you ran on Comet.
So, for example, my first one.
The list was a module list.
This Right.
And so when I log into Comet or so when you log into Expanse, you see here that there is like on Comet, there is a default set of modules that are loaded for you.
And so it's this shared module.
This cpu 1.
0 module and this default modules module, they're
The only one, you really need You know, is the CPU one both which we'll talk about in a second.
But let me just go through the other commands.
So the next one in my list was Module avail So for example, if you want to know what compilers are available on the system.
For example, if you're interested in the GCC compiler, you can just run a module avail gcc.
And it'll give you a list of all of the GCC compilers or perhaps if GCC here in this example is shown in the name of the module name.
It'll also appear here so you can see here we have four different versions of GCC shown here and two are actually the same version.
And I'll explain that.
In a bit here.
But then you can see if you just have GCC in the name that you're searching for.
It will also show other packages that might be there with GCC in the name.
So for example, this looks like we have a netCDF and an openMP library that were compiled with GCC on the system.
Okay, the next one was probably module purge on my list.
Right.
So if I have some default modules loaded the module purge command will unload all of the modules that have been loaded for you right
So now I have a clean environment that has none of the other environment variables that may have been set by those other modules in the shell session.
Right.
And so let's go through a quick exercise and show you how the module environment is, how the how the module system is actually making changes to some of those environment variables that I mentioned, like PATH and LD path, right.
And so if I do an echo.
I can see my path.
And my LD_library.
Which has nothing right now.
And so if I want to say compile code for For say the AMD compute nodes.
What we've set up in the module environment.
Is Maybe let's do maybe a module show while we're here, I don't know if Trevor's actually put a description on here.
Maybe.
But yeah, yes.
So it looks like this module CPU.
1.
0 is basically to load the software that has been compiled for the AMD compute nodes on Expanse supposedly.
So this is kind of your
meta, meta module that you'll load to get access to all the compilers MPIs, math libraries that you should use for Expanse AMD compute nodes.
So let's go ahead and do that.
So we'll go ahead and load CPU/1.
0
And if you do a module avail, you can then see What packages are available.
And so I think as my direct mentioned in his last talk, we're using Spack to build and deploy all the
Packages on Comet for users.
And so this is probably a good point to explain what's going on here with all of the modules that you can kind of see on my screen now.
So the module load CPU 1.
0 loaded all of the packages that we're showing here in this /cm/shared/apps/spack/lmod/.
.
.
.
core set of packages.
Okay.
And you can see below it, though.
There's also a bunch of other modules, where the CPU 1.
0 module lives.
Actually, and these other modules are essentially right now in visible to you, these are sort of standard modules that come with the
System Manager that we're using to sort of deploy all the system software.
So we're actually I think we're going to try to clean this up a bit.
Because essentially, you don't need to look at these CM local CM shared modules here that are sort of extraneous and
For the most part, for example, this is where when I did the module load GCC or module avail GCC.
We saw two versions of the GCC 920 compiler.
So one was coming from the Spack installed version, which you should use if you want to use GCC 9.
2.
0 but there's also another GCC 9.
2 here in the CM local that
You could use, but this is really for the system administrators at this point right now.
And so hopefully we'll get this cleaned up.
So there's not too much confusion.
However, I will mention that there are some modules that you will want to potentially use in the same CM local
And in the CM shared module environments.
There is obviously the CPU that one point knows the GPU that 1.
0 which are these these meta module files that we've created.
You'll also if you're using singularity, singularity is visible here in the in the CM local module environment.
There's also in the user share, we have if you're using Globus, this is one that will likely stay in sort of visible module shown to you.
And then also, if you've looked at the Expanse, and
Expanse user guide, there's also the, if you want to check your accounting in allocations.
The sdsc/1.
0 module file here will load sort of this Expanse clients package that you can use to check your application usage essentially
So personally, I think it's still a little confusing for some people but this is sort of if you're getting on if you're on the system.
Now, or on the system.
Soon, you might still see
Some of these other modules and just want to indicate not to get confused if you're if you're trying to compile your application for either the AMD compute nodes or the Intel based Nvidia GPU nodes.
You the two packages that you really want to focus on first loading is either the CPU 1.
0 and then the GPU.
Okay, so you can see here that we have the CPU 1.
0
Module loaded.
Now if you want, I think, as I mentioned, to Ron in the chat earlier if you want to compile for the AMD compute nodes.
The GCC 10.
2.
0 is the compiler that you'll want to use this is the one that will have the Zen2 optimizations available, and I assume if you say Ron was asking about the dash
equals native it should hopefully pick that up that these are AMD compute nodes and use Zen2 nodes and use that as the Architecture compile against right so Just kind of clean this up.
So as I said, right.
Right now we have just the
CPU 1.
0 module loaded
And what we're going to do now is module load GCC 10.
2.
0 and hopefully we'll see the path changes that have been made, and you can see now, you know, prior to loading the GCC 10.
2 module, the path in LD library path were different.
And now if you want to just say run GCC.
Right, it's that version of GCC is the first GCC found in your path.
And that's the one that is available in your shell session to compile code.
For example, if I do a module purge And then figure out which GCC right you'll see that you get the system GCC so GCC 8.
3.
1 is the default GCC for basically CentOS 8.
Right.
And so this is this is why the module environment exists to help you make changes to the path that you need to get access to the different versions of software.
And so that is you know that's that's what the module environment is doing for you.
And this is why
If you have different applications.
And you're going to compile them in different ways and then run them, you really don't want to put the modules load commands.
In your in your .
bashrc file, you really want to put those commands in either say a build script which will look at shortly that you want to use to compile the code.
Or when you go run the code, right, those module loads have to be the same.
So you actually want to just load the modules in your batch job script itself.
Right, otherwise you're going to probably run into some headaches, if you, you know, forget to change your .
bashrc before you run a certain certain a certain certain application, right, if you're switching from one workflow to the other.
Right.
Okay, so maybe I'll stop here for a second and ask if there's any questions about anything that I haven't covered in the module environment I think I just remembered one point that I definitely want to make
But maybe I'll stop here for any if there's any questions.
No.
I guess.
They don't see the chat so Yeah.
So Marty, there's been a lot of questions and
Trevor and you're doing a great job answering them.
I don't think they've been stumped.
So it's quite active
Okay.
Okay, good.
Good.
Okay.
Um, okay.
So we've sort of covered these basic module commands that you're probably familiar with on Comet.
So I want to just highlight before we switch over to actually
Showing some examples of building code on Expanse and interacting with the module environment to do that in the changes that I would have to make this other package other piece of software that I've compiled before The last, the key thing that we need to point out is That the module environment or the model of system used on Expanse is different from Comet right so on Comet, we use the environment module system that I had a slide on before briefly.
Which is, you know, one version of a module system that you can use to manage all these paths changes and access to software.
On an HPC system.
Lmod is a newer system that was developed at TACC actually so And one of the things it does nicely, is it tries to help the users avoid mistakes that could be made with The environment module system that was available on Comet and, in particular, the way they do this is through a hierarchical module system essentially right where essentially if you want to compile a code with GCC tended to like we just showed.
And it's an MPI code you need to then load an MPI module.
However, on Comet
You know, you could load a compiler Module one version of a compiler module and an MPI module that weren't compatible necessarily right So usually what we do on these HPC systems and once we cross compile we compile multiple versions of the MPI for a particular compiler.
Right.
So if you really do want to have access to openMP
Intel MPI and MVAPICH2 to you really have to compile a version of each of those for each compiler that's available on the system.
Right.
But the one problem with modules environment.
If you didn't have all that all of those different versions available for a particular compiler that you wanted to use you could get caught up by loading the compiler module that did not have a compatible version of
Say that version of MPI that you wanted to use.
Right.
And you'd probably send us tickets.
Like, why can I get this particlar file or why isn't this running.
Also, for example, one of the common problems would see is
Users would load multiple versions of MPI in there in the .
bashrc or actually in their, their batch job scripts and that would then confuse you know your application which MPI to use and so LMOD
Allows Uses a hierarchical system to avoid these types of problems where you can only load the types of MPI that are available for that particular compiler.
Right.
And so this is kind of what's what's nice about Lmod is it helps avoid these types of problems you might have run into in the past.
On, Comet
Now, The one I would say You know, the one thing that is going to be less familiar to you is, is how that happens like in practice, maybe I think a lot of people probably use Lmod on different systems.
So they are probably familiar with it, but I'll just go ahead and show you what this sort of means in practice.
Open my terminal again.
Okay, so And So what do I have loaded.
I have no modules loaded in the shell.
So let's go ahead and load.
Let me go ahead and do this right So if we do a module.
So if we if we start with a clear module environment right and do a module avail
Right, you'll see that there is no Spack modules anymore.
It's just those default modules from the cluster management software that I mentioned before, right.
And so that's why these sort of meta packages exist in the in in the management modules, essentially, right.
So if you load CPU one that's
That's what gives you access to the Spack based installed modules, right.
So let's go ahead and load.
So we see the meta packages load the correct Spack environment that you want for either the CPU or GPU nodes, depending on what you want to do.
And then let's say we want to build an application with GCC 10.
2 so if we do a module load GCC 10.
2 do a module avail you'll then see that there is another sort of sub module environment here that was created with all these packages.
That these packages were built with GCC 10.
2.
And so if you as you can see we have
A fair amount already of different applications in here because this as I mentioned GCC 10.
2 is the one that has the Zen2 optimizations that you can
Take advantage of.
But again, we also have multiple versions of MPI, like I mentioned, but what's nice like I said about the Lmod system is your only presented here the certain types of MPI or math libraries that are associated with incompatible with GCC 10.
2.
So let's go ahead and say this is an applications API application now to build and I want to use let's say OpenMPI Do a module avail and then there is another sort of nested environment, shown here.
Once you load the certain version of MPI you'll then discover
That there are already certain applications that were built here with both GCC 10.
2 end this version of Open MPI
Right.
And so as you progressively load say your compiler and then your MPI you'll be revealed, more and more
Applications that are available to you from that combination of compiler and MPI.
So for example, if your code required FFT W.
Right.
This one is presumably compiled for
In using the parallel options to use MPI and it's being it's compatible with the GCC 10.
2 and Open MPI 4.
0.
4 right now.
One of the questions you'll probably have if you're not familiar with Lmod is Okay, yeah, this is great.
I can sort of sort of helps keep the module environment, a little bit more organized and prevents sort of conflicting modules from being loaded at the same time.
But how am I supposed to know that that version of FFTw is available with, you know, GCC 10.
2 openMPI 4.
0.
4
Like, is it a bit like for example, your question might be isn't available for MVAPICH2 yet.
Like how do I know and so the one, one of the new commands that you'll probably get familiar with is module spider.
And so this is one of
The new module commands that isn't available in the old environment module system.
This is allows you to search essentially
This hierarchical structure for the packages, you need to load to get to the packages, you want So let's go ahead and run that example that I just mentioned of FFT W.
Right.
Like if I wanted to use a different version of MPI or compilers what what FFTWs have been compiled in are available for the different combinations that I might want to use right and so
If I clear this and do a module spider.
FFTW You'll see a list of the different modules that are available.
FFTW modules that are available and what modules, you have to successively load to get to it.
Right.
And so you can see here
In on the AMD compute nodes we have an FFTw a serialized version.
I'm guessing that's what this is compiled with only GCC 10.
2.
We also have parallel versions compiled with both MVAPICH2 and OpenMPI
And so that's how this.
This is the one new command that you need to really get familiar with.
If you're sort of confused about like
Okay, what software is available.
Because it's going to mean you don't want to just start loading random compilers and my and MPI is to figure out which you know what Sort of mass libraries or other dependency, you might need to build your software, right.
And so this module spider command really simplifies the process of figuring out
The figures out for you.
It's like, All right, what's available and what's already been compiled and like what combinations of other models are needed to be loaded to use it right
Okay.
So yeah, that's the only really new command.
I want to cover in
Lmod because it's gonna be the one that you'll get the most bang for your buck right off the bat.
Okay, so I'll stop here for a second.
And if there's any questions let me know I will sort of get set up here for doing some compilation examples.
Know if anybody is using the chat.
So Marty, just about now you should be looking at GPU.
So when you're ready to move to
GPU.
That you've allocated for each one, just Yeah, I mean, I didn't really stick to the original plan of you separate ones.
I wanted to really cover the module environment in general.
And then I can show I'll show both.
Maybe I'll start with the GPU example.
Yeah, it's up to you.
I'm just like, okay, that's fine.
Yeah, so
Um, OK.
So now, now you're sort of familiar with module environment and some of the differences that are on Expanse versus Comet.
One of the things you might want to do is you take some code that you've had on Comet built on Comet and then transfer it over to Expanse.
Now I'm gonna pick some examples that I have from Comet right now and show you how I modify or you will show you how I compile code in general and how You might want to think about doing something similar for sort of reproducibility aspects in the future, but also just you know how to then use the module environment that we just explored and to compile code on Expanse.
So the first thing I want to say and I think Mahi mentioned during his talk is Unlike on Comet where we sort of didn't Didn't monitor this as much as maybe the system administrators would like us to and we really do encourage you to compile code on the type of compute node that you're running on.
Especially if you're going to be using GPU code right the login nodes are AMD.
They're, they're very similar to the actual compute nodes out on the system.
But even for CPU code.
We do encourage you to say, if you want to do some sort of interactive compilation to
Figure out how to build the piece of software that you want to Grab an interactive session in do that in sort of develop that build process and In it.
And and in particular on the GPU sides, you really should do that because, again, as you know, depending on how you sort of structure your build process, right, if you're using, say, the m-arch native kind of setting right you need the compiler to pick up on those optimizations and you need to Sort of, you know, if you want it to be simpler.
You need to really be on the type of architecture that you're trying to compile against
You can cross compile, but we've run into problems ourselves with that a bit.
So this is why we're really encouraging you to compile on the note type you want and you know, try not to do on the login nodes, because sometimes
compilation process.
You can be a bit involved in computationally intensive
And that would affect other users on the login notes.
Okay, so let's go ahead and grab an interactive session.
I'll do the GPU example.
Just grab Commands.
I guess I'll do 16 gigabytes one GPU.
And I'm sure I have less than 30 minutes I'll just do this.
So this is again.
SLURM is not loaded in your default environment so Okay.
So I basically started an interactive session on one of Expanses GPU nodes and now I need to correct Me to Login with my credentials correctly.
So I can pull some files from Comet.
Okay, so we're on an interactive compute node.
And I want to pull some files from Comet.
Oh, it's probably not.
Just do it here first.
Okay, so I'm gonna install some software in my software directory here.
So I have some software sitting on Comet.
Some build scripts that I've used in the past.
So I'm gonna
grab these Scripts Gromacs is a molecular dynamics code that's widely used on Comet.
And so from time to time we've had to sort of build multiple versions for people when we don't really have those in the standard module environment.
So I have this build script that I've created to compile, Gromacs with
MPI openMP and CUDA itself.
And so this will be a good example because there is a slight difference with CUDA.
Okay, so let me go open up this build script.
So this is what I do when I'm compiling code.
So you might be working interactively for a while to figure out how to build software.
But eventually, once you sort of have the set of modules that you're using for your application and
The different dependencies that are needed and how you sort of configure that what I do is I create a build script essentially That sort of codifies all the changes that I made to the environment itself to compile the code.
And the reason I do this is because usually I'm handing these build scripts to users so they compile to compile this version of the code in their home directory
But as I mentioned, you also have to load those same modules.
When you run the code, right.
And so by having a build script like this, which could also be run as a batch job.
You can see here I have it.
So I can compile the code in batch form as well.
I don't have to do it interactively I can just launch it, and you know, forget about it and let it compile and then come back later.
But it really codifies what was done to compile the code because you need to know exactly what modules were loaded When you compile the code to then run it.
Right.
And so this is where another common problem that users get tripped up on they forget how they compile the code and then we have to sort out what modules, that need to be loaded when they try to run it.
And so that's why you know trying to codify things in a build script like this is I think a good practice and it helps us when you run into problems.
So we can actually see
You know, oh, this is what I did.
And this is where it's failing, you help me figure out why this part of the build is failing is so this gives us a more complete view of what you were actually doing.
Okay so
We trying to be reduced the text here a bit Of this build scripts.
What time does this session.
Go to Mary can you just remind me.
I think it's 1:25
Yes.
Yes.
Great.
Okay.
Log into Expanse here.
And just so we can have another screen here for the What modules, I might want to load.
Right.
So again, by default when you log in.
Right.
The CPU stack is visible to you.
Right.
And so what you'll want to do if you're compiling on GPU is do a module purge and then module load GCC, a module GPU.
I'm getting ahead of myself.
Right, you can see that the path and the module environment for Spack is different than the one on the AMD compute nodes which makes sense because this is these are Intel based CPUs and NVIDIA GPUs.
And you'll also notice there isn't.
For example, a default GCC and the module that's loaded in the Spack environment up here.
And the reason for this is we're actually using the system GCC, GCC 8.
3.
1 right it's already in your path.
For most of the builds that we're doing.
On the Intel compute nodes so and so that's what I'm going to use right now and so In this case, I don't really have a module.
Module that I'm loading For the compiler itself because basically all of the defaults here.
Are being built with this GCC 8.
3.
1
that's available.
So for example, this Open MPI 404 is being built with GCC 831 so this is one caveat to the
To the module environment on the GPU side like you have to sort of keep in your head that these packages in the Spack environment were built with GCC 831 so if you want to use this OpenMPI 404 you're going to you know your by default, you're using the GCC 831.
So I'm going to change my build script here to use that OpenMPI for for and let's just double check to make sure that if I do a module load There's also a cmake in there.
I'll be doesn't know there's a cmake here.
It's already there.
So I'm going to change my cmake module here to the 838 318 2 and then this is where the caveat comes in with the module environments on the one of the caveats on the GPU side come in.
So right now, at least the CUDA modules that are available are actually in the system manage module environment.
And so if we look at this
You'll see the cuda 10.
2 is available down here in these CM shared modules.
Right, and so the default one to load, I believe, is just the toolkit and I don't think I've run into any problems with having to actually load the rest of them.
But you might run into issues like that.
And if you do let us know.
But I believe everything that I've done testing wise if I just use the main tool kit one works so But that's obviously dependent on the codes and you might be compiling against.
I mean, basically we're again remember the modules are just telling the shell to really know where to look for certain things.
So if all the paths that are in the toolkit module are there and really point to most of the libraries in CUDA that you're compiling against it's probably fine.
But you might run into some edge cases that we have not come up with it.
And in this case, maybe
I'll change this.
I know the Gromacs is a new version here.
And essentially, this is what my build scripts look like I sort of declare some environment variables that set what modules I want so I can use them later.
To describe where to install things and create paths and things like that.
And essentially there is no one caveat here is there isn't really a compiler module.
It'll just be GCC By default, but you know you, then you know I always do a module purge before compiling and then I load the modules that are needed.
To create the environment that I want to then compiled program.
Right.
And then I do here, I'm just downloading the source code.
Creating some directories and then running cmake which is the build them build program used for Gromacs here, you know, there's different settings that you can set for your obviously for your code and These are sort of just default ones we use on Comet forever.
And so, all I've really done is change the modules that have been loaded.
Right.
And then I can either run this interactively
And start compiling the code.
Was an error here my GPU.
Oh, yeah.
So yeah, I actually did this earlier when I was testing the examples.
So yeah, I did it again.
Okay, so there's the one other module that I need to add which Mahi just pointed out is
Expanse GPU module.
Right.
Right, because basically when I do the module purge i've you know removed the GPU stack from my environment, right.
So if I do a module load
That should then load all the packages that I'm expecting to see, and then I can load those subsequently So yeah, so that's what we usually do when we're trying to help you compile software as we construct these build scripts that you know sort of You know, cement in what was done.
And so I definitely recommend trying to do that.
I mean,
Obviously, sometimes you want to do this interactively and just get a sense of, like, Oh, is this going to work.
Is this going to work.
Which is fine, but as you work through it.
I would definitely recommend you know having another terminal open with, you know, creating that build script on the fly.
Because, for example, in this case, I was
Very easily able to switch over to Gromacs say 20 from 19 would have that for about three to 20.
4 and basically change nothing other than a few variables that were in the script.
Right, I just changed the variables that we're pointing to different module names and the version of the package itself.
Right.
And so, not every not every code is set up you know to do with that that easily.
But it's it's it's a good practice that I would encourage folks to take a look at and, in particular, as I set up the code just stop the compilation now.
You know, you can also set these up as batch job scripts, because, for example, there wasn't someone who asked, Well, if I'm doing this and debug and it's going to take longer to compile My package.
How should I do that.
You could either wait for an interactive node on one of the regular compute nodes or the regular GPU nodes.
Or you can also, you know, if you have a build script like this.
You could just submit it as job like once you know that the build is looking like it's going to work.
You can, you know, submit it as a job to the scheduler itself.
And one thing that's nice about that when I'm compiling software is that
You know, you get a build out, I mean you have the complete output file from the build itself.
So even if the build fails, you know, where it fails and you can look up the entire history of the bill to see what was happening.
Right.
And so if you want.
I mean, so I mean this is, you know, this is what we do when we're compiling code is really helpful for debugging purposes and it's another reason that
I definitely encourage you to think about, you know, building code.
This way, and especially since we're sort of, you know, asking people to try to build out on the cluster itself, not on the login nodes.
This is one other way to another way to do that.
And I think that's about out of time and If there's any questions.
I guess we should take them now.
And I think there's probably a break.
So I can stick around during the break if people have questions.
Hey, Marty.
I guess I was away for a couple minutes in the middle I don't know if you already mentioned this, but do you have examples of these SLURM batch compiler build scripts in the examples area for people to kind of take what you've shown very quickly here and implement it themselves.
Yeah.
Now that's a good.
It's a good
It's a good question.
No we do not.
I do not have them.
I can put them on today.
But yeah, we do have some of these sitting on Comet.
But yeah, they're hard to find.
Essentially, this is
Yeah, maybe we should keep Yeah, yeah, basically, I will put you somewhere.
Some of these examples in
In on Expanse In the examples directories that we have.
I don't know if anyone from our team is putting the examples on there yet, but I think by the time we get in production will definitely have some examples there so
And if you're currently on the system in terms of early user program, you know, If you, if you want.
Access to some of these examples I can you know ping us with a ticket and I can show you where they are on Comet and you can sort of play around with them.
or copy.
Okay, well, if there's no more questions.
Thanks, Marty we'll come back at
135 where Nicole Wolter will talk about job charging followed by Subha she'll go over the Expanse portal and then we'll do interactive computing.
And we'll see you guys in about 10 minutes 1:35.
Going so we stay on time, we, we have Nicole Wolter will be talking about job charging followed by Subha who will talk about the OOD portal and then I'll talk about interactive computing This will run till 2:15 and then we'll have a final session on data management.
Okay, thanks.
All right.
So, Can everyone see my screen or Tell me, you can all see my screen so I'm, I'm going to talk about Expanse managing allocations and charging and we've touched on this.
A little bit in the previous presentations and there were a lot of discussion.
There was a lot of discussion in the chat about this.
So for them, so some of this might be repetitive, but there are some some little tidbits in here that are going to be different for Comet and that we haven't touched on yet so I hope it's helpful and one of one of the goals that we can get out of this.
This presentation is that based on how we charge and knowing understanding how we charge you can better prepare your
Your proposals because you know you know how the resources are going to be used and how you can use them and it will also help you when you're you're thinking about how to set up your workflows.
So I'm going to start with the upbeat that job charging a super simple, at least on the user's behalf.
You can talk to our system support staff and they'll say they work very hard to make sure that it's super simple for you all.
So on the top level as Mahi mentioned before Expanse and Expanse GPU are completely separate resources and at SDSC and Expanse for, for I mean At XSEDE for that matter all resources are allocated in what we call service units or SU's but just because they're, they're called the same thing does not mean that they are the same at all.
So as before with the Expanse and Expanse GPU an Expanse unit is
Allocated in core hour where GPU Expanse GPU SU is going to be a GPU hour a GPU hour I'm in addition you need to know that even if they are both allocated as core hours where Expanse and Comet were both allocated as core hours.
The Expanse core hour is not equivalent to the compute or the Comet compute hour or core hour sorry about that.
Um, some of the things that.
Another thing that we need to take in consideration is when you request a job.
You're not just requesting the core hour.
But the core hour is up represents a core hours.
So with the core hour you are also considering the what else comes with that.
So that will include the memory or for GPUs will represent the cores and the memory involved.
Kind of the high level thought here is that jobs charged jobs are charged for the resources that you request.
And when I say resources you request I mean the actual cores, or GPUs you request.
And that'll be in separate from, I'm sorry, the jobs charged are for the resources you request, not the resources that you use.
So if you request five Cores and you run it for an hour, but you own, you run a serial job you're going to get charged for all five cores, regardless
Another important thing to note is that the minimum charge for all jobs is one SU so if you run a On one core for a couple seconds is going to get charged one SU but if you run on one core for an entire hour that will also get charged one SU So when you're thinking about your workflows here, it's, it's something that you should consider into bundling lots of small jobs together in one job.
So on the high level.
The, the charging algorithm is going to be the resources you request.
So either GPUs, or core hours times to Job duration, the job duration is going to be the time in hours, times the charge factor and the charge factor is going to represent
It or is going to be dependent on the partition.
So, Mahi talked about the partitions already.
So, as we know.
We should be familiar with most of these.
So we have nine partitions out there seven of them.
So the compute shared GPU GPU shared large shared debug.
And GPU debug are all going to have a charge factor of one, but with Expanse, we added these preempt and the GPU preempt So these are discounted jobs that you can run on knows that aren't previously allocated, but as discussed before, if another job comes into the queue and needs to use those.
Your job will be preempted.
A couple things to keep in consideration.
First, the good in preempt queue, you can run for seven days, which is significantly larger than you can run in any of the others.
But on the flip side of that, there are no refunds, regardless
And, the other thing to note is that in preempt It's like a compute or a GPU request, you will be charged for the entire core or the entire node.
So we just we allocated in full nodes.
So GPU charging you can think of it kind of like a Tetris game.
When we're when we're scheduling.
We're trying to fill in all those little spots.
So there was some discussion before about if we allocate in node hours versus core hours.
And one of the reasons we we do core hours is because we feel that with the shared queue, we can fill in all those little back parts a little bit better.
So again, part of the reason for having to preempt queue now is also that we can backfill in and fill in the empty spaces.
So one Expanse node has 128 cores and 256 gigs of memory.
This is different from Comet, where it has 24 cores and 128 gigs of memory.
So when you take the fraction of the smallest unit that we can allocate then is one Expanse SU or one core hour is going to be one core and up to two gigabytes of memory, whereas on Comet.
Because we had fewer cores, even though we had less memory, per core you got a little bit more.
You could get a little bit more memory.
You could get up to five gigs of memory.
Something that's really, really important to note at this point is that on in all partitions unless you specify memory, you're going to get a default of one gig of memory, that's across all the partitions.
So again, if you just select, you just want to have
Your in shared and you want one or one core By default it is going to put in one gig.
So the charge here is fairly straightforward.
It's the equivalent of CPUs.
So the equivalent of CPUs is going to be the number of cores, or the memory that those cores have.
So if you request
One core and three gigabytes memory.
The three gigabytes of memory is going to be more than worth one core, which means you pop into two course.
So you're going to be charged the equivalent of two CPUs for that.
The wall clock time or the duration is going to be in hours and then a charge factor, of course, is related to The partition that you use.
GPU is very similar.
So just treat.
It's the
GPUs is, the charging is going to be the exact same but again you need to note that one GPU on Expanse is different than what we had on Comet.
On Comet we actually had two different types of GPUs available, but on Expanse one GPU node has four GPUs 40 cores and 384 gigabytes.
So if you reduce or you take the fraction that will be down to the lowest
Portion of that node, you can have up to one GPU, up to 10 cores and up to 96 gigabytes of memory.
So again, for GPUs, unless you specify differently.
By default, if you request one GPU and don't add the other elements into your batch script.
You're also only going to get one core and one gigabyte of memory.
The equation here is similar equivalent GPUs job duration charge factor but again the max is going to be out of GPUs cores or gigabytes of memory.
So whichever has the most there is where it's going to fall in.
So we talked a little about the Our Marty actually mentioned about the Expanse client tool we have.
So the Expanse client tool replaces the tools, the tools we had on
Comet, which was show accounts and project details.
And what this tool will do for you.
So basically,
Marty also mentioned that to be able to run this command, you have to load the SDSC module.
The syntax for this is Expanse-client and then the command that comes behind it, the commands you can use are either users or projects.
When you list projects, it will give it will also when you use Expanse-client project if you need to also specify the project that you want to view.
This will show you who all is on the allocation, what has been used By each individual user what the total is available to all the users.
So in our account management system, we can actually specify percentage of allocation that's available or cap users at this so that might be different depending.
In the example I have here.
It actually everyone in this allocation has the same amount available to them.
And the final is what everyone, everyone combined has used on the project.
The other command that you can use is users.
And user will show you specifically what you have access to so Expanse module client user will show you all the applications that you currently have available to you.
And will show you your usage.
What's new with this tool is that we have a verbose option which is really, really nice.
So it doesn't only show you what so
That the our accounting updates into the database nightly so everything that's been run today.
If you go into the Expanse portal.
You will only see information for all the jobs that were run yesterday or before then you won't see what happened today because the jobs are uploaded at midnight, but with this tool.
The two at the bottom here where you can see used queue and used by project queue.
This will show you what is currently in the queue.
What is currently running and what has been run today.
So that actually reconciles quite frequently.
So you can if you request a job to run for 24 hours on 128 nodes it will show one thing, but if it pops in and pops out and immediately fails.
And so it just runs for a second and only gets charged one SU that will drop down quickly for you.
So, I went I went really quick and then I can I can give some more things, but just in kind of a review the charges are based on the reserve resources that you reserve and you request, not, not how you're going to use it.
The default is one gigabyte of memory per core GPU in all partitions.
So it's really important that you designate in your scripts with --mem
Minimum charge for any job is one SU, a thing to notice that we dropped the minimum the time limit from 10 seconds to one second.
So any job that runs for a second will get charged
I wrote down here again the changes in the cores and what they are and that we have new partitions.
Some of some of the partitions and preempt I think we're still considering how to best manage them and how to tweak them so that they are the most advantageous for users.
So I would recommend going to the user guide every once in a while just in case we we change anything.
That's all I have.
I went very quickly.
And I know there's there's stuff I forgot.
So why don't I open it up to questions.
Um,
Yeah, I think, I think one of the questions I was trying to answer that just now in the in the chat was in terms of maybe you can just review again the charging versus cores and memories.
Like, when are you charge more for one or the other.
So depending so we can actually there's we have a little more complicated equation for that.
But basically, the smallest unit that you
Can request.
or I'll take the example of CPUs is one core and with that core, you can use up to two gigabytes of memory.
So if you request.
You can request five cores and two gigabytes memory.
And in that case, the, the number of cores is more than the amount of memory that five cores would have.
So you're going to be charged for
The cores in that case.
However, if you ask for one core and say six gigabytes of memory six gigabytes of memory would actually correlate to three cores.
So in that case, the, the memory is higher.
So we're going to charge you for 3 cores in that in that aspect.
Does that help
At all.
Yeah, I think so.
I mean, I think the question was, yeah.
I mean, it's a matter of essentially you have to think about the rule of thumb.
I think I was saying in the chat was, you know,
You're going to be charged by the resource that you use more of over the amount that's available on a per quarter basis.
So as as Nicole said right it's
If you want, you know, for every core there's effectively two gigabytes of memory available on the system.
So if you need more than that for a single core job.
You're going to be charged by the
Memory instead because effectively you're taking that memory away from the other cores, essentially.
So it's, that's the general rule of thumb, I think, right.
Yeah, well, and I should also add why we Why we chose to set it at one gigabyte.
So in a lot of cases, a lot, a lot of the applications don't need that much memory.
However, some
In the case with the example I gave you have one core and three gigabytes.
It makes more sense to take advantage of someone who's using a core and one gigabyte example again this is a Tetris example.
So we want to fill in that little slot instead of taking another core away from someone else that could be using it.
We can we can use what's not being used.
If that makes sense.
And say, Okay, if there aren't any more questions.
Yeah.
Thanks Nicole.
I think we're running a little long.
So I think Subha is going to talk about the Expanse portal next
Yes, I can share my slides now.
Yes.
Okay, I can get started.
Everyone can see my slides.
Yeah, looks good.
Okay.
Hi, my name is Subha Sivagnanam, as Mahidhar mentioned in this talk earlier today, we are developing a user portal for Expanse users.
So this is a very short talk, I will just briefly describe the features of the portal and show a quick demo.
So the goal of the Expanse user portal is to provide an integrated web based environment for doing file management and job submission operations on Expanse Using an easy to use interface the portal is also being developed to serve as a gateway for launching interactive applications such as, you know, MATLAB and R studio So we did not build this you know we're not building this portal from scratch.
We're actually leveraging an existing NSF funded project called Open-on-Demand, which some of you may have heard of.
And we're just customizing the software for working with the Expanse system.
So the, the portal provides graphical file management capability where users can view and operate on files in their home directory or On Lustre scratch directory.
In addition to doing like basic file management operations like copying or
Moving, renaming and uploading files, users can also edit files without needing a shell.
There is a job composer area where users can create
Edit submit jobs and also monitor jobs.
We do have, we have provided some predefined templates that you know, users can use and they can customize it and submit jobs.
We currently have example templates for jobs like MPI and openMP and we'll be adding more but however if you don't see a job template that you think would be widely used for a specific application or for research community.
And you want that to be provided, please do let us know.
The portal also allows the ability to submit and interactive job to the cluster.
So, and without, you know, needing to install and local X server.
And as I mentioned, we only had it for MATLAB and R studio, but based on
user request we may add more interactive applications, and for those users who need to access a you know who need a terminal.
There is an integrated shell terminal.
And now with the portal and that's very similar to many other tools that provide terminal access.
So now who can use the Expanse portal is anyone any PI or users who have a valid Expanse allocation and have the XSEDE base credentials.
They can only access the Expanse portal and what that means is, once your application is approved, and you need to have your home directory and everything set up on the back end before you could use the Expanse portal.
And though this is obvious any policy like you know scheduler application usage security policies that's been set on Expanse resource.
Or any guidelines that has been established is automatically applicable when you're using the portal as well.
So the portal is being developed.
And if you go into production, along with the cluster.
However, we did develop a
Prototype for Comet that's available if you're interested in checking it out.
You can go to portal.
comet.
sdsc.
edu Which is very similar to what we will be providing for Expanse.
So with that, I will quickly show a demo.
And I only have about 10 minutes, Mary, is that right
Yes.
Sorry, I was on mute.
So let me so So as I mentioned, you need to have XSEDE portal username and password to login so you'll be automatically redirected to the authentication page.
Where you enter in your details and then you will be redirected to our portal.
So once in the portal.
You can see on the top navigation bar.
You know, it's pretty explanatory, but it's just you can access your files or you can go for the job composer area.
If you want the terminal is under the clusters.
And you can use the
Interactive apps to launch your interactive job on the cluster.
So I'll just show my home directory.
So this is my home directory on Expanse and this is the file graphical file management tool.
And here I can
View files I can edit files I can you know move and rename or create a new file or directory.
I can do all those operations here and I could just
Show you.
You know how just a basic edit.
So this is my submit script and I can, you know, maybe I want to print out the date In addition to running this job.
So I can do that.
And if I save it, it gets saved.
And then I'm sorry the Zoom bar.
I can see where that was.
Have to go back So as you can see now.
The edits would be saved and I can view the changes here and I can submit this job using the job submission interface.
So under the jobs that there's an active job in a job composer tabs, the active jobs is where.
Once you have submitted a job you can
Come here to monitor the status.
You know, if it's waiting the queue.
You can look at it and you can also look at, essentially, it gives you the output of scontrol show job.
And that's what anything listed over there.
The job composer is where you would actually create your job scripts and submitted to the cluster.
And if you're interested in using the templates.
There is a template tab that you can even tab you can access on the top or through the
Through the button and right now we have a basic hello world.
There's an MPI job.
There's an openMP
And if you want to use any of these existing templates to start the work you can just click on creating new job and that's created and you can see the job script itself over here.
And if you want to make any changes, right.
So if we want to add your account information.
If you want to add.
If you want to modify the script, you can do so by opening the editor and saving it.
And for the interest of time, let me just show you a job that has been completed.
So this is a job that I
Ran yesterday.
And as you can see, you know, the output is just print hello world and
So in addition to creating submitting jobs.
You can also run interactive jobs right now and just show a quick example of you running a MATLAB job.
Here you can choose your queue the number of hours what account you want to use on where you want to launch the job from and I will.
For this demo, and just launch it from scratch my
Lustre scratch directory and I'm just running for one hour.
And once it's queued.
You will show it and show up here and you can launch multiple such interactive sessions and you can
There is a tab where you can go view all your live interactive sessions in under my interactive session right now.
The job is running and
For some reason they say you're debugging with a colleague or you want to you want them to see what you're doing.
You can share you can give them a view only shareable link to so they can also watch what you're doing.
And so once if I launched the job, it opens up
It's failed to connect, so Let me see.
I'm sorry, I'm not sure what's going on.
I'll have to debug this one, but the ones you would it will open up and
And interactive terminal when the Matlab will be launched and you can plot and display and open your Matlab files and And use the interactive application there.
And if there's time I will show I will try to see what's going on.
And then I can even show this on Comet, as well as how it's working.
So other than that.
If you're interested in using the terminal shell access, there is an integrated shell and which you can use to log in and you can just use that for an instance of a basic like a putty shell and if you don't want install putty you can use this one
So this is the portal, in a nutshell, we will be adding more features such as you know your information about your job and information about your applications will also be displayed here and we will also be adding more interactive applications as there is a demand.
So with that, I will end my talk and see if you have any questions.
So, Subha there was one in the chat about whether there's a REST API for OOD.
OOD.
Um, no, we don't have a REST API for OOD or the portal.
Okay, thanks, Subha.
So I'll go ahead and get started.
And let's see if I can make this Work.
There we go.
Everybody see The talk.
Yep.
Okay.
Go ahead and turn off my video.
Alright, so I'm going to talk about interactive computing, which is a subset of the user portal.
So Subha just talked about the portal and there's a lot of different definitions for interactive computing.
So today I'll just talk about a couple of tools we're working with, and
Subha was saying, there'll be other interactive tools brought on to Expanse So once interactive computing.
Basically, when I type on my keyboard and I work interact with my computer.
I'm doing interactive computing
But it's a little bit different, as you've gathered and if you've worked on HPC systems.
It's a little bit different when you're trying to do this on.
A high performance computer, and in particular on the compute nodes, you have to get them allocated, you have to get your hands on them and you have to be able to work on them.
So what, there's a lot of
work being done in the field of interactive computing on HPC systems.
There's actually an annual workshop that
Is being held.
I think it's in conjunction supercomputing on interactive HPC systems so
So with that, here are some scenarios for HPC interactive HPC MATLAB, parallel MATLAB like Subha just showed you and And on Expanse.
We want you to use it through the portal.
So it can be well maintained in and you get good better performance.
There's Jupyter notebooks, which I'll talk about today.
And then there's Amazon has an application for
Some scientific application, looking at airflow monitoring so anytime you're running an application and you're accessing a cluster node its interactive HPC computing So to get an interactive node, you have to run a particular command.
So interactive nodes are particular nodes that when you get it, you'll be able to actually you'll be logged on to it.
And you'll be able to run commands.
That's where you can grab an interactive node and you can compile and debug code a little bit.
We don't want you running your application
On the login node.
But if you need to do things on the command line, like you're compiling takes a really long time then you can do it on an interactive node and and in the future, there will be a full partition dedicated to interactive, but that's not up and running yet.
So this is some of the arguments.
I think you might have seen them earlier, the big change between Comet and Expanse is you have to give it the account information and as Marty pointed out the arguments to
To SLURM are a lot more exacting, so you have to give a little bit more information.
These two work on the CPU and the GPU and if you need help developing other interactive requests
Let us know, and so I'm going to focus on Jupyter notebooks and letting them on Expanse.
so Jupyter is a open source tool that there are two really popular applications.
Jupyter notebooks and Jupyter lab and they're actually considered services and I'll show you why.
And understanding that will help you realize some of a key aspect we're trying to focus on, which is the security of these notebooks.
So we have a tutorial on running notebooks on Comet that will be ready for Expanse as well.
So for more details.
My talk kind of follows this very loosely
But go to that tutorial.
If you have questions.
So normally, if you're going to run a notebook.
you can run it on your laptop and
notebooks are popular in a lot of different scenarios, especially machine learning, you can run parallel notebooks, you need to use something like Spark.
And there's a lot of applications that are using them.
So Anaconda is great, but it's big and heavy weight.
So we don't want to run that on an HPC node.
So what we recommend is that you use the mini-anaconda environment called conda and so you can go to the tutorials and Find out the more detailed explanations or go to the conda site to install it, but the power of conda installing it is it gives you your own local environment.
So you can install the version of Python, like Marty was showing you with modules, it's very similar.
It was originally created for Python and so you can you can also develop your own conda environments and then have different versions of Python in those environments.
So it gives you a lot of flexibility and control for what you're doing.
So we intentionally want people to use conda because that way.
We're not trying to keep different versions of conda environments maintained for 10,000 users the users will teach you to install conda.
Make your own environment.
And then you'll have a lot more control over how your applications are running.
So you want to you can also create these virtual environments.
I won't, I won't do a demo of that.
And you can look on the conda site.
But this is one way that you do it.
You create the environment you install the particular modules that you need.
And off you go.
To install conda we have long full instructions on the tutorial but but it can take a long time.
Just a warning.
It takes a long time to install that software.
If you want to use Jupyter notebooks, you need to install the notebooks and you need to install Jupyter lab is pretty straightforward.
But again, it can take 5-10 minutes for these to install
So, and then you want to make sure that the environment is working.
So a key issue that we're addressing it at SDSC is the security of Jupyter notebooks.
Probably a lot of you have already run Jupyter notebooks and you're thinking, why do I need to be
taught about this, but most of the time when you run a notebook they're not secure it's HTTP.
By default, Jupyter launches without security.
There are some ways to make it more secure.
And that's what we're trying to work on today.
So there are vulnerabilities.
In fact, the portal that Subha presented to you
The team spent the last year, improving the security model.
So we're working on that with our security experts Scott Sakai and some other people
So what, why is this a problem that connection by default is not secure, you can connect over SSH tunneling it's secure and we have a little tutorial and we have no problem with that.
But it's inconvenient and things happen like
Your connection may timeout and you lose your notebook.
So it's not the best way.
So we've been trying to find something a little more
Optimal and we came up with this idea of a reverse proxy service.
So I'll show you how that works.
And then Marty's been developing Galileo, which is a remote notebook launcher that would run off your laptop.
Hopefully we can work on that and roll it out next year.
So we allow Jupyter services.
That's Jupyter notebook and Jupyter lab to run on interactive nodes on the compute nodes on GPU nodes.
So what's wrong with the default Jupyter notebook.
It helps to understand that you're on your laptop you SSH over to
The server whichever it could be the login node, which we don't want you to do, or even the remote node that you get with an interactive command.
The Jupyter service launches and it's a web server basically running for you on that node, but it's HTTP.
So you have a secure SSH connection.
Everything you type there is protected, but the connection between your web browser and Jupyter is insecure, which means it can be hacked.
And so we're trying to overcome that by making this a little more secure SSH tunneling works this way.
What you do is you connect you set up the tunneling by making a proxy connection between
Using the port you set up a port on your laptop, a port on the Jupyter notebook and then an SSH connection between that port on your laptop.
And that port on the notebook and everything communicated here it's secure.
But like I said, if you have some interruption these things have a way of timing out or getting disrupted.
It's not very stable, but it works.
And we're fine with you using that
So what's more secure is what we've architected in called the reverse proxy service.
So you log in to Comet and you run a script and the script will go and submit a request to the batch
queue.
Grab an interactive node for you launch your Jupyter service either notebook or lab and communicate with a reverse proxy service and what you get back as an HTTPS connection.
So everything you do is encrypted.
That's just basically, you know,
In in Repeating what's in that diagram the start Jupyter script is what you're going to run you'll check it out and clone it from the repository.
And you have arguments.
You can tell to partition you want what directory you wanted to start in the project allocation the batch script.
Do you want to use notebook or Jupyter lab right now.
You have to tell it.
A little more explicitly but we're working on improving that part of the user interface.
How long you want it to run and you can get extra information about the job.
As you as we build the script and submitted.
So to install it, you know, I usually put it in a little working directory you clone the repository.
And it gets installed and then you come down into the directory and you'll see this script here.
We've generalize the
Service so that it can work for systems that have SLURM or twerk, which means that works on Comet Expanse TSCC Stratus and And we can just add all we have to do is develop the right notebook.
Configuration for the particular service.
So you're just going to run this you all you have to do is run that script in the defaults will get picked up.
Here's an example of starting it you'll get some output and one of the things that comes out is your notebook is here.
So you want to copy that URL.
Paste it into a web browser and you'll be given a monitoring window and when it's finally launched and that can take some time, because the queuing system might be very busy, you'll get your notebook or your, your browser and this is over here is just a little example of The, the script, I'm going the wrong way and This is an example of using the Expanse portal to launch the notebook.
So you log in and you pick
A terminal window and say what you get is a shell and there's a link on this portal.
When you first log in.
And you're in your shell.
And what I did is I went into my reverse proxy and I typed start Jupyter.
And by default, that's going to launch a notebook for me and I got back a URL I pasted in and I monitored it and then it ran.
And so eventually, but it can take a while for it to run
So you can do it through the portal or you can do this through the command line, whichever you want And I think in my time.
Yeah, I think we still have time.
All right, thank you.
That's it.
If you have any questions if you have this is a new product new application.
So if you discovered problems with it.
Please submit a ticket to XSEDE consults so we can make it work better.
That's it.
Thank you.
Thanks, Mary.
That's great.
And last but not least thanks everyone for hanging in there.
Nicole Walter will share some details on data transfer mechanisms.
I'm sorry, I just saw that I apologize.
So for this final thing we'll talk about data storage and transferring
I'm going to do a quick overview of the storage we have available and the policies that are associated with it.
I know that Mahi has discussed this previously, but it is a pretty big change.
And so I think it's important that we drive it home to you that it's coming.
And then I'll talk about some various options and tools that are going to help you manage how how to manage the data you have Alright, so like on Comet we have the home file system and then we have the parallel global file system.
The home file system
Similar to Comet all users will get one, you will have 100 gigs of data there.
I think both Marty and Mahi mentioned that this should only be used for for scripts, maybe source trees, maybe binaries.
Static stuff that you want to keep forever.
You should never run jobs from the home the home file system, one of it is that your performance isn't going to be that good.
So instead we we recommend you use the parallel file systems.
We have the scratch and projects as scratch indicates this is supposed to be temporary.
On, Comet we recommended that it should be less than three months or less than 90 days on Expanse, we're gonna we're going to help remind you that this is scratch and we're going to automate that purge to help everyone and it's going to be 90 days from file creation.
The other option we have.
So again, any files that you have there that need to be in a little bit more permanent place should go into projects.
Projects is actually allocated space, but all projects will, by default, get 500 gigs of memory.
If you want any additional
Or if you need additional space as Mahi mentioned you're going to need to justify it, and either put it in your initial proposal or request a supplement The other really really important thing is that the Lustre file system does not have backups.
So all the tools.
I'm going to talk about talk about after this, please take advantage of them so that we never have a
Problem of that I've lost all my data.
Soon, since we know that we're not supposed to use login nodes we shouldn't use login nodes for Any compute or IO intensive and the IO intensive is, of course, including moving large amounts of data.
So we have for data mover nodes and they're there specifically to help you guys transfer data back and forth, um, the data mover nodes can be used with all the tools that I'm going to discuss further When you access or you can access data mover nodes with oasis-dm.
sdsc.
edu I think that name might change.
But at this time when you go in there, you'll it's load balanced so that it will round robin to the one of the four nodes.
So there's of the tools we have available this kind of a list, we have out there.
When you're deciding on what tool to use.
It's really going to depend on the amount of data you have, so that includes both
The size of the individual files and the number of files that you're moving Other things that you kind of want to take into consideration is your workflow.
And if you need to be able to script any of these things.
So if it's going to be scripted, you need a command line interface.
So for for smaller moves you can use scp or rsync or SFTP
For larger for larger things as Mahi mentioned the globus toolkit.
We have a couple tools there and different places where you can use that from and how you can use it.
In addition, we're kind of adding two new tools that into our into our idea here because we're getting into the cloud, a little bit more.
You might be needing to move stuff off our archival storage these tools.
These tools are very robust and can help with other things.
So we'll start with a small file.
So for small files, less than two gigabit so you can use scp SFTP or rsync.
But let's focus on the top two first so SCP secure copy protocol is the utility for secure file transfer
SFTP, as Marty gave us a great example earlier is secure file transfer protocol.
It has a utility with a bit more functionality.
So not only can you transfer files.
You can also access files and you can do some file management in there.
Um, with all these tools there are lots of options of how you can use.
I highly recommend again due to time I'm cutting this very brief but
I recommend on the system you do SCP help or command help.
And it'll give you all the other options that are available to you.
Another thing to consider with these is that you can copy from
A local site to a remote site from a remote site back to your local site or between two remote sites.
As well.
I'm going to talk about, kind of, for the small is rsync.
So rsync is going to be useful if you if you want to kind of maintain two equal copies on a different on two different systems.
The difference.
The nice thing about rsync is not going to copy the entire file is just going to copy the changes that are happening.
So it's a little less overhead for you.
The next three tools I'm going to talk about rely on the globus toolkit, the globus toolkit is collection of clients and middleware, that is designed specifically for enabling technology for grid computing.
So essentially, allow secure communication between grid users that cross organizational boundaries.
It's a command line.
command line tool that supports the standard SSH and grid.
So you can use this in the Single Sign On but that's that's what is using us gsissh to interface with globus.
With Globus the transfer tools that are in there are gsiscp and gsiftp and they're essentially the same as a SCP and SFTP The one thing with all the globus tools is that you're going to need an XSEDE, an active, an active, that's important, XSEDE allocation.
If you are a local user who only has a local allocation, feel free to contact us.
In addition, if your account has expired at this time, I believe if your, your allocation is over
XSEDE cuts you off.
I believe they are talking about changing that.
So there was a little bit of leeway.
But in case that's still the case you should contact us either through the XSEDE ticketing system or the SDSC local ticketing system and will help facilitate it so that you can move your data as needed.
If you're uncomfortable with the command line, you can use Globus via GUI you'll log into www.
globus.
org you're going to use your XSEDE credentials to get in there.
The endpoints are important.
So you can see the collection is where you will select your endpoints.
If you just type start typing XSEDE it'll give you a full list of everything's in there and you can pick and choose.
Another thing to keep in mind is that the past and little bit or on the data mover nodes Are a little different than what is actually on the system itself, so check in the in the user guide to to find how those paths, how they correlate between them.
And the next one I'm going to talk about is globus-url-copy.
This can be done on the data mover notes.
It's a control line.
command line interface for high speed data transfers.
It also uses the gridFTP tool.
So this is we this is really recommended for large files and large number of files.
So, do a little segue here just because you can do something doesn't mean you should
If you have a large number of really small files it is highly recommended that you tar them up and send them as one large file, rather than lots of little files.
So to use Globus URL copy once you're on the data nodes you need to load the globus module, you need to check to make sure you have an active user proxy.
The command is grid-proxy-info, it can tell you if it's available, how long, how long is available by default.
I think the certificates last for 12 hours.
If you don't have one you can use the
The command my-proxy-logon to create a new certificate for yourself.
Again, the help --help will give you all the parameters.
To help you improve performance and to make this as efficient as possible.
Um, I should take a moment and note that the Globus tool kit can be up to eight times faster.
Than using SCP for large files and like the terabytes and stuff.
However, it should also be noted that if you're doing really small files, it's actually better to use SCP because the grid tools will take longer because the overhead associated with them.
So now I'm going to talk about two tools that I am by no means Fully versed in all the capabilities.
These are two extremely robust tools that are way beyond just copying files.
The first one is IBM Sparrow.
And that the tool or the command that we use is as ascp it uses a fast, fast technologies and to transfer and share files, it does a whole lot more.
So it.
I mean, it's for big data, transfer and sinking.
You can send you can share with this.
It also helps with automation.
And again, this can be used for the cloud stuff as well.
Um, I have a simple example here that that's just kind of the high level.
But again, lots more information.
Lots more functionality that's available with this tool and you can find that at the at the link provided below.
IBM Sparrow copy tool is available is currently available on the data mover nodes.
So the AWS command line interface is not available yet, you would need to download it to use it.
However, the
Instructions and is super simple.
So it comes down, it takes a little bit, little bit of time because larger thing, but it's very simple.
It's an open source tool to interact with the AWS services using commands in your command line shell, it'll use Linux.
You can use it with Windows as well.
It uses FTP or FTPS or SFTP to transfer stuff.
Um, again, it's a very, very robust tool.
There's a lot more options available out there and you can find them by going on the web and checking it out.
So start with backup all your data.
I remind you that we we don't backup the Lustre file system.
So it's really important that you take it upon yourselves to backup the data.
And then kind of an in review.
There's lots of tools available on Expanse for data transfer how you choose which tools to use is going to depend on the size of the files and numbers of files and and what your workflow is.
You should always avoid moving a lot of small files.
There's an automated purge on the Lustre file system.
So again, backup your data.
And finally, we haven't met.
.
.
, well, we have mentioned that Comet going to go away at the end of March.
Which means it's file system will go away in mid April.
So we strongly encourage you, if you have a lot of data to start thinking about where that data.
Data is going to go.
Something else that I should should mention, that's a difference between Comet and all the resources before and Expanse is that Comet loaded have the same home directory and actually loaded.
I think the file systems from previous
We aren't doing that with Expanse, so anything in your home directory.
If you want it on expansive
If it was on Comet, you're going to need to copy it over.
So, and I think that's all I have.
Are there any, any questions or
stop sharing and check the Check.
Yes.
Hey everybody, I guess we're done.
And we're going to end on time, which is always good.
Somebody asked about the presentations.
So I'll post this again in the chat.
But we'll have everything uploaded as PDFs in this directory.
So thank you very much for attending.
We really appreciate it and hope you found this helpful.
We will repeat it.
Again in the spring before Comet shuts down, send us questions, send questions to help that xsede.
org if you have questions, consult or help@xsede.
org.
If you have any other questions, anybody have any comments or questions go ahead.
I just want to add that this being a three hour recording means it'll take a day or two at least To clean up transcript and get it online.
And we'll make the chat available and the slides as well.
And there will also be a link, there's the link to the GitHub repository, and that will be populated with additional resources as well.
So thanks very much everyone for hanging in there.
Thanks to the presenters, thanks to the team for helping out in the chat too.
Have a great day.
Yeah.
Have a great day, folks.
Bye.
Thanks.


Transcript

And we can go ahead and get started.
Hello, everyone. Thanks very much for joining.

Search the Transcript

No search results yet...