Introduction to SLURM scitas.epfl.ch October 9, 2014

Document technical information

Format pdf
Size 1.8 MB
First found Jun 9, 2017

Document content analysis

Language
English
Type
not defined
Concepts
no text concepts found

Persons

Organizations

Places

Transcript

Introduction to SLURM
scitas.epfl.ch
October 9, 2014
Bellatrix
I
Frontend at bellatrix.epfl.ch
I
16 x 2.2 GHz cores per node
I
424 nodes with 32GB
I
Infiniband QDR network
I
The batch system is SLURM
1 / 36
Castor
I
Frontend at castor.epfl.ch
I
16 x 2.6 GHz cores per
node
I
50 nodes with 64GB
I
2 nodes with 256GB
I
For sequential jobs
(Matlab etc.)
I
The batch system is
SLURM
I
RedHat 6.5
2 / 36
Deneb (October 2014)
I
Frontend at deneb.epfl.ch
I
16 x 2.6 GHz cores per node
I
376 nodes with 64GB
I
8 nodes with 256GB
I
2 nodes with 512GB and 32 cores
I
16 nodes with 4 Nvidia K40 GPUs
I
Infiniband QDR network
3 / 36
Storage
/home
I
filesystem has per user quotas
I
will be backed up for important things
(source code, results and theses)
/scratch
I
high performance ”temporary” space
I
is not backed up
I
is organised by laboratory
4 / 36
Connection
Start the X server (automatic on a Mac)
Open a terminal
ssh -Y [email protected]
Try the following commands:
I
id
I
pwd
I
quota
I
ls /scratch/<group>/<username>
5 / 36
The batch system
Goal: to take a list of jobs and execute them when
appropriate resources become available
SCITAS uses SLURM on its clusters:
http://slurm.schedmd.com
The configuration depends on the purpose of the cluster
(serial vs parallel)
6 / 36
sbatch
The fundamental command is sbatch
sbatch submits jobs to the batch system
Suggested workflow:
I
create a short job-script
I
submit it to the batch system
7 / 36
sbatch - exercise
Copy the first two examples to your home directory
cp /scratch/examples/ex1.run .
cp /scratch/examples/ex2.run .
Open the file ex1.run with your editor of choice
8 / 36
ex1.run
#!/bin/bash
#SBATCH
#SBATCH
#SBATCH
#SBATCH
#SBATCH
--workdir /scratch/<group>/<username>
--nodes 1
--ntasks 1
--cpus-per-task 1
--mem 1024
sleep 10
echo "hello from $(hostname)"
sleep 10
9 / 36
ex1.run
#SBATCH is a directive to the batch system
--nodes 1
the number of nodes to use - on Castor this is limited to 1
--ntasks 1
the number of tasks (in an MPI sense) to run per job
--cpu-per-task 1
the number of cores per aforementioned task
--mem 4096
the memory required per node in MB
--time 12:00:00
--time 2-6
the time required
# 12 hours
# two days and six hours
10 / 36
Running ex1.run
The job is assigned a default runtime of 15 minutes
$ sbatch ex1.run
Submitted batch job 439
$ cat /scratch/<group>/<username>/slurm-439.out
hello from c03
11 / 36
What went on?
sacct -j
<JOB_ID>
sacct -l -j
<JOB_ID>
Or more usefully:
Sjob <JOB ID>
12 / 36
Cancelling jobs
To cancel a specific job:
scancel <JOB_ID>
To cancel all your jobs:
scancel -u <username>
13 / 36
ex2.run
#!/bin/bash
#SBATCH
#SBATCH
#SBATCH
#SBATCH
#SBATCH
#SBATCH
--workdir /scratch/<group>/<username>
--nodes 1
--ntasks 1
--cpus-per-task 8
--mem 122880
--time 00:30:00
/scratch/examples/linpack/runme_1_45k
14 / 36
What’s going on?
squeue
squeue -u <username>
Squeue
Sjob <JOB_ID>
scontrol -d show job <JOB_ID>
sinfo
15 / 36
squeue and Squeue
squeue
I
Job states: Pending, Resources, Priority, Running
squeue | grep <JOB_ID>
squeue -j <JOB_ID>
Squeue <JOB_ID>
16 / 36
Sjob
$ Sjob <JOB_ID>
JobID
JobName
Cluster
Account Partition Timelimit
User
Group
------------ ---------- ---------- ---------- ---------- ---------- --------- --------31006
ex1.run
castor scitas-ge
serial
00:15:00
jmenu scitas-ge
31006.batch
batch
castor scitas-ge
Submit
Eligible
Start
End
------------------- ------------------- ------------------- ------------------2014-05-12T15:55:48 2014-05-12T15:55:48 2014-05-12T15:55:48 2014-05-12T15:56:08
2014-05-12T15:55:48 2014-05-12T15:55:48 2014-05-12T15:55:48 2014-05-12T15:56:08
Elapsed ExitCode
State
---------- -------- ---------00:00:20
0:0 COMPLETED
00:00:20
0:0 COMPLETED
NCPUS
NTasks
NodeList
UserCPU SystemCPU
AveCPU MaxVMSize
---------- -------- --------------- ---------- ---------- ---------- ---------1
c04
00:00:00 00:00.001
1
1
c04
00:00:00 00:00.001
00:00:00
207016K
17 / 36
scontrol
$ scontrol -d show job <JOB_ID>
$ scontrol -d show job 400
obId=400 Name=s1.job
UserId=user(123456) GroupId=group(654321)
Priority=111 Account=scitas-ge QOS=normal
JobState=RUNNING Reason=None Dependency=(null)
Requeue=1 Restarts=0 BatchFlag=1 ExitCode=0:0
DerivedExitCode=0:0
RunTime=00:03:39 TimeLimit=00:15:00 TimeMin=N/A
SubmitTime=2014-03-06T09:45:27 EligibleTime=2014-03-06T09:45:27
StartTime=2014-03-06T09:45:27 EndTime=2014-03-06T10:00:27
PreemptTime=None SuspendTime=None SecsPreSuspend=0
Partition=serial AllocNode:Sid=castor:106310
ReqNodeList=(null) ExcNodeList=(null)
NodeList=c03
BatchHost=c03
NumNodes=1 NumCPUs=1 CPUs/Task=1 ReqS:C:T=*:*:*
Nodes=c03 CPU IDs=0 Mem=1024
MinCPUsNode=1 MinMemoryCPU=1024M MinTmpDiskNode=0
Features=(null) Gres=(null) Reservation=(null)
Shared=OK Contiguous=0 Licenses=(null) Network=(null)
Command=/home/<user>/jobs/s1.job
WorkDir=/scratch/<group>/<user>
18 / 36
Modules
Modules make your life easier
I
module avail
I
module show <take your pick>
I
module load <take your pick>
I
module list
I
module purge
I
module list
19 / 36
ex3.run - Mathematica
Copy the following files to your chosen directory:
cp /scratch/examples/ex3.run .
cp /scratch/examples/mathematica.in .
Submit ‘ex3.run’ to the batch system and see what happens...
20 / 36
ex3.run
#!/bin/bash
#SBATCH
#SBATCH
#SBATCH
#SBATCH
#SBATCH
--ntasks 1
--cpus-per-task 1
--nodes 1
--mem 4096
--time 00:05:00
echo STARTING AT ‘date‘
module purge
module load mathematica/9.0.1
math < mathematica.in
echo FINISHED at ‘date‘
21 / 36
Compiling ex4.* source files
Copy the following files to your chosen directory:
/scratch/examples/ex4_README.txt
/scratch/examples/ex4.c
/scratch/examples/ex4.cxx
/scratch/examples/ex4.f90
/scratch/examples/ex4.run
Then compile them with:
module load intelmpi/4.1.3
mpiicc -o ex4 c ex4.c
mpiicpc -o ex4 cxx ex4.cxx
mpiifort -o ex4 f90 ex4.f90
22 / 36
The 3 methods to get interactive access 1/3
In order to schedule an allocation use salloc with exactly the
same options for resources as sbatch
You will then arrive in a new prompt which is still on the
submission node but by using srun you can get access to the
allocated resources
[email protected]:hello > salloc -N 1 -n 2
salloc: Granted job allocation 1234
bash-4.1$ hostname
castor
bash-4.1$ srun hostname
c03
c03
23 / 36
The 3 methods to get interactive access 2/3
To get a prompt on the machine one needs to use the “--pty”
option with “srun” and then “bash -i” (or “tcsh -i”) to get
the shell:
[email protected] > salloc -N 1 -n 1
salloc: Granted job allocation 1235
[email protected] > srun --pty bash -i
bash-4.1$ hostname
c03
24 / 36
The 3 methods to get interactive access 3/3
This is the least elegant but it is the method by which one can run
X11 applications:
[email protected] > salloc -n 1 -c 16 -N 1
salloc: Granted job allocation 1236
bash-4.1$ srun hostname
c04
bash-4.1$ ssh -Y c04
[email protected] >
25 / 36
Dynamic libs used in an application
“ldd” displays the libraries an executable file depends on:
/COURS > ldd ex4 f90
[email protected]:~
linux-vdso.so.1 => (0x00007fff4b905000)
libmpigf.so.4 =>
/opt/software/intel/14.0.1/intel64/lib/libmpigf.so.4
(0x00007f556cf88000)
libmpi.so.4 => /opt/software/intel/14.0.1/intel64/lib/libmpi.so.4
(0x00007f556c91c000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003807e00000)
librt.so.1 => /lib64/librt.so.1 (0x0000003808a00000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003808200000)
libm.so.6 => /lib64/libm.so.6 (0x0000003807600000)
libc.so.6 => /lib64/libc.so.6 (0x0000003807a00000)
libgcc s.so.1 => /lib64/libgcc s.so.1 (0x000000380c600000)
/lib64/ld-linux-x86-64.so.2 (0x0000003807200000)
26 / 36
ex4.run
#!/bin/bash
...
module purge
module load intelmpi/4.1.3
module list echo
LAUNCH DIR=/scratch/scitas-ge/jmenu
EXECUTABLE="./ex4 f90"
echo "--> LAUNCH DIR = ${LAUNCH DIR}"
echo "--> EXECUTABLE = ${EXECUTABLE}" echo
echo "--> ${EXECUTABLE} depends on the following dynamic
libraries:"
ldd ${EXECUTABLE}
echo
cd ${LAUNCH DIR}
srun ${EXECUTABLE}
...
27 / 36
The debug QoS
In order to have priority access for debugging
sbatch --qos debug ex1.run
Limits on Castor:
I
30 minutes walltime
I
1 job per user
I
16 cores between all users
To display the available QoS’s:
sacctmgr show qos
28 / 36
cgroups (Castor)
General:
I
cgroups (“control groups”) is a Linux kernel feature to limit,
account, and isolate resource usage (CPU, memory, disk I/O,
etc.) of process groups
SLURM:
I
Linux cgroups apply contraints to the CPUs and memory that
can be used by a job
I
They are automatically generated using the resource requests
given to SLURM
I
They are automatically destroyed at the end of the job, thus
releasing all resources used
Even if there is physical memory available a task will be killed if
it tries to exceed the limits of the cgroup!
29 / 36
System process view
Two tasks running on the same node with “ps auxf”
root
user
user
177873
177877
177908
slurmstepd: [1072]
\_ /bin/bash /var/spool/slurmd/job01072/slurm_script
\_ sleep 10
root
user
user
177890
177894
177970
slurmstepd: [1073]
\_ /bin/bash /var/spool/slurmd/job01073/slurm_script
\_ sleep 10
Check memory, thread and core usage with “htop”
30 / 36
Fair share 1/3
The scheduler is configured to give all groups a share of the
computing power
Within each group the members have an equal share by default:
[email protected]:~ > sacctmgr show association where account=lacal
format=Account,Cluster,User,GrpNodes, QOS,DefaultQOS,Share tree
Account
Cluster
User GrpNodes
QOS
Def QOS
Share
-------------------- ---------- ---------- -------- -------------------- --------- --------lacal
castor
normal
1
lacal
castor
aabecker
debug,normal
normal
1
lacal
castor
kleinjun
debug,normal
normal
1
lacal
castor
knikitin
debug,normal
normal
1
Priority is based on recent usage
I
this is forgotten about with time (half life)
I
fair share comes into play when the resources are heavily used
31 / 36
Fair share 2/3
Job priority is a weighted sum of various factors:
[email protected]:~ > sprio -w
JOBID
Weights
PRIORITY
AGE
1000
[email protected]:~ > sprio -w
JOBID
PRIORITY
Weights
FAIRSHARE
10000
AGE
1000
QOS
100000
FAIRSHARE
10000
JOBSIZE
100
To compare jobs’ priorities:
[email protected]:~ > sprio -j80833,77613
JOBID
PRIORITY
AGE FAIRSHARE
77613
145
146
0
80833
9204
93
9111
32 / 36
QOS
0
0
QOS
100000
Fair share 3/3
FairShare values range from 0.0 to 1.0:
Value
Meaning
≈ 0.0
you used much more resources that you were granted
0.5
≈ 1.0
you got what you paid for
you used nearly no resources
[email protected]:~ > sshare -a -A lacal
Accounts requested:
: lacal
Account
User Raw Shares Norm Shares
Raw Usage Effectv Usage FairShare
-------------------- ---------- ---------- ----------- ----------- ------------- ---------lacal
666
0.097869 1357691548
0.256328
0.162771
lacal
boissaye
1
0.016312
0
0.042721
0.162771
lacal
janson
1
0.016312
0
0.042721
0.162771
lacal
jetchev
1
0.016312
0
0.042721
0.162771
lacal
kleinjun
1
0.016312 1357691548
0.256328
0.000019
lacal
pbottine
1
0.016312
0
0.042721
0.162771
lacal
saltini
1
0.016312
0
0.042721
0.162771
More information at:
http://schedmd.com/slurmdocs/priority_multifactor.html
33 / 36
Helping yourself
man pages are your friend!
I
man sbatch
I
man sacct
I
man gcc
module load intel/14.0.1
I
man ifort
34 / 36
Getting help
If you still have problems then send a message to:
[email protected]
Please start the subject with HPC
for automatic routing to the HPC team
Please give as much information as possible including:
I
the jobid
I
the directory location and name of the submission script
I
where the “slurm-*.out” file is to be found
I
how the “sbatch” command was used to submit it
I
the output from “env” and “module list” commands
35 / 36
Appendix
Change your shell at:
https://dinfo.epfl.ch/cgi-bin/accountprefs
Scitas web site:
http://scitas.epfl.ch
36 / 36

Similar documents

×

Report this document