Galahad, Dirac SAFE and IRIS - job submission¶
Job submission Galahad¶
To get access to Galahad, please contact Anthony Holloway (email: anthony.holloway[at]manchester.ac.uk). A short introduction to Galahad will be provided (eg. ‘home’ and ‘working’ folders, accessing and loading available modules etc.)
To submit a job on Galahad:
[<your-user>@galahad ~]$ cat slrascil1.sh
#!/bin/bash
#SBATCH --ntasks 1
#SBATCH --time 5:0
#SBATCH --output=test_%j.log
pwd; hostname; date
module load python37base gcc920
CMD="singularity exec /home/<your-user>/RASCIL-full1.img python3 /rascil/examples/scripts/imaging.py"
eval $CMD
- Submit the job using the command:
[<your-user>@galahad ~]$ sbatch slrascil1.sh
Submitted batch job 3404
- Check the submitted job:
[<your-user>@galahad ~]$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
3404 CLUSTER slrascil <your-user>R 0:18 1 compute-0-7
Job submission Dirac SAFE¶
To get access to Dirac SAFE, please follow the documentation under Signing up for Dirac/HPC Resources . A short introduction to Dirac SAFE will be sent to the user by email once the account is approved. More details can be found on Cambridge CSD3 cheat sheet and Slurm Cheat Sheet
To submit a job on Dirac SAFE (skylake):
[<your-user>@login-e-13 ~]$ cat slrascil1.sh
#!/bin/bash
#SBATCH -A DIRAC-TP001-CPU
#SBATCH -p skylake
#SBATCH --ntasks 1
#SBATCH --time 5:0
#SBATCH --output=test_%j.log
pwd; hostname; date
CMD="singularity exec /home/<your-user>/RASCIL-full1.img python3 /rascil/examples/scripts/imaging.py"
eval $CMD
- Submit the job using the command:
[<your-user>@login-e-13 ~]$ sbatch slrascil1.sh
Submitted batch job 52726369
- Check the submitted job:
[<your-user>@login-e-13 ~]$ squeue | grep <your-user>
52726369 skylake slrascil <your-user> R 0:04 1 cpu-e-820
- Check the results:
[<your-user>@login-e-13 ~]$ ls
imaging_dirty.fits imaging_restored.fits
imaging_psf.fits
- Check the logfile:
[<your-user>@login-e-13 ~]$ cat test_52726369.log
Job submission IRIS¶
To get access to IRIS and submit jobs, please follow the documentation under DIRAC install and basic usage that gives details how to get a certificate, be approved by a VO and install DIRAC in order to be able to submit jobs to IRIS - jdl and py forms.
From the server where dirac is installed:
start proxy before using any dms commands
bash-4.2$ /raid/scratch/<your-user>/dirac_ui > source bashrc bash-4.2$ /raid/scratch/<your-user>/dirac_ui > dirac-proxy-init -x -N
Add the RASCIL container to the filecathalog using command “dirac-dms-add-file”
dirac-dms-add-file LFN:/skatelescope.eu/user/c/<your-user>/rascil/RASCIL-full1.img RASCIL-full1.img UKI-NORTHGRID-MAN-HEP-disk
check where the file has been uploaded using command “dirac-dms-filecatalog-cli”
Job submission - submit .jdl¶
create .jdl and .sh files
cat simpleR1.jdl JobName = "InputAndOuputSandbox"; Executable = "testR1.sh"; StdOutput = "StdOut"; StdError = "StdErr"; InputSandbox = {"testR1.sh"}; InputData = {"LFN:/skatelescope.eu/user/c/<your-user>/rascil/RASCIL-full1.img"}; OutputSandbox = {"StdOut","StdErr"}; OutputData={"imaging_dirty.fits","imaging_psf.fits","imaging_restored.fits"}; OutputSE ="UKI-NORTHGRID-MAN-HEP-disk"; Site = "LCG.UKI-NORTHGRID-MAN-HEP.uk"; cat testR1.sh #!/bin/bash singularity exec --cleanenv -H $PWD:/srv --pwd /srv -C RASCIL-full1.img python3 /rascil/examples/scripts/imaging.py;Submit the job
bash-4.2$ dirac-wms-job-submit simpleR1.jdl JobID = 25260750 bash-4.2$ dirac-wms-job-status 25260750 JobID=25260750 Status=Running; MinorStatus=Input Data Resolution; Site=LCG.UKINORTHGRID-MAN-HEP.uk; bash-4.2$ dirac-wms-job-status 25260750 JobID=25260750 Status=Done; MinorStatus=Execution Complete; Site=LCG.UKINORTHGRID-MAN-HEP.uk;
Get output data and output file
bash-4.2$ dirac-wms-job-get-output-data 25336768 Job 25336768 output data retrieved bash-4.2$ ls -rw-r--r--. 1 <your-user> users6 2102400 May 14 17:32 imaging_dirty.fits -rw-r--r--. 1 <your-user> users6 2102400 May 14 17:32 imaging_psf.fits -rw-r--r--. 1 <your-user> users6 2102400 May 14 17:32 imaging_restored.fits bash-4.2$ dirac-wms-job-get-output 25336768 Job output sandbox retrieved in /raid/scratch/<your-user>/dirac_ui/tests/rascilTests/ 25336768/ bash-4.2$ cd 25336768 bash-4.2$ ls StdErr StdOut bash-4.2$ cat StdErr INFO: Convert SIF file to sandbox... INFO: Cleaning up image...
Job submission - submit .py¶
Set up environment variables:
#SET THE PATH PYTHON 2.7 INTO $PATH #PATH to python 2.7 added eg bash-4.2$ export PATH=/usr/local/casa/bin/python:$PATH
the job to be submitted and the .sh script
bash-4.2$ cat jobpy.py import os import sys import time # setup DIRAC from DIRAC.Core.Base import Script Script.parseCommandLine(ignoreErrors=False) from DIRAC.Interfaces.API.Job import Job from DIRAC.Interfaces.API.Dirac import Dirac from DIRAC.Core.Security.ProxyInfo import getProxyInfo SitesList = ['LCG.UKI-NORTHGRID-MAN-HEP.uk'] SEList = ['UKI-NORTHGRID-MAN-HEP-disk'] dirac = Dirac() j = Job(stdout='StdOut', stderr='StdErr') j.setName('TestJob') j.setInputSandbox(["testR1py.sh"]) j.setInputData(['LFN:/skatelescope.eu/user/c/<your-user>/rascil/RASCIL-full1.img']) j.setOutputSandbox(['StdOut','StdErr']) j.setOutputData(['imaging_dirty.fits','imaging_psf.fits','imaging_restored.fits'], outputSE='UKI-NORTHGRID-MAN-HEP-disk') j.setExecutable('testR1py.sh') jobID = dirac.submitJob(j) print 'Submission Result: ', jobID bash-4.2$ cat testR1py.sh #!/bin/bash singularity exec --cleanenv -H $PWD:/srv --pwd /srv -C RASCIL-full1.img python3 /rascil/examples/scripts/imaging.pySubmitting the job
bash-4.2$ python jobpy.py Submission Result: {'requireProxyUpload': False, 'OK': True, 'rpcStub': (('WorkloadManagement/JobManag er', {'delegatedDN': None, 'timeout': 600, 'skipCACheck': False, 'keepAliveLapse': 150, 'delegatedGroup ': None}), 'submitJob', ('[ \n Origin = DIRAC;\n Executable = "$DIRACROOT/scripts/dirac-jobexec"; \n StdError = StdErr;\n LogLevel = info;\n OutputSE = UKI-NORTHGRIDMAN- HEP-disk;\n InputSa ndbox = \n {\n "testR1py.sh",\n "SB:GridPPSandboxSE|/SandBox/i/iulia.c.cim pan.skatelescope.eu_user/cf8/ca6/cf8ca689995e24c01c068eb6f34126b8.tar.bz2"\n };\n JobName = T estJob;\n Priority = 1;\n Arguments = "jobDescription.xml -o LogLevel=info";\n JobGroup = skat elescope.eu;\n OutputSandbox = \n {\n StdOut,\n StdErr,\n Sc ript1_testR1py.sh.log\n };\n StdOutput = StdOut;\n InputData = LFN:/skatelescope.eu/user/c /<your-user>/rascil/RASCIL-full1.img;\n JobType = User;\n OutputData = \n {\n imagin g_dirty.fits,\n imaging_psf.fits,\n imaging_restored.fits\n };\n]',)), 'Va lue': 25344748, 'JobID': 25344748}Get the results
bash-4.2$ dirac-wms-job-get-output 25344748 Job output sandbox retrieved in /raid/scratch/<your-user>/dirac_ui/tests/rascilTests/25344748/ bash-4.2$ cd 25344748 bash-4.2$ ls Script1_testR1py.sh.log StdOut bash-4.2$ dirac-wms-job-get-output-data 25344748 Job 25344748 output data retrieved bash-4.2$ ls imaging_dirty.fits imaging_psf.fits imaging_restored.fits Script1_testR1py.sh.log StdOut
Appendix¶
You run vncserver on galahad (already installed). On your windows PC use:
https://www.tightvnc.com/download-old.php as your vnc viewer.
When you run vncserver for the first time you will set up a password.
It will report it has created a virtual display galahad.ast.man.ac.uk:X
The X will be a number. You then use that address in your vnc viewer
[<your-user>@galahad ~]$ vncserver
[<your-user>@galahad ~]$ vncserver -kill :3
Killing Xvnc process ID 35841
With vnc I would suggest editing the default .vnc/xstartup file (created after you run vncserver for the first time) to change the last line to run /usr/bin/icewm as the window manager rather than xinitrc. You should then kill off your first vncserver and run it again to pick up the change. This avoids a bug where sometimes the VNC just displays a black screen.
[<your-user>@galahad ~]$ cat .vnc/xstartup
#!/bin/shunset SESSION_MANAGER
unset DBUS_SESSION_BUS_ADDRESS
#exec /etc/X11/xinit/xinitrc
/usr/bin/icewm
[<your-user>@galahad ~]$ vncserver #restarting the server
How to find the host for the for the diagnostics page? It would be whichever host has started it, so use squeue to see what host is running your job and then it would be for example http://compute-0-5:8787
[<your-user>@galahad ~]$ squeue