PyTaskFarmer is a program that runs a list of tasks (list of commands) in parallel, keeping track of which ones have finished. It can be used to parallelize MCC software by defining a task as running Marlin on a single input file. The MCCTaskFarmer package also provides advanced tools via the TaskList handler API for defining a task as an event range in a single file.
The pytaskfarmer and mcctaskfarmer packages should be installed in a Python Virtual Environemnt outside of the MCC image. This allows to use a more recent versions of Python. The pytaskfarmer program will start an image for you automatically.
python -m venv pymcc
source pymcc/bin/activate
pip install pytaskfarmer
pip install git+https://gitlab.cern.ch/berkeleylab/MuonCollider/mcctaskfarmer.git
Do not forget to load the venv at the start of every session.
source pymcc/bin/activate
PyTaskFarmer automatically starts a new shifter container for every task being executed. The available images can be configured through Runner configurations.
Add the following entry to ${HOME}/.pytaskfarmer/runners.d/shifter.ini to create a new runner called mcc that will use the official Docker image and setup the software environment.
[mcc]
Runner = taskfarmer.runners.ShifterRunner
image = docker:infnpd/mucoll-ilc-framework:1.6-centos8
setup = source /opt/ilcsoft/muonc/init_ilcsoft.sh
volumes = /global/cscratch1/sd/kkrizka/MuonCollider/casarsa-data:/data
If you have custom packages that need to be loaded (ie: TrackPerfWorkspace, then use the workspace setup script for setup. In this case, the runner is called mcc_tperf.
[mcc_tperf]
Runner = taskfarmer.runners.ShifterRunner
image = docker:infnpd/mucoll-ilc-framework:1.6-centos8
setup = source ${HOME}/MuonCollider/TrackPerfWorkspace/setup.sh ${CFS}/atlas/${USER}/MCC/build/TrackPerfWorkspace-1.6
volumes = /global/cscratch1/sd/kkrizka/MuonCollider/casarsa-data:/data
The mcctrackfarmer package contains a Tasklist handler called MarlinTaskList. It takes a list of SLCIO files as tasklist input and splits it into multiple tasks based on desired number of events per task.
Add the following to a tasks.ini (or any other name ending in .ini) located in your run directory. This will define a tasklist handler called trackperf that runs the example tracking performance steering file over the supplied sample.
[trackperf]
TaskList = mcctaskfarmer.MarlinTaskList
steering = /global/u2/k/kkrizka/MuonCollider/TrackPerfWorkspace/example/actsseedckf_steer.xml
maxEventsPerJob=10
MyAIDAProcessor.FileName={workdir}/{SAMPLE}
The input task list is a list of SLCIO files. For example, muonGun_sim_MuColl_v1_BIB.filelist with:
/global/cfs/cdirs/atlas/kkrizka/MCC/samples/muonGun_sim_MuColl_v1_BIB/data_bibtest/muonGun_sim_MuColl_v1_72.slcio
/global/cfs/cdirs/atlas/kkrizka/MCC/samples/muonGun_sim_MuColl_v1_BIB/data_bibtest/muonGun_sim_MuColl_v1_73.slcio
/global/cfs/cdirs/atlas/kkrizka/MCC/samples/muonGun_sim_MuColl_v1_BIB/data_bibtest/muonGun_sim_MuColl_v1_74.slcio
/global/cfs/cdirs/atlas/kkrizka/MCC/samples/muonGun_sim_MuColl_v1_BIB/data_bibtest/muonGun_sim_MuColl_v1_75.slcio
/global/cfs/cdirs/atlas/kkrizka/MCC/samples/muonGun_sim_MuColl_v1_BIB/data_bibtest/muonGun_sim_MuColl_v1_76.slcio
/global/cfs/cdirs/atlas/kkrizka/MCC/samples/muonGun_sim_MuColl_v1_BIB/data_bibtest/muonGun_sim_MuColl_v1_77.slcio
...
To run a steering file (configured as a Tasklist Handler) over them, run the following. It will run 10 tasks in parallel and store the output inside test_me:
pytaskfarmer.py -n 10 -r mcc_tperf -t trackperf -w test_me muonGun_sim_MuColl_v1.filelist
Do not run long tasks that take many cores on login nodes! Instead submit big jobs to the processing queue. See NERSC documentation for details. Two common examples, along with very important note, follow.
Important Note
You are reserved (and charged) for a whole node. This means 32 cores. You have to fill it as efficiently as possible. Do not run just a single process on a single node. Ideally you would use -n32 or -n64 (each core supports two threads) with PyTaskFarmer. There is some manual optimization necessary for the right number of tasks per node, depending on:
To run short (<4h) jobs "immediately":
salloc --signal=USR1@60 -N 1 -C haswell -q interactive -t 04:00:00 -- pytaskfarmer.py -n 32 -r mcc_tperf -t trackperf -w test_me muonGun_sim_MuColl_v1.filelist
To run long (<48h) jobs with a "day" latency:
sbatch --signal=USR1@60 -N 1 -C haswell -q regular -t 48:00:00 -- pytaskfarmer.py -n 32 -r mcc_tperf -t trackperf -w test_me muonGun_sim_MuColl_v1.filelist