Setting up Simfactory for a New Machine
Contents
Setting up Simfactory for a New Machine
In order to setup a new machine for Simfactory, you will need to create 4 files inside the simfactory repo:
- mdb/machines/YourMachine.ini
- mdb/runscripts/YourMachine.run
- mdb/submitscripts/YourMachine.sub
- mdb/optionlists/YourMachine.cfg
In all cases, "YourMachine" should be replaced with whatever name you would like to supply for your machine.
Creating YourMachine.ini
Name and Alias Pattern
Generally speaking, the name in square brackets at the top and the nickname should match. By convention, they are all in lower case. The name usually matches the nickname, except that name may have capital letters.
The aliaspattern should provide a Python regular expression that identifies your machine by it's hostname. The hostname will be found by SimFactory by calling the /usr/bin/hostname command. Note that if you put a name in ~/.hostname, it will use that instead of the output of /usr/bin/hostname.
envsetup
The envsetup provides a place for you to ensure the user has the correct environment for building on yourmachine. Depending on yourmachine's configuration, that may involve invoking "module load" commands or explicitly setting variables. RH: Does this require bash syntax? What if someone has csh?
Because the envsetup is likely to require more than one line, our example uses the multiline syntax (which works the same way as the shell's multiline syntax). Use <<EOT to start, and EOT on a line by itself to end. Note that instead of EOT, you may use any string of your choosing.
disabled-thorns and enabled-thorns
Depending on the way your machine is set up, you may need to enable or disable thorns, e.g. SystemTopology. On some machines, slurm is misconfigured so that, even though the scheduler is supposed to give you the entire node, the taskset passed through Slurm will confine you to just a few CPUs. When nodes are not shared, SystemTopology is helpful and it would be useful to put it in enabled-thorns.
On the other hand, on machines which are correctly configured and where nodes may be shared by more than one user, SystemTopology is likely to hurt performance and should be in the disabled-thorns list.
[yourmachine] # This machine description file is used internally by simfactory as a template # during the sim setup and sim setup-silent commands # Edit at your own risk # Machine description nickname = yourmachine name = YourMachine location = YourLocation description = SomeLongerDescription status = production webpage = https://SomeWebPage # How to access this machine via ssh hostname = slurmjupyter.cct.lsu.edu aliaspattern = PythonRegularExpression envsetup = <<EOT module load icc export LD_LIBRARY_PATH=/usr/local/lib64 EOT disabled-thorns = <<EOT OpenBLAS EOT enabled-thorns = <<EOT SystemTopology EOT optionlist = yourmachine.cfg submitscript = yourmachine.sub runscript = yourmachine.run makejobs = 16 make = make -j@MAKEJOBS@ # Simulation management basedir = /global/cscratch1/sd/@USER@/simulations cpu = Two 2.3 GHz 16-core Haswell processors per node cpufreq = 2.3 flop/cycle = 8 max-num-smt = 2 num-smt = 1 ppn = 32 spn = 2 max-num-threads = 64 num-threads = 16 memory = 131072 nodes = 1630 min-ppn = 32 allocation = NO_ALLOCATION queue = regular maxwalltime = 48:00:00 submit = sbatch @SCRIPTFILE@ getstatus = squeue -j @JOB_ID@ stop = scancel @JOB_ID@ submitpattern = 'Submitted batch job (\d+)' statuspattern = '@JOB_ID@ ' queuedpattern = ' PD ' runningpattern = ' (CF|CG|R|TO) ' holdingpattern = '\(JobHeldUser\)' scratchbasedir = /scratch2/scratchdirs/@USER@ exechost = hostname exechostpattern = (.*) stdout = cat @SIMULATION_NAME@.out stderr = cat @SIMULATION_NAME@.err stdout-follow = tail -n 100 -f @SIMULATION_NAME@.out @SIMULATION_NAME@.err