Setting up Simfactory for a New Machine

From Einstein Toolkit Documentation
Revision as of 10:11, 20 June 2023 by Sbrandt (talk | contribs)
Jump to: navigation, search

Setting up Simfactory for a New Machine

In order to setup a new machine for Simfactory, you will need to create 4 files inside the simfactory repo:

  • mdb/machines/YourMachine.ini
  • mdb/runscripts/YourMachine.run
  • mdb/submitscripts/YourMachine.sub
  • mdb/optionlists/YourMachine.cfg

In all cases, "YourMachine" should be replaced with whatever name you would like to supply for your machine.

Creating YourMachine.ini

Name and Alias Pattern

Generally speaking, the name in square brackets at the top and the nickname should match. By convention, they are all in lower case. The name usually matches the nickname, except that name may have capital letters.

The aliaspattern should provide a Python regular expression that identifies your machine by it's hostname. The hostname will be found by SimFactory by calling the /usr/bin/hostname command. Note that if you put a name in ~/.hostname, it will use that instead of the output of /usr/bin/hostname.

envsetup

The envsetup provides a place for you to ensure the user has the correct environment for building on yourmachine. Depending on yourmachine's configuration, that may involve invoking "module load" commands or explicitly setting variables. RH: Does this require bash syntax? What if someone has csh?

Because the envsetup is likely to require more than one line, our example uses the multiline syntax (which works the same way as the shell's multiline syntax). Use <<EOT to start, and EOT on a line by itself to end. Note that instead of EOT, you may use any string of your choosing.

disabled-thorns and enabled-thorns

Depending on the way your machine is set up, you may need to enable or disable thorns, e.g. SystemTopology. On some machines, slurm is misconfigured so that, even though the scheduler is supposed to give you the entire node, the taskset passed through Slurm will confine you to just a few CPUs. When nodes are not shared, SystemTopology is helpful and it would be useful to put it in enabled-thorns.

On the other hand, on machines which are correctly configured and where nodes may be shared by more than one user, SystemTopology is likely to hurt performance and should be in the disabled-thorns list.

 [yourmachine]
 # This machine description file is used internally by simfactory as a template
 # during the sim setup and sim setup-silent commands
 # Edit at your own risk
 # Machine description
 nickname        = yourmachine
 name            = YourMachine
 location        = YourLocation
 description     = SomeLongerDescription
 status          = production
 webpage         = https://SomeWebPage
 # How to access this machine via ssh
 hostname        = slurmjupyter.cct.lsu.edu
 aliaspattern    = PythonRegularExpression
 envsetup = <<EOT
   module load icc
   export LD_LIBRARY_PATH=/usr/local/lib64
 EOT
 disabled-thorns = <<EOT
     OpenBLAS
 EOT
 enabled-thorns = <<EOT
    SystemTopology
 EOT
 optionlist      = yourmachine.cfg
 submitscript    = yourmachine.sub
 runscript       = yourmachine.run
 makejobs        = 16
 make            = make -j@MAKEJOBS@
 # Simulation management
 basedir         = /global/cscratch1/sd/@USER@/simulations
 cpu             = Two 2.3 GHz 16-core Haswell processors per node
 cpufreq         = 2.3
 flop/cycle      = 8
 max-num-smt     = 2
 num-smt         = 1
 ppn             = 32
 spn             = 2
 max-num-threads = 64
 num-threads     = 16
 memory          = 131072
 nodes           = 1630
 min-ppn         = 32
 allocation      = NO_ALLOCATION
 queue           = regular
 maxwalltime     = 48:00:00
 submit          = sbatch @SCRIPTFILE@
 getstatus       = squeue -j @JOB_ID@
 stop            = scancel @JOB_ID@
 submitpattern   = 'Submitted batch job (\d+)'
 statuspattern   = '@JOB_ID@ '
 queuedpattern   = ' PD '
 runningpattern  = ' (CF|CG|R|TO) '
 holdingpattern  = '\(JobHeldUser\)'
 scratchbasedir   = /scratch2/scratchdirs/@USER@
 exechost        = hostname
 exechostpattern = (.*)
 stdout          = cat @SIMULATION_NAME@.out
 stderr          = cat @SIMULATION_NAME@.err
 stdout-follow   = tail -n 100 -f @SIMULATION_NAME@.out @SIMULATION_NAME@.err