Configuring a new machine
Configuring a new machine
If your machine is not supported by SimFactory already, you will need to write your own option list, run script and (for a cluster) submit script.
When using SimFactory on a cluster, it needs to know a lot of information about the details of the cluster, provided in a "machine definition file" in simfactory/mdb/machines. For example, it needs to know the number of cores on each node. Copy one of the provided files, and adapt it to your machine. Getting this right is nontrivial.
When using SimFactory on a laptop or workstation (i.e. a machine for which SimFactory has no machine definition file), the "sim setup" command will write a suitable machine definition file automatically.
The following is based in stampede.ini:
gives the name of the machine that can be used with simfactory's
--machine option. It must be unique among all machine definition files.
nickname = stampede name = Stampede location = TACC description = A very large Linux cluster at TACC webpage = http://www.tacc.utexas.edu/user-services/user-guides/stampede-user-guide status = production
describe the machine they are used by simfactory when reporting in the machine but are arbitrary otherwise.
hostname = stampede.tacc.utexas.edu rsynccmd = /home1/00507/eschnett/rsync-3.0.9/bin/rsync envsetup = <<EOT module load intel/15.0.2 module load mvapich2/2.1 module -q load hdf5 module load fftw3 module load gsl module load boost module load papi EOT aliaspattern = ^login(\.stampede\.tacc\.utexas\.edu)?$
hostname is the name of the login node used by simfactory's
envsetup is executed before each simfactory command (in particular during build and when running the simulation) to ensure a consistent set of libraries are loaded and finally
aliaspattern is a regular expression used by simfactory to identify the machine. It must match all cluster login nodes.
sourcebasedir = /work/00507/@USER@ disabled-thorns = <<EOT ExternalLibraries/BLAS ExternalLibraries/CGNS ExternalLibraries/curl LSUThorns/Twitter ExternalLibraries/flickcurl LSUThorns/Flickr ExternalLibraries/LAPACK ExternalLibraries/libxml2 ExternalLibraries/Nirvana CarpetDev/CarpetIONirvana CarpetExtra/Nirvana ExternalLibraries/OpenSSL EOT enabled-thorns = <<EOT:b ExternalLibraries/OpenCL CactusExamples/HelloWorldOpenCL CactusExamples/WaveToyOpenCL CactusUtils/OpenCLRunTime CactusUtils/Accelerator McLachlan/ML_BSSN_CL McLachlan/ML_BSSN_CL_Helper McLachlan/ML_WaveToy_CL ExternalLibraries/OpenBLAS ExternalLibraries/pciutils ExternalLibraries/PETSc CactusElliptic/EllPETSc CactusElliptic/TATelliptic CactusElliptic/TATPETSc EOT
sourcebasedir is the root directory underneath which all Cactus
trees are located, it should be large enough to hold multiple compiled Cactus
checkouts. Some clusters do not provide all libraries to run all thorns in
the Einstein Toolkit or they require alternative libraries (eg OpenBLAS
instead of LAPACK),
let you choose which thorns to enable/disable in thornlists.
enabled-thorns will remove a
#DISABLED from lines in the
disabled-thorns will add it.
optionlist = stampede-mvapich2.cfg submitscript = stampede.sub runscript = stampede-mvapich2.run make = make -j8
used when compiling and submitting simulations and are described in detail
below. You can find examples in the subdirectories of simfactory/mdb.
make is the command used to compile the code, it can contain
extra arguments to enable for example parallel compilation.
The final set of options deals with the queuing system and characteristics of the machine.
basedir = /scratch/00507/@USER@/simulations cpu = Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz cpufreq = 2.7 flop/cycle = 8 ppn = 16 spn = 2 mpn = 2 max-num-threads = 16 num-threads = 8 memory = 32768 nodes = 6400 min-ppn = 16 allocation = NO_ALLOCATION queue = normal # [normal, large, development] maxwalltime = 48:00:00 # development has 4:0:0 maxqueueslots = 25 # there are 50, but jobs can take 2 slots
basedir is the root directory under which all simulations are
created, this should live on a fast, parallel file system.
are currently unused by simfactoqy.
spn is the number of CPU
sockets per node,
mpn is the number of NUMA domains per node
ppn is the nubmer of cores per
node (historically called
processors hence the
is passed by simfactory to the queuing system to request a certain number of
cores per node.
max-num-threads is the maximum number of threads
that can be used, typically the same as
num-threads is the default number of threads used, often the number
of cores in a single NUMA domain.
min-ppn is the
minimum number of cores that need to be requested, often this is identical to
ppn if the queuing system does not hanlde under-subscribing a node.
memory is currently only used by simfactory's
utility script and
only used to to abort if more nodes than are in the cluster are requested.
allocation gives the allocation to which to charge runs on clusters
where computer time is accounted for (which is almost all clusters at
queue is the default queue to submit to, often
named "default", "batch", "production" or similar.
the maximum allowed run time for a single job, if a long running simulation is
requested simfactory automatically splits it up into chunks of length no
maxqueueslots is the maximum
number of jobs that can be queued at the same time, a limit imposed on some
clusters to reduce the load on the queue scheduler.
Please see SimFactory's online documentation for the exact definition of the terms that SimFactory uses to refer to cores, nodes, CPUs, processing units etc.
submit = sbatch @SCRIPTFILE@; sleep 5 # sleep 60 getstatus = squeue -j @JOB_ID@ stop = scancel @JOB_ID@ submitpattern = Submitted batch job ([0-9]+) statuspattern = '@JOB_ID@ ' queuedpattern = ' PD ' runningpattern = ' (CF|CG|R|TO) ' holdingpattern = ' S ' #exechost = head -n 1 SIMFACTORY/NODES #exechostpattern = ^(\S+) stdout = cat @SIMULATION_NAME@.out stderr = cat @SIMULATION_NAME@.err stdout-follow = tail -n 100 -f @SIMULATION_NAME@.out @SIMULATION_NAME@.err
stop are used by
simfactory as the commands to submit a new job to the queuing system query the
status of a running job and cancel a running job. You can use
@JOB_ID@ to refer to the job's identifier in them. The respective
pattern variables are regular exttssions simfactory matches
against the ouput of the commands.
submitpattern must capture
(enclose in parenthesis) the actual job id so that it can be referred to as
the first captured group (
$1 in sed).
is used to select the line in
getstatus's output that contains
the actual job state information. The
holdingpattern pattens are used
to identify job states, whichever matches first (in the order listed above)
detemines the job state.
stdout-follow are commands that simfactory executes during
sim show-output to obtain the simulations log output,
stdout-follow is used when the
--follow option is
specified to show output of a running segment. On some clusters the log
output for running segments is not directly accessible from the head nodes and
simfactory has to first log into one of the cluster nodes to gain access to
exechostpattern give the
command and regulra exttssion used to obtain the host to log into. If they are
stdout-follow is executed on that host, otherwise on
the head node.
The options provided by Cactus are described in the Cactus documentation. This page provides additional information and recommendations.
The following is based on the ubuntu.cfg optionlist which can be found in simfactory/mdb/optionslists .
VERSION = 2012-09-28
Cactus will reconfigure when the VERSION string changes.
CPP = cpp FPP = cpp CC = gcc CXX = g++ F77 = gfortran F90 = gfortran
The C and Fortran preprocessors, and the C, C++, Fortran 77 and Fortran 90 compilers, are specified by these options. You can specify a full path if the compiler you want to use is not available on your default path. Note that it is strongly recommended to use compilers from the same family; e.g. don't mix the Intel C Compiler with the GNU Fortran Compiler.
CPPFLAGS = -DMPICH_IGNORE_CXX_SEEK FPPFLAGS = -traditional CFLAGS = -g3 -march=native -std=gnu99 CXXFLAGS = -g3 -march=native -std=gnu++0x F77FLAGS = -g3 -march=native -fcray-pointer -m128bit-long-double -ffixed-line-length-none F90FLAGS = -g3 -march=native -fcray-pointer -m128bit-long-double -ffixed-line-length-none LDFLAGS = -rdynamic
Cactus thorns can be written in C or C++. Cactus supports the C99 and C++11 standards respectively. Additionally, the Einstein Toolkit requires the GNU extensions provided by the options gnu99 / gnu++11. If these extensions are not available, some Einstein Toolkit thorns will not compile.
-g3 ensures that debugging symbols are included in the object files. It is not necessary to set DEBUG = yes to get debugging symbols.
The rdynamic linker flag ensures that additional information is available in the executable for producing backtraces at runtime in the event of an internal error.
LIBDIRS = C_LINE_DIRECTIVES = yes F_LINE_DIRECTIVES = yes
DEBUG = no CPP_DEBUG_FLAGS = -DCARPET_DEBUG FPP_DEBUG_FLAGS = -DCARPET_DEBUG C_DEBUG_FLAGS = -O0 CXX_DEBUG_FLAGS = -O0 F77_DEBUG_FLAGS = -O0 F90_DEBUG_FLAGS = -O0
When DEBUG = yes is set (e.g. on the make command line or with SimFactory's --debug option), these debug flags are used. The intention here is to disable optimisation and enable additional code which may slow down execution but makes the code easier to debug.
OPTIMISE = yes CPP_OPTIMISE_FLAGS = -DKRANC_VECTORS # -DCARPET_OPTIMISE -DNDEBUG FPP_OPTIMISE_FLAGS = # -DCARPET_OPTIMISE -DNDEBUG C_OPTIMISE_FLAGS = -O2 -ffast-math CXX_OPTIMISE_FLAGS = -O2 -ffast-math F77_OPTIMISE_FLAGS = -O2 -ffast-math F90_OPTIMISE_FLAGS = -O2 -ffast-math
PROFILE = no CPP_PROFILE_FLAGS = FPP_PROFILE_FLAGS = C_PROFILE_FLAGS = -pg CXX_PROFILE_FLAGS = -pg F77_PROFILE_FLAGS = -pg F90_PROFILE_FLAGS = -pg
OPENMP = yes CPP_OPENMP_FLAGS = -fopenmp FPP_OPENMP_FLAGS = -fopenmp C_OPENMP_FLAGS = -fopenmp CXX_OPENMP_FLAGS = -fopenmp F77_OPENMP_FLAGS = -fopenmp F90_OPENMP_FLAGS = -fopenmp
WARN = yes CPP_WARN_FLAGS = -Wall FPP_WARN_FLAGS = -Wall C_WARN_FLAGS = -Wall CXX_WARN_FLAGS = -Wall F77_WARN_FLAGS = -Wall F90_WARN_FLAGS = -Wall
The Einstein toolkit thorns use a variety of third-party libraries like MPI or HDF5. These are usually provided by helper thorns in the ExternalLibaries arrangement. As a general rule, to enable a capability FOO add
to your ThornList and set FOO_DIR to the directory where the include and lib directories are found.
If no HDF5 options are given, then HDF5 will be used if it can be automatically detected from standard locations, and will be built from a source package in the HDF5 thorn if not. Alternatively you can specify HDF5_DIR to point to an HF5 installation, for example
HDF5_DIR = /usr/local/hdf5-1.9.1
The following options disable support for Fortran and C++ when building HDF5, as it is not required by the Einstein Toolkit.
HDF5_ENABLE_FORTRAN = no HDF5_ENABLE_CXX = no
MPI_DIR = /usr MPI_INC_DIRS = /usr/include/mpich2 MPI_LIB_DIRS = /usr/lib MPI_LIBS = mpich fmpich mpl
PTHREADS_DIR = NO_BUILD
The submission script is used to submit a job to the queueing system. See the examples in simfactory/mdb/submitscripts, and create a new one for your cluster that uses the same queueing system.
The most important part of the run script is usually the set of modules that need to be loaded, and the mpirun command to use on the machine. See the examples in simfactory/mdb/runscripts, and create a new one for your cluster that is similar to one that already exists.