Difference between revisions of "Simulation Factory Advanced Tutorial"

From Einstein Toolkit Documentation
Jump to: navigation, search
(Archiving)
Line 1: Line 1:
The Simulation Factory is an effective method for controlling all facets of a Cactus simulation. It provides a central
+
The Simulation Factory simplifies many aspects of running Cactus-based
facility for managing an authoritative source tree, controlling and providing remote access to many commonly-used HPC machines including
+
simulations. It provides a central facility for managing authoritative
LONI and the TeraGrid, builds and compiles a Cactus source tree into many independent configurations, and can also manage a simulation all the
+
source tree versions, providing convenient access to remote HPC
way from creation to output.
+
systems, building Cactus source configurations, and managing
 +
simulations all the way from submission to archiving their output.
 +
 
 +
 
  
 
== Getting Started ==
 
== Getting Started ==
In order to begin using The Simulation Factory, it must be checked out from '''svn'''. The Simulation Factory typically resides in the '''simfactory'''
+
 
folder inside a Cactus source tree. This can be accomplished with the following '''svn''' command:
+
To begin using The Simulation Factory, it needs to be checked out from
 +
'''svn'''. The Simulation Factory is typically placed into a
 +
'''simfactory''' folder inside a Cactus source tree. This can be
 +
accomplished with the following '''svn''' command:
  
 
  svn co https://svn.cct.lsu.edu/repos/numrel/simfactory/branches/PYSIM_2010 simfactory
 
  svn co https://svn.cct.lsu.edu/repos/numrel/simfactory/branches/PYSIM_2010 simfactory
  
The Simulation Factory can also be placed in an independent location to be used with multiple Cactus source trees. This approach will be detailed later.
+
The Simulation Factory could also be placed in an independent location
 +
to be used with multiple Cactus source trees. This approach will be
 +
described later.
 +
 
 +
 
  
 
== Initial Setup ==
 
== Initial Setup ==
  
Once The Simulation Factory has been checked out from svn, the next step is to create two required configuration files. Assuming The Simulation Factory
+
Once the Simulation Factory has been checked out from svn, the next
has been checked out into the '''simfactory''' folder, this initial configuration can be accomplished with the following commands:
+
step is to configure it, telling it e.g. about your user name. The
 +
Simulation Factory comes with example configuration files that you can
 +
copy and modify. Assuming The Simulation Factory has been checked out
 +
into the '''simfactory''' folder, this initial configuration can be
 +
accomplished with the following commands:
 
   
 
   
cp simfactory/etc/defs.ini.example simfactory/etc/defs.ini
 
 
  cp simfactory/etc/defs.local.ini.simple simfactory/etc/defs.local.ini
 
  cp simfactory/etc/defs.local.ini.simple simfactory/etc/defs.local.ini
  
Line 27: Line 40:
 
</ul>
 
</ul>
  
=== Additional Configuration ===
+
=== Additional Configurations ===
 +
 
 +
The Simulation Factory contains a database known as the Machine
 +
Database. This collection of information describes all the aspects
 +
that are unique about each individual HPC system, so that the
 +
Simulation Factory can provide a common interface for all systems that
 +
hides these differences.
 +
 
 +
The Machine Database consists of different sections, one for each
 +
machine. The section name is given in square brackets, e.g.
 +
'''[queenbee]'''. There is a special section '''[default]''' that
 +
provides default values for those properties that are not explicitly
 +
set in the machine-specific entries.
  
The Simulation Factory contains a database known as the Machine Database. This collection of information is used to define and help mitigate the uniqueness of each individual HPC machine. The Machine Database is an authoritative collection of information, and is generally not meant to be edited by a user. To add, or change properties of a Machine Database entry, '''simfactory/etc/defs.local.ini''' is used. For instance, if an alternative username, allocation, and sourcebasedir is needed for the machine '''queenbee''', you would add the following section:
+
The Machine Database is an authoritative collection of information,
 +
and is generally not meant to contain modification that are only
 +
relevant for individual people. These local modifications are instead
 +
maintained in the file '''simfactory/etc/defs.local.ini''' described
 +
above, where one can add, change, or overwrite properties of Machine
 +
Database entries. For instance, if an alternative username,
 +
allocation, and sourcebasedir is needed for the machine
 +
'''queenbee''', you would add the following section there:
  
 
  [queenbee]
 
  [queenbee]
  user          = queenbee_username
+
  user          = QUEENBEE_USERNAME
  allocation    = queenbee_allocation
+
  allocation    = QUEENBEE_ALLOCATION
 
  sourcebasedir = /work/@USER@
 
  sourcebasedir = /work/@USER@
  
There are several macros that can aide in simplifying configuration. For configuration purposes, the most useful is '''@USER@'''. This macro expands to the '''user''' property of the Machine Database entry. If user was defined in the '''[default]''' section of '''simfactory/etc/defs.local.ini''' then it will contain that value. An expanded list of useful macros can be found in the [[#Macros]] section
+
There are several macros that help simplifying configuration entries.
 +
The most useful is probably '''@USER@''', which expands to the
 +
'''user''' property of the Machine Database entry.
 +
 
 +
For example, if you are using the same user name on many systems, but
 +
have a different user name on some systems, then you would set the
 +
common user name in the '''[default]''' section, and override this for
 +
those machines where your user name differs. The example
 +
'''simfactory/etc/defs.local.ini.complex''' has examples for this.
 +
 
 +
Most of the macros available in the Simulation Factory are described
 +
in the section [[#Macros]] below.
  
To get a list of preconfigured machines, issue the following command:
+
The command
 
   
 
   
 
  simfactory/sim list-machines
 
  simfactory/sim list-machines
 +
 +
outputs a list of all preconfigured machines that the Simulation
 +
Factory knows about.
  
 
=== Local Workstation Configuration ===
 
=== Local Workstation Configuration ===
  
In order to use a local workstation with The Simulation Factory, a Machine Database entry must be created. Before getting started, the hostname of the local
+
The Simulation Factory can only be used on a machine known to the
machine must be determined. It is through this hostname that The Simulation Factory matches a Machine Database entry to the executing machine. The hostname
+
Simulation Factory. This means that you may have to add an entry for
can be determined using the following command:
+
your workstation or notebook.
 +
 
 +
The first step is to determine the hostname of the local machine. It
 +
is through this hostname that the Simulation Factory matches a Machine
 +
Database entry to the machine on which it executes. The hostname can
 +
be determined using the following command:
 
   
 
   
 
  hostname
 
  hostname
  
Once you have the hostname, issue the following command:
+
Once you know the hostname, issue the following command to create a
 +
new Machine Database entry, starting from a generic example:
  
 
  cp simfactory/etc/mdb/generic.ini simfactory/etc/mdb/<hostname>.ini
 
  cp simfactory/etc/mdb/generic.ini simfactory/etc/mdb/<hostname>.ini
  
Edit '''simfactory/etc/mdb/<hostname>.ini''' and replace
+
Then edit '''simfactory/etc/mdb/<hostname>.ini''' and replace
  
 
<ul>
 
<ul>
 
<li> '''[generic]''' with '''[<hostname>]'''  
 
<li> '''[generic]''' with '''[<hostname>]'''  
 
<ul>
 
<ul>
<li> The section header for this machine database entry must be a unique value and must match the '''nickname''' property exactly.
+
<li>The section header for this machine database entry must be unique
 +
  among all Machine Database entries, and must match the '''nickname'''
 +
  property exactly.
 
</ul>
 
</ul>
 
<li> '''nickname = generic''' with '''nickname = <hostname>'''
 
<li> '''nickname = generic''' with '''nickname = <hostname>'''
 
<li> '''hostname = generic''' with '''hostname = <hostname>'''
 
<li> '''hostname = generic''' with '''hostname = <hostname>'''
<li> '''sourcebasedir = /home/@USER@''' with the correct '''root''' path under which all your Cactus source trees reside.  
+
<li> '''sourcebasedir = /home/@USER@''' with the correct '''root'''
<li> '''basedir = /home/@USER@/simulations''' with the desired folder for simulation output
+
  path under which all your Cactus source trees reside. (This is not
 +
  the Cactus directory itself, but the directory that contains the
 +
  Cactus directory.)
 +
<li> '''basedir = /home/@USER@/simulations''' with the desired folder
 +
  that will contain all simulation output. (This is explained in more
 +
  detail below in the section [[#Managing Simulations]].)
 
</ul>
 
</ul>
 +
You can ignore the properties '''user''', '''email''', and
 +
'''allocation''', as the values from the '''[default]''' section of
 +
'''simfactory/etc/defs.local.ini''' will propagate to this entry.
 +
  
user, email, and allocation can safely be ignored, as the values from the '''[default]''' section of '''simfactory/etc/defs.local.ini''' will propagate to this entry.
 
  
 
== Accessing Remote Systems ==
 
== Accessing Remote Systems ==
  
The Simulation Factory provides a convenient facility for handling remote communication and file transfer with any known machine. Using this facility, a user can synchronize
+
The Simulation Factory simplifies access to remote systems, both for
an authoritative source tree, get an interactive shell on the remote system, or execute a command, locally or remotely.  
+
transferring files and logging in. You can synchronise (replicate) an
 +
authoritative version of your Cactus source tree to remote systems,
 +
obtain an interactive shell, or execute commands.
  
 
=== Information Commands ===  
 
=== Information Commands ===  
  
The following commands can be used to discover information about a machine, or list all known, configured machines.
+
The following commands can be used to discover information about a
 +
machine, or list all known machines.
  
List all known machines
+
List all known machines:
  
 
  simfactory/sim list-machines
 
  simfactory/sim list-machines
  
List details about a single machine
+
List details about a single machine:
 
   
 
   
 
  simfactory/sim list-machine <machine>
 
  simfactory/sim list-machine <machine>
  
Print the current Machine Database to the screen
+
Print the complete Machine Database to the screen:
  
 
  simfactory/sim print-mdb
 
  simfactory/sim print-mdb
  
Print the Machine Database entry for a single machine
+
Print the Machine Database entry for a single machine:
  
 
  simfactory/sim print-mdb <machine>
 
  simfactory/sim print-mdb <machine>
  
Get the machine that The Simulation Factory is currently being executed on
+
Print the name of the machine on which the Simulation Factory is
 +
currently being executed:
 
   
 
   
 
  simfactory/sim print-machine
 
  simfactory/sim print-machine
 
  
 
=== Syncing ===
 
=== Syncing ===
  
Historically, Cactus and the Einstein Toolkit have not been installed into a central location, and instead are built on-demand for a certain thornlist. In order to aide this approach, The Simulation Factory has the ability to synchronize a Cactus/Einstein Toolkit developer's local, authoritative source tree to a remote HPC machine to be compiled and ran.  
+
Historically, Cactus and the Einstein Toolkit have not been installed
 +
into a central location on each machine, but are instead built
 +
on-demand by every user for a certain thorn list. (One of the
 +
advantages is that people can thus easily add their own thorns.) To
 +
help with this approach, the Simulation Factory provides a facility to
 +
synchronize a Cactus user's local, authoritative source tree to remote
 +
HPC systems, where it can then be compiled and run.
  
Remote access services are implemented on top of ssh, and ssh-like mechanisms such as gsi-ssh. Currently you must manually manage all ssh keys and passwords.
+
Remote access is implemented on top of ssh and other ssh-like
 +
mechanisms such as gsi-ssh. Currently, you must still manage all ssh
 +
keys and passwords manually. (We highly recommend to use ssh keychain
 +
and ssh agents to avoid having to enter passwords multiple times.)
  
 
==== Configuration ====
 
==== Configuration ====
  
Before syncing a small amount of configuration must be performed. It is necessary to either verify the defaults are correct, or to define the correct values for the following keys
+
Before syncing a source tree to a remote system, a small amount of
 +
configuration must be performed. It is necessary to either verify that
 +
the defaults are correct, or to define the correct values for the
 +
following keys for the remote system in the Machine Database:
  
 
<ul>
 
<ul>
 
<li> '''sourcebasedir'''
 
<li> '''sourcebasedir'''
 
<ul>
 
<ul>
<li> The root directory under which the Cactus source tree will reside
+
<li>The root directory under which the Cactus source tree will reside
 
</ul>
 
</ul>
 
<li> '''basedir'''
 
<li> '''basedir'''
 
<ul>
 
<ul>
<li> The root directory which all simulation output will reside
+
<li>The root directory in which all simulation output will reside
 
</ul>
 
</ul>
 
<li> '''user'''
 
<li> '''user'''
 
<ul>
 
<ul>
<li> The username for remote access
+
<li>The user name on the remote system
 
</ul>
 
</ul>
 
</ul>
 
</ul>
  
You can see the configured values by issuing the following command
+
You can output the currently configured values by issuing the command
 
   
 
   
 
  simfactory/sim print-mdb <machine>
 
  simfactory/sim print-mdb <machine>
  
If it is determined that the values for those entries need to be changed. Edit '''simfactory/etc/defs.local.ini''' and add an entry for the machine being used. This entry will augment
+
If you need to change these values, then edit (on the local system)
the existing Machine Database entry, updating the default values with the values specified. An example for the machine '''queenbee''' can be see in the [[#Additional Configuration]] section.
+
the file '''simfactory/etc/defs.local.ini''' and add a section for the
 +
remote machine. This entry will augment the existing Machine Database
 +
entry and updating/replace the corresponding values. An example for
 +
the machine '''queenbee''' can be see in the [[#Additional
 +
Configuration]] section.
  
Additionally, to see/modify the list of files and directories that are synchronized, edit '''simfactory/etc/defs.ini''' and find the following two keys
+
To see the list of files and directories that are synchronized, look
 +
at '''simfactory/etc/defs.ini''' and find the following two keys
  
 
<ul>
 
<ul>
<li> '''rsync-sources'''
+
<li> '''sync-sources'''
 
<ul>
 
<ul>
<li> The list of files and directories that will be copied when the option '''--sync-sourcetree''' is enabled
+
<li>The list of files and directories that will be transferred when
 +
the option '''--sync-sourcetree''' is enabled (on by default)
 
</ul>
 
</ul>
<li> '''rsync-parfiles'''
+
<li> '''sync-parfiles'''
 
<ul>
 
<ul>
<li> The list of files and directories that will be copied when the option '''--sync-parfiles''' is enabled. This list of files typically includes just parameter files.
+
<li>The list of files and directories that will be copied when the
 +
option '''--sync-parfiles''' is enabled (also on by default). This
 +
list of files typically includes just parameter files.
 
</ul>
 
</ul>
<li> '''rsync-excludes'''
+
<li> '''sync-excludes'''
 
<ul>
 
<ul>
<li> The list of files and directories that will be expressly excluded from syncing
+
<li>A list of files and directories that will be expressly excluded
 +
from syncing, such as e.g. '''CVS''' or '''.svn''' directories.
 
</ul>
 
</ul>
 
</ul>
 
</ul>
Line 150: Line 236:
 
==== Performing a Sync ====
 
==== Performing a Sync ====
  
A sync command takes two arguments, both of which default to '''true'''.  
+
A sync command takes two options, both of which default to '''true'''.
  
 
<ul>
 
<ul>
 
<li> '''sync-sourcetree'''
 
<li> '''sync-sourcetree'''
 
<ul>
 
<ul>
<li> Enable syncing of the list of files and folders specified by the aforementioned '''rsync-sources''' configuration entry.
+
<li>Synchronise the complete source tree, as specified in the
 +
aforementioned '''rsync-sources''' configuration entry. This takes a
 +
few seconds or minutes, depending on the connection.
 
</ul>
 
</ul>
 
<li> '''sync-parfiles'''
 
<li> '''sync-parfiles'''
 
<ul>
 
<ul>
<li> Enable syncing of the list of files and folders specified by the aforementioned '''rsync-parfiles''' configuration entry.  
+
<li>Synchronise parameter files, as specified by the aforementioned
 +
'''rsync-parfiles''' configuration entry. This is typically faster
 +
than synchronising the source tree.
 
</ul>
 
</ul>
 
</ul>
 
</ul>
  
A default sync can be performed by issuing the following command
+
Usually, you would issue the command:
  
 
  simfactory/sim sync <machine>
 
  simfactory/sim sync <machine>
  
To sync only parfiles, you can negate the '''--sync-sourcetree''' argument with the following command
+
To synchronise only parfiles, you can negate the
 +
'''--sync-sourcetree''' argument with the following command
  
 
  simfactory/sim sync <machine> --nosync-sourcetree
 
  simfactory/sim sync <machine> --nosync-sourcetree
  
If the desire is to perform a sync from one remote machine to another remote machine, this can be accomplished with the following command
+
If you want to synchronise not from the local machine, but from
 +
another remote machine, then use
 +
 
 +
simfactory/sim --remote=<frommachine> sync <tomachine>
  
simfactory/sim sync <tomachine> --remotemachine=<frommachine>
+
This executes the synchronisation command on the machine
 +
'''<frommachine>'''.
  
 
=== Remote Login ===
 
=== Remote Login ===
  
The Simulation Factory provides the ability to receive an interactive shell on the remote system. This can be initiated with the following command
+
The Simulation Factory provides the ability to log in to a remote
 +
system. This is initiated with the command
  
 
  simfactory/sim login <machine>
 
  simfactory/sim login <machine>
 +
 +
This will automatically cd into the Cactus directory on the remote
 +
system.
  
 
=== Local/Remote Command Execution ===
 
=== Local/Remote Command Execution ===
  
To execute a command locally via The Simulation Factory, use the following command
+
To execute a command (locally) via the Simulation Factory, use the
 +
command
  
 
  simfactory/sim execute <command>
 
  simfactory/sim execute <command>
  
If the command is complex, and requires arguments, the command must be quoted. For example
+
The command will be executed in the Cactus directory on the remote
 +
system.
  
simfactory/sim execute "ls -al"
+
If the command is complex, and requires arguments, the command must be
 +
quoted. For example
  
To execute a remote command, use the following command
+
simfactory/sim execute 'ls -al'
  
  simfactory/sim execute <command> --remotemachine=<machine>
+
To execute a remote command, use the command
 +
 
 +
  simfactory/sim --remote=<machine> execute <command>
  
 
An example of a complex command being executed remotely is
 
An example of a complex command being executed remotely is
 
   
 
   
  simfactory/sim execute "find . -name *.py -exec sed -i .bk -n s/foo/bar/g {} \;" --remotemachine=queenbee
+
  simfactory/sim --remote=queenbee execute 'find . -name *.py -exec sed -i .bk -n s/foo/bar/g {} \;'
 +
 
 +
 
  
== Cactus Build Configurations ==
+
== Build Cactus Configurations ==
  
The Simulation Factory provides a central facility for configuring and building Cactus source tree. When a Cactus source tree is compiled, The Simulation Factory creates a '''configuration'''
+
The Simulation Factory provides a central facility for configuring and
for the compiled executable, storing with it information such as the Cactus options list, and the provided submission and run scripts. This configuration represents the core of what is necessary to perform Cactus execution and submission.
+
building Cactus source trees. When a Cactus source tree is compiled,
 +
the Simulation Factory creates a '''configuration''' for the compiled
 +
executable, storing with it related information such as the Cactus
 +
options list, and the scripts necessary to submit and run jobs in a
 +
queuing system. This configuration is thus a self-contained entity
 +
containing everything that is necessary to perform Cactus simulations.
  
 
=== Information Commands ===
 
=== Information Commands ===
  
To list all existing configurations, use the following command
+
To list all existing Cactus configurations, use the following command
 
   
 
   
 
  simfactory/sim list-configurations
 
  simfactory/sim list-configurations
Line 212: Line 323:
 
=== Building a Configuration ===
 
=== Building a Configuration ===
  
In order to build a configuration, four pieces of information are needed.
+
To build a configuration, four pieces of information are required:
  
 
<ul>
 
<ul>
<li> Thornlist
+
<li>Thorn list
 
<ul>
 
<ul>
<li> Default: '''thornlist''' parameter of the Machine Database entry
+
<li>This defines which thorns are to be included into the configuration.
<li> Override: '''--thornlist=<thornlist>'''
+
<li>Default: '''thornlist''' parameter of the Machine Database entry
 +
<li>Override: '''--thornlist=<thornlist>'''
 +
<li>The default option list is probably not useful in many cases.
 
</ul>
 
</ul>
<li> Options List
+
<li>Option List
 
<ul>
 
<ul>
<li> Default: '''optionlist''' parameter of the Machine Database entry
+
<li>This specifies the compiler and build options that need to be used
<li> Override: '''--optionlist=<optionlist>'''
+
  to build Cactus on a particular system.
 +
<li>Default: '''optionlist''' parameter of the Machine Database entry
 +
<li>Override: '''--optionlist=<optionlist>'''
 +
<li>The Simulation Factory is supposed to contain good, working
 +
  default option lists for all supported systems. In fact, this is one
 +
  of the main strengths of the Simulation Factory. You should normally
 +
  not need to override the default.
 
</ul>
 
</ul>
<li> Submission Script
+
<li>Submission Script
 
<ul>
 
<ul>
<li> Default: '''submitscript''' parameter of the Machine Database entry
+
<li>This specifies how to submit a job to the queueing system on a
<li> Override: '''--submitscript=<submitscript>'''
+
  particular system.
 +
<li>Default: '''submitscript''' parameter of the Machine Database entry
 +
<li>Override: '''--submitscript=<submitscript>'''
 +
<li>Similar to the option list, the Simulation Factory is supposed to
 +
  contain good, working default submission scripts for all supported
 +
  systems.
 
</ul>
 
</ul>
<li> Run Script
+
<li>Run Script
 
<ul>
 
<ul>
<li> Default: '''runscript''' parameter of the Machine Database entry
+
<li>This specifies to how execute an MPI process on a particular
<li> Override: '''--runscript=<runscript>'''
+
  system; it is closely connected to the submission script.
 +
<li>Default: '''runscript''' parameter of the Machine Database entry
 +
<li>Override: '''--runscript=<runscript>'''
 +
<li>Same as with the the submission script, the Simulation Factory is
 +
supposed to contain good, working default run scripts for all
 +
supported systems.
 
</ul>
 
</ul>
 
</ul>
 
</ul>
  
For any pre-configured Machine Database entry, the defaults for '''optionlist''', '''submitscript''', and '''runscript''' should suffice.
+
To build a configuration with a specific thornlist, issue the
 
+
following command:
To build a configuration with a specified thornlist, issue the following command:
 
  
 
  simfactory/sim build [<configurationname>] --thornlist=<thornlist>
 
  simfactory/sim build [<configurationname>] --thornlist=<thornlist>
  
If you choose to omit the configuration name, it will default to 'sim'. If one of the following options is specified, debug, profile, unsafe, and optimise, then the configuration name
+
If you choose to omit the configuration name, it will default to
will append the specified option onto the end of it. For instance, if you specify --debug with a configuration name 'mybuild', then the configuration name will be mybuild-debug
+
'sim'. (We recommend this.) You can in addition specify any of the
 
+
options
==== Additional Options ====
 
 
 
 
<ul>
 
<ul>
 
+
<li>--debug
<li> '''--debug'''
+
<li>--optimise (default)
<ul>
+
<li>--profile
<li> Enable debugging in the Cactus executable
 
 
</ul>
 
</ul>
 +
These options create configurations for debugging, optimisation (this
 +
is the default), or profiling enabled.
 +
If any of these options is specified, then the configuration name will
 +
be modified correspondingly, e.g. to 'sim-debug'.
  
<li> '''--optimise'''
+
==== Additional Options ====
<ul>
 
<li> Enable optimisation in the Cactus executable
 
<li> * WIll be OFF if --debug is enabled.
 
</ul>
 
  
<li> '''--profile'''
 
 
<ul>
 
<ul>
<li> Build Cactus with profiling
 
</ul>
 
  
<li> '''--unsafe'''
+
<li>'''--reconfig'''
 
<ul>
 
<ul>
<li> Build Cactus with unsafe options
+
<li>Reconfigure before building, i.e. re-examine the configuration
 +
  options and re-run the CST stage. This happens automatically when
 +
  the option list changes.
 
</ul>
 
</ul>
  
<li> '''--reconfig'''
+
<li>'''--clean'''
 
<ul>
 
<ul>
<li> Force Cactus to reconfigure before building
+
<li>Clean the configuration (remove all object files etc.) before
</ul>
+
  building.
 
 
<li> '''--clean'''
 
<ul>
 
<li> Clean Cactus before building
 
 
</ul>
 
</ul>
  
Line 285: Line 405:
 
=== What's Produced ===
 
=== What's Produced ===
  
The Simulation Factory creates a configuration based upon the input parameters (or defaults) and the compiled executable. Configurations live in the '''configs''' folder inside the Cactus source tree, and compiled executables live inside the '''exe''' folder also inside the Cactus source tree. The following is an example directory structure of the compiled configuration '''sim'''
+
The Simulation Factory creates a configuration based upon the input
 +
parameters (or defaults) containing the compiled executable.
 +
Configurations live in the '''configs''' folder inside the Cactus
 +
source tree, and compiled executables live in inside the '''exe'''
 +
folder also inside the Cactus source tree. The following is an example
 +
directory structure of a configuration called '''sim''':
  
 
  Cactus/
 
  Cactus/
Line 305: Line 430:
 
=== Script Locations ===
 
=== Script Locations ===
  
The Simulation Factory provides default scripts for every one of its preconfigured machines. These scripts can be found in the following locations
+
The Simulation Factory provides default scripts for all its
 +
preconfigured machines. These scripts can be found in the following
 +
locations
  
 
<ul>
 
<ul>
  
<li> '''Option Lists'''  
+
<li>'''Option Lists'''  
 
<ul>
 
<ul>
<li> MDB Key: optionlist
+
<li>MDB Key: optionlist
<li> Location: simfactory/etc/optionlists
+
<li>Location: simfactory/etc/optionlists
 
</ul>
 
</ul>
  
<li> '''Submit Scripts'''  
+
<li>'''Submit Scripts'''  
 
<ul>
 
<ul>
<li> MDB Key: submitscript
+
<li>MDB Key: submitscript
<li> Location: simfactory/etc/submitscripts
+
<li>Location: simfactory/etc/submitscripts
 
</ul>
 
</ul>
  
<li> '''Run Scripts'''  
+
<li>'''Run Scripts'''  
 
<ul>
 
<ul>
<li> MDB Key: runscript
+
<li>MDB Key: runscript
<li> Location: simfactory/etc/runscripts
+
<li>Location: simfactory/etc/runscripts
 
</ul>
 
</ul>
  
 
</ul>
 
</ul>
  
To determine, for instance, which option list queenbee uses by default, issue the following command
+
To determine, for instance, which option list Queen Bee uses by
 +
default, issue the command
  
 
  simfactory/sim print-mdb queenbee | grep optionlist
 
  simfactory/sim print-mdb queenbee | grep optionlist
 +
 +
  
 
== Managing Simulations ==  
 
== Managing Simulations ==  
  
The Simulation Factory provides a convenient, consistent facility for submitting, executing, and managing simulations. This is accomplished through two main commands: '''submit''' and '''run'''.
+
The Simulation Factory provides a convenient, consistent facility for
 +
submitting, running, and managing simulations. This is accomplished
 +
through two main commands '''submit''' and '''run'''.
  
 
=== Information Commands ===
 
=== Information Commands ===
  
The status of all simulations can be seen with the following command
+
The status of all simulations on a particular machine can be seen with
 +
the following command
  
 
  simfactory/sim list-simulations
 
  simfactory/sim list-simulations
  
If a more detailed look at each simulation is required, the verbose option can be specified
+
If a more detailed look at each simulation is required, the verbose
 +
option can be specified
  
 
  simfactory/sim list-simulations --verbose
 
  simfactory/sim list-simulations --verbose
Line 349: Line 483:
 
=== Submitting a Simulation ===
 
=== Submitting a Simulation ===
  
Four primary pieces of information are necessary when submitting a simulation to the host queuing system. They are
+
Four primary pieces of information are necessary when submitting a
 +
simulation to the host queuing system. They are
  
 
<ul>
 
<ul>
  
<li> '''Configuration'''
+
<li>'''Configuration'''
 
<ul>
 
<ul>
<li> The Cactus build configuration to use.
+
<li>The Cactus configuration to run
<li> '''option''': --configuration
+
<li>'''option''': --configuration
<li> '''default''': "sim"
+
<li>'''default''': "sim"
 
</ul>  
 
</ul>  
  
<li> '''Parfile'''
+
<li>'''Parfile'''
 
<ul>
 
<ul>
<li> The Cactus parameter file to use
+
<li>The Cactus parameter file to use
<li> '''option''': --parfile
+
<li>'''option''': --parfile
 
</ul>  
 
</ul>  
  
<li> '''Walltime'''
+
<li>'''Walltime'''
 
<ul>
 
<ul>
<li> The amount of CPU time to request
+
<li>The total amount of wall time required
<li> '''option''': --walltime
+
<li>'''option''': --walltime
<li> '''default''': MDB Key: maxwalltime
+
<li>'''default''': MDB Key '''maxwalltime'''
 
</ul>  
 
</ul>  
  
<li> Procs
+
<li>Processors
 
<ul>
 
<ul>
<li> The number of processors to use  
+
<li>The total number of processors to use
<li> '''option''': --procs
+
<li>'''option''': --procs
<li> '''default''': 1
+
<li>'''default''': 1
 
</ul>  
 
</ul>  
  
 
</ul>  
 
</ul>  
  
'''--configuration''' only needs to be specified the '''first''' time you submit a simulation. Subsequent submissions of the same simulation will use whatever configuration was specified the first time. Here is an example of submitting a simulation named "static_tov" using the aforementioned options
+
The option '''--configuration''' only needs to be specified the first
 +
time you submit a simulation. Subsequent re-submissions of the same
 +
simulation (for restarting from checkpoints) will always use the same
 +
configuration that was specified the first time. Here is an example of
 +
submitting a simulation named "static_tov" using the aforementioned
 +
options:
  
 
  simfactory/sim submit static_tov --configuration sim-debug --parfile=par/static_tov.par --walltime=4:00:00 --procs=8
 
  simfactory/sim submit static_tov --configuration sim-debug --parfile=par/static_tov.par --walltime=4:00:00 --procs=8
  
It is possible to submit a simulation using shorthand, where you can specify the options in a certain order. If you don't specify a simulation name using
+
It is possible to submit a simulation using shorthand notation where
the shorthand syntax, it will attempt to derive the simulation name from the basename of the parfile specified.
+
you do not need to specify the option names, but have to specify the
 +
options in a certain order. If you don't specify a simulation name
 +
using the shorthand syntax, a simulation name will be derived from
 +
from the parameter file name.
  
 
  simfactory/sim submit [<simulationname>] <parfile> <walltime> <procs>
 
  simfactory/sim submit [<simulationname>] <parfile> <walltime> <procs>
  
An example
+
An example is
  
 
  simfactory/sim submit par/static_tov.par 4:00:00 8
 
  simfactory/sim submit par/static_tov.par 4:00:00 8
 
  
 
==== Additional Options: Submission ====
 
==== Additional Options: Submission ====
Line 400: Line 542:
 
<ul>
 
<ul>
  
<li> '''Processors Per Node'''
+
<li>Number of OpenMP Threads
 
<ul>
 
<ul>
<li> The number of processors per node to use.  
+
<li>The number of OpenMP threads per MPI process. (You specify the
<li> '''option''': --ppn
+
  total number of processors (cores), and the number of OpenMP
<li> '''default''': 1
+
  threads; the number of MPI processes is then calculated
 +
  automatically.)
 +
<li>option: --num-threads
 +
<li>default: 1 (as if OpenMP was not used)
 
</ul>  
 
</ul>  
  
<li> '''Memory'''
+
<li>Allocation
 
<ul>
 
<ul>
<li> The amount of memory to use
+
<li>The allocation for the simulation, overriding the corresponding
<li> '''option''': --memory
+
  MDB entry
<li> '''default''': 1024
+
<li>option: --allocation
 +
<li>default: taken from the MDB
 
</ul>  
 
</ul>  
  
<li> '''cpufreq'''
+
<li>Queue
 
<ul>
 
<ul>
<li> The frequency of the CPU
+
<li>The queue for the simulation, overriding the corresponding MDB
<li> '''option''': --cpufreq
+
  entry
<li> '''default''': 0
+
<li>option: --queue
 +
<li>default: taken from the MDB
 
</ul>  
 
</ul>  
  
<li> '''allocation'''
+
<li>Processors per node
 
<ul>
 
<ul>
<li> The allocation for the simulation to use
+
<li>The number of processors per node requested from the queueing system
<li> '''option''': --allocation
+
<li>option: --ppn
<li> '''default''':  
+
<li>default: all processors on a node
 
</ul>  
 
</ul>  
  
<li> '''queue'''
+
<li>Used processors per node
 
<ul>
 
<ul>
<li> The queue for the simulation to use
+
<li>The number of processors per node that should actually be used,
<li> '''option''': --queue
+
  allowing under-using nodes even if the queueing system does not
<li> '''default''': "checkpt"
+
  allow it. (The remaining processors will idle and will remain
 +
  unused.)
 +
<li>option: --ppn-used
 +
<li>default: all processors on a node
 
</ul>  
 
</ul>  
 +
 
</ul>  
 
</ul>  
 
  
 
=== Running a Simulation ===
 
=== Running a Simulation ===
  
The Simulation Factory can execute a simulation directly, bypassing a queuing system. Running a simulation directly uses the same options, minus the walltime, as submitting a simulation, only using the '''run''' command instead. An example
+
The Simulation Factory can execute a simulation directly, bypassing
 
+
the queuing system. Running a simulation directly uses the same
simfactory/sim run static_tov --parfile=par/static_tov.par --procs=8
+
options, but ignores wall time limit etc. You use the '''run'''
 +
command for this:
  
If this simulation does not exist, '''--configuration=<configuration>''' will need to be specified the first time the simulation is run.
+
simfactory/sim run static_tov --configuration sim-debug --parfile=par/static_tov.par --procs=8
  
 
==== Additional Options: Running ====
 
==== Additional Options: Running ====
Line 451: Line 602:
 
=== Other Simulation Commands ===
 
=== Other Simulation Commands ===
  
To launch an interactive session on a compute node, use the following command
+
To launch an interactive session on a compute node, use the command
  
 
  simfactory/sim interactive --procs=8 --walltime=4:00:00
 
  simfactory/sim interactive --procs=8 --walltime=4:00:00
  
To stop a simulation
+
This leads to a login shell on the compute node, but is otherwise
 +
similar to the submit command.
 +
 
 +
To stop a simulation:
 
   
 
   
  simfactory/sim stop <simulationname> [--restart-id=<restartid>]
+
  simfactory/sim stop <simulationname>
  
To purge (put in the basedir/TRASH folder) an existing simulation
+
To purge (put in the basedir/TRASH folder) an existing simulation:
  
  simfactory/sim purge <simulationname>[--restart-id=<restartid>]
+
  simfactory/sim purge <simulationname> [--restart-id=<restartid>]
  
To show the output for a given simulation
+
To show the current output (stdout and stderr) for a given simulation:
  
 
  simfactory/sim show-output <simulationname> [--restart-id=<restartid>]
 
  simfactory/sim show-output <simulationname> [--restart-id=<restartid>]
Line 469: Line 623:
 
=== What's Produced ===
 
=== What's Produced ===
  
When a simulation is run for the first time, all the necessary information from the Cactus build configuration is brought into a specific simulation folder created  
+
When a simulation is submitted for the first time, all necessary
underneath the '''basedir'''. Contained inside this folder, which has the same name as the specified simulation, are the executable, the run script, the submit script, the SIMFACTORY folder, a log file, and the output directories of each individual restart.
+
information from the Cactus build configuration is brought into a
 +
specific simulation folder created underneath the '''basedir'''
 +
directory. Contained inside this folder, which has the same name as
 +
the specified simulation, are the executable, run script, submit
 +
script, a SIMFACTORY folder, a log file, and the output directories
 +
for each individual restart.
  
Here is the contents of the simulation folder "btest" with several restarts in it
+
Simulations are self-contained, and once created do not rely on
 +
outside information. For example, recompiling the executable or
 +
changing the parameter file that were used to submit a simulation will
 +
not influence the simulation, since the simulation contains copies of
 +
both. This ensures that simulations can continue to run unperturbed
 +
even weeks after they have been created.
  
  [mwt@eric2 simulations]$ ls -al btest
+
Here is the contents of the simulation folder "static_tov" with
 +
several restarts in it:
 +
 
 +
  [mwt@eric2 simulations]$ ls -l static_tov
 
  total 32
 
  total 32
drwxr-xr-x  8 mwt lsuusers 4096 Sep 17 09:03 .
 
drwxr-xr-x  8 mwt lsuusers 4096 Sep 27 11:32 ..
 
 
  -rw-r--r--  1 mwt lsuusers    0 Sep 30 13:30 LOG
 
  -rw-r--r--  1 mwt lsuusers    0 Sep 30 13:30 LOG
  drwxr-xr-x  3 mwt lsuusers 4096 Aug 20 10:19 output-0001
+
  drwxr-xr-x  3 mwt lsuusers 4096 Aug 20 10:19 output-0000
  drwxr-xr-x  4 mwt lsuusers 4096 Aug 20 10:19 output-0002
+
  drwxr-xr-x  4 mwt lsuusers 4096 Aug 20 10:19 output-0001
  drwxr-xr-x  4 mwt lsuusers 4096 Aug 20 10:24 output-0003
+
  drwxr-xr-x  4 mwt lsuusers 4096 Aug 20 10:24 output-0002
  drwxr-xr-x  3 mwt lsuusers 4096 Aug 20 23:57 output-0004
+
  drwxr-xr-x  3 mwt lsuusers 4096 Aug 20 23:57 output-0003
  drwxr-xr-x  4 mwt lsuusers 4096 Sep 17 09:02 output-0005
+
  drwxr-xr-x  4 mwt lsuusers 4096 Sep 17 09:02 output-0004
 
  drwxr-xr-x  7 mwt lsuusers 4096 Aug 20 10:18 SIMFACTORY
 
  drwxr-xr-x  7 mwt lsuusers 4096 Aug 20 10:18 SIMFACTORY
  
 +
The SIMFACTORY folder contains the executable, the necessary script
 +
files for submission and execution, and a properties.ini file that is
 +
used by the Simulation Factory to store information about the
 +
simulation.
  
The SIMFACTORY folder contains the executable, the necessary script files needed for submitting and execution, and a properties.ini file that is used by the Simulation Factory to store
+
Each time a simulation is either run or submitted, a restart directory
information about the simulation.
+
is created underneath the simulation directory. This restart folder
 
+
has a name of the format "output-####", starting with "output-0000".
Each time a simulation is either run or submitted, a restart directory is created underneath the simulation directory. This restart folder has the format of output-####, starting with
+
Contained inside the restart folder are several internal files, the
output-0001. Contained inside the restart folder are several internal files, the output written to stdout and stderr from the simulation, and the simulation output itself. The simulation output is stored inside a directory named after the basename of the parameter file. An example output directory is below
+
output written to stdout and stderr from the simulation, and the
 
+
simulation output itself. The simulation output is typically stored
 +
inside a directory named after the basename of the parameter file. An
 +
example output directory is:
  
  [mwt@eric2 output-0001]$ ls -al
+
  [mwt@eric2 output-0001]$ ls -l
 
  total 172
 
  total 172
drwxr-xr-x  4 mwt lsuusers  4096 Sep 17 21:06 .
 
drwxr-xr-x  4 mwt lsuusers  4096 Sep 27 11:29 ..
 
 
  -rw-r--r--  1 mwt lsuusers      0 Sep 17 09:06 LOG
 
  -rw-r--r--  1 mwt lsuusers      0 Sep 17 09:06 LOG
 
  -rw-r--r--  1 mwt lsuusers      9 Sep 17 09:06 mpd_nodefile
 
  -rw-r--r--  1 mwt lsuusers      9 Sep 17 09:06 mpd_nodefile
Line 511: Line 680:
 
=== Script Locations ===
 
=== Script Locations ===
  
When a simulation is created, it copies the submit script and the run script from the build configuration into the basedir/<simulation>/SIMFACTORY folder. The executable goes in the exe/ folder, the run and submit scripts into the run/ folder, the Cactus options list into the cfg/ folder, and the parfile into the par/ folder. Below shows an example SIMFACTORY directory
+
When a simulation is created, it copies the submit script and the run
 +
script from the build configuration into the folder
 +
"basedir/<simulation>/SIMFACTORY". The executable goes in the "exe/"
 +
folder, the run and submit scripts into the "run/" folder, the Cactus
 +
options list into the "cfg/" folder, and the parfile into the "par/"
 +
folder. Below shows an example SIMFACTORY directory
  
  [mwt@eric2 SIMFACTORY]$ ls -alR
+
  [mwt@eric2 SIMFACTORY]$ ls -lR
 
  .:
 
  .:
 
  total 32
 
  total 32
Line 539: Line 713:
 
  -rw-r--r--  1 mwt lsuusers 1162 Sep 17 09:06 RunScript
 
  -rw-r--r--  1 mwt lsuusers 1162 Sep 17 09:06 RunScript
 
  -rw-r--r--  1 mwt lsuusers  410 Sep 17 09:06 SubmitScript
 
  -rw-r--r--  1 mwt lsuusers  410 Sep 17 09:06 SubmitScript
 +
 +
  
 
== Other Advanced Features ==
 
== Other Advanced Features ==
Line 544: Line 720:
 
=== Archiving ===
 
=== Archiving ===
  
Preliminary support for archiving of simulations has been added to The Simulation Factory using the Petashare data storage system. This archiving system can only be used from machines that have access to Petashare. The currently supported machines that have access to Petashare are
+
Preliminary support for archiving of simulations has been added to the
 +
Simulation Factory using the PetaShare data storage system on LONI.
 +
This archiving system can only be used from machines that have access
 +
to PetaShare. The currently supported machines that have access to
 +
Petashare are
  
  queenbee
+
  bluedawg
  tezpur
+
  ducky
 
  eric
 
  eric
  ducky
+
  lacumba
 +
louie
 
  neptune
 
  neptune
zeke
 
spider
 
 
  oliver
 
  oliver
lacumba
 
 
  painter
 
  painter
bluedawg
 
 
  poseidon
 
  poseidon
  louie
+
  queenbee
 +
spider
 +
tezpur
 +
zeke
  
In order to use the Petashare archiving system, the Petashare [http://www.cct.lsu.edu/~sreekanth/petashare/downloads/pcommands/pcommands-2.3.tar.gz Pcommands], [http://www.cct.lsu.edu/~sreekanth/petashare/pcommands-manual(2.0).php (manual)] must be downloaded and installed into your local directory. Once that's done, add a new section in '''etc/defs.local.ini''' for the machine you wish to access petashare from, and add the necessary keys. An example for the machine queenbee is below
+
To use the PetaShare archiving system, the PetaShare
 +
[http://www.cct.lsu.edu/~sreekanth/petashare/downloads/pcommands/pcommands-2.3.tar.gz
 +
Pcommands],
 +
[http://www.cct.lsu.edu/~sreekanth/petashare/pcommands-manual(2.0).php
 +
(manual)] must be downloaded and installed onto your machine. Once
 +
that is done, add a new section in '''etc/defs.local.ini''' for the
 +
machine from which you wish to access PetaShare, and add the necessary
 +
MDB keys. An example for the machine Queen Bee is:
  
 
  [queenbee]
 
  [queenbee]
Line 569: Line 756:
 
  archivebasepath  = /tempZone/home/numrel/mwt/simulations
 
  archivebasepath  = /tempZone/home/numrel/mwt/simulations
  
archivebasepath is where the simulations will be stored on petashare.  
+
'''archivebasepath''' is where the simulations will be stored on
 +
PetaShare.
  
Once setup has been completed, you can archive an entire simulation (including all restarts) using the following command
+
Once this setup has been completed, you can archive an entire
 +
simulation (including all restarts) using the command
  
 
  simfactory/sim archive <simulationname>
 
  simfactory/sim archive <simulationname>
  
To archive just a single restart for a given simulation, you can issue the following command
+
To archive just a single restart for a given simulation, issue the
 +
command
 
   
 
   
  simfactory/sim archive <simulationname> -- restart-id=<restartid>
+
  simfactory/sim archive <simulationname> --restart-id=<restartid>
  
To get a list of all archived simulations, use the following command
+
To print a list of all archived simulations, use the command
  
 
  simfactory/sim list-archived-simulations
 
  simfactory/sim list-archived-simulations
  
To retreive an archived simulation, first use the '''list-archived-simulations''' command to retrieve the unique identifier for the simulation, and then issue the following command to retrieve a simulation. It will place the simulation in the current directory.
+
To retreive an archived simulation, first use the
 +
'''list-archived-simulations''' command to retrieve the unique
 +
identifier for the simulation, and then issue the following command to
 +
retrieve a simulation. It will place the simulation in the current
 +
directory.
  
 
  simfactory/sim get-archived-simulation <archiveid>
 
  simfactory/sim get-archived-simulation <archiveid>
 
=== Macros ===
 

Revision as of 20:25, 6 October 2010

The Simulation Factory simplifies many aspects of running Cactus-based simulations. It provides a central facility for managing authoritative source tree versions, providing convenient access to remote HPC systems, building Cactus source configurations, and managing simulations all the way from submission to archiving their output.


Getting Started

To begin using The Simulation Factory, it needs to be checked out from svn. The Simulation Factory is typically placed into a simfactory folder inside a Cactus source tree. This can be accomplished with the following svn command:

svn co https://svn.cct.lsu.edu/repos/numrel/simfactory/branches/PYSIM_2010 simfactory

The Simulation Factory could also be placed in an independent location to be used with multiple Cactus source trees. This approach will be described later.


Initial Setup

Once the Simulation Factory has been checked out from svn, the next step is to configure it, telling it e.g. about your user name. The Simulation Factory comes with example configuration files that you can copy and modify. Assuming The Simulation Factory has been checked out into the simfactory folder, this initial configuration can be accomplished with the following commands:

cp simfactory/etc/defs.local.ini.simple simfactory/etc/defs.local.ini

Edit simfactory/etc/defs.local.ini and replace

  • YOUR_LOGIN with your usual username
  • YOUR@EMAIL.ADDRESS with your usual email address
  • YOUR_ALLOCATION with your usual allocation

Additional Configurations

The Simulation Factory contains a database known as the Machine Database. This collection of information describes all the aspects that are unique about each individual HPC system, so that the Simulation Factory can provide a common interface for all systems that hides these differences.

The Machine Database consists of different sections, one for each machine. The section name is given in square brackets, e.g. [queenbee]. There is a special section [default] that provides default values for those properties that are not explicitly set in the machine-specific entries.

The Machine Database is an authoritative collection of information, and is generally not meant to contain modification that are only relevant for individual people. These local modifications are instead maintained in the file simfactory/etc/defs.local.ini described above, where one can add, change, or overwrite properties of Machine Database entries. For instance, if an alternative username, allocation, and sourcebasedir is needed for the machine queenbee, you would add the following section there:

[queenbee]
user          = QUEENBEE_USERNAME
allocation    = QUEENBEE_ALLOCATION
sourcebasedir = /work/@USER@

There are several macros that help simplifying configuration entries. The most useful is probably @USER@, which expands to the user property of the Machine Database entry.

For example, if you are using the same user name on many systems, but have a different user name on some systems, then you would set the common user name in the [default] section, and override this for those machines where your user name differs. The example simfactory/etc/defs.local.ini.complex has examples for this.

Most of the macros available in the Simulation Factory are described in the section #Macros below.

The command

simfactory/sim list-machines

outputs a list of all preconfigured machines that the Simulation Factory knows about.

Local Workstation Configuration

The Simulation Factory can only be used on a machine known to the Simulation Factory. This means that you may have to add an entry for your workstation or notebook.

The first step is to determine the hostname of the local machine. It is through this hostname that the Simulation Factory matches a Machine Database entry to the machine on which it executes. The hostname can be determined using the following command:

hostname

Once you know the hostname, issue the following command to create a new Machine Database entry, starting from a generic example:

cp simfactory/etc/mdb/generic.ini simfactory/etc/mdb/<hostname>.ini

Then edit simfactory/etc/mdb/<hostname>.ini and replace

  • [generic] with [<hostname>]
    • The section header for this machine database entry must be unique among all Machine Database entries, and must match the nickname property exactly.
  • nickname = generic with nickname = <hostname>
  • hostname = generic with hostname = <hostname>
  • sourcebasedir = /home/@USER@ with the correct root path under which all your Cactus source trees reside. (This is not the Cactus directory itself, but the directory that contains the Cactus directory.)
  • basedir = /home/@USER@/simulations with the desired folder that will contain all simulation output. (This is explained in more detail below in the section #Managing Simulations.)

You can ignore the properties user, email, and allocation, as the values from the [default] section of simfactory/etc/defs.local.ini will propagate to this entry.


Accessing Remote Systems

The Simulation Factory simplifies access to remote systems, both for transferring files and logging in. You can synchronise (replicate) an authoritative version of your Cactus source tree to remote systems, obtain an interactive shell, or execute commands.

Information Commands

The following commands can be used to discover information about a machine, or list all known machines.

List all known machines:

simfactory/sim list-machines

List details about a single machine:

simfactory/sim list-machine <machine>

Print the complete Machine Database to the screen:

simfactory/sim print-mdb

Print the Machine Database entry for a single machine:

simfactory/sim print-mdb <machine>

Print the name of the machine on which the Simulation Factory is currently being executed:

simfactory/sim print-machine

Syncing

Historically, Cactus and the Einstein Toolkit have not been installed into a central location on each machine, but are instead built on-demand by every user for a certain thorn list. (One of the advantages is that people can thus easily add their own thorns.) To help with this approach, the Simulation Factory provides a facility to synchronize a Cactus user's local, authoritative source tree to remote HPC systems, where it can then be compiled and run.

Remote access is implemented on top of ssh and other ssh-like mechanisms such as gsi-ssh. Currently, you must still manage all ssh keys and passwords manually. (We highly recommend to use ssh keychain and ssh agents to avoid having to enter passwords multiple times.)

Configuration

Before syncing a source tree to a remote system, a small amount of configuration must be performed. It is necessary to either verify that the defaults are correct, or to define the correct values for the following keys for the remote system in the Machine Database:

  • sourcebasedir
    • The root directory under which the Cactus source tree will reside
  • basedir
    • The root directory in which all simulation output will reside
  • user
    • The user name on the remote system

You can output the currently configured values by issuing the command

simfactory/sim print-mdb <machine>

If you need to change these values, then edit (on the local system) the file simfactory/etc/defs.local.ini and add a section for the remote machine. This entry will augment the existing Machine Database entry and updating/replace the corresponding values. An example for the machine queenbee can be see in the [[#Additional Configuration]] section.

To see the list of files and directories that are synchronized, look at simfactory/etc/defs.ini and find the following two keys

  • sync-sources
    • The list of files and directories that will be transferred when the option --sync-sourcetree is enabled (on by default)
  • sync-parfiles
    • The list of files and directories that will be copied when the option --sync-parfiles is enabled (also on by default). This list of files typically includes just parameter files.
  • sync-excludes
    • A list of files and directories that will be expressly excluded from syncing, such as e.g. CVS or .svn directories.

Performing a Sync

A sync command takes two options, both of which default to true.

  • sync-sourcetree
    • Synchronise the complete source tree, as specified in the aforementioned rsync-sources configuration entry. This takes a few seconds or minutes, depending on the connection.
  • sync-parfiles
    • Synchronise parameter files, as specified by the aforementioned rsync-parfiles configuration entry. This is typically faster than synchronising the source tree.

Usually, you would issue the command:

simfactory/sim sync <machine>

To synchronise only parfiles, you can negate the --sync-sourcetree argument with the following command

simfactory/sim sync <machine> --nosync-sourcetree

If you want to synchronise not from the local machine, but from another remote machine, then use

simfactory/sim --remote=<frommachine> sync <tomachine>

This executes the synchronisation command on the machine <frommachine>.

Remote Login

The Simulation Factory provides the ability to log in to a remote system. This is initiated with the command

simfactory/sim login <machine>

This will automatically cd into the Cactus directory on the remote system.

Local/Remote Command Execution

To execute a command (locally) via the Simulation Factory, use the

command
simfactory/sim execute <command>

The command will be executed in the Cactus directory on the remote system.

If the command is complex, and requires arguments, the command must be quoted. For example

simfactory/sim execute 'ls -al'

To execute a remote command, use the command

simfactory/sim --remote=<machine> execute <command>

An example of a complex command being executed remotely is

simfactory/sim --remote=queenbee execute 'find . -name *.py -exec sed -i .bk -n s/foo/bar/g {} \;'


Build Cactus Configurations

The Simulation Factory provides a central facility for configuring and building Cactus source trees. When a Cactus source tree is compiled, the Simulation Factory creates a configuration for the compiled executable, storing with it related information such as the Cactus options list, and the scripts necessary to submit and run jobs in a queuing system. This configuration is thus a self-contained entity containing everything that is necessary to perform Cactus simulations.

Information Commands

To list all existing Cactus configurations, use the following command

simfactory/sim list-configurations

Building a Configuration

To build a configuration, four pieces of information are required:

  • Thorn list
    • This defines which thorns are to be included into the configuration.
    • Default: thornlist parameter of the Machine Database entry
    • Override: --thornlist=<thornlist>
    • The default option list is probably not useful in many cases.
  • Option List
    • This specifies the compiler and build options that need to be used to build Cactus on a particular system.
    • Default: optionlist parameter of the Machine Database entry
    • Override: --optionlist=<optionlist>
    • The Simulation Factory is supposed to contain good, working default option lists for all supported systems. In fact, this is one of the main strengths of the Simulation Factory. You should normally not need to override the default.
  • Submission Script
    • This specifies how to submit a job to the queueing system on a particular system.
    • Default: submitscript parameter of the Machine Database entry
    • Override: --submitscript=<submitscript>
    • Similar to the option list, the Simulation Factory is supposed to contain good, working default submission scripts for all supported systems.
  • Run Script
    • This specifies to how execute an MPI process on a particular system; it is closely connected to the submission script.
    • Default: runscript parameter of the Machine Database entry
    • Override: --runscript=<runscript>
    • Same as with the the submission script, the Simulation Factory is supposed to contain good, working default run scripts for all supported systems.

To build a configuration with a specific thornlist, issue the following command:

simfactory/sim build [<configurationname>] --thornlist=<thornlist>

If you choose to omit the configuration name, it will default to 'sim'. (We recommend this.) You can in addition specify any of the options

  • --debug
  • --optimise (default)
  • --profile

These options create configurations for debugging, optimisation (this is the default), or profiling enabled. If any of these options is specified, then the configuration name will be modified correspondingly, e.g. to 'sim-debug'.

Additional Options

  • --reconfig
    • Reconfigure before building, i.e. re-examine the configuration options and re-run the CST stage. This happens automatically when the option list changes.
  • --clean
    • Clean the configuration (remove all object files etc.) before building.

What's Produced

The Simulation Factory creates a configuration based upon the input parameters (or defaults) containing the compiled executable. Configurations live in the configs folder inside the Cactus source tree, and compiled executables live in inside the exe folder also inside the Cactus source tree. The following is an example directory structure of a configuration called sim:

Cactus/
Cactus/exe/
Cactus/exe/cactus_sim                                  * Follows the naming convention cactus_<configuration>

Cactus/configs/
Cactus/configs/sim/
Cactus/configs/sim/bindings/
Cactus/configs/sim/build/
Cactus/configs/sim/config-data/
Cactus/configs/sim/lib/
Cactus/configs/sim/scratch/
Cactus/configs/sim/OptionList
Cactus/configs/sim/RunScript
Cactus/configs/sim/SubmitScript
Cactus/configs/sim/ThornList

Script Locations

The Simulation Factory provides default scripts for all its preconfigured machines. These scripts can be found in the following locations

  • Option Lists
    • MDB Key: optionlist
    • Location: simfactory/etc/optionlists
  • Submit Scripts
    • MDB Key: submitscript
    • Location: simfactory/etc/submitscripts
  • Run Scripts
    • MDB Key: runscript
    • Location: simfactory/etc/runscripts

To determine, for instance, which option list Queen Bee uses by default, issue the command

simfactory/sim print-mdb queenbee | grep optionlist


Managing Simulations

The Simulation Factory provides a convenient, consistent facility for submitting, running, and managing simulations. This is accomplished through two main commands submit and run.

Information Commands

The status of all simulations on a particular machine can be seen with the following command

simfactory/sim list-simulations

If a more detailed look at each simulation is required, the verbose option can be specified

simfactory/sim list-simulations --verbose

Submitting a Simulation

Four primary pieces of information are necessary when submitting a simulation to the host queuing system. They are

  • Configuration
    • The Cactus configuration to run
    • option: --configuration
    • default: "sim"
  • Parfile
    • The Cactus parameter file to use
    • option: --parfile
  • Walltime
    • The total amount of wall time required
    • option: --walltime
    • default: MDB Key maxwalltime
  • Processors
    • The total number of processors to use
    • option: --procs
    • default: 1

The option --configuration only needs to be specified the first time you submit a simulation. Subsequent re-submissions of the same simulation (for restarting from checkpoints) will always use the same configuration that was specified the first time. Here is an example of submitting a simulation named "static_tov" using the aforementioned options:

simfactory/sim submit static_tov --configuration sim-debug --parfile=par/static_tov.par --walltime=4:00:00 --procs=8

It is possible to submit a simulation using shorthand notation where you do not need to specify the option names, but have to specify the options in a certain order. If you don't specify a simulation name using the shorthand syntax, a simulation name will be derived from from the parameter file name.

simfactory/sim submit [<simulationname>] <parfile> <walltime> <procs>

An example is

simfactory/sim submit par/static_tov.par 4:00:00 8

Additional Options: Submission

  • Number of OpenMP Threads
    • The number of OpenMP threads per MPI process. (You specify the total number of processors (cores), and the number of OpenMP threads; the number of MPI processes is then calculated automatically.)
    • option: --num-threads
    • default: 1 (as if OpenMP was not used)
  • Allocation
    • The allocation for the simulation, overriding the corresponding MDB entry
    • option: --allocation
    • default: taken from the MDB
  • Queue
    • The queue for the simulation, overriding the corresponding MDB entry
    • option: --queue
    • default: taken from the MDB
  • Processors per node
    • The number of processors per node requested from the queueing system
    • option: --ppn
    • default: all processors on a node
  • Used processors per node
    • The number of processors per node that should actually be used, allowing under-using nodes even if the queueing system does not allow it. (The remaining processors will idle and will remain unused.)
    • option: --ppn-used
    • default: all processors on a node

Running a Simulation

The Simulation Factory can execute a simulation directly, bypassing the queuing system. Running a simulation directly uses the same options, but ignores wall time limit etc. You use the run command for this:

simfactory/sim run static_tov --configuration sim-debug --parfile=par/static_tov.par --procs=8

Additional Options: Running

See #Aditional Options: Submission

Other Simulation Commands

To launch an interactive session on a compute node, use the command

simfactory/sim interactive --procs=8 --walltime=4:00:00

This leads to a login shell on the compute node, but is otherwise similar to the submit command.

To stop a simulation:

simfactory/sim stop <simulationname>

To purge (put in the basedir/TRASH folder) an existing simulation:

simfactory/sim purge <simulationname> [--restart-id=<restartid>]

To show the current output (stdout and stderr) for a given simulation:

simfactory/sim show-output <simulationname> [--restart-id=<restartid>]

What's Produced

When a simulation is submitted for the first time, all necessary information from the Cactus build configuration is brought into a specific simulation folder created underneath the basedir directory. Contained inside this folder, which has the same name as the specified simulation, are the executable, run script, submit script, a SIMFACTORY folder, a log file, and the output directories for each individual restart.

Simulations are self-contained, and once created do not rely on outside information. For example, recompiling the executable or changing the parameter file that were used to submit a simulation will not influence the simulation, since the simulation contains copies of both. This ensures that simulations can continue to run unperturbed even weeks after they have been created.

Here is the contents of the simulation folder "static_tov" with several restarts in it:

[mwt@eric2 simulations]$ ls -l static_tov
total 32
-rw-r--r--  1 mwt lsuusers    0 Sep 30 13:30 LOG
drwxr-xr-x  3 mwt lsuusers 4096 Aug 20 10:19 output-0000
drwxr-xr-x  4 mwt lsuusers 4096 Aug 20 10:19 output-0001
drwxr-xr-x  4 mwt lsuusers 4096 Aug 20 10:24 output-0002
drwxr-xr-x  3 mwt lsuusers 4096 Aug 20 23:57 output-0003
drwxr-xr-x  4 mwt lsuusers 4096 Sep 17 09:02 output-0004
drwxr-xr-x  7 mwt lsuusers 4096 Aug 20 10:18 SIMFACTORY

The SIMFACTORY folder contains the executable, the necessary script files for submission and execution, and a properties.ini file that is used by the Simulation Factory to store information about the simulation.

Each time a simulation is either run or submitted, a restart directory is created underneath the simulation directory. This restart folder has a name of the format "output-####", starting with "output-0000". Contained inside the restart folder are several internal files, the output written to stdout and stderr from the simulation, and the simulation output itself. The simulation output is typically stored inside a directory named after the basename of the parameter file. An example output directory is:

[mwt@eric2 output-0001]$ ls -l
total 172
-rw-r--r--  1 mwt lsuusers      0 Sep 17 09:06 LOG
-rw-r--r--  1 mwt lsuusers      9 Sep 17 09:06 mpd_nodefile
-rw-r--r--  1 mwt lsuusers     32 Sep 17 09:06 mpi_nodefile
-rw-r--r--  1 mwt lsuusers     33 Sep 17 09:06 NODELIST
drwxr-xr-x  3 mwt lsuusers  20480 Sep 17 16:12 qc0-mclachlan
-rw-------  1 mwt lsuusers   2520 Sep 17 21:06 qc0-mclachlan.err
-rw-------  1 mwt lsuusers 108210 Sep 17 21:06 qc0-mclachlan.out
-rw-r--r--  1 mwt lsuusers  13621 Sep 17 09:06 qc0-mclachlan.par
lrwxrwxrwx  1 mwt lsuusers     23 Sep 17 09:06 scratch -> /var/scratch/mwt/250072
drwxr-xr-x  2 mwt lsuusers   4096 Sep 17 09:06 SIMFACTORY

Script Locations

When a simulation is created, it copies the submit script and the run script from the build configuration into the folder "basedir/<simulation>/SIMFACTORY". The executable goes in the "exe/" folder, the run and submit scripts into the "run/" folder, the Cactus options list into the "cfg/" folder, and the parfile into the "par/" folder. Below shows an example SIMFACTORY directory

[mwt@eric2 SIMFACTORY]$ ls -lR
.:
total 32
drwxr-xr-x  2 mwt lsuusers 4096 Sep 17 09:06 cfg
drwxr-xr-x  2 mwt lsuusers 4096 Sep 17 09:05 data
drwxr-xr-x  2 mwt lsuusers 4096 Sep 17 09:05 exe
drwxr-xr-x  2 mwt lsuusers 4096 Sep 17 09:06 par
-rw-r--r--  1 mwt lsuusers  740 Sep 17 09:06 properties.ini
drwxr-xr-x  2 mwt lsuusers 4096 Sep 17 09:06 run
./cfg:
total 12
-rw-r--r--  1 mwt lsuusers 4041 Sep 17 09:06 OptionList
./exe:
total 121408
-rwxr-xr-x  1 mwt lsuusers 124306159 Sep 17 09:06 cactus_sim
./par:
total 24
-rw-r--r--  1 mwt lsuusers 13621 Sep 17 09:06 qc0-mclachlan.par
./run:
total 16
-rw-r--r--  1 mwt lsuusers 1162 Sep 17 09:06 RunScript
-rw-r--r--  1 mwt lsuusers  410 Sep 17 09:06 SubmitScript


Other Advanced Features

Archiving

Preliminary support for archiving of simulations has been added to the Simulation Factory using the PetaShare data storage system on LONI. This archiving system can only be used from machines that have access to PetaShare. The currently supported machines that have access to Petashare are

bluedawg
ducky
eric
lacumba
louie
neptune
oliver
painter
poseidon
queenbee
spider
tezpur
zeke

To use the PetaShare archiving system, the PetaShare [http://www.cct.lsu.edu/~sreekanth/petashare/downloads/pcommands/pcommands-2.3.tar.gz Pcommands], [http://www.cct.lsu.edu/~sreekanth/petashare/pcommands-manual(2.0).php (manual)] must be downloaded and installed onto your machine. Once that is done, add a new section in etc/defs.local.ini for the machine from which you wish to access PetaShare, and add the necessary MDB keys. An example for the machine Queen Bee is:

[queenbee]
# archive information
archivetype      = petashare
archiveuser      = numrel
archivetoolspath = /home/mwt/tools/pcommands-2.3/bin
archivebasepath  = /tempZone/home/numrel/mwt/simulations

archivebasepath is where the simulations will be stored on PetaShare.

Once this setup has been completed, you can archive an entire simulation (including all restarts) using the command

simfactory/sim archive <simulationname>

To archive just a single restart for a given simulation, issue the command

simfactory/sim archive <simulationname> --restart-id=<restartid>

To print a list of all archived simulations, use the command

simfactory/sim list-archived-simulations

To retreive an archived simulation, first use the list-archived-simulations command to retrieve the unique identifier for the simulation, and then issue the following command to retrieve a simulation. It will place the simulation in the current directory.

simfactory/sim get-archived-simulation <archiveid>