Difference between revisions of "Simulation Factory Advanced Tutorial"

From Einstein Toolkit Documentation
Jump to: navigation, search
(Script Locations)
(Script Locations)
Line 535: Line 535:
 
  -rw-r--r--  1 mwt lsuusers 13621 Sep 17 09:06 qc0-mclachlan.par
 
  -rw-r--r--  1 mwt lsuusers 13621 Sep 17 09:06 qc0-mclachlan.par
  
./run:
+
./run:
total 16
+
total 16
-rw-r--r--  1 mwt lsuusers 1162 Sep 17 09:06 RunScript
+
-rw-r--r--  1 mwt lsuusers 1162 Sep 17 09:06 RunScript
-rw-r--r--  1 mwt lsuusers  410 Sep 17 09:06 SubmitScript
+
-rw-r--r--  1 mwt lsuusers  410 Sep 17 09:06 SubmitScript
  
 
== Other Advanced Features ==
 
== Other Advanced Features ==

Revision as of 19:01, 30 September 2010

The Simulation Factory is an effective method for controlling all facets of a Cactus simulation. It provides a central facility for managing an authoritative source tree, controlling and providing remote access to many commonly-used HPC machines including LONI and the TeraGrid, builds and compiles a Cactus source tree into many independent configurations, and can also manage a simulation all the way from creation to output.

Getting Started

In order to begin using The Simulation Factory, it must be checked out from svn. The Simulation Factory typically resides in the simfactory folder inside a Cactus source tree. This can be accomplished with the following svn command:

svn co https://svn.cct.lsu.edu/repos/numrel/simfactory/branches/PYSIM_2010 simfactory

The Simulation Factory can also be placed in an independent location to be used with multiple Cactus source trees. This approach will be detailed later.

Initial Setup

Once The Simulation Factory has been checked out from svn, the next step is to create two required configuration files. Assuming The Simulation Factory has been checked out into the simfactory folder, this initial configuration can be accomplished with the following commands:

cp simfactory/etc/defs.ini.example simfactory/etc/defs.ini
cp simfactory/etc/defs.local.ini.simple simfactory/etc/defs.local.ini

Edit simfactory/etc/defs.local.ini and replace

  • YOUR_LOGIN with your usual username
  • YOUR@EMAIL.ADDRESS with your usual email address
  • YOUR_ALLOCATION with your usual allocation

Additional Configuration

The Simulation Factory contains a database known as the Machine Database. This collection of information is used to define and help mitigate the uniqueness of each individual HPC machine. The Machine Database is an authoritative collection of information, and is generally not meant to be edited by a user. To add, or change properties of a Machine Database entry, simfactory/etc/defs.local.ini is used. For instance, if an alternative username, allocation, and sourcebasedir is needed for the machine queenbee, you would add the following section:

[queenbee]
user          = queenbee_username
allocation    = queenbee_allocation
sourcebasedir = /work/@USER@

There are several macros that can aide in simplifying configuration. For configuration purposes, the most useful is @USER@. This macro expands to the user property of the Machine Database entry. If user was defined in the [default] section of simfactory/etc/defs.local.ini then it will contain that value. An expanded list of useful macros can be found in the #Macros section

To get a list of preconfigured machines, issue the following command:

simfactory/sim list-machines

Local Workstation Configuration

In order to use a local workstation with The Simulation Factory, a Machine Database entry must be created. Before getting started, the hostname of the local machine must be determined. It is through this hostname that The Simulation Factory matches a Machine Database entry to the executing machine. The hostname can be determined using the following command:

hostname

Once you have the hostname, issue the following command:

cp simfactory/etc/mdb/generic.ini simfactory/etc/mdb/<hostname>.ini

Edit simfactory/etc/mdb/<hostname>.ini and replace

  • [generic] with [<hostname>]
    • The section header for this machine database entry must be a unique value and must match the nickname property exactly.
  • nickname = generic with nickname = <hostname>
  • hostname = generic with hostname = <hostname>
  • sourcebasedir = /home/@USER@ with the correct root path under which all your Cactus source trees reside.
  • basedir = /home/@USER@/simulations with the desired folder for simulation output

user, email, and allocation can safely be ignored, as the values from the [default] section of simfactory/etc/defs.local.ini will propagate to this entry.

Accessing Remote Systems

The Simulation Factory provides a convenient facility for handling remote communication and file transfer with any known machine. Using this facility, a user can synchronize an authoritative source tree, get an interactive shell on the remote system, or execute a command, locally or remotely.

Information Commands

The following commands can be used to discover information about a machine, or list all known, configured machines.

List all known machines

simfactory/sim list-machines

List details about a single machine

simfactory/sim list-machine <machine>

Print the current Machine Database to the screen

simfactory/sim print-mdb

Print the Machine Database entry for a single machine

simfactory/sim print-mdb <machine>

Get the machine that The Simulation Factory is currently being executed on

simfactory/sim print-machine


Syncing

Historically, Cactus and the Einstein Toolkit have not been installed into a central location, and instead are built on-demand for a certain thornlist. In order to aide this approach, The Simulation Factory has the ability to synchronize a Cactus/Einstein Toolkit developer's local, authoritative source tree to a remote HPC machine to be compiled and ran.

Remote access services are implemented on top of ssh, and ssh-like mechanisms such as gsi-ssh. Currently you must manually manage all ssh keys and passwords.

Configuration

Before syncing a small amount of configuration must be performed. It is necessary to either verify the defaults are correct, or to define the correct values for the following keys

  • sourcebasedir
    • The root directory under which the Cactus source tree will reside
  • basedir
    • The root directory which all simulation output will reside
  • user
    • The username for remote access

You can see the configured values by issuing the following command

simfactory/sim print-mdb <machine>

If it is determined that the values for those entries need to be changed. Edit simfactory/etc/defs.local.ini and add an entry for the machine being used. This entry will augment the existing Machine Database entry, updating the default values with the values specified. An example for the machine queenbee can be see in the #Additional Configuration section.

Additionally, to see/modify the list of files and directories that are synchronized, edit simfactory/etc/defs.ini and find the following two keys

  • rsync-sources
    • The list of files and directories that will be copied when the option --sync-sourcetree is enabled
  • rsync-parfiles
    • The list of files and directories that will be copied when the option --sync-parfiles is enabled. This list of files typically includes just parameter files.
  • rsync-excludes
    • The list of files and directories that will be expressly excluded from syncing

Performing a Sync

A sync command takes two arguments, both of which default to true.

  • sync-sourcetree
    • Enable syncing of the list of files and folders specified by the aforementioned rsync-sources configuration entry.
  • sync-parfiles
    • Enable syncing of the list of files and folders specified by the aforementioned rsync-parfiles configuration entry.

A default sync can be performed by issuing the following command

simfactory/sim sync <machine>

To sync only parfiles, you can negate the --sync-sourcetree argument with the following command

simfactory/sim sync <machine> --nosync-sourcetree

If the desire is to perform a sync from one remote machine to another remote machine, this can be accomplished with the following command

simfactory/sim sync <tomachine> --remotemachine=<frommachine>

Remote Login

The Simulation Factory provides the ability to receive an interactive shell on the remote system. This can be initiated with the following command

simfactory/sim login <machine>

Local/Remote Command Execution

To execute a command locally via The Simulation Factory, use the following command

simfactory/sim execute <command>

If the command is complex, and requires arguments, the command must be quoted. For example

simfactory/sim execute "ls -al"

To execute a remote command, use the following command

simfactory/sim execute <command> --remotemachine=<machine>

An example of a complex command being executed remotely is

simfactory/sim execute "find . -name *.py -exec sed -i .bk -n s/foo/bar/g {} \;" --remotemachine=queenbee

Cactus Build Configurations

The Simulation Factory provides a central facility for configuring and building Cactus source tree. When a Cactus source tree is compiled, The Simulation Factory creates a configuration for the compiled executable, storing with it information such as the Cactus options list, and the provided submission and run scripts. This configuration represents the core of what is necessary to perform Cactus execution and submission.

Information Commands

To list all existing configurations, use the following command

simfactory/sim list-configurations

Building a Configuration

In order to build a configuration, four pieces of information are needed.

  • Thornlist
    • Default: thornlist parameter of the Machine Database entry
    • Override: --thornlist=<thornlist>
  • Options List
    • Default: optionlist parameter of the Machine Database entry
    • Override: --optionlist=<optionlist>
  • Submission Script
    • Default: submitscript parameter of the Machine Database entry
    • Override: --submitscript=<submitscript>
  • Run Script
    • Default: runscript parameter of the Machine Database entry
    • Override: --runscript=<runscript>

For any pre-configured Machine Database entry, the defaults for optionlist, submitscript, and runscript should suffice.

To build a configuration with a specified thornlist, issue the following command:

simfactory/sim build [<configurationname>] --thornlist=<thornlist>

If you choose to omit the configuration name, it will default to 'sim'. If one of the following options is specified, debug, profile, unsafe, and optimise, then the configuration name will append the specified option onto the end of it. For instance, if you specify --debug with a configuration name 'mybuild', then the configuration name will be mybuild-debug

Additional Options

  • --debug
    • Enable debugging in the Cactus executable
  • --optimise
    • Enable optimisation in the Cactus executable
    • * WIll be OFF if --debug is enabled.
  • --profile
    • Build Cactus with profiling
  • --unsafe
    • Build Cactus with unsafe options
  • --reconfig
    • Force Cactus to reconfigure before building
  • --clean
    • Clean Cactus before building

What's Produced

The Simulation Factory creates a configuration based upon the input parameters (or defaults) and the compiled executable. Configurations live in the configs folder inside the Cactus source tree, and compiled executables live inside the exe folder also inside the Cactus source tree. The following is an example directory structure of the compiled configuration sim

Cactus/
Cactus/exe/
Cactus/exe/cactus_sim                                  * Follows the naming convention cactus_<configuration>

Cactus/configs/
Cactus/configs/sim/
Cactus/configs/sim/bindings/
Cactus/configs/sim/build/
Cactus/configs/sim/config-data/
Cactus/configs/sim/lib/
Cactus/configs/sim/scratch/
Cactus/configs/sim/OptionList
Cactus/configs/sim/RunScript
Cactus/configs/sim/SubmitScript
Cactus/configs/sim/ThornList

Script Locations

The Simulation Factory provides default scripts for every one of its preconfigured machines. These scripts can be found in the following locations

  • Option Lists
    • MDB Key: optionlist
    • Location: simfactory/etc/optionlists
  • Submit Scripts
    • MDB Key: submitscript
    • Location: simfactory/etc/submitscripts
  • Run Scripts
    • MDB Key: runscript
    • Location: simfactory/etc/runscripts

To determine, for instance, which option list queenbee uses by default, issue the following command

simfactory/sim print-mdb queenbee | grep optionlist

Managing Simulations

The Simulation Factory provides a convenient, consistent facility for submitting, executing, and managing simulations. This is accomplished through two main commands: submit and run.

Information Commands

The status of all simulations can be seen with the following command

simfactory/sim list-simulations

If a more detailed look at each simulation is required, the verbose option can be specified

simfactory/sim list-simulations --verbose

Submitting a Simulation

Four primary pieces of information are necessary when submitting a simulation to the host queuing system. They are

  • Configuration
    • The Cactus build configuration to use.
    • option: --configuration
    • default: "sim"
  • Parfile
    • The Cactus parameter file to use
    • option: --parfile
  • Walltime
    • The amount of CPU time to request
    • option: --walltime
    • default: MDB Key: maxwalltime
  • Procs
    • The number of processors to use
    • option: --procs
    • default: 1

--configuration only needs to be specified the first time you submit a simulation. Subsequent submissions of the same simulation will use whatever configuration was specified the first time. Here is an example of submitting a simulation named "static_tov" using the aforementioned options

simfactory/sim submit static_tov --configuration sim-debug --parfile=par/static_tov.par --walltime=4:00:00 --procs=8

It is possible to submit a simulation using shorthand, where you can specify the options in a certain order. If you don't specify a simulation name using the shorthand syntax, it will attempt to derive the simulation name from the basename of the parfile specified.

simfactory/sim submit [<simulationname>] <parfile> <walltime> <procs>

An example

simfactory/sim submit par/static_tov.par 4:00:00 8


Additional Options: Submission

  • Processors Per Node
    • The number of processors per node to use.
    • option: --ppn
    • default: 1
  • Memory
    • The amount of memory to use
    • option: --memory
    • default: 1024
  • cpufreq
    • The frequency of the CPU
    • option: --cpufreq
    • default: 0
  • allocation
    • The allocation for the simulation to use
    • option: --allocation
    • default:
  • queue
    • The queue for the simulation to use
    • option: --queue
    • default: "checkpt"


Running a Simulation

The Simulation Factory can execute a simulation directly, bypassing a queuing system. Running a simulation directly uses the same options, minus the walltime, as submitting a simulation, only using the run command instead. An example

simfactory/sim run static_tov --parfile=par/static_tov.par --procs=8

If this simulation does not exist, --configuration=<configuration> will need to be specified the first time the simulation is run.

Additional Options: Running

See #Aditional Options: Submission

Other Simulation Commands

To launch an interactive session on a compute node, use the following command

simfactory/sim interactive --procs=8 --walltime=4:00:00

To stop a simulation

simfactory/sim stop <simulationname> [--restart-id=<restartid>]

To purge (put in the basedir/TRASH folder) an existing simulation

simfactory/sim purge <simulationname>[--restart-id=<restartid>]

To show the output for a given simulation

simfactory/sim show-output <simulationname> [--restart-id=<restartid>]

What's Produced

When a simulation is run for the first time, all the necessary information from the Cactus build configuration is brought into a specific simulation folder created underneath the basedir. Contained inside this folder, which has the same name as the specified simulation, are the executable, the run script, the submit script, the SIMFACTORY folder, a log file, and the output directories of each individual restart.

Here is the contents of the simulation folder "btest" with several restarts in it

[mwt@eric2 simulations]$ ls -al btest
total 32
drwxr-xr-x  8 mwt lsuusers 4096 Sep 17 09:03 .
drwxr-xr-x  8 mwt lsuusers 4096 Sep 27 11:32 ..
-rw-r--r--  1 mwt lsuusers    0 Sep 30 13:30 LOG
drwxr-xr-x  3 mwt lsuusers 4096 Aug 20 10:19 output-0001
drwxr-xr-x  4 mwt lsuusers 4096 Aug 20 10:19 output-0002
drwxr-xr-x  4 mwt lsuusers 4096 Aug 20 10:24 output-0003
drwxr-xr-x  3 mwt lsuusers 4096 Aug 20 23:57 output-0004
drwxr-xr-x  4 mwt lsuusers 4096 Sep 17 09:02 output-0005
drwxr-xr-x  7 mwt lsuusers 4096 Aug 20 10:18 SIMFACTORY


The SIMFACTORY folder contains the executable, the necessary script files needed for submitting and execution, and a properties.ini file that is used by the Simulation Factory to store information about the simulation.

Each time a simulation is either run or submitted, a restart directory is created underneath the simulation directory. This restart folder has the format of output-####, starting with output-0001. Contained inside the restart folder are several internal files, the output written to stdout and stderr from the simulation, and the simulation output itself. The simulation output is stored inside a directory named after the basename of the parameter file. An example output directory is below


[mwt@eric2 output-0001]$ ls -al
total 172
drwxr-xr-x  4 mwt lsuusers   4096 Sep 17 21:06 .
drwxr-xr-x  4 mwt lsuusers   4096 Sep 27 11:29 ..
-rw-r--r--  1 mwt lsuusers      0 Sep 17 09:06 LOG
-rw-r--r--  1 mwt lsuusers      9 Sep 17 09:06 mpd_nodefile
-rw-r--r--  1 mwt lsuusers     32 Sep 17 09:06 mpi_nodefile
-rw-r--r--  1 mwt lsuusers     33 Sep 17 09:06 NODELIST
drwxr-xr-x  3 mwt lsuusers  20480 Sep 17 16:12 qc0-mclachlan
-rw-------  1 mwt lsuusers   2520 Sep 17 21:06 qc0-mclachlan.err
-rw-------  1 mwt lsuusers 108210 Sep 17 21:06 qc0-mclachlan.out
-rw-r--r--  1 mwt lsuusers  13621 Sep 17 09:06 qc0-mclachlan.par
lrwxrwxrwx  1 mwt lsuusers     23 Sep 17 09:06 scratch -> /var/scratch/mwt/250072
drwxr-xr-x  2 mwt lsuusers   4096 Sep 17 09:06 SIMFACTORY

Script Locations

When a simulation is created, it copies the submit script and the run script from the build configuration into the basedir/<simulation>/SIMFACTORY folder. The executable goes in the exe/ folder, the run and submit scripts into the run/ folder, the Cactus options list into the cfg/ folder, and the parfile into the par/ folder. Below shows an example SIMFACTORY directory

[mwt@eric2 SIMFACTORY]$ ls -alR
.:
total 32
drwxr-xr-x  2 mwt lsuusers 4096 Sep 17 09:06 cfg
drwxr-xr-x  2 mwt lsuusers 4096 Sep 17 09:05 data
drwxr-xr-x  2 mwt lsuusers 4096 Sep 17 09:05 exe
drwxr-xr-x  2 mwt lsuusers 4096 Sep 17 09:06 par
-rw-r--r--  1 mwt lsuusers  740 Sep 17 09:06 properties.ini
drwxr-xr-x  2 mwt lsuusers 4096 Sep 17 09:06 run

./cfg:
total 12
-rw-r--r--  1 mwt lsuusers 4041 Sep 17 09:06 OptionList
./exe:
total 121408
-rwxr-xr-x  1 mwt lsuusers 124306159 Sep 17 09:06 cactus_sim
./par:
total 24
-rw-r--r--  1 mwt lsuusers 13621 Sep 17 09:06 qc0-mclachlan.par
./run:
total 16
-rw-r--r--  1 mwt lsuusers 1162 Sep 17 09:06 RunScript
-rw-r--r--  1 mwt lsuusers  410 Sep 17 09:06 SubmitScript

Other Advanced Features

Macros

Archiving