Difference between revisions of "Simulation Factory Advanced Tutorial"
(→Script Locations) |
(→Script Locations) |
||
Line 535: | Line 535: | ||
-rw-r--r-- 1 mwt lsuusers 13621 Sep 17 09:06 qc0-mclachlan.par | -rw-r--r-- 1 mwt lsuusers 13621 Sep 17 09:06 qc0-mclachlan.par | ||
− | ./run: | + | ./run: |
− | total 16 | + | total 16 |
− | -rw-r--r-- 1 mwt lsuusers 1162 Sep 17 09:06 RunScript | + | -rw-r--r-- 1 mwt lsuusers 1162 Sep 17 09:06 RunScript |
− | -rw-r--r-- 1 mwt lsuusers 410 Sep 17 09:06 SubmitScript | + | -rw-r--r-- 1 mwt lsuusers 410 Sep 17 09:06 SubmitScript |
== Other Advanced Features == | == Other Advanced Features == |
Revision as of 19:01, 30 September 2010
The Simulation Factory is an effective method for controlling all facets of a Cactus simulation. It provides a central facility for managing an authoritative source tree, controlling and providing remote access to many commonly-used HPC machines including LONI and the TeraGrid, builds and compiles a Cactus source tree into many independent configurations, and can also manage a simulation all the way from creation to output.
Contents
Getting Started
In order to begin using The Simulation Factory, it must be checked out from svn. The Simulation Factory typically resides in the simfactory folder inside a Cactus source tree. This can be accomplished with the following svn command:
svn co https://svn.cct.lsu.edu/repos/numrel/simfactory/branches/PYSIM_2010 simfactory
The Simulation Factory can also be placed in an independent location to be used with multiple Cactus source trees. This approach will be detailed later.
Initial Setup
Once The Simulation Factory has been checked out from svn, the next step is to create two required configuration files. Assuming The Simulation Factory has been checked out into the simfactory folder, this initial configuration can be accomplished with the following commands:
cp simfactory/etc/defs.ini.example simfactory/etc/defs.ini cp simfactory/etc/defs.local.ini.simple simfactory/etc/defs.local.ini
Edit simfactory/etc/defs.local.ini and replace
- YOUR_LOGIN with your usual username
- YOUR@EMAIL.ADDRESS with your usual email address
- YOUR_ALLOCATION with your usual allocation
Additional Configuration
The Simulation Factory contains a database known as the Machine Database. This collection of information is used to define and help mitigate the uniqueness of each individual HPC machine. The Machine Database is an authoritative collection of information, and is generally not meant to be edited by a user. To add, or change properties of a Machine Database entry, simfactory/etc/defs.local.ini is used. For instance, if an alternative username, allocation, and sourcebasedir is needed for the machine queenbee, you would add the following section:
[queenbee] user = queenbee_username allocation = queenbee_allocation sourcebasedir = /work/@USER@
There are several macros that can aide in simplifying configuration. For configuration purposes, the most useful is @USER@. This macro expands to the user property of the Machine Database entry. If user was defined in the [default] section of simfactory/etc/defs.local.ini then it will contain that value. An expanded list of useful macros can be found in the #Macros section
To get a list of preconfigured machines, issue the following command:
simfactory/sim list-machines
Local Workstation Configuration
In order to use a local workstation with The Simulation Factory, a Machine Database entry must be created. Before getting started, the hostname of the local machine must be determined. It is through this hostname that The Simulation Factory matches a Machine Database entry to the executing machine. The hostname can be determined using the following command:
hostname
Once you have the hostname, issue the following command:
cp simfactory/etc/mdb/generic.ini simfactory/etc/mdb/<hostname>.ini
Edit simfactory/etc/mdb/<hostname>.ini and replace
- [generic] with [<hostname>]
- The section header for this machine database entry must be a unique value and must match the nickname property exactly.
- nickname = generic with nickname = <hostname>
- hostname = generic with hostname = <hostname>
- sourcebasedir = /home/@USER@ with the correct root path under which all your Cactus source trees reside.
- basedir = /home/@USER@/simulations with the desired folder for simulation output
user, email, and allocation can safely be ignored, as the values from the [default] section of simfactory/etc/defs.local.ini will propagate to this entry.
Accessing Remote Systems
The Simulation Factory provides a convenient facility for handling remote communication and file transfer with any known machine. Using this facility, a user can synchronize an authoritative source tree, get an interactive shell on the remote system, or execute a command, locally or remotely.
Information Commands
The following commands can be used to discover information about a machine, or list all known, configured machines.
List all known machines
simfactory/sim list-machines
List details about a single machine
simfactory/sim list-machine <machine>
Print the current Machine Database to the screen
simfactory/sim print-mdb
Print the Machine Database entry for a single machine
simfactory/sim print-mdb <machine>
Get the machine that The Simulation Factory is currently being executed on
simfactory/sim print-machine
Syncing
Historically, Cactus and the Einstein Toolkit have not been installed into a central location, and instead are built on-demand for a certain thornlist. In order to aide this approach, The Simulation Factory has the ability to synchronize a Cactus/Einstein Toolkit developer's local, authoritative source tree to a remote HPC machine to be compiled and ran.
Remote access services are implemented on top of ssh, and ssh-like mechanisms such as gsi-ssh. Currently you must manually manage all ssh keys and passwords.
Configuration
Before syncing a small amount of configuration must be performed. It is necessary to either verify the defaults are correct, or to define the correct values for the following keys
- sourcebasedir
- The root directory under which the Cactus source tree will reside
- basedir
- The root directory which all simulation output will reside
- user
- The username for remote access
You can see the configured values by issuing the following command
simfactory/sim print-mdb <machine>
If it is determined that the values for those entries need to be changed. Edit simfactory/etc/defs.local.ini and add an entry for the machine being used. This entry will augment the existing Machine Database entry, updating the default values with the values specified. An example for the machine queenbee can be see in the #Additional Configuration section.
Additionally, to see/modify the list of files and directories that are synchronized, edit simfactory/etc/defs.ini and find the following two keys
- rsync-sources
- The list of files and directories that will be copied when the option --sync-sourcetree is enabled
- rsync-parfiles
- The list of files and directories that will be copied when the option --sync-parfiles is enabled. This list of files typically includes just parameter files.
- rsync-excludes
- The list of files and directories that will be expressly excluded from syncing
Performing a Sync
A sync command takes two arguments, both of which default to true.
- sync-sourcetree
- Enable syncing of the list of files and folders specified by the aforementioned rsync-sources configuration entry.
- sync-parfiles
- Enable syncing of the list of files and folders specified by the aforementioned rsync-parfiles configuration entry.
A default sync can be performed by issuing the following command
simfactory/sim sync <machine>
To sync only parfiles, you can negate the --sync-sourcetree argument with the following command
simfactory/sim sync <machine> --nosync-sourcetree
If the desire is to perform a sync from one remote machine to another remote machine, this can be accomplished with the following command
simfactory/sim sync <tomachine> --remotemachine=<frommachine>
Remote Login
The Simulation Factory provides the ability to receive an interactive shell on the remote system. This can be initiated with the following command
simfactory/sim login <machine>
Local/Remote Command Execution
To execute a command locally via The Simulation Factory, use the following command
simfactory/sim execute <command>
If the command is complex, and requires arguments, the command must be quoted. For example
simfactory/sim execute "ls -al"
To execute a remote command, use the following command
simfactory/sim execute <command> --remotemachine=<machine>
An example of a complex command being executed remotely is
simfactory/sim execute "find . -name *.py -exec sed -i .bk -n s/foo/bar/g {} \;" --remotemachine=queenbee
Cactus Build Configurations
The Simulation Factory provides a central facility for configuring and building Cactus source tree. When a Cactus source tree is compiled, The Simulation Factory creates a configuration for the compiled executable, storing with it information such as the Cactus options list, and the provided submission and run scripts. This configuration represents the core of what is necessary to perform Cactus execution and submission.
Information Commands
To list all existing configurations, use the following command
simfactory/sim list-configurations
Building a Configuration
In order to build a configuration, four pieces of information are needed.
- Thornlist
- Default: thornlist parameter of the Machine Database entry
- Override: --thornlist=<thornlist>
- Options List
- Default: optionlist parameter of the Machine Database entry
- Override: --optionlist=<optionlist>
- Submission Script
- Default: submitscript parameter of the Machine Database entry
- Override: --submitscript=<submitscript>
- Run Script
- Default: runscript parameter of the Machine Database entry
- Override: --runscript=<runscript>
For any pre-configured Machine Database entry, the defaults for optionlist, submitscript, and runscript should suffice.
To build a configuration with a specified thornlist, issue the following command:
simfactory/sim build [<configurationname>] --thornlist=<thornlist>
If you choose to omit the configuration name, it will default to 'sim'. If one of the following options is specified, debug, profile, unsafe, and optimise, then the configuration name will append the specified option onto the end of it. For instance, if you specify --debug with a configuration name 'mybuild', then the configuration name will be mybuild-debug
Additional Options
- --debug
- Enable debugging in the Cactus executable
- --optimise
- Enable optimisation in the Cactus executable
- * WIll be OFF if --debug is enabled.
- --profile
- Build Cactus with profiling
- --unsafe
- Build Cactus with unsafe options
- --reconfig
- Force Cactus to reconfigure before building
- --clean
- Clean Cactus before building
What's Produced
The Simulation Factory creates a configuration based upon the input parameters (or defaults) and the compiled executable. Configurations live in the configs folder inside the Cactus source tree, and compiled executables live inside the exe folder also inside the Cactus source tree. The following is an example directory structure of the compiled configuration sim
Cactus/ Cactus/exe/ Cactus/exe/cactus_sim * Follows the naming convention cactus_<configuration> Cactus/configs/ Cactus/configs/sim/ Cactus/configs/sim/bindings/ Cactus/configs/sim/build/ Cactus/configs/sim/config-data/ Cactus/configs/sim/lib/ Cactus/configs/sim/scratch/ Cactus/configs/sim/OptionList Cactus/configs/sim/RunScript Cactus/configs/sim/SubmitScript Cactus/configs/sim/ThornList
Script Locations
The Simulation Factory provides default scripts for every one of its preconfigured machines. These scripts can be found in the following locations
- Option Lists
- MDB Key: optionlist
- Location: simfactory/etc/optionlists
- Submit Scripts
- MDB Key: submitscript
- Location: simfactory/etc/submitscripts
- Run Scripts
- MDB Key: runscript
- Location: simfactory/etc/runscripts
To determine, for instance, which option list queenbee uses by default, issue the following command
simfactory/sim print-mdb queenbee | grep optionlist
Managing Simulations
The Simulation Factory provides a convenient, consistent facility for submitting, executing, and managing simulations. This is accomplished through two main commands: submit and run.
Information Commands
The status of all simulations can be seen with the following command
simfactory/sim list-simulations
If a more detailed look at each simulation is required, the verbose option can be specified
simfactory/sim list-simulations --verbose
Submitting a Simulation
Four primary pieces of information are necessary when submitting a simulation to the host queuing system. They are
- Configuration
- The Cactus build configuration to use.
- option: --configuration
- default: "sim"
- Parfile
- The Cactus parameter file to use
- option: --parfile
- Walltime
- The amount of CPU time to request
- option: --walltime
- default: MDB Key: maxwalltime
- Procs
- The number of processors to use
- option: --procs
- default: 1
--configuration only needs to be specified the first time you submit a simulation. Subsequent submissions of the same simulation will use whatever configuration was specified the first time. Here is an example of submitting a simulation named "static_tov" using the aforementioned options
simfactory/sim submit static_tov --configuration sim-debug --parfile=par/static_tov.par --walltime=4:00:00 --procs=8
It is possible to submit a simulation using shorthand, where you can specify the options in a certain order. If you don't specify a simulation name using the shorthand syntax, it will attempt to derive the simulation name from the basename of the parfile specified.
simfactory/sim submit [<simulationname>] <parfile> <walltime> <procs>
An example
simfactory/sim submit par/static_tov.par 4:00:00 8
Additional Options: Submission
- Processors Per Node
- The number of processors per node to use.
- option: --ppn
- default: 1
- Memory
- The amount of memory to use
- option: --memory
- default: 1024
- cpufreq
- The frequency of the CPU
- option: --cpufreq
- default: 0
- allocation
- The allocation for the simulation to use
- option: --allocation
- default:
- queue
- The queue for the simulation to use
- option: --queue
- default: "checkpt"
Running a Simulation
The Simulation Factory can execute a simulation directly, bypassing a queuing system. Running a simulation directly uses the same options, minus the walltime, as submitting a simulation, only using the run command instead. An example
simfactory/sim run static_tov --parfile=par/static_tov.par --procs=8
If this simulation does not exist, --configuration=<configuration> will need to be specified the first time the simulation is run.
Additional Options: Running
See #Aditional Options: Submission
Other Simulation Commands
To launch an interactive session on a compute node, use the following command
simfactory/sim interactive --procs=8 --walltime=4:00:00
To stop a simulation
simfactory/sim stop <simulationname> [--restart-id=<restartid>]
To purge (put in the basedir/TRASH folder) an existing simulation
simfactory/sim purge <simulationname>[--restart-id=<restartid>]
To show the output for a given simulation
simfactory/sim show-output <simulationname> [--restart-id=<restartid>]
What's Produced
When a simulation is run for the first time, all the necessary information from the Cactus build configuration is brought into a specific simulation folder created underneath the basedir. Contained inside this folder, which has the same name as the specified simulation, are the executable, the run script, the submit script, the SIMFACTORY folder, a log file, and the output directories of each individual restart.
Here is the contents of the simulation folder "btest" with several restarts in it
[mwt@eric2 simulations]$ ls -al btest total 32 drwxr-xr-x 8 mwt lsuusers 4096 Sep 17 09:03 . drwxr-xr-x 8 mwt lsuusers 4096 Sep 27 11:32 .. -rw-r--r-- 1 mwt lsuusers 0 Sep 30 13:30 LOG drwxr-xr-x 3 mwt lsuusers 4096 Aug 20 10:19 output-0001 drwxr-xr-x 4 mwt lsuusers 4096 Aug 20 10:19 output-0002 drwxr-xr-x 4 mwt lsuusers 4096 Aug 20 10:24 output-0003 drwxr-xr-x 3 mwt lsuusers 4096 Aug 20 23:57 output-0004 drwxr-xr-x 4 mwt lsuusers 4096 Sep 17 09:02 output-0005 drwxr-xr-x 7 mwt lsuusers 4096 Aug 20 10:18 SIMFACTORY
The SIMFACTORY folder contains the executable, the necessary script files needed for submitting and execution, and a properties.ini file that is used by the Simulation Factory to store
information about the simulation.
Each time a simulation is either run or submitted, a restart directory is created underneath the simulation directory. This restart folder has the format of output-####, starting with output-0001. Contained inside the restart folder are several internal files, the output written to stdout and stderr from the simulation, and the simulation output itself. The simulation output is stored inside a directory named after the basename of the parameter file. An example output directory is below
[mwt@eric2 output-0001]$ ls -al total 172 drwxr-xr-x 4 mwt lsuusers 4096 Sep 17 21:06 . drwxr-xr-x 4 mwt lsuusers 4096 Sep 27 11:29 .. -rw-r--r-- 1 mwt lsuusers 0 Sep 17 09:06 LOG -rw-r--r-- 1 mwt lsuusers 9 Sep 17 09:06 mpd_nodefile -rw-r--r-- 1 mwt lsuusers 32 Sep 17 09:06 mpi_nodefile -rw-r--r-- 1 mwt lsuusers 33 Sep 17 09:06 NODELIST drwxr-xr-x 3 mwt lsuusers 20480 Sep 17 16:12 qc0-mclachlan -rw------- 1 mwt lsuusers 2520 Sep 17 21:06 qc0-mclachlan.err -rw------- 1 mwt lsuusers 108210 Sep 17 21:06 qc0-mclachlan.out -rw-r--r-- 1 mwt lsuusers 13621 Sep 17 09:06 qc0-mclachlan.par lrwxrwxrwx 1 mwt lsuusers 23 Sep 17 09:06 scratch -> /var/scratch/mwt/250072 drwxr-xr-x 2 mwt lsuusers 4096 Sep 17 09:06 SIMFACTORY
Script Locations
When a simulation is created, it copies the submit script and the run script from the build configuration into the basedir/<simulation>/SIMFACTORY folder. The executable goes in the exe/ folder, the run and submit scripts into the run/ folder, the Cactus options list into the cfg/ folder, and the parfile into the par/ folder. Below shows an example SIMFACTORY directory
[mwt@eric2 SIMFACTORY]$ ls -alR .: total 32 drwxr-xr-x 2 mwt lsuusers 4096 Sep 17 09:06 cfg drwxr-xr-x 2 mwt lsuusers 4096 Sep 17 09:05 data drwxr-xr-x 2 mwt lsuusers 4096 Sep 17 09:05 exe drwxr-xr-x 2 mwt lsuusers 4096 Sep 17 09:06 par -rw-r--r-- 1 mwt lsuusers 740 Sep 17 09:06 properties.ini drwxr-xr-x 2 mwt lsuusers 4096 Sep 17 09:06 run ./cfg: total 12 -rw-r--r-- 1 mwt lsuusers 4041 Sep 17 09:06 OptionList
./exe: total 121408 -rwxr-xr-x 1 mwt lsuusers 124306159 Sep 17 09:06 cactus_sim
./par: total 24 -rw-r--r-- 1 mwt lsuusers 13621 Sep 17 09:06 qc0-mclachlan.par
./run: total 16 -rw-r--r-- 1 mwt lsuusers 1162 Sep 17 09:06 RunScript -rw-r--r-- 1 mwt lsuusers 410 Sep 17 09:06 SubmitScript