Difference between revisions of "Improving the new user experience"
(→Obstacles faced by new users) |
|||
(29 intermediate revisions by 3 users not shown) | |||
Line 8: | Line 8: | ||
== Brainstorming == | == Brainstorming == | ||
+ | |||
+ | (Roughly in order of how likely we are to work on it at the workshop) | ||
=== Obstacles faced by new users === | === Obstacles faced by new users === | ||
− | * | + | Work on these first: |
− | ** | + | |
− | + | * Configuring the ET on a new machine is very difficult (even just compiling, let along interactions with queuing systems etc) | |
− | ** | + | ** It's particularly frustrating to me (Ian) that simfactory cannot figure out even simple things about a new machine (e.g. ppn), and if you don't get this right you aren't allowed to run multi-threaded jobs, etc. |
− | ** | + | *** [RH] we need to distinguish clusters and latops. For laptops auto-detection makes sense, for clusters the login nodes can be quite different from the compute nodes (worst case would be compiling on a Broadwell CPU and running on a KNL CPU) also see comments on build parallelism below. |
− | * | + | *** We have a patch (https://trac.einsteintoolkit.org/ticket/2059) from Mikael Sahrling which detects the number of cores. Let's commit this during the workshop. |
− | ** | + | *** Steve is working on automatically detecting the OS to choose the optionlist, and also something to do with which packages need to be installed (https://bitbucket.org/simfactory/simfactory2/branch/os_detect). |
+ | *** SimFactory should be able to figure out everything that it asks for in sim setup (user name, which optionlist to use for a machine, source base dir | ||
+ | *** SimFactory fails to get the source base dir right if run on a supported machine from a different directory (https://trac.einsteintoolkit.org/ticket/2056)). | ||
* There are too many tutorials | * There are too many tutorials | ||
** The following are listed on the wiki: | ** The following are listed on the wiki: | ||
*** [[Tutorial for New Users]] | *** [[Tutorial for New Users]] | ||
*** [[Simplified Tutorial for New Users]] | *** [[Simplified Tutorial for New Users]] | ||
− | *** [[Getting Started for Cactus Experts]] | + | *** [[Getting Started for Cactus Experts]] - out of date, unnecessary. get rid of it. |
*** [[Compiling the Einstein Toolkit]] | *** [[Compiling the Einstein Toolkit]] | ||
+ | *** There are also the Cactus tutorials. Out of date? Found to be helpful. Maybe we can merge things in. | ||
+ | ** Have either a single tutorial or a set of non-overlapping tutorials | ||
** The names of the tutorials do not allow users to distinguish what they are for. For example, the Tutorial for New Users and Simplified Tutorial for New Users differ in that the former is run on Queen Bee, and the latter is run on a user's own laptop or workstation. | ** The names of the tutorials do not allow users to distinguish what they are for. For example, the Tutorial for New Users and Simplified Tutorial for New Users differ in that the former is run on Queen Bee, and the latter is run on a user's own laptop or workstation. | ||
** Perhaps all the above should be consolidated | ** Perhaps all the above should be consolidated | ||
− | * | + | |
− | ** | + | Do these later: |
+ | |||
+ | * The ET takes a long time to compile | ||
+ | ** Maybe have multiple thornlists | ||
+ | ** Maybe add tags to thorns, so that you can build just one type of thorn (e.g. those for BBH, or for NS, or something like that) | ||
+ | *** [RH] sounds a lot like individual thornlists. Note that we already have an "include" mechanism for thornlists so we can (this should not be the default tough) provide thornlist "fragments" | ||
+ | ** Parallel build by default | ||
+ | *** [RH] need to figure out how many cores to use. Note: login nodes can have different number of cores than compute nodes (eg 64 vs 32 on BW) and login nodes can have limits on allowed parallelism (eg the UIUC campus cluster will kill make if more that 4 identical processes are running) | ||
+ | ** Linking stage very slow - Roland has some ideas | ||
+ | ** Formaline seems to always be a suspect | ||
+ | *** [RH] Formaline is likely not the culprit for the linking stage, it does not contribute much (couple 10MB), the major size factor is actually the debug information from "-g". | ||
+ | ** Don't build MPI without warning the user. Maybe abort and give the command or instructions to build it. | ||
+ | *** [RH] that is already implemented, the option is usually called "NO_BUILD" and just needs to be added to the option lists | ||
+ | * Easy-ish on a new machine, but hard on a machine which already has a lot of packages installed, as they may be conflicting | ||
+ | * The ET has a lot of dependencies, or compiles a lot of libraries (which sometimes don't compile successfully) | ||
+ | ** Is the self-built version of OpenMPI ever actually usable? | ||
+ | *** [RH] yes, I have used it on at least one cluster. | ||
+ | ** Erik has talked about using Spack for libraries. Maybe we should push for this. Erik now prefers "nix". | ||
+ | *** [RH] the seems to me to shift dependencies to whatever (apparently rapidly evolving) build system we use. Fewer dependencies to be true, as long as the build system is able to cope with all our clusters otherwise we will end up patching the build system. | ||
+ | * Many of the examples do not work | ||
+ | ** Test automatically | ||
+ | ** Add comments to indicate which ones will not work with the toolkit | ||
+ | ** List examples on the ET website - those that work. | ||
+ | ** At a previous workshop, this was evaluated, but there was no resolution ([[Fixing examples]]) | ||
* On slower laptops, the build stage regularly hangs and has to be killed and restarted (which almost always solves the problem). Can we figure out which components are responsible, and omit them from quick-start tutorials? | * On slower laptops, the build stage regularly hangs and has to be killed and restarted (which almost always solves the problem). Can we figure out which components are responsible, and omit them from quick-start tutorials? | ||
− | ** RH | + | ** [RH] if the slower laptop also has less memory (or is a VM) then I would first try and monitor how much memory in particular the linker consumes. Testing this on my workstation it uses 1-2GB of RAM for a full ET build. On a 32bit VM with a cut down thornlist (no Formaline) it uses ~700MB. Similarly some C++ code takes a hug e amount of memory to compile. |
+ | * Comments: Some of the documentation (tex) files that come with thorns are blank. If I need more information about how to use a thorn with missing documentation, I usually look in the ccl files for clues. It would be nice to have any additional information from the ccl files in the tex files as well. (Lump all the details about a thorn in the same place.) | ||
+ | * The mailing list and one or two calls I've joined so far were a huge help. But sometimes asking "beginner questions" over public channels makes me feel as if I'm wasting people's time. Up-to-date documentation with lots of details about parameters would be amazing! (Also, of course, time-consuming to write!) | ||
+ | * I don't think this is very important, but a resources page listing some good textbooks or papers for getting started in numerical relativity might be nice for people who come in without prior knowledge. | ||
+ | * Tutorials don't have sophisticated examples | ||
+ | * Important for examples to be able to run on laptops | ||
+ | * Jupyter notebooks as an alternative (replacement?) to tutorials | ||
+ | ** [RH] should also generate static html pages from them to serve to users that don't have jupyter running | ||
+ | * Automatically-tested tutorials? | ||
+ | * Missing tutorials/documentation, e.g. | ||
+ | ** How to do checkpoint/restart | ||
+ | ** How to run with fixed mesh refinement | ||
+ | ** How to set up simfactory on a new machine | ||
+ | ** Hints and tips | ||
+ | * Have some more advanced tutorials | ||
+ | * Simplified tutorial is hard to read - big block for prerequisites | ||
+ | ** [RH] the block is needed as far as I can tell (keep in mind that I am the person who keeps the tutorial up to date at each release and tests it in a freshly installed set of VMs each time) and is only big because it has to cover mutliple OS and has a chicken-egg problem in that one cannot refer to content of eg option lists yet since we cannot even expect to be able to use curl to download GetComponents much less svn/git to get repos. | ||
+ | * Queen Bee thing is hard to maintain - Steve wants to switch away from this | ||
+ | * ET thorn documentation doesn't work in Chrome - see e.g. ADMAnalysis. Need to fix tex4ht? Or switch to another system for the documentation. | ||
+ | ** Either switch to using something else (markdown?) for docs? | ||
+ | *** [RH] that seems to only circumvent the issue. Docs in LaTeX seems to be the only viable way since we need to be able to include equations, bibtex references etc. Possibly one needs a newer version of the tex->html converter (or a different one)? | ||
+ | ** or make the PDFs available | ||
+ | *** [RH] this should only be done in addition to the html pages | ||
+ | * Demos: Jupyter example didn't work with Safari | ||
+ | * List pointers/tutorials/documentation for things that pretty much everyone is going to need to know, especially for advanced features. | ||
+ | * Subversion checkout even with homebrew on OSX doesn't work due to lack of certificates | ||
+ | ** [RH] not sure if this is still true. It worked for me with the OS provided svn (Sierra) when I last tried, but then my MacOS is hardly "pristine" so I may have added a fix and forgotten about it. Which repo was affected? | ||
+ | * curl, git, svn are not installed on a default ubuntu, so the first line of the tutorial doesn't work | ||
+ | ** [RH] the (simplified tutorial) prereq says "sudo apt-get install ... git subversion curl" for this reason. | ||
+ | * The OS-specific optionlists should have the exact command(s) needed to install the packages, not split unnecessarily into several parts with text that needs to be read | ||
+ | |||
+ | == New tutorial scheme == | ||
+ | |||
+ | We should have ''documentation'' which can be used as a complete reference for how things work. We should also have "tutorials" which are instructions which can be followed step-by-step to achieve a certain goal. We should not mix the two. The thing that new users first want to do is to set up the Einstein Toolkit so that they can run it. Let's write a "Setup tutorial" for this purpose. This will include downloading, configuring, compiling, etc. Ideally, this should apply to any machine. Then we can have additional tutorials for accomplishing other tasks. | ||
+ | |||
+ | Tutorials: | ||
+ | * [[Setup Tutorial]] - Prerequisites, download, configure, compile | ||
+ | * Test simulation - Hello World thorn (need to write it) | ||
+ | * More advanced examples | ||
+ | ** WaveToy example from Steve's tutorial | ||
+ | ** Static TOV example | ||
+ | ** Gallery: Multipatch wave equation | ||
+ | ** Gallery: BBH example (if on a cluster) |
Latest revision as of 14:47, 5 September 2017
Summary
We would like to improve the experience of new users of the Einstein Toolkit. Some of us will work on this at the North American Einstein Toolkit workshop in August 2017.
If you have suggestions for improvements, or things which don't work well right now, please add them below.
We probably won't have time during the 2-day workshop to address all that is listed here, but we will make an effort to make progress on one or two items.
Brainstorming
(Roughly in order of how likely we are to work on it at the workshop)
Obstacles faced by new users
Work on these first:
- Configuring the ET on a new machine is very difficult (even just compiling, let along interactions with queuing systems etc)
- It's particularly frustrating to me (Ian) that simfactory cannot figure out even simple things about a new machine (e.g. ppn), and if you don't get this right you aren't allowed to run multi-threaded jobs, etc.
- [RH] we need to distinguish clusters and latops. For laptops auto-detection makes sense, for clusters the login nodes can be quite different from the compute nodes (worst case would be compiling on a Broadwell CPU and running on a KNL CPU) also see comments on build parallelism below.
- We have a patch (https://trac.einsteintoolkit.org/ticket/2059) from Mikael Sahrling which detects the number of cores. Let's commit this during the workshop.
- Steve is working on automatically detecting the OS to choose the optionlist, and also something to do with which packages need to be installed (https://bitbucket.org/simfactory/simfactory2/branch/os_detect).
- SimFactory should be able to figure out everything that it asks for in sim setup (user name, which optionlist to use for a machine, source base dir
- SimFactory fails to get the source base dir right if run on a supported machine from a different directory (https://trac.einsteintoolkit.org/ticket/2056)).
- It's particularly frustrating to me (Ian) that simfactory cannot figure out even simple things about a new machine (e.g. ppn), and if you don't get this right you aren't allowed to run multi-threaded jobs, etc.
- There are too many tutorials
- The following are listed on the wiki:
- Tutorial for New Users
- Simplified Tutorial for New Users
- Getting Started for Cactus Experts - out of date, unnecessary. get rid of it.
- Compiling the Einstein Toolkit
- There are also the Cactus tutorials. Out of date? Found to be helpful. Maybe we can merge things in.
- Have either a single tutorial or a set of non-overlapping tutorials
- The names of the tutorials do not allow users to distinguish what they are for. For example, the Tutorial for New Users and Simplified Tutorial for New Users differ in that the former is run on Queen Bee, and the latter is run on a user's own laptop or workstation.
- Perhaps all the above should be consolidated
- The following are listed on the wiki:
Do these later:
- The ET takes a long time to compile
- Maybe have multiple thornlists
- Maybe add tags to thorns, so that you can build just one type of thorn (e.g. those for BBH, or for NS, or something like that)
- [RH] sounds a lot like individual thornlists. Note that we already have an "include" mechanism for thornlists so we can (this should not be the default tough) provide thornlist "fragments"
- Parallel build by default
- [RH] need to figure out how many cores to use. Note: login nodes can have different number of cores than compute nodes (eg 64 vs 32 on BW) and login nodes can have limits on allowed parallelism (eg the UIUC campus cluster will kill make if more that 4 identical processes are running)
- Linking stage very slow - Roland has some ideas
- Formaline seems to always be a suspect
- [RH] Formaline is likely not the culprit for the linking stage, it does not contribute much (couple 10MB), the major size factor is actually the debug information from "-g".
- Don't build MPI without warning the user. Maybe abort and give the command or instructions to build it.
- [RH] that is already implemented, the option is usually called "NO_BUILD" and just needs to be added to the option lists
- Easy-ish on a new machine, but hard on a machine which already has a lot of packages installed, as they may be conflicting
- The ET has a lot of dependencies, or compiles a lot of libraries (which sometimes don't compile successfully)
- Is the self-built version of OpenMPI ever actually usable?
- [RH] yes, I have used it on at least one cluster.
- Erik has talked about using Spack for libraries. Maybe we should push for this. Erik now prefers "nix".
- [RH] the seems to me to shift dependencies to whatever (apparently rapidly evolving) build system we use. Fewer dependencies to be true, as long as the build system is able to cope with all our clusters otherwise we will end up patching the build system.
- Is the self-built version of OpenMPI ever actually usable?
- Many of the examples do not work
- Test automatically
- Add comments to indicate which ones will not work with the toolkit
- List examples on the ET website - those that work.
- At a previous workshop, this was evaluated, but there was no resolution (Fixing examples)
- On slower laptops, the build stage regularly hangs and has to be killed and restarted (which almost always solves the problem). Can we figure out which components are responsible, and omit them from quick-start tutorials?
- [RH] if the slower laptop also has less memory (or is a VM) then I would first try and monitor how much memory in particular the linker consumes. Testing this on my workstation it uses 1-2GB of RAM for a full ET build. On a 32bit VM with a cut down thornlist (no Formaline) it uses ~700MB. Similarly some C++ code takes a hug e amount of memory to compile.
- Comments: Some of the documentation (tex) files that come with thorns are blank. If I need more information about how to use a thorn with missing documentation, I usually look in the ccl files for clues. It would be nice to have any additional information from the ccl files in the tex files as well. (Lump all the details about a thorn in the same place.)
- The mailing list and one or two calls I've joined so far were a huge help. But sometimes asking "beginner questions" over public channels makes me feel as if I'm wasting people's time. Up-to-date documentation with lots of details about parameters would be amazing! (Also, of course, time-consuming to write!)
- I don't think this is very important, but a resources page listing some good textbooks or papers for getting started in numerical relativity might be nice for people who come in without prior knowledge.
- Tutorials don't have sophisticated examples
- Important for examples to be able to run on laptops
- Jupyter notebooks as an alternative (replacement?) to tutorials
- [RH] should also generate static html pages from them to serve to users that don't have jupyter running
- Automatically-tested tutorials?
- Missing tutorials/documentation, e.g.
- How to do checkpoint/restart
- How to run with fixed mesh refinement
- How to set up simfactory on a new machine
- Hints and tips
- Have some more advanced tutorials
- Simplified tutorial is hard to read - big block for prerequisites
- [RH] the block is needed as far as I can tell (keep in mind that I am the person who keeps the tutorial up to date at each release and tests it in a freshly installed set of VMs each time) and is only big because it has to cover mutliple OS and has a chicken-egg problem in that one cannot refer to content of eg option lists yet since we cannot even expect to be able to use curl to download GetComponents much less svn/git to get repos.
- Queen Bee thing is hard to maintain - Steve wants to switch away from this
- ET thorn documentation doesn't work in Chrome - see e.g. ADMAnalysis. Need to fix tex4ht? Or switch to another system for the documentation.
- Either switch to using something else (markdown?) for docs?
- [RH] that seems to only circumvent the issue. Docs in LaTeX seems to be the only viable way since we need to be able to include equations, bibtex references etc. Possibly one needs a newer version of the tex->html converter (or a different one)?
- or make the PDFs available
- [RH] this should only be done in addition to the html pages
- Either switch to using something else (markdown?) for docs?
- Demos: Jupyter example didn't work with Safari
- List pointers/tutorials/documentation for things that pretty much everyone is going to need to know, especially for advanced features.
- Subversion checkout even with homebrew on OSX doesn't work due to lack of certificates
- [RH] not sure if this is still true. It worked for me with the OS provided svn (Sierra) when I last tried, but then my MacOS is hardly "pristine" so I may have added a fix and forgotten about it. Which repo was affected?
- curl, git, svn are not installed on a default ubuntu, so the first line of the tutorial doesn't work
- [RH] the (simplified tutorial) prereq says "sudo apt-get install ... git subversion curl" for this reason.
- The OS-specific optionlists should have the exact command(s) needed to install the packages, not split unnecessarily into several parts with text that needs to be read
New tutorial scheme
We should have documentation which can be used as a complete reference for how things work. We should also have "tutorials" which are instructions which can be followed step-by-step to achieve a certain goal. We should not mix the two. The thing that new users first want to do is to set up the Einstein Toolkit so that they can run it. Let's write a "Setup tutorial" for this purpose. This will include downloading, configuring, compiling, etc. Ideally, this should apply to any machine. Then we can have additional tutorials for accomplishing other tasks.
Tutorials:
- Setup Tutorial - Prerequisites, download, configure, compile
- Test simulation - Hello World thorn (need to write it)
- More advanced examples
- WaveToy example from Steve's tutorial
- Static TOV example
- Gallery: Multipatch wave equation
- Gallery: BBH example (if on a cluster)