Remote Mini-Workshop Series

From Einstein Toolkit Documentation
Revision as of 10:13, 8 December 2016 by 38.104.158.162 (talk) (To-Do)
Jump to: navigation, search

Quite a few interesting mini-projects are being undertaken at the moment. It is worthwhile to advertise these to the larger community to invite participation. In our weekly calls we decided that we should set aside a few hours or half a day for one of these. I now suggest that we turn this into a mini-series, where we pick from the list below until we run out of interest. Maybe this will keep us busy until Christmas.

We picked Wednesday 9:00 EST as meeting time. We'll meet on Google Hangout (probably), details TBA here.

  1. Spack: installing external package https://github.com/LLNL/spack [Erik]
  2. SimulationIO: a new file format that's easy to read https://github.com/eschnett/SimulationIO
  3. FunHPC (multi-threading with futures): overview https://bitbucket.org/eschnett/funhpc.cxx [Erik, Christian, Ian]
  4. FunHPC (multi-threading with futures): shoehorning this into Cactus [Erik, Christian, Ian]
  5. StencilOps: more efficient finite differencing stencils in Kranc [Ian]
  6. DG: Jonah and my new DG formulation that can replace FD methods https://arxiv.org/abs/1604.00075 [Federico]
  7. The "distribute" script: testing the Einstein Toolkit on HPC systems
  8. Towards a Kranc implementation of a hydro formulation [Ian, Federico]

If you are interested in one of these topics, then add your name in square brackets after the topic.

If you are interested in presenting a topic yourself, then add a new item to the list.

Mini-Workshop #1: Wed, Dec 7, 2016, 9:00 EST

Topic: FunHPC (multi-threading with futures): overview https://bitbucket.org/eschnett/funhpc.cxx

Venue: Google Hangouts https://hangouts.google.com/call/jjkffrrvmnbhrooiyjxhfeb2ume

Agenda:

  • FunHPC design overview
  • Comparison to OpenMP
  • CPU vs. memory performance
  • Cache and multi-threading, loop tiling
  • How to parallelize an application via FunHPC
  • Building and installing
  • Examples
  • Benchmarks

Building and Installing

FunHPC is available on BitBucket https://bitbucket.org/eschnett/funhpc.cxx . It requires several other packages to be installed as well, namely

  • cereal: Serializing C++ objects
  • hwloc: Determining the hardware (core, cache) layout
  • jemalloc: Fast multi-threaded memory manager (malloc replacement)
  • OpenMPI: FunHPC prefers this MPI library
  • Qthreads: Fine-grained multi-threading (providing a C interface)

To install FunHPC from scratch, you need to install these other libraries first, and then edit FunHPC's Makefile. Google Test is also required, but will be downloaded automatically. Apologies for this unprofessional setup. In the future, FunHPC should be converted to use cmake, and Google Test should be packages as part of it.

When you "make", you need to pass certain environment variables:

  • CEREAL_DIR (have to set in Makefile)
  • HWLOC_DIR
  • JEMALLOC_DIR
  • QTHREADS_DIR
  • CXX
  • MPICXX
  • MPIRUN

For example:

 make CEREAL_DIR=... HWLOC_DIR=... JEMALLOC_DIR=... QTHREADS_DIR=... CXX=c++ MPICXX=mpicxx MPIRUN=mpirun

I have installed FunHPC and all its dependencies on Wheeler (Caltech) into the directory /home/eschnett/src/spack-view . This includes a recent version of GCC that was used to build these libraries. If you want to use this, then I highly recommend using this version of GCC as well as all the other software installed in this directory (e.g. HDF5, PAPI, and many more) instead of combining these with system libraries.

As a side note, Roland Haas says that the Simfactory configuration for Wheeler is using this directory. This is not really relevant yet since we won't be using Cactus in the beginning.

Running FunHPC Applications

FunHPC is an MPI application, but we are not interested in using MPI today. We might still need to use mpirun, but only in a trivial way.

Qthreads etc. use environment variables to change certain settings. Some settings are necessary to prevent problems. These "problems" are usually resource exhaustion (e.g. not enough stack space), which Unix helpfully all translates into "Segmentation fault". I am usually setting these environment variables:

 export QTHREAD_NUM_SHEPHERDS="${nshep}"
 export QTHREAD_NUM_WORKERS_PER_SHEPHERD="${nwork}"
 export QTHREAD_STACK_SIZE=8388608 # Byte 
 export QTHREAD_GUARD_PAGES=0      # 0, 1
 export QTHREAD_INFO=1

Here "nshep" is the number of sockets (aka NUMA nodes), and "nwork" the number of cores per socket. You can find these e.g. via "hwloc-info". On Wheeler:

 $ ~/src/spack-view/bin/hwloc-info
 depth 0:        1 Machine (type #1)
  depth 1:       2 NUMANode (type #2)
   depth 2:      2 Package (type #3)
    depth 3:     2 L3Cache (type #4)
     depth 4:    24 L2Cache (type #4)
      depth 5:   24 L1dCache (type #4)
       depth 6:  24 L1iCache (type #4)
        depth 7: 24 Core (type #5)
         depth 8:        24 PU (type #6)

Thus I choose "nshep=2" and "nwork=12" on Wheeler.

By default, Qthreads chooses a rather small stack size of 8 kByte per thread. If a thread uses more stack space, random memory will be overwritten. You can enable guard pages, which is good for debugging. This will catch many cases where the stack overflows. Finally, Qthreads can produce info output at startup that might be helpful.

On Wheeler:

 ~eschnett/src/spack-view/bin/mpirun -np 1 -x QTHREAD_NUM_SHEPHERDS=2 -x QTHREAD_NUM_WORKERS_PER_SHEPHERD=12 -x QTHREAD_STACK_SIZE=1000000 ~eschnett/src/spack-view/bin/fibonacci

To-Do

This is a wiki -- everybody should add missing items here

  • Put loop parallelization example onto wiki (and make it compile)
  • [done] Correct broken FunHPC grid self-test
  • Maybe: Make FunHPC compile with Clang on Darwin
  • Announce next meeting (Wed Dec. 14, 12:00 EST)
  • Maybe: Set up FunHPC on Bethe or Fermi (if Frank can't get access to Wheeler)
  • Add pointers to http://cppreference.com to wiki (for async, future)
  • Describe future, shared_future; async's launch:: options
  • [done] Provide make wrapper for Wheeler
  • Make sure all FunHPC examples run on Wheeler
  • If possible: look at weird performance numbers (350 ms vs. 3500 ms on Wheeler's head node); run on compute node instead?
  • Add pointers to package web sites to build instructions
  • Describe Cereal patch