Remote Mini-Workshop Series

From Einstein Toolkit Documentation
Revision as of 15:03, 9 December 2016 by Eschnett (talk | contribs) (To-Do)
Jump to: navigation, search

Quite a few interesting mini-projects are being undertaken at the moment. It is worthwhile to advertise these to the larger community to invite participation. In our weekly calls we decided that we should set aside a few hours or half a day for one of these. I now suggest that we turn this into a mini-series, where we pick from the list below until we run out of interest. Maybe this will keep us busy until Christmas.

We picked Wednesday 9:00 EST as meeting time. We'll meet on Google Hangout (probably), details TBA here.

  1. Spack: installing external package https://github.com/LLNL/spack [Erik]
  2. SimulationIO: a new file format that's easy to read https://github.com/eschnett/SimulationIO
  3. FunHPC (multi-threading with futures): overview https://bitbucket.org/eschnett/funhpc.cxx [Erik, Christian, Ian]
  4. FunHPC (multi-threading with futures): shoehorning this into Cactus [Erik, Christian, Ian]
  5. StencilOps: more efficient finite differencing stencils in Kranc [Ian]
  6. DG: Jonah and my new DG formulation that can replace FD methods https://arxiv.org/abs/1604.00075 [Federico]
  7. The "distribute" script: testing the Einstein Toolkit on HPC systems
  8. Towards a Kranc implementation of a hydro formulation [Ian, Federico]

If you are interested in one of these topics, then add your name in square brackets after the topic.

If you are interested in presenting a topic yourself, then add a new item to the list.

Mini-Workshop #1: Wed, Dec 7, 2016, 9:00 EST

Topic: FunHPC (multi-threading with futures): overview https://bitbucket.org/eschnett/funhpc.cxx

Venue: Google Hangouts https://hangouts.google.com/call/jjkffrrvmnbhrooiyjxhfeb2ume

Agenda:

  • FunHPC design overview
  • Comparison to OpenMP
  • CPU vs. memory performance
  • Cache and multi-threading, loop tiling
  • How to parallelize an application via FunHPC
  • Building and installing
  • Examples
  • Benchmarks

Building and Installing

FunHPC is available on BitBucket https://bitbucket.org/eschnett/funhpc.cxx . It requires several other packages to be installed as well, namely

To install FunHPC from scratch, you need to install these other libraries first, and then edit FunHPC's Makefile. Google Test is also required, but will be downloaded automatically. Apologies for this unprofessional setup. In the future, FunHPC should be converted to use cmake, and Google Test should be packages as part of it.

The Cereal package requires a patch. This patch makes it distinguish between regular pointers and function pointers. Regular pointers cannot be serialized since it is unclear whether they are valid, and if so, how the target should be allocated or freed. Function pointers, however, can be serialized -- we assume they point to functions, which are constants, so that no memory management issues arise. You need to apply the following patch:

 --- old/include/cereal/types/common.hpp
 +++ new/include/cereal/types/common.hpp
 @@ -106,14 +106,16 @@
      t = reinterpret_cast<typename common_detail::is_enum<T>::type const &>( value );
    }
 
 +#ifndef CEREAL_ENABLE_RAW_POINTER_SERIALIZATION
    //! Serialization for raw pointers
    /*! This exists only to throw a static_assert to let users know we don't support raw pointers. */
    template <class Archive, class T> inline
    void CEREAL_SERIALIZE_FUNCTION_NAME( Archive &, T * & )
    {
      static_assert(cereal::traits::detail::delay_static_assert<T>::value,
        "Cereal does not support serializing raw pointers - please use a smart pointer");
    }
 +#endif
 
    //! Serialization for C style arrays
    template <class Archive, class T> inline


When you "make", you need to pass certain environment variables:

  • CEREAL_DIR (have to set in Makefile)
  • HWLOC_DIR
  • JEMALLOC_DIR
  • QTHREADS_DIR
  • CXX
  • MPICXX
  • MPIRUN

For example:

 make CEREAL_DIR=... HWLOC_DIR=... JEMALLOC_DIR=... QTHREADS_DIR=... CXX=c++ MPICXX=mpicxx MPIRUN=mpirun

I have installed FunHPC and all its dependencies on Wheeler (Caltech) into the directory /home/eschnett/src/spack-view . This includes a recent version of GCC that was used to build these libraries. If you want to use this, then I highly recommend using this version of GCC as well as all the other software installed in this directory (e.g. HDF5, PAPI, and many more) instead of combining these with system libraries.

As a side note, Roland Haas says that the Simfactory configuration for Wheeler is using this directory. This is not really relevant yet since we won't be using Cactus in the beginning.

Running FunHPC Applications

FunHPC is an MPI application, but we are not interested in using MPI today. We might still need to use mpirun, but only in a trivial way.

Qthreads etc. use environment variables to change certain settings. Some settings are necessary to prevent problems. These "problems" are usually resource exhaustion (e.g. not enough stack space), which Unix helpfully all translates into "Segmentation fault". I am usually setting these environment variables:

 export QTHREAD_NUM_SHEPHERDS="${nshep}"
 export QTHREAD_NUM_WORKERS_PER_SHEPHERD="${nwork}"
 export QTHREAD_STACK_SIZE=8388608 # Byte 
 export QTHREAD_GUARD_PAGES=0      # 0, 1
 export QTHREAD_INFO=1

Here "nshep" is the number of sockets (aka NUMA nodes), and "nwork" the number of cores per socket. You can find these e.g. via "hwloc-info". On Wheeler:

 $ ~/src/spack-view/bin/hwloc-info
 depth 0:        1 Machine (type #1)
  depth 1:       2 NUMANode (type #2)
   depth 2:      2 Package (type #3)
    depth 3:     2 L3Cache (type #4)
     depth 4:    24 L2Cache (type #4)
      depth 5:   24 L1dCache (type #4)
       depth 6:  24 L1iCache (type #4)
        depth 7: 24 Core (type #5)
         depth 8:        24 PU (type #6)

Thus I choose "nshep=2" and "nwork=12" on Wheeler.

By default, Qthreads chooses a rather small stack size of 8 kByte per thread. If a thread uses more stack space, random memory will be overwritten. You can enable guard pages, which is good for debugging. This will catch many cases where the stack overflows. Finally, Qthreads can produce info output at startup that might be helpful.

On Wheeler:

 ~eschnett/src/spack-view/bin/mpirun -np 1 -x QTHREAD_NUM_SHEPHERDS=2 -x QTHREAD_NUM_WORKERS_PER_SHEPHERD=12 -x QTHREAD_STACK_SIZE=1000000 ~eschnett/src/spack-view/bin/fibonacci

To-Do

This is a wiki -- everybody should add missing items here

  • Put loop parallelization example onto wiki (and make it compile)
  • Maybe: Make FunHPC compile with Clang on Darwin
  • Announce next meeting (Wed Dec. 14, 12:00 EST)
  • Maybe: Set up FunHPC on Bethe or Fermi (if Frank can't get access to Wheeler)
  • Add pointers to http://cppreference.com to wiki (for async, future)
  • Describe future, shared_future; async's launch:: options
  • Make sure all FunHPC examples run on Wheeler
  • If possible: look at weird performance numbers (350 ms vs. 3500 ms on Wheeler's head node); run on compute node instead?

Done:

  • Correct broken FunHPC grid self-test
  • Provide make wrapper for Wheeler
  • Describe Cereal patch
  • Add pointers to package web sites to build instructions