Vectorisation

From Einstein Toolkit Documentation
Revision as of 15:38, 7 April 2011 by Eschnett (talk | contribs)
Jump to: navigation, search

The goal of vectorisation is to improve single-core performance by making use of advanced CPU features, such as using vector or cache-management instructions. On a modern CPU, using vector instructions can theoretically improve performance by a factor of two or four, and improving cache management can improve performance by an order of magnitude. In practice, the observed improvements are much smaller, but may still be significant.

In principle, it is the compiler's task to use these instructions. In practice, our code is often too complex for the compiler to understand, and these optimisations therefore do not occur. Below, we take a two-levels approach:

  • Create an API that hides machine-specific details
  • Use automated code generation that targets this API instead of C/C++ directly

Example

For example, Intel/AMD CPUs offer vectors consisting of two double precision numbers; this datatype is called __m128d. To add two such numbers, one calls the builtin function _mm_add_pd(x,y) (which the compiler translates into a single machine instruction, avoiding the function call).