Single-node benchmark results

From Einstein Toolkit Documentation
Revision as of 11:08, 10 December 2014 by Eschnett (talk | contribs) (Single-node benchmark results)
Jump to: navigation, search

Benchmark: Evolve Einstein equations on unigrid, single-node performance. We measure amoritzed time per grid point update (gp), per core. Smaller is better.

To obtain per-node timings, divide by the number of cores on each node. To obtain the performance/cost ratio, multiply the per-node timing by the per-node-time cost (e.g. 16 SU/h for Blue Waters).

The procs, threads, points, and iters describe the particular benchmarking configuration. These are optimized for each machine.

The flop column measures how many floating-point operations (flop) a core could theoretically have performed in the time it takes to evaluate one grid point update. This measures the floating point efficiency of the code; large numbers indicate that further optimizations may be possible. The actual flop count is about 5000; this number is not exactly defined because various more or less obvious optimizations are possible.


Benchmark Results

Date: 2014-12-10

machine location procs threads points iters flop/gp time[μs]/gp
Mike LSU 8 2 179^3 87 70667.6 3.39748
Shelob LSU 8 2 179^3 87 70809.9 3.40432
Stampede TACC 8 2 179^3 90 77383.6 3.58258
Gordon SDSC 8 2 179^3 87 77075.6 3.70556
Datura AEI 12 1 163^3 44 51019.6 4.78428
Blue Waters NCSA 32 1 226^3 40 74101.8 7.56141
Trestles SDSC 32 1 226^3 39 86707.5 9.03204
SHC Caltech 2 4 143^3 19 55291.1 11.519
Stampede-MIC TACC 60 4 140^3 57 501478 28.493

Note: Stampede-MIC runs on the MICs (Intel Xeon Phi) accelerators of Stampede. Our code has been optimized for CPUs, but not yet for accelerators, explaining the poor performance there.


Raw results

[2014-12-07 09:52:39]: benchmarks-mike-Cbeta-sim-minkowski-amr-lev01-grid000179-iter044545-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.39748e-06 sec, 70667.6 flop
[2014-12-07 15:04:09]: benchmarks-shelob-Cbeta-sim-minkowski-amr-lev01-grid000179-iter044545-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.40432e-06 sec, 70809.9 flop
[2014-12-07 15:20:52]: benchmarks-trestles-Cbeta-sim-minkowski-amr-lev01-grid000226-iter019969-nodes0001-cores000032-procs000032-threads0001-smt01-run0000: 9.03204e-06 sec, 86707.5 flop
[2014-12-07 17:57:56]: benchmarks-shc-Cbeta-sim-minkowski-amr-lev01-grid000143-iter009729-nodes0001-cores000008-procs000002-threads0004-smt01-run0000: 1.1519e-05 sec, 55291.1 flop
[2014-12-08 14:13:33]: benchmarks-gordon-Cbeta-sim-minkowski-amr-lev01-grid000179-iter044545-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.70556e-06 sec, 77075.6 flop
[2014-12-08 19:47:39]: benchmarks-bluewaters-Cbeta-sim-minkowski-amr-lev01-grid000226-iter020481-nodes0001-cores000032-procs000032-threads0001-smt01-run0000: 7.56141e-06 sec, 74101.8 flop
[2014-12-08 22:00:05]: benchmarks-stampede-Cbeta-sim-minkowski-amr-lev01-grid000179-iter046081-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.58258e-06 sec, 77383.6 flop
[2014-12-09 12:02:39]: benchmarks-stampede-mic-Cbeta-sim-minkowski-amr-lev01-grid000140-iter029185-nodes0001-cores000060-procs000060-threads0004-smt04-run0000: 2.8493e-05 sec, 501478 flop
[2014-12-09 14:22:26]: benchmarks-datura-Cbeta-sim-minkowski-amr-lev01-grid000163-iter022529-nodes0001-cores000012-procs000012-threads0001-smt01-run0000: 4.78428e-06 sec, 51019.6 flop