Difference between revisions of "Single-node benchmark results"

From Einstein Toolkit Documentation
Jump to: navigation, search
(Single-node benchmark results)
(Benchmark Results)
Line 36: Line 36:
 
|-
 
|-
 
| Datura        || AEI      ||    12 ||      1 ||  163^3 ||    44 ||  51019.6 ||    4.78428
 
| Datura        || AEI      ||    12 ||      1 ||  163^3 ||    44 ||  51019.6 ||    4.78428
 +
|-
 +
| Zwicky        || Caltech  ||    6 ||      2 ||  163^3 ||    44 ||  55116.6 ||    5.18013
 
|-
 
|-
 
| Blue Waters    || NCSA      ||    32 ||      1 ||  226^3 ||    40 ||  74101.8 ||    7.56141
 
| Blue Waters    || NCSA      ||    32 ||      1 ||  226^3 ||    40 ||  74101.8 ||    7.56141
Line 49: Line 51:
 
Stampede. Our code has been optimized for CPUs, but not yet for
 
Stampede. Our code has been optimized for CPUs, but not yet for
 
accelerators, explaining the poor performance there.
 
accelerators, explaining the poor performance there.
 
 
  
 
==Raw results==
 
==Raw results==

Revision as of 11:32, 10 December 2014

Benchmark: Evolve Einstein equations on unigrid, single-node performance. We measure amoritzed time per grid point update (gp), per core. Smaller is better.

To obtain per-node timings, divide by the number of cores on each node. To obtain the performance/cost ratio, multiply the per-node timing by the per-node-time cost (e.g. 16 SU/h for Blue Waters).

The procs, threads, points, and iters describe the particular benchmarking configuration. These are optimized for each machine.

The flop column measures how many floating-point operations (flop) a core could theoretically have performed in the time it takes to evaluate one grid point update. This measures the floating point efficiency of the code; large numbers indicate that further optimizations may be possible. The actual flop count is about 5000; this number is not exactly defined because various more or less obvious optimizations are possible.


Benchmark Results

Date: 2014-12-10

machine location procs threads points iters flop/gp time[μs]/gp
Mike LSU 8 2 179^3 87 70667.6 3.39748
Shelob LSU 8 2 179^3 87 70809.9 3.40432
Stampede TACC 8 2 179^3 90 77383.6 3.58258
Gordon SDSC 8 2 179^3 87 77075.6 3.70556
Datura AEI 12 1 163^3 44 51019.6 4.78428
Zwicky Caltech 6 2 163^3 44 55116.6 5.18013
Blue Waters NCSA 32 1 226^3 40 74101.8 7.56141
Trestles SDSC 32 1 226^3 39 86707.5 9.03204
SHC Caltech 2 4 143^3 19 55291.1 11.519
Stampede-MIC TACC 60 4 140^3 57 501478 28.493

Note: Stampede-MIC runs on the MICs (Intel Xeon Phi) accelerators of Stampede. Our code has been optimized for CPUs, but not yet for accelerators, explaining the poor performance there.

Raw results

[2014-12-07 09:52:39]: benchmarks-mike-Cbeta-sim-minkowski-amr-lev01-grid000179-iter044545-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.39748e-06 sec, 70667.6 flop
[2014-12-07 15:04:09]: benchmarks-shelob-Cbeta-sim-minkowski-amr-lev01-grid000179-iter044545-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.40432e-06 sec, 70809.9 flop
[2014-12-07 15:20:52]: benchmarks-trestles-Cbeta-sim-minkowski-amr-lev01-grid000226-iter019969-nodes0001-cores000032-procs000032-threads0001-smt01-run0000: 9.03204e-06 sec, 86707.5 flop
[2014-12-07 17:57:56]: benchmarks-shc-Cbeta-sim-minkowski-amr-lev01-grid000143-iter009729-nodes0001-cores000008-procs000002-threads0004-smt01-run0000: 1.1519e-05 sec, 55291.1 flop
[2014-12-08 14:13:33]: benchmarks-gordon-Cbeta-sim-minkowski-amr-lev01-grid000179-iter044545-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.70556e-06 sec, 77075.6 flop
[2014-12-08 19:47:39]: benchmarks-bluewaters-Cbeta-sim-minkowski-amr-lev01-grid000226-iter020481-nodes0001-cores000032-procs000032-threads0001-smt01-run0000: 7.56141e-06 sec, 74101.8 flop
[2014-12-08 22:00:05]: benchmarks-stampede-Cbeta-sim-minkowski-amr-lev01-grid000179-iter046081-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.58258e-06 sec, 77383.6 flop
[2014-12-09 12:02:39]: benchmarks-stampede-mic-Cbeta-sim-minkowski-amr-lev01-grid000140-iter029185-nodes0001-cores000060-procs000060-threads0004-smt04-run0000: 2.8493e-05 sec, 501478 flop
[2014-12-09 14:22:26]: benchmarks-datura-Cbeta-sim-minkowski-amr-lev01-grid000163-iter022529-nodes0001-cores000012-procs000012-threads0001-smt01-run0000: 4.78428e-06 sec, 51019.6 flop