Difference between revisions of "Single-node benchmark results"

From Einstein Toolkit Documentation
Jump to: navigation, search
(Single-node benchmark results)
(Raw results)
 
(4 intermediate revisions by 2 users not shown)
Line 34: Line 34:
 
|-
 
|-
 
| Gordon        || SDSC      ||    8 ||      2 ||  179^3 ||    87 ||  77075.6 ||    3.70556
 
| Gordon        || SDSC      ||    8 ||      2 ||  179^3 ||    87 ||  77075.6 ||    3.70556
 +
|-
 +
| Philip        || LSU      ||    4 ||      4 ||  143^3 ||    48 ||  52681.0 ||    4.49496
 
|-
 
|-
 
| Datura        || AEI      ||    12 ||      1 ||  163^3 ||    44 ||  51019.6 ||    4.78428
 
| Datura        || AEI      ||    12 ||      1 ||  163^3 ||    44 ||  51019.6 ||    4.78428
 +
|-
 +
| Zwicky        || Caltech  ||    6 ||      2 ||  163^3 ||    44 ||  55116.6 ||    5.18013
 
|-
 
|-
 
| Blue Waters    || NCSA      ||    32 ||      1 ||  226^3 ||    40 ||  74101.8 ||    7.56141
 
| Blue Waters    || NCSA      ||    32 ||      1 ||  226^3 ||    40 ||  74101.8 ||    7.56141
 +
|-
 +
| Hopper        || NERSC    ||    24 ||      1 ||  205^3 ||    35 ||  75829.6 ||    9.02733
 
|-
 
|-
 
| Trestles      || SDSC      ||    32 ||      1 ||  226^3 ||    39 ||  86707.5 ||    9.03204
 
| Trestles      || SDSC      ||    32 ||      1 ||  226^3 ||    39 ||  86707.5 ||    9.03204
Line 49: Line 55:
 
Stampede. Our code has been optimized for CPUs, but not yet for
 
Stampede. Our code has been optimized for CPUs, but not yet for
 
accelerators, explaining the poor performance there.
 
accelerators, explaining the poor performance there.
 
 
  
 
==Raw results==
 
==Raw results==
Line 64: Line 68:
 
[2014-12-09 12:02:39]: benchmarks-stampede-mic-Cbeta-sim-minkowski-amr-lev01-grid000140-iter029185-nodes0001-cores000060-procs000060-threads0004-smt04-run0000: 2.8493e-05 sec, 501478 flop
 
[2014-12-09 12:02:39]: benchmarks-stampede-mic-Cbeta-sim-minkowski-amr-lev01-grid000140-iter029185-nodes0001-cores000060-procs000060-threads0004-smt04-run0000: 2.8493e-05 sec, 501478 flop
 
[2014-12-09 14:22:26]: benchmarks-datura-Cbeta-sim-minkowski-amr-lev01-grid000163-iter022529-nodes0001-cores000012-procs000012-threads0001-smt01-run0000: 4.78428e-06 sec, 51019.6 flop
 
[2014-12-09 14:22:26]: benchmarks-datura-Cbeta-sim-minkowski-amr-lev01-grid000163-iter022529-nodes0001-cores000012-procs000012-threads0001-smt01-run0000: 4.78428e-06 sec, 51019.6 flop
 +
[2014-12-10 11:58:10]: benchmarks-zwicky-Cbeta-sim-minkowski-amr-lev01-grid000163-iter022529-nodes0001-cores000012-procs000006-threads0002-smt01-run0000: 5.18013e-06 sec, 55116.6 flop
 +
[2014-12-11 17:54:39]: benchmarks-hopper-Cbeta-sim-minkowski-amr-lev01-grid000205-iter017921-nodes0001-cores000024-procs000024-threads0001-smt01-run0000: 9.02733e-06 sec, 75829.6 flop
 +
[2014-12-20 18:15:31]: benchmarks-philip-Cbeta-sim-minkowski-amr-lev01-grid000143-iter024577-nodes0001-cores000008-procs000004-threads0004-smt02-run0000: 4.49496e-06 sec, 52681 flop
 
</pre>
 
</pre>

Latest revision as of 23:52, 20 December 2014

Benchmark: Evolve Einstein equations on unigrid, single-node performance. We measure amoritzed time per grid point update (gp), per core. Smaller is better.

To obtain per-node timings, divide by the number of cores on each node. To obtain the performance/cost ratio, multiply the per-node timing by the per-node-time cost (e.g. 16 SU/h for Blue Waters).

The procs, threads, points, and iters describe the particular benchmarking configuration. These are optimized for each machine.

The flop column measures how many floating-point operations (flop) a core could theoretically have performed in the time it takes to evaluate one grid point update. This measures the floating point efficiency of the code; large numbers indicate that further optimizations may be possible. The actual flop count is about 5000; this number is not exactly defined because various more or less obvious optimizations are possible.


Benchmark Results

Date: 2014-12-10

machine location procs threads points iters flop/gp time[μs]/gp
Mike LSU 8 2 179^3 87 70667.6 3.39748
Shelob LSU 8 2 179^3 87 70809.9 3.40432
Stampede TACC 8 2 179^3 90 77383.6 3.58258
Gordon SDSC 8 2 179^3 87 77075.6 3.70556
Philip LSU 4 4 143^3 48 52681.0 4.49496
Datura AEI 12 1 163^3 44 51019.6 4.78428
Zwicky Caltech 6 2 163^3 44 55116.6 5.18013
Blue Waters NCSA 32 1 226^3 40 74101.8 7.56141
Hopper NERSC 24 1 205^3 35 75829.6 9.02733
Trestles SDSC 32 1 226^3 39 86707.5 9.03204
SHC Caltech 2 4 143^3 19 55291.1 11.519
Stampede-MIC TACC 60 4 140^3 57 501478 28.493

Note: Stampede-MIC runs on the MICs (Intel Xeon Phi) accelerators of Stampede. Our code has been optimized for CPUs, but not yet for accelerators, explaining the poor performance there.

Raw results

[2014-12-07 09:52:39]: benchmarks-mike-Cbeta-sim-minkowski-amr-lev01-grid000179-iter044545-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.39748e-06 sec, 70667.6 flop
[2014-12-07 15:04:09]: benchmarks-shelob-Cbeta-sim-minkowski-amr-lev01-grid000179-iter044545-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.40432e-06 sec, 70809.9 flop
[2014-12-07 15:20:52]: benchmarks-trestles-Cbeta-sim-minkowski-amr-lev01-grid000226-iter019969-nodes0001-cores000032-procs000032-threads0001-smt01-run0000: 9.03204e-06 sec, 86707.5 flop
[2014-12-07 17:57:56]: benchmarks-shc-Cbeta-sim-minkowski-amr-lev01-grid000143-iter009729-nodes0001-cores000008-procs000002-threads0004-smt01-run0000: 1.1519e-05 sec, 55291.1 flop
[2014-12-08 14:13:33]: benchmarks-gordon-Cbeta-sim-minkowski-amr-lev01-grid000179-iter044545-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.70556e-06 sec, 77075.6 flop
[2014-12-08 19:47:39]: benchmarks-bluewaters-Cbeta-sim-minkowski-amr-lev01-grid000226-iter020481-nodes0001-cores000032-procs000032-threads0001-smt01-run0000: 7.56141e-06 sec, 74101.8 flop
[2014-12-08 22:00:05]: benchmarks-stampede-Cbeta-sim-minkowski-amr-lev01-grid000179-iter046081-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.58258e-06 sec, 77383.6 flop
[2014-12-09 12:02:39]: benchmarks-stampede-mic-Cbeta-sim-minkowski-amr-lev01-grid000140-iter029185-nodes0001-cores000060-procs000060-threads0004-smt04-run0000: 2.8493e-05 sec, 501478 flop
[2014-12-09 14:22:26]: benchmarks-datura-Cbeta-sim-minkowski-amr-lev01-grid000163-iter022529-nodes0001-cores000012-procs000012-threads0001-smt01-run0000: 4.78428e-06 sec, 51019.6 flop
[2014-12-10 11:58:10]: benchmarks-zwicky-Cbeta-sim-minkowski-amr-lev01-grid000163-iter022529-nodes0001-cores000012-procs000006-threads0002-smt01-run0000: 5.18013e-06 sec, 55116.6 flop
[2014-12-11 17:54:39]: benchmarks-hopper-Cbeta-sim-minkowski-amr-lev01-grid000205-iter017921-nodes0001-cores000024-procs000024-threads0001-smt01-run0000: 9.02733e-06 sec, 75829.6 flop
[2014-12-20 18:15:31]: benchmarks-philip-Cbeta-sim-minkowski-amr-lev01-grid000143-iter024577-nodes0001-cores000008-procs000004-threads0004-smt02-run0000: 4.49496e-06 sec, 52681 flop