Difference between revisions of "Single-node benchmark results"
(→Benchmark Results) |
(→Raw results) |
||
Line 64: | Line 64: | ||
[2014-12-09 12:02:39]: benchmarks-stampede-mic-Cbeta-sim-minkowski-amr-lev01-grid000140-iter029185-nodes0001-cores000060-procs000060-threads0004-smt04-run0000: 2.8493e-05 sec, 501478 flop | [2014-12-09 12:02:39]: benchmarks-stampede-mic-Cbeta-sim-minkowski-amr-lev01-grid000140-iter029185-nodes0001-cores000060-procs000060-threads0004-smt04-run0000: 2.8493e-05 sec, 501478 flop | ||
[2014-12-09 14:22:26]: benchmarks-datura-Cbeta-sim-minkowski-amr-lev01-grid000163-iter022529-nodes0001-cores000012-procs000012-threads0001-smt01-run0000: 4.78428e-06 sec, 51019.6 flop | [2014-12-09 14:22:26]: benchmarks-datura-Cbeta-sim-minkowski-amr-lev01-grid000163-iter022529-nodes0001-cores000012-procs000012-threads0001-smt01-run0000: 4.78428e-06 sec, 51019.6 flop | ||
+ | [2014-12-10 11:58:10]: benchmarks-zwicky-Cbeta-sim-minkowski-amr-lev01-grid000163-iter022529-nodes0001-cores000012-procs000006-threads0002-smt01-run0000: 5.18013e-06 sec, 55116.6 flop | ||
+ | [2014-12-11 17:54:39]: benchmarks-hopper-Cbeta-sim-minkowski-amr-lev01-grid000205-iter017921-nodes0001-cores000024-procs000024-threads0001-smt01-run0000: 9.02733e-06 sec, 75829.6 flop | ||
</pre> | </pre> |
Revision as of 18:41, 11 December 2014
Benchmark: Evolve Einstein equations on unigrid, single-node performance. We measure amoritzed time per grid point update (gp), per core. Smaller is better.
To obtain per-node timings, divide by the number of cores on each node. To obtain the performance/cost ratio, multiply the per-node timing by the per-node-time cost (e.g. 16 SU/h for Blue Waters).
The procs, threads, points, and iters describe the particular benchmarking configuration. These are optimized for each machine.
The flop column measures how many floating-point operations (flop) a core could theoretically have performed in the time it takes to evaluate one grid point update. This measures the floating point efficiency of the code; large numbers indicate that further optimizations may be possible. The actual flop count is about 5000; this number is not exactly defined because various more or less obvious optimizations are possible.
Benchmark Results
Date: 2014-12-10
machine | location | procs | threads | points | iters | flop/gp | time[μs]/gp |
---|---|---|---|---|---|---|---|
Mike | LSU | 8 | 2 | 179^3 | 87 | 70667.6 | 3.39748 |
Shelob | LSU | 8 | 2 | 179^3 | 87 | 70809.9 | 3.40432 |
Stampede | TACC | 8 | 2 | 179^3 | 90 | 77383.6 | 3.58258 |
Gordon | SDSC | 8 | 2 | 179^3 | 87 | 77075.6 | 3.70556 |
Datura | AEI | 12 | 1 | 163^3 | 44 | 51019.6 | 4.78428 |
Zwicky | Caltech | 6 | 2 | 163^3 | 44 | 55116.6 | 5.18013 |
Blue Waters | NCSA | 32 | 1 | 226^3 | 40 | 74101.8 | 7.56141 |
Trestles | SDSC | 32 | 1 | 226^3 | 39 | 86707.5 | 9.03204 |
SHC | Caltech | 2 | 4 | 143^3 | 19 | 55291.1 | 11.519 |
Stampede-MIC | TACC | 60 | 4 | 140^3 | 57 | 501478 | 28.493 |
Note: Stampede-MIC runs on the MICs (Intel Xeon Phi) accelerators of Stampede. Our code has been optimized for CPUs, but not yet for accelerators, explaining the poor performance there.
Raw results
[2014-12-07 09:52:39]: benchmarks-mike-Cbeta-sim-minkowski-amr-lev01-grid000179-iter044545-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.39748e-06 sec, 70667.6 flop [2014-12-07 15:04:09]: benchmarks-shelob-Cbeta-sim-minkowski-amr-lev01-grid000179-iter044545-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.40432e-06 sec, 70809.9 flop [2014-12-07 15:20:52]: benchmarks-trestles-Cbeta-sim-minkowski-amr-lev01-grid000226-iter019969-nodes0001-cores000032-procs000032-threads0001-smt01-run0000: 9.03204e-06 sec, 86707.5 flop [2014-12-07 17:57:56]: benchmarks-shc-Cbeta-sim-minkowski-amr-lev01-grid000143-iter009729-nodes0001-cores000008-procs000002-threads0004-smt01-run0000: 1.1519e-05 sec, 55291.1 flop [2014-12-08 14:13:33]: benchmarks-gordon-Cbeta-sim-minkowski-amr-lev01-grid000179-iter044545-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.70556e-06 sec, 77075.6 flop [2014-12-08 19:47:39]: benchmarks-bluewaters-Cbeta-sim-minkowski-amr-lev01-grid000226-iter020481-nodes0001-cores000032-procs000032-threads0001-smt01-run0000: 7.56141e-06 sec, 74101.8 flop [2014-12-08 22:00:05]: benchmarks-stampede-Cbeta-sim-minkowski-amr-lev01-grid000179-iter046081-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.58258e-06 sec, 77383.6 flop [2014-12-09 12:02:39]: benchmarks-stampede-mic-Cbeta-sim-minkowski-amr-lev01-grid000140-iter029185-nodes0001-cores000060-procs000060-threads0004-smt04-run0000: 2.8493e-05 sec, 501478 flop [2014-12-09 14:22:26]: benchmarks-datura-Cbeta-sim-minkowski-amr-lev01-grid000163-iter022529-nodes0001-cores000012-procs000012-threads0001-smt01-run0000: 4.78428e-06 sec, 51019.6 flop [2014-12-10 11:58:10]: benchmarks-zwicky-Cbeta-sim-minkowski-amr-lev01-grid000163-iter022529-nodes0001-cores000012-procs000006-threads0002-smt01-run0000: 5.18013e-06 sec, 55116.6 flop [2014-12-11 17:54:39]: benchmarks-hopper-Cbeta-sim-minkowski-amr-lev01-grid000205-iter017921-nodes0001-cores000024-procs000024-threads0001-smt01-run0000: 9.02733e-06 sec, 75829.6 flop