Difference between revisions of "Single-node benchmark results"
(→Single-node benchmark results) |
(→Raw results) |
||
(4 intermediate revisions by 2 users not shown) | |||
Line 34: | Line 34: | ||
|- | |- | ||
| Gordon || SDSC || 8 || 2 || 179^3 || 87 || 77075.6 || 3.70556 | | Gordon || SDSC || 8 || 2 || 179^3 || 87 || 77075.6 || 3.70556 | ||
+ | |- | ||
+ | | Philip || LSU || 4 || 4 || 143^3 || 48 || 52681.0 || 4.49496 | ||
|- | |- | ||
| Datura || AEI || 12 || 1 || 163^3 || 44 || 51019.6 || 4.78428 | | Datura || AEI || 12 || 1 || 163^3 || 44 || 51019.6 || 4.78428 | ||
+ | |- | ||
+ | | Zwicky || Caltech || 6 || 2 || 163^3 || 44 || 55116.6 || 5.18013 | ||
|- | |- | ||
| Blue Waters || NCSA || 32 || 1 || 226^3 || 40 || 74101.8 || 7.56141 | | Blue Waters || NCSA || 32 || 1 || 226^3 || 40 || 74101.8 || 7.56141 | ||
+ | |- | ||
+ | | Hopper || NERSC || 24 || 1 || 205^3 || 35 || 75829.6 || 9.02733 | ||
|- | |- | ||
| Trestles || SDSC || 32 || 1 || 226^3 || 39 || 86707.5 || 9.03204 | | Trestles || SDSC || 32 || 1 || 226^3 || 39 || 86707.5 || 9.03204 | ||
Line 49: | Line 55: | ||
Stampede. Our code has been optimized for CPUs, but not yet for | Stampede. Our code has been optimized for CPUs, but not yet for | ||
accelerators, explaining the poor performance there. | accelerators, explaining the poor performance there. | ||
− | |||
− | |||
==Raw results== | ==Raw results== | ||
Line 64: | Line 68: | ||
[2014-12-09 12:02:39]: benchmarks-stampede-mic-Cbeta-sim-minkowski-amr-lev01-grid000140-iter029185-nodes0001-cores000060-procs000060-threads0004-smt04-run0000: 2.8493e-05 sec, 501478 flop | [2014-12-09 12:02:39]: benchmarks-stampede-mic-Cbeta-sim-minkowski-amr-lev01-grid000140-iter029185-nodes0001-cores000060-procs000060-threads0004-smt04-run0000: 2.8493e-05 sec, 501478 flop | ||
[2014-12-09 14:22:26]: benchmarks-datura-Cbeta-sim-minkowski-amr-lev01-grid000163-iter022529-nodes0001-cores000012-procs000012-threads0001-smt01-run0000: 4.78428e-06 sec, 51019.6 flop | [2014-12-09 14:22:26]: benchmarks-datura-Cbeta-sim-minkowski-amr-lev01-grid000163-iter022529-nodes0001-cores000012-procs000012-threads0001-smt01-run0000: 4.78428e-06 sec, 51019.6 flop | ||
+ | [2014-12-10 11:58:10]: benchmarks-zwicky-Cbeta-sim-minkowski-amr-lev01-grid000163-iter022529-nodes0001-cores000012-procs000006-threads0002-smt01-run0000: 5.18013e-06 sec, 55116.6 flop | ||
+ | [2014-12-11 17:54:39]: benchmarks-hopper-Cbeta-sim-minkowski-amr-lev01-grid000205-iter017921-nodes0001-cores000024-procs000024-threads0001-smt01-run0000: 9.02733e-06 sec, 75829.6 flop | ||
+ | [2014-12-20 18:15:31]: benchmarks-philip-Cbeta-sim-minkowski-amr-lev01-grid000143-iter024577-nodes0001-cores000008-procs000004-threads0004-smt02-run0000: 4.49496e-06 sec, 52681 flop | ||
</pre> | </pre> |
Latest revision as of 23:52, 20 December 2014
Benchmark: Evolve Einstein equations on unigrid, single-node performance. We measure amoritzed time per grid point update (gp), per core. Smaller is better.
To obtain per-node timings, divide by the number of cores on each node. To obtain the performance/cost ratio, multiply the per-node timing by the per-node-time cost (e.g. 16 SU/h for Blue Waters).
The procs, threads, points, and iters describe the particular benchmarking configuration. These are optimized for each machine.
The flop column measures how many floating-point operations (flop) a core could theoretically have performed in the time it takes to evaluate one grid point update. This measures the floating point efficiency of the code; large numbers indicate that further optimizations may be possible. The actual flop count is about 5000; this number is not exactly defined because various more or less obvious optimizations are possible.
Benchmark Results
Date: 2014-12-10
machine | location | procs | threads | points | iters | flop/gp | time[μs]/gp |
---|---|---|---|---|---|---|---|
Mike | LSU | 8 | 2 | 179^3 | 87 | 70667.6 | 3.39748 |
Shelob | LSU | 8 | 2 | 179^3 | 87 | 70809.9 | 3.40432 |
Stampede | TACC | 8 | 2 | 179^3 | 90 | 77383.6 | 3.58258 |
Gordon | SDSC | 8 | 2 | 179^3 | 87 | 77075.6 | 3.70556 |
Philip | LSU | 4 | 4 | 143^3 | 48 | 52681.0 | 4.49496 |
Datura | AEI | 12 | 1 | 163^3 | 44 | 51019.6 | 4.78428 |
Zwicky | Caltech | 6 | 2 | 163^3 | 44 | 55116.6 | 5.18013 |
Blue Waters | NCSA | 32 | 1 | 226^3 | 40 | 74101.8 | 7.56141 |
Hopper | NERSC | 24 | 1 | 205^3 | 35 | 75829.6 | 9.02733 |
Trestles | SDSC | 32 | 1 | 226^3 | 39 | 86707.5 | 9.03204 |
SHC | Caltech | 2 | 4 | 143^3 | 19 | 55291.1 | 11.519 |
Stampede-MIC | TACC | 60 | 4 | 140^3 | 57 | 501478 | 28.493 |
Note: Stampede-MIC runs on the MICs (Intel Xeon Phi) accelerators of Stampede. Our code has been optimized for CPUs, but not yet for accelerators, explaining the poor performance there.
Raw results
[2014-12-07 09:52:39]: benchmarks-mike-Cbeta-sim-minkowski-amr-lev01-grid000179-iter044545-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.39748e-06 sec, 70667.6 flop [2014-12-07 15:04:09]: benchmarks-shelob-Cbeta-sim-minkowski-amr-lev01-grid000179-iter044545-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.40432e-06 sec, 70809.9 flop [2014-12-07 15:20:52]: benchmarks-trestles-Cbeta-sim-minkowski-amr-lev01-grid000226-iter019969-nodes0001-cores000032-procs000032-threads0001-smt01-run0000: 9.03204e-06 sec, 86707.5 flop [2014-12-07 17:57:56]: benchmarks-shc-Cbeta-sim-minkowski-amr-lev01-grid000143-iter009729-nodes0001-cores000008-procs000002-threads0004-smt01-run0000: 1.1519e-05 sec, 55291.1 flop [2014-12-08 14:13:33]: benchmarks-gordon-Cbeta-sim-minkowski-amr-lev01-grid000179-iter044545-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.70556e-06 sec, 77075.6 flop [2014-12-08 19:47:39]: benchmarks-bluewaters-Cbeta-sim-minkowski-amr-lev01-grid000226-iter020481-nodes0001-cores000032-procs000032-threads0001-smt01-run0000: 7.56141e-06 sec, 74101.8 flop [2014-12-08 22:00:05]: benchmarks-stampede-Cbeta-sim-minkowski-amr-lev01-grid000179-iter046081-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.58258e-06 sec, 77383.6 flop [2014-12-09 12:02:39]: benchmarks-stampede-mic-Cbeta-sim-minkowski-amr-lev01-grid000140-iter029185-nodes0001-cores000060-procs000060-threads0004-smt04-run0000: 2.8493e-05 sec, 501478 flop [2014-12-09 14:22:26]: benchmarks-datura-Cbeta-sim-minkowski-amr-lev01-grid000163-iter022529-nodes0001-cores000012-procs000012-threads0001-smt01-run0000: 4.78428e-06 sec, 51019.6 flop [2014-12-10 11:58:10]: benchmarks-zwicky-Cbeta-sim-minkowski-amr-lev01-grid000163-iter022529-nodes0001-cores000012-procs000006-threads0002-smt01-run0000: 5.18013e-06 sec, 55116.6 flop [2014-12-11 17:54:39]: benchmarks-hopper-Cbeta-sim-minkowski-amr-lev01-grid000205-iter017921-nodes0001-cores000024-procs000024-threads0001-smt01-run0000: 9.02733e-06 sec, 75829.6 flop [2014-12-20 18:15:31]: benchmarks-philip-Cbeta-sim-minkowski-amr-lev01-grid000143-iter024577-nodes0001-cores000008-procs000004-threads0004-smt02-run0000: 4.49496e-06 sec, 52681 flop