Difference between revisions of "Single-node benchmark results"

Latest revision as of 23:52, 20 December 2014

Benchmark: Evolve Einstein equations on unigrid, single-node performance. We measure amoritzed time per grid point update (gp), per core. Smaller is better.

To obtain per-node timings, divide by the number of cores on each node. To obtain the performance/cost ratio, multiply the per-node timing by the per-node-time cost (e.g. 16 SU/h for Blue Waters).

The procs, threads, points, and iters describe the particular benchmarking configuration. These are optimized for each machine.

The flop column measures how many floating-point operations (flop) a core could theoretically have performed in the time it takes to evaluate one grid point update. This measures the floating point efficiency of the code; large numbers indicate that further optimizations may be possible. The actual flop count is about 5000; this number is not exactly defined because various more or less obvious optimizations are possible.

Benchmark Results

Date: 2014-12-10

machine	location	procs	threads	points	iters	flop/gp	time[μs]/gp
Mike	LSU	8	2	179^3	87	70667.6	3.39748
Shelob	LSU	8	2	179^3	87	70809.9	3.40432
Stampede	TACC	8	2	179^3	90	77383.6	3.58258
Gordon	SDSC	8	2	179^3	87	77075.6	3.70556
Philip	LSU	4	4	143^3	48	52681.0	4.49496
Datura	AEI	12	1	163^3	44	51019.6	4.78428
Zwicky	Caltech	6	2	163^3	44	55116.6	5.18013
Blue Waters	NCSA	32	1	226^3	40	74101.8	7.56141
Hopper	NERSC	24	1	205^3	35	75829.6	9.02733
Trestles	SDSC	32	1	226^3	39	86707.5	9.03204
SHC	Caltech	2	4	143^3	19	55291.1	11.519
Stampede-MIC	TACC	60	4	140^3	57	501478	28.493

Note: Stampede-MIC runs on the MICs (Intel Xeon Phi) accelerators of Stampede. Our code has been optimized for CPUs, but not yet for accelerators, explaining the poor performance there.

Raw results

[2014-12-07 09:52:39]: benchmarks-mike-Cbeta-sim-minkowski-amr-lev01-grid000179-iter044545-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.39748e-06 sec, 70667.6 flop
[2014-12-07 15:04:09]: benchmarks-shelob-Cbeta-sim-minkowski-amr-lev01-grid000179-iter044545-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.40432e-06 sec, 70809.9 flop
[2014-12-07 15:20:52]: benchmarks-trestles-Cbeta-sim-minkowski-amr-lev01-grid000226-iter019969-nodes0001-cores000032-procs000032-threads0001-smt01-run0000: 9.03204e-06 sec, 86707.5 flop
[2014-12-07 17:57:56]: benchmarks-shc-Cbeta-sim-minkowski-amr-lev01-grid000143-iter009729-nodes0001-cores000008-procs000002-threads0004-smt01-run0000: 1.1519e-05 sec, 55291.1 flop
[2014-12-08 14:13:33]: benchmarks-gordon-Cbeta-sim-minkowski-amr-lev01-grid000179-iter044545-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.70556e-06 sec, 77075.6 flop
[2014-12-08 19:47:39]: benchmarks-bluewaters-Cbeta-sim-minkowski-amr-lev01-grid000226-iter020481-nodes0001-cores000032-procs000032-threads0001-smt01-run0000: 7.56141e-06 sec, 74101.8 flop
[2014-12-08 22:00:05]: benchmarks-stampede-Cbeta-sim-minkowski-amr-lev01-grid000179-iter046081-nodes0001-cores000016-procs000008-threads0002-smt01-run0000: 3.58258e-06 sec, 77383.6 flop
[2014-12-09 12:02:39]: benchmarks-stampede-mic-Cbeta-sim-minkowski-amr-lev01-grid000140-iter029185-nodes0001-cores000060-procs000060-threads0004-smt04-run0000: 2.8493e-05 sec, 501478 flop
[2014-12-09 14:22:26]: benchmarks-datura-Cbeta-sim-minkowski-amr-lev01-grid000163-iter022529-nodes0001-cores000012-procs000012-threads0001-smt01-run0000: 4.78428e-06 sec, 51019.6 flop
[2014-12-10 11:58:10]: benchmarks-zwicky-Cbeta-sim-minkowski-amr-lev01-grid000163-iter022529-nodes0001-cores000012-procs000006-threads0002-smt01-run0000: 5.18013e-06 sec, 55116.6 flop
[2014-12-11 17:54:39]: benchmarks-hopper-Cbeta-sim-minkowski-amr-lev01-grid000205-iter017921-nodes0001-cores000024-procs000024-threads0001-smt01-run0000: 9.02733e-06 sec, 75829.6 flop
[2014-12-20 18:15:31]: benchmarks-philip-Cbeta-sim-minkowski-amr-lev01-grid000143-iter024577-nodes0001-cores000008-procs000004-threads0004-smt02-run0000: 4.49496e-06 sec, 52681 flop

@@ Line 34: / Line 34: @@
 |-
 | Gordon         || SDSC      ||     8 ||       2 ||  179^3 ||    87 ||  77075.6 ||     3.70556
+|-
+| Philip         || LSU       ||     4 ||       4 ||  143^3 ||    48 ||  52681.0 ||     4.49496
 |-
 | Datura         || AEI       ||    12 ||       1 ||  163^3 ||    44 ||  51019.6 ||     4.78428
+|-
+| Zwicky         || Caltech   ||     6 ||       2 ||  163^3 ||    44 ||  55116.6 ||     5.18013
 |-
 | Blue Waters    || NCSA      ||    32 ||       1 ||  226^3 ||    40 ||  74101.8 ||     7.56141
+|-
+| Hopper         || NERSC     ||    24 ||       1 ||  205^3 ||    35 ||  75829.6 ||     9.02733
 |-
 | Trestles       || SDSC      ||    32 ||       1 ||  226^3 ||    39 ||  86707.5 ||     9.03204
@@ Line 49: / Line 55: @@
 Stampede. Our code has been optimized for CPUs, but not yet for
 accelerators, explaining the poor performance there.
 ==Raw results==
@@ Line 64: / Line 68: @@
 [2014-12-09 12:02:39]: benchmarks-stampede-mic-Cbeta-sim-minkowski-amr-lev01-grid000140-iter029185-nodes0001-cores000060-procs000060-threads0004-smt04-run0000: 2.8493e-05 sec, 501478 flop
 [2014-12-09 14:22:26]: benchmarks-datura-Cbeta-sim-minkowski-amr-lev01-grid000163-iter022529-nodes0001-cores000012-procs000012-threads0001-smt01-run0000: 4.78428e-06 sec, 51019.6 flop
+[2014-12-10 11:58:10]: benchmarks-zwicky-Cbeta-sim-minkowski-amr-lev01-grid000163-iter022529-nodes0001-cores000012-procs000006-threads0002-smt01-run0000: 5.18013e-06 sec, 55116.6 flop
+[2014-12-11 17:54:39]: benchmarks-hopper-Cbeta-sim-minkowski-amr-lev01-grid000205-iter017921-nodes0001-cores000024-procs000024-threads0001-smt01-run0000: 9.02733e-06 sec, 75829.6 flop
+[2014-12-20 18:15:31]: benchmarks-philip-Cbeta-sim-minkowski-amr-lev01-grid000143-iter024577-nodes0001-cores000008-procs000004-threads0004-smt02-run0000: 4.49496e-06 sec, 52681 flop
 </pre>

Difference between revisions of "Single-node benchmark results"

Latest revision as of 23:52, 20 December 2014

Benchmark Results

Raw results

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Toolbox