04.04.2013 Views

Performance report primergy rx300 s7 - fujitsu global

Performance report primergy rx300 s7 - fujitsu global

Performance report primergy rx300 s7 - fujitsu global

SHOW MORE
SHOW LESS

You also want an ePaper? Increase the reach of your titles

YUMPU automatically turns print PDFs into web optimized ePapers that Google loves.

WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7<br />

WHITE PAPER<br />

FUJITSU PRIMERGY SERVERS<br />

PERFORMANCE REPORT PRIMERGY RX300 S7<br />

This document contains a summary of the benchmarks executed for the PRIMERGY<br />

RX300 S7.<br />

The PRIMERGY RX300 S7 performance data are compared with the data of other<br />

PRIMERGY models and discussed. In addition to the benchmark results, an explanation<br />

has been included for each benchmark and for the benchmark environment.<br />

Version<br />

1.3<br />

2012-10-09<br />

© Fujitsu Technology Solutions 2012 Page 1 (59)


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Contents<br />

Document history ................................................................................................................................................ 3<br />

Technical data .................................................................................................................................................... 4<br />

SPECcpu2006 .................................................................................................................................................... 7<br />

SPECjbb2005 ................................................................................................................................................... 14<br />

SPECpower_ssj2008 ........................................................................................................................................ 16<br />

Disk-I/O ............................................................................................................................................................. 23<br />

SAP SD ............................................................................................................................................................. 30<br />

OLTP-2 ............................................................................................................................................................. 32<br />

TPC-E with TPC-Energy ................................................................................................................................... 36<br />

vServCon .......................................................................................................................................................... 42<br />

VMmark V2 ....................................................................................................................................................... 49<br />

STREAM ........................................................................................................................................................... 53<br />

LINPACK .......................................................................................................................................................... 55<br />

Literature ........................................................................................................................................................... 58<br />

Contact ............................................................................................................................................................. 59<br />

Page 2 (59) © Fujitsu Technology Solutions 2012


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Document history<br />

Version 1.0<br />

New:<br />

Technical data<br />

SPECcpu2006<br />

Measurements with processors of Xeon series E5-2600<br />

SPECjbb2005<br />

Measurement with Xeon E5-2690<br />

SPECpower_ssj2008<br />

Measurement with Oracle Java HotSpot VM<br />

SAP SD<br />

Certification number 2012008<br />

OLTP-2<br />

Results for Xeon E5-2600 processor series<br />

vServCon<br />

Results for Xeon E5-2600 processor series<br />

STREAM<br />

Measurements with Xeon E5-2600 processor series<br />

LINPACK<br />

Measurements with Xeon E5-2600 processor series<br />

Version 1.1<br />

New:<br />

VMmark V2<br />

Measurement with Xeon E5-2690<br />

Version 1.2<br />

New:<br />

TPC-E with TPC-Energy<br />

Measurement with Xeon E5-2690<br />

Version 1.3<br />

New:<br />

Disk I/O<br />

Measurements with ―LSI SW RAID on Intel C600 (Onboard SATA)‖, ―LSI SW RAID on Intel C600<br />

(Onboard SAS)‖, ―RAID Ctrl SAS 6G 0/1‖,―RAID Ctrl SAS 5/6 512MB (D2616)‖ and ―RAID Ctrl SAS<br />

6G 5/6 1GB (D3116)‖ controllers<br />

Updated:<br />

SPECpower_ssj2008<br />

Measurement with IBM J9 VM<br />

© Fujitsu Technology Solutions 2012 Page 3 (59)


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Technical data<br />

Decimal prefixes according to the SI standard are used for measurement units in this white paper (e.g. 1 GB<br />

= 10 9 bytes). In contrast, these prefixes should be interpreted as binary prefixes (e.g. 1 GB = 2 30 bytes) for<br />

the capacities of caches and storage modules. Separate reference will be made to any further exceptions<br />

where applicable.<br />

Model PRIMERGY RX300 S7<br />

Model versions<br />

Form factor Rack server<br />

Chipset Intel C600 series<br />

Number of sockets 2<br />

Number of processors orderable 1 or 2<br />

Basic unit with 6 3.5" HDD bays fixed<br />

Basic unit with 2.5" HDD bays expandable<br />

Basic unit with 8 2.5" HDD bays fixed<br />

Basic unit with 12 2.5" HDD bays fixed<br />

Processor type Intel Xeon series E5-2600<br />

Number of memory slots 24 (12 per processor)<br />

Maximum memory configuration 768 GB<br />

Onboard LAN controller 2 × 1 Gbit/s<br />

Onboard HDD controller<br />

PCI slots<br />

PRIMERGY RX300 S7<br />

Basic unit with<br />

6 3.5" HDDs fixed<br />

Max. number of internal hard disks<br />

Controller with RAID 0, RAID 1 or RAID 10 for up to 4 × 2.5" SATA HDDs<br />

Optional for basic unit with 2.5" HDD bays expandable:<br />

SAS Enabling Key for Onboard Ports for up to 4 × 2.5" SAS HDDs<br />

5 PCI-Express 3.0 x8<br />

2 PCI-Express 3.0 x16<br />

Basic unit with 3.5" HDD bays: 6<br />

Basic unit with 2.5" HDD bays expandable: 16<br />

PRIMERGY RX300 S7<br />

Basic unit with<br />

2.5" HDD bays expandable<br />

Page 4 (59) © Fujitsu Technology Solutions 2012


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Processors (since system release)<br />

Processor<br />

Cores<br />

Threads<br />

Cache<br />

[MB]<br />

QPI<br />

Speed<br />

[GT/s]<br />

Processor<br />

Frequency<br />

[Ghz]<br />

Max.<br />

Turbo<br />

Frequency<br />

at full load<br />

[Ghz]<br />

Max.<br />

Turbo<br />

Frequency<br />

© Fujitsu Technology Solutions 2012 Page 5 (59)<br />

[Ghz]<br />

Max.<br />

Memory<br />

Frequency<br />

[MHz]<br />

Xeon E5-2637 2 4 5 8.00 3.00 3.50 3.50 1600 80<br />

Xeon E5-2603 4 4 10 6.40 1.80 n/a n/a 1066 80<br />

Xeon E5-2609 4 4 10 6.40 2.40 n/a n/a 1066 80<br />

Xeon E5-2643 4 8 10 8.00 3.30 3.40 3.50 1600 130<br />

Xeon E5-2630L 6 12 15 7.20 2.00 2.30 2.50 1333 60<br />

Xeon E5-2620 6 12 15 7.20 2.00 2.30 2.50 1333 95<br />

Xeon E5-2630 6 12 15 7.20 2.30 2.60 2.80 1333 95<br />

Xeon E5-2640 6 12 15 7.20 2.50 2.80 3.00 1333 95<br />

Xeon E5-2667 6 12 15 8.00 2.90 3.20 3.50 1600 130<br />

Xeon E5-2650L 8 16 20 8.00 1.80 2.00 2.30 1600 70<br />

Xeon E5-2650 8 16 20 8.00 2.00 2.40 2.80 1600 95<br />

Xeon E5-2660 8 16 20 8.00 2.20 2.70 3.00 1600 95<br />

Xeon E5-2665 8 16 20 8.00 2.40 2.80 3.10 1600 115<br />

Xeon E5-2670 8 16 20 8.00 2.60 3.00 3.30 1600 115<br />

Xeon E5-2680 8 16 20 8.00 2.70 3.10 3.50 1600 130<br />

Xeon E5-2690 8 16 20 8.00 2.90 3.30 3.80 1600 135<br />

Memory modules (since system release)<br />

Memory module<br />

2GB (1x2GB) 1Rx8 L DDR3-1600 U ECC<br />

(2 GB 1Rx8 PC3L-12800E)<br />

4GB (1x4GB) 2Rx8 L DDR3-1600 U ECC<br />

(4 GB 2Rx8 PC3L-12800E)<br />

4GB (1x4GB) 1Rx4 L DDR3-1333 R ECC<br />

(4 GB 1Rx4 PC3L-10600R)<br />

4GB (1x4GB) 1Rx4 L DDR3-1600 R ECC<br />

(4 GB 1Rx4 PC3L-12800R)<br />

4GB (1x4GB) 2Rx8 L DDR3-1600 R ECC<br />

(4 GB 2Rx8 PC3L-12800R)<br />

8GB (1x8GB) 2Rx4 L DDR3-1333 R ECC<br />

(8 GB 2Rx4 PC3L-10600R)<br />

8GB (1x8GB) 2Rx4 L DDR3-1600 R ECC<br />

(8 GB 2Rx4 PC3L-12800R)<br />

16GB (1x16GB) 4Rx4 L DDR3-1333 LR ECC<br />

(16 GB 4Rx4 PC3L-10600L)<br />

16GB (1x16GB) 2Rx4 L DDR3-1600 R ECC<br />

(16 GB 2Rx4 PC3L-12800R)<br />

32GB (1x32GB) 4Rx4 L DDR3-1333 LR ECC<br />

(32 GB 4Rx4 PC3L-10600L)<br />

Capacity [GB]<br />

Ranks<br />

Bit width of the<br />

memory chips<br />

Frequency [MHz]<br />

2 1 8 1600 <br />

4 2 8 1600 <br />

4 1 4 1333 <br />

4 1 4 1600 <br />

4 2 8 1600 <br />

8 2 4 1333 <br />

8 2 4 1600 <br />

16 4 4 1333 <br />

16 2 4 1600 <br />

32 4 4 1333 <br />

Low voltage<br />

Load reduced<br />

Registered<br />

ECC<br />

TDP<br />

[Watt]


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Power supplies (since system release) Max. number<br />

Power supply 450W (hot-plug) 2<br />

Power supply 800W (hot-plug) 2<br />

Some components may not be available in all countries or sales regions.<br />

Detailed technical information is available in the data sheet PRIMERGY RX300 S7.<br />

Page 6 (59) © Fujitsu Technology Solutions 2012


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

SPECcpu2006<br />

Benchmark description<br />

SPECcpu2006 is a benchmark which measures the system efficiency with integer and floating-point<br />

operations. It consists of an integer test suite (SPECint2006) containing 12 applications and a floating-point<br />

test suite (SPECfp2006) containing 17 applications. Both test suites are extremely computing-intensive and<br />

concentrate on the CPU and the memory. Other components, such as Disk I/O and network, are not<br />

measured by this benchmark.<br />

SPECcpu2006 is not tied to a special operating system. The benchmark is available as source code and is<br />

compiled before the actual measurement. The used compiler version and their optimization settings also<br />

affect the measurement result.<br />

SPECcpu2006 contains two different performance measurement methods: the first method (SPECint2006 or<br />

SPECfp2006) determines the time which is required to process single task. The second method<br />

(SPECint_rate2006 or SPECfp_rate2006) determines the throughput, i.e. the number of tasks that can be<br />

handled in parallel. Both methods are also divided into two measurement runs, ―base‖ and ―peak‖ which<br />

differ in the use of compiler optimization. When publishing the results the base values are always used; the<br />

peak values are optional.<br />

Benchmark Arithmetics Type Compiler<br />

optimization<br />

SPECint2006 integer peak aggressive<br />

SPECint_base2006 integer base conservative<br />

SPECint_rate2006 integer peak aggressive<br />

SPECint_rate_base2006 integer base conservative<br />

SPECfp2006 floating point peak aggressive<br />

SPECfp_base2006 floating point base conservative<br />

SPECfp_rate2006 floating point peak aggressive<br />

SPECfp_rate_base2006 floating point base conservative<br />

Measurement<br />

result<br />

Application<br />

Speed single-threaded<br />

Throughput multi-threaded<br />

Speed single-threaded<br />

Throughput multi-threaded<br />

The measurement results are the geometric average from normalized ratio values which have been<br />

determined for individual benchmarks. The geometric average - in contrast to the arithmetic average - means<br />

that there is a weighting in favour of the lower individual results. Normalized means that the measurement is<br />

how fast is the test system compared to a reference system. Value ―1‖ was defined for the<br />

SPECint_base2006-, SPECint_rate_base2006, SPECfp_base2006 and SPECfp_rate_base2006 results of<br />

the reference system. For example, a SPECint_base2006 value of 2 means that the measuring system has<br />

handled this benchmark twice as fast as the reference system. A SPECfp_rate_base2006 value of 4 means<br />

that the measuring system has handled this benchmark some 4/[# base copies] times faster than the<br />

reference system. ―# base copies‖ specify how many parallel instances of the benchmark have been<br />

executed.<br />

Not every SPECcpu2006 measurement is submitted by us for publication at SPEC. This is why the SPEC<br />

web pages do not have every result. As we archive the log files for all measurements, we can prove the<br />

correct implementation of the measurements at any time.<br />

© Fujitsu Technology Solutions 2012 Page 7 (59)


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Benchmark environment<br />

System Under Test (SUT)<br />

Hardware<br />

Model PRIMERGY RX300 S7<br />

Processor Xeon E5-2600 processor series<br />

Memory<br />

1 processor: 8 × 8GB (1x8GB) 2Rx4 L DDR3-1600 R ECC<br />

2 processors: 16 × 8GB (1x8GB) 2Rx4 L DDR3-1600 R ECC<br />

Power Supply Unit 2 × Power supply 450W (hot-plug)<br />

Software<br />

BIOS settings SPECint_base2006, SPECint2006, SPECfp_base2006, SPECfp2006:<br />

Processors other than Xeon E5-2603, E5-2609: Hyper-Threading = Disabled<br />

Operating system Red Hat Enterprise Linux Server release 6.2<br />

Operating system<br />

settings<br />

echo always > /sys/kernel/mm/redhat_transparent_hugepage/enabled<br />

Compiler Intel C++/Fortran Compiler 12.1<br />

Some components may not be available in all countries or sales regions.<br />

Page 8 (59) © Fujitsu Technology Solutions 2012


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Benchmark results<br />

In terms of processors the benchmark result depends primarily on the size of the processor cache, the<br />

support for Hyper-Threading, the number of processor cores and on the processor frequency. In the case of<br />

processors with Turbo mode the number of cores, which are loaded by the benchmark, determines the<br />

maximum processor frequency that can be achieved. In the case of single-threaded benchmarks, which<br />

largely load one core only, the maximum processor frequency that can be achieved is higher than with multithreaded<br />

benchmarks (see the processor table in the section "Technical Data").<br />

Processor<br />

Number of processors<br />

SPECint_base2006<br />

SPECint2006<br />

Number of processors<br />

Xeon E5-2637 2 45.1 47.5 1 96.7 101 2 187 196<br />

Xeon E5-2603 2 26.6 27.8 1 86.1 89.6 2 168 175<br />

Xeon E5-2609 2 34.6 36.3 1 111 116 2 217 226<br />

Xeon E5-2643 2 49.3 52.0 1 185 194 2 361 378<br />

Xeon E5-2630L 2 36.8 39.1 1 193 201 2 377 394<br />

Xeon E5-2620 2 37.0 39.3 1 193 202 2 376 393<br />

Xeon E5-2630 2 41.2 43.8 1 213 223 2 417 436<br />

Xeon E5-2640 2 43.8 46.6 1 227 237 2 444 463<br />

Xeon E5-2667 2 50.8 54.2 1 258 269 2 504 526<br />

Xeon E5-2650L 2 35.2 37.7 1 225 236 2 441 461<br />

Xeon E5-2650 2 42.1 45.4 1 265 276 2 517 540<br />

Xeon E5-2660 2 45.2 48.3 1 291 302 2 568 593<br />

Xeon E5-2665 2 47.0 50.3 1 300 313 2 587 613<br />

Xeon E5-2670 2 49.4 52.7 1 317 330 2 618 644<br />

Xeon E5-2680 2 52.2 56.0 1 326 339 2 638 664<br />

Xeon E5-2690 2 56.3 60.5 1 339 354 2 669 697<br />

© Fujitsu Technology Solutions 2012 Page 9 (59)<br />

SPECint_rate_base2006<br />

SPECint_rate2006<br />

Number of processors<br />

SPECint_rate_base2006<br />

SPECint_rate2006


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Processor<br />

Number of processors<br />

SPECfp_base2006<br />

SPECfp2006<br />

Number of processors<br />

Xeon E5-2637 2 65.9 67.9 1 89.5 92.4 2 175 181<br />

Xeon E5-2603 2 45.2 47.2 1 91.1 93.9 2 179 184<br />

Xeon E5-2609 2 56.7 59.1 1 111 114 2 219 225<br />

Xeon E5-2643 2 78.4 82.0 1 165 170 2 327 336<br />

Xeon E5-2630L 2 61.6 64.9 1 166 170 2 328 336<br />

Xeon E5-2620 2 61.6 64.9 1 166 170 2 329 337<br />

Xeon E5-2630 2 67.8 71.0 1 178 183 2 352 361<br />

Xeon E5-2640 2 70.9 74.6 1 185 190 2 367 376<br />

Xeon E5-2667 2 81.0 85.4 1 211 217 2 418 429<br />

Xeon E5-2650L 2 59.8 63.1 1 191 196 2 377 386<br />

Xeon E5-2650 2 66.9 71.0 1 212 218 2 421 432<br />

Xeon E5-2660 2 73.4 77.6 1 225 231 2 446 459<br />

Xeon E5-2665 2 75.3 79.7 1 230 237 2 456 469<br />

Xeon E5-2670 2 76.8 81.1 1 237 245 2 469 484<br />

Xeon E5-2680 2 81.7 86.5 1 242 249 2 479 493<br />

Xeon E5-2690 2 86.8 91.5 1 248 256 2 495 509<br />

Page 10 (59) © Fujitsu Technology Solutions 2012<br />

SPECfp_rate_base2006<br />

On 6 th March 2012 the PRIMERGY RX300 S7 with two Xeon E5-2690 processors was ranked<br />

first in the 2-socket systems category for the benchmark SPECint_base2006.<br />

On 6 th March 2012 the PRIMERGY RX300 S7 with two Xeon E5-2690 processors was ranked<br />

first in the Intel-based 2-socket systems category for the benchmark SPECfp_rate_base2006.<br />

On 13 th March 2012 the PRIMERGY RX300 S7 with two Xeon E5-2690 processors was ranked<br />

first in the 2-socket systems category for the benchmark SPECint_rate_base2006.<br />

On 13 th March 2012 the PRIMERGY RX300 S7 with two Xeon E5-2690 processors was ranked<br />

first in the 2-socket systems category for the benchmark SPECint_rate2006.<br />

On 13 th March 2012 the PRIMERGY RX300 S7 with two Xeon E5-2690 processors was ranked<br />

first in the Intel-based 2-socket systems category for the benchmark SPECfp_rate2006.<br />

The current results can be found at http://www.spec.org/cpu2006/results.<br />

SPECfp_rate2006<br />

Number of processors<br />

SPECfp_rate_base2006<br />

SPECfp_rate2006


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

The following four diagrams illustrate the throughput of the PRIMERGY RX300 S7 in comparison to its<br />

predecessor PRIMERGY RX300 S6, in their respective most performant configuration.<br />

700<br />

600<br />

500<br />

400<br />

300<br />

200<br />

100<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

0<br />

0<br />

SPECcpu2006: integer performance<br />

PRIMERGY RX300 S7 vs. PRIMERGY RX300 S6<br />

45.3<br />

47.9<br />

56.3<br />

60.5<br />

PRIMERGY RX300 S6 PRIMERGY RX300 S7<br />

2 x Xeon X5687 2 x Xeon E5-2690<br />

SPECcpu2006: integer performance<br />

PRIMERGY RX300 S7 vs. PRIMERGY RX300 S6<br />

389<br />

416<br />

669<br />

697<br />

PRIMERGY RX300 S6 PRIMERGY RX300 S7<br />

2 x Xeon X5690 2 x Xeon E5-2690<br />

SPECint2006<br />

SPECint_base2006<br />

SPECint_rate2006<br />

SPECint_rate_base2006<br />

© Fujitsu Technology Solutions 2012 Page 11 (59)


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

600<br />

500<br />

400<br />

300<br />

200<br />

100<br />

100<br />

0<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

0<br />

SPECcpu2006: floating-point performance<br />

PRIMERGY RX300 S7 vs. PRIMERGY RX300 S6<br />

62.0<br />

65.7<br />

86.8<br />

91.5<br />

PRIMERGY RX300 S6 PRIMERGY RX300 S7<br />

2 x Xeon X5687 2 x Xeon E5-2690<br />

SPECcpu2006: floating-point performance<br />

PRIMERGY RX300 S7 vs. PRIMERGY RX300 S6<br />

266<br />

273<br />

495<br />

509<br />

PRIMERGY RX300 S6 PRIMERGY RX300 S7<br />

2 x Xeon X5690 2 x Xeon E5-2690<br />

SPECfp2006<br />

SPECfp_base2006<br />

SPECfp_rate2006<br />

SPECfp_rate_base2006<br />

Page 12 (59) © Fujitsu Technology Solutions 2012


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

The two diagrams below reflect how the performance of the PRIMERGY RX300 S7 scales from one to two<br />

processors when using the Xeon E5-2690.<br />

700<br />

600<br />

500<br />

400<br />

300<br />

200<br />

100<br />

0<br />

600<br />

500<br />

400<br />

300<br />

200<br />

100<br />

0<br />

SPECcpu2006: integer performance<br />

PRIMERGY RX300 S7 (2 sockets vs. 1 socket)<br />

339<br />

354<br />

669<br />

697<br />

1 x Xeon E5-2690 2 x Xeon E5-2690<br />

SPECcpu2006: floating-point performance<br />

PRIMERGY RX300 S7 (2 sockets vs. 1 socket)<br />

248<br />

256<br />

495<br />

509<br />

1 x Xeon E5-2690 2 x Xeon E5-2690<br />

SPECint_rate2006<br />

SPECint_rate_base2006<br />

SPECfp_rate2006<br />

SPECfp_rate_base2006<br />

© Fujitsu Technology Solutions 2012 Page 13 (59)


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

SPECjbb2005<br />

Benchmark description<br />

SPECjbb2005 is a Java business benchmark that focuses on the performance of Java Server platforms.<br />

SPECjbb2005 is essentially a modernized SPECjbb2000. The main differences are:<br />

The transactions have become more complex in order to cover a greater functional scope.<br />

The working set of the benchmark has been enlarged to the extent that the total system load has<br />

increased.<br />

SPECjbb2000 allows only one active Java Virtual Machine instance (JVM) whereas SPECjbb2005<br />

permits several instances, which in turn achieves greater closeness to reality, particularly with large<br />

systems.<br />

On the software side SPECjbb2005 primarily measures the performance of the JVM used with its just-in-time<br />

compiler as well as their thread and garbage collection implementation. Some aspects of the operating<br />

system used also play a role. As far as hardware is concerned, it measures the efficiency of the CPUs and<br />

caches, the memory subsystem and the scalability of shared memory systems (SMP). Disk and network I/O<br />

are irrelevant.<br />

SPECjbb2005 emulates a 3-tier client/server system that is typical for modern business process applications<br />

with the emphasis on the middle-tier system:<br />

Clients generate the load, consisting of driver threads, which on the basis of TPC-C benchmark<br />

generate OLTP accesses to a database without thinking times.<br />

The middle tier system implements the business processes and the updating of the database.<br />

The database takes on the data management and is emulated by Java objects that are in the<br />

memory. Transaction logging is implemented on an XML basis.<br />

The major advantage of this benchmark is that it includes all three tiers that run together on a single host.<br />

The performance of the middle-tier is measured. Large-scale hardware installations are thus avoided and<br />

direct comparisons between the SPECjbb2005 results from the various systems are possible. Client and<br />

database emulation are also written in Java.<br />

SPECjbb2005 only needs the operating system as well as a Java Virtual Machine with J2SE 5.0 features.<br />

The scaling unit is a warehouse with approx. 25 MB Java objects. Precisely one Java thread per warehouse<br />

executes the operations on these objects. The business operations are assumed by TPC-C:<br />

New Order Entry<br />

Payment<br />

Order Status Inquiry<br />

Delivery<br />

Stock Level Supervision<br />

Customer Report<br />

However, these are the only features SPECjbb2005 and TPC-C have in common. The results of the two<br />

benchmarks are not comparable.<br />

SPECjbb2005 has 2 performance metrics:<br />

bops (business operations per second) is the overall rate of all business operations performed per<br />

second.<br />

bops/JVM is the ratio of the first metrics and the number of active JVM instances.<br />

In comparisons of various SPECjbb2005 results, both metrics must be specified.<br />

The following rules, according to which a compliant benchmark run has to be performed, are the basis for<br />

these three metrics:<br />

A compliant benchmark run consists of a sequence of measuring points with an increasing number of<br />

warehouses (and thus of threads) with the number in each case being increased by one warehouse. The run<br />

is started at one warehouse up through 2*MaxWh, but not less than 8 warehouses. MaxWh is the number of<br />

warehouses with the highest rate per second the benchmark expects. Per default the benchmark equates<br />

MaxWh with the number of CPUs visible by the operating system.<br />

The metric bops is the arithmetic average of all measured operation rates with MaxWh warehouses up to<br />

2*MaxWh warehouses.<br />

Page 14 (59) © Fujitsu Technology Solutions 2012


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Benchmark environment<br />

System Under Test (SUT)<br />

Hardware<br />

Model PRIMERGY RX300 S7<br />

Power Supply Unit 2 × Power supply 800W (hot-plug)<br />

Processor 2 × Xeon E5-2690<br />

Memory 16 × 8GB (1x8GB) 2Rx4 L DDR3-1600 R ECC<br />

Software<br />

BIOS settings Hardware Prefetch = Disable<br />

Adjacent Sector Prefetch = Disable<br />

DCU Streamer Prefetch = Disable<br />

SAS/SATA OpROM = LSI MegaRAID<br />

Operating system Microsoft Windows Server 2008 R2 Enterprise SP1<br />

Operating system<br />

settings<br />

Using the local security settings console, ―lock pages in memory‖ was enabled for the user<br />

running the benchmark.‖<br />

JVM Oracle Java HotSpot(TM) 64-Bit Server VM on Windows, version 1.6.0_31<br />

JVM settings start /HIGH /AFFINITY [0xFFFF,0xFFFF0000] /B java -server -Xmx29g -Xms29g -Xmn24g -<br />

XX:BiasedLockingStartupDelay=200 -XX:ParallelGCThreads=16 -XX:SurvivorRatio=60 -<br />

XX:TargetSurvivorRatio=90 -XX:InlineSmallCode=3900 -XX:MaxInlineSize=270 -<br />

XX:FreqInlineSize=2500 -XX:AllocatePrefetchDistance=256 -XX:AllocatePrefetchLines=4 -<br />

XX:InitialTenuringThreshold=12 -XX:MaxTenuringThreshold=15 -XX:LoopUnrollLimit=45 -<br />

XX:+UseCompressedStrings -XX:+AggressiveOpts -XX:+UseLargePages -<br />

XX:+UseParallelOldGC -XX:-UseAdaptiveSizePolicy<br />

Some components may not be available in all countries or sales regions.<br />

Benchmark results<br />

SPECjbb2005 bops = 1536588<br />

SPECjbb2005 bops/JVM = 768294<br />

The following diagrams illustrate the throughput of the PRIMERGY RX300 S7 in comparison to its<br />

predecessor PRIMERGY RX300 S6, in their respective most performant configuration.<br />

SPECjbb2005 bops:<br />

PRIMERGY RX300 S7 vs. RX300 S6<br />

SPECjbb2005 bops:<br />

PRIMERGY RX300 S7 vs. RX300 S6<br />

© Fujitsu Technology Solutions 2012 Page 15 (59)


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

SPECpower_ssj2008<br />

Benchmark description<br />

SPECpower_ssj2008 is the first industry-standard SPEC benchmark that evaluates the power and<br />

performance characteristics of a server. With SPECpower_ssj2008 SPEC has defined standards for server<br />

power measurements in the same way they have done for performance.<br />

The benchmark workload represents typical server-side Java business applications. The workload is<br />

scalable, multi-threaded, portable across a wide range of platforms and easy to run. The benchmark tests<br />

CPUs, caches, the memory hierarchy and scalability of symmetric multiprocessor systems (SMPs), as well<br />

as the implementation of Java Virtual Machine (JVM), Just In Time (JIT) compilers, garbage collection,<br />

threads and some aspects of the operating system.<br />

SPECpower_ssj2008 <strong>report</strong>s power consumption for<br />

servers at different performance levels — from 100% to<br />

―active idle‖ in 10% segments — over a set period of<br />

time. The graduated workload recognizes the fact that<br />

processing loads and power consumption on servers<br />

vary substantially over the course of days or weeks. To<br />

compute a power-performance metric across all levels,<br />

measured transaction throughputs for each segment are<br />

added together and then divided by the sum of the<br />

average power consumed for each segment. The result<br />

is a figure of merit called ―overall ssj_ops/watt‖. This<br />

ratio provides information about the energy efficiency of<br />

the measured server. The defined measurement<br />

standard enables customers to compare it with other<br />

configurations and servers measured with<br />

SPECpower_ssj2008. The diagram shows a typical<br />

graph of a SPECpower_ssj2008 result.<br />

The benchmark runs on a wide variety of<br />

operating systems and hardware<br />

architectures and does not require extensive<br />

client or storage infrastructure. The minimum<br />

equipment for SPEC-compliant testing is two<br />

networked computers, plus a power analyzer<br />

and a temperature sensor. One computer is<br />

the System Under Test (SUT) which runs<br />

one of the supported operating systems and<br />

the JVM. The JVM provides the environment<br />

required to run the SPECpower_ssj2008<br />

workload which is implemented in Java. The<br />

other computer is a ―Control & Collection<br />

System‖ (CCS) which controls the operation<br />

of the benchmark and captures the power,<br />

performance and temperature readings for<br />

<strong>report</strong>ing. The diagram provides an overview<br />

of the basic structure of the benchmark<br />

configuration and the various components.<br />

Page 16 (59) © Fujitsu Technology Solutions 2012


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Benchmark environment<br />

System Under Test (SUT)<br />

Hardware<br />

Model PRIMERGY RX300 S7<br />

Model version Basic unit with 2.5" HDD bays expandable<br />

Processor 2 × Xeon E5-2660<br />

Memory Measurement with Oracle Java HotSpot VM: 8 × 4GB (1x4GB) 2Rx8 L DDR3-1600 U ECC<br />

Measurement with IBM J9 VM: 6 × 4GB (1x4GB) 2Rx8 L DDR3-1600 U ECC<br />

Network-Interface Onboard LAN-Controller (1 port used)<br />

Disk-Subsystem Onboard HDD-Controller<br />

Measurement with Oracle Java HotSpot VM:<br />

1 × SSD SATA 3G 32GB SLC HOT PLUG 2.5" EP<br />

Measurement with IBM J9 VM:<br />

1 × HD SATA 6G 250GB 7.2K HOT PL 2.5" BC<br />

Power Supply Unit 1 × Power supply 450W (hot-plug)<br />

Software<br />

BIOS Measurement with Oracle Java HotSpot VM: R1.1.0<br />

Measurement with IBM J9 VM: R1.13.0<br />

BIOS settings Adjacent Sector Prefetch = Disabled<br />

Hardware Prefetch = Disabled<br />

DCU Streamer Prefetch = Disabled<br />

DDR <strong>Performance</strong> = Low-Voltage optimized<br />

USB Port Control = Enable internal ports only<br />

QPI Link Speed = 6.4GT/s<br />

P-State coordination = SW_ANY<br />

Intel Virtualization Technology = Disabled<br />

SAS/SATA OpROM = LSI MegaRAID<br />

ASPM Support = Auto<br />

LAN Controller = LAN 1<br />

Firmware Measurement with Oracle Java HotSpot VM: 6.45<br />

Measurement with IBM J9 VM: 6.53A<br />

Operating system Microsoft Windows Server 2008 R2 Enterprise SP1<br />

Operating system<br />

settings<br />

Using the local security settings console, ―lock pages in memory‖ was enabled for the user<br />

running the benchmark.<br />

Power Management: Enabled (―Fujitsu Enhanced Power Settings‖ power plan)<br />

Set ―Turn off hard disk after = 1 Minute‖ in OS.<br />

Benchmark was started via Windows Remote Desktop Connection.<br />

JVM Measurement with Oracle Java HotSpot VM:<br />

Oracle Java HotSpot(TM) 64-Bit Server VM on Windows, version 1.6.0_30<br />

Measurement with IBM J9 VM:<br />

IBM J9 VM (build 2.6, JRE 1.7.0 Windows Server 2008 R2 amd64-64 20120322_106209<br />

(JIT enabled, AOT enabled)<br />

JVM settings start /NODE [0,1] /AFFINITY [0x3,0xC,0x30,0xC0,0x300,0xC00,0x3000,0xC000]<br />

Measurement with Oracle Java HotSpot VM:<br />

-server -Xmx1024m -Xms1024m -Xmn853m -XX:ParallelGCThreads=2<br />

-XX:SurvivorRatio=60 -XX:TargetSurvivorRatio=90 -XX:InlineSmallCode=3900<br />

-XX:MaxInlineSize=270 -XX:FreqInlineSize=2500 -XX:AllocatePrefetchDistance=256<br />

-XX:AllocatePrefetchLines=4 -XX:InitialTenuringThreshold=12<br />

-XX:MaxTenuringThreshold=15 -XX:LoopUnrollLimit=45 -XX:+UseCompressedStrings<br />

-XX:+AggressiveOpts -XX:+UseLargePages -XX:+UseParallelOldGC<br />

Measurement with IBM J9 VM:<br />

-Xaggressive -Xcompressedrefs -Xgcpolicy:gencon -Xmn800m -Xms1024m<br />

-Xmx1024m -XlockReservation -Xnoloa -XtlhPrefetch -Xlp -Xconcurrentlevel0<br />

© Fujitsu Technology Solutions 2012 Page 17 (59)


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Other software Measurement with Oracle Java HotSpot VM: none<br />

Measurement with IBM J9 VM:<br />

IBM SDK Java Technology Edition Version 7.0 for Windows x64<br />

ServerView Agent for Windows<br />

ServerView RAID Manager<br />

Some components may not be available in all countries or sales regions.<br />

Benchmark results<br />

Measurement with Oracle Java HotSpot VM<br />

The PRIMERGY RX300 S7 achieved the following result:<br />

SPECpower_ssj2008 = 5,032 overall ssj_ops/watt<br />

The adjoining diagram shows the result<br />

of the configuration described above.<br />

The red horizontal bars show the<br />

performance to power ratio in<br />

ssj_ops/watt (upper x-axis) for each<br />

target load level tagged on the y-axis of<br />

the diagram. The blue line shows the<br />

run of the curve for the average power<br />

consumption (bottom x-axis) at each<br />

target load level marked with a small<br />

rhomb. The black vertical line shows<br />

the benchmark result of 5,032 overall<br />

ssj_ops/watt for the PRIMERGY RX300<br />

S7. This is the quotient of the sum of<br />

the transaction throughputs for each<br />

load level and the sum of the average<br />

power consumed for each measurement<br />

interval.<br />

Page 18 (59) © Fujitsu Technology Solutions 2012


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

The following table shows the benchmark results for the throughput in ssj_ops, the power consumption in<br />

watts and the resulting energy efficiency for each load level.<br />

<strong>Performance</strong> Power Energy Efficiency<br />

Target Load ssj_ops Average Power (W) ssj_ops/watt<br />

100% 1,343,300 245 5,483<br />

90% 1,209,714 217 5,563<br />

80% 1,078,110 187 5,777<br />

70% 938,069 156 6,030<br />

60% 808,997 134 6,024<br />

50% 673,417 117 5,740<br />

40% 537,643 105 5,109<br />

30% 403,053 94.9 4,249<br />

20% 269,431 85.3 3,160<br />

10% 133,103 74.9 1,777<br />

Active Idle 0 53.1 0<br />

∑ssj_ops / ∑power = 5,032<br />

The PRIMERGY RX300 S7 achieved a new world record with this result, thus surpassing the<br />

best result of the competition by 6.4% (date: March 21, 2012). Thus, the PRIMERGY RX300 S7<br />

proves itself to be the most energy-efficient single-node server in the world. For the latest<br />

SPECpower_ssj2008 benchmark results, visit: http://www.spec.org/power_ssj2008/results.<br />

SPECpower_ssj2008: PRIMERGY RX300 S7 vs. competition<br />

The comparison with the competition makes<br />

the advantage of the PRIMERGY RX300 S7 in<br />

the field of energy efficiency evident. With<br />

6.4% more energy efficiency than the best<br />

result of the competition in the single-node<br />

server category, the Dell PowerEdge T620<br />

server, and 8% more energy efficiency than<br />

the IBM system x3650, which just like the<br />

PRIMERGY RX300 S7 belongs to the category<br />

of 2U 2-socket rack servers, the<br />

PRIMERGY RX300 S7 is setting new standards.<br />

© Fujitsu Technology Solutions 2012 Page 19 (59)


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Measurement with IBM J9 VM<br />

The PRIMERGY RX300 S7 achieved the following result:<br />

SPECpower_ssj2008 = 5,406 overall ssj_ops/watt<br />

The adjoining diagram shows the result<br />

of the configuration described above.<br />

The red horizontal bars show the<br />

performance to power ratio in<br />

ssj_ops/watt (upper x-axis) for each<br />

target load level tagged on the y-axis<br />

of the diagram. The blue line shows<br />

the run of the curve for the average<br />

power consumption (bottom x-axis) at<br />

each target load level marked with a<br />

small rhomb. The black vertical line<br />

shows the benchmark result of 5,406<br />

overall ssj_ops/watt for the<br />

PRIMERGY RX300 S7. This is the<br />

quotient of the sum of the transaction<br />

throughputs for each load level and the<br />

sum of the average power consumed<br />

for each measurement interval.<br />

The following table shows the benchmark results for the throughput in ssj_ops, the power consumption in<br />

watts and the resulting energy efficiency for each load level.<br />

<strong>Performance</strong> Power Energy Efficiency<br />

Target Load ssj_ops Average Power (W) ssj_ops/watt<br />

100% 1,432,829 245 5,859<br />

90% 1,291,012 216 5,988<br />

80% 1,149,959 183 6,289<br />

70% 1,003,836 153 6,555<br />

60% 863,137 132 6,516<br />

50% 720,232 117 6,173<br />

40% 573,470 106 5,435<br />

30% 431,904 95.2 4,535<br />

20% 287,140 85.4 3,361<br />

10% 143,632 75.3 1,906<br />

Active Idle 0 54.0 0<br />

∑ssj_ops / ∑power = 5,406<br />

The PRIMERGY RX300 S7 achieved a new class record with this result, thus surpassing the<br />

best result of the competition by 0.6% (date: September 19, 2012). Thus, the PRIMERGY<br />

RX300 S7 proves itself to be the most energy-efficient 2-socket rack server in the world. The<br />

current results can be found at http://www.spec.org/power_ssj2008/results.<br />

Page 20 (59) © Fujitsu Technology Solutions 2012


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

SPECpower_ssj2008: PRIMERGY RX300 S7 vs. competition<br />

The comparison with the competition makes<br />

the advantage of the PRIMERGY RX300 S7 in<br />

the field of energy efficiency evident. With<br />

0.6% more energy efficiency than the best<br />

result of the competition in the 2-socket rack<br />

server category, the Dell PowerEdge R720<br />

server, the PRIMERGY RX300 S7 is setting<br />

new standards.<br />

The following diagram shows for each load level the power consumption (on the right y-axis) and the<br />

throughput (on the left y-axis) of the PRIMERGY RX300 S7 compared to the predecessor the PRIMERGY<br />

RX300 S6.<br />

SPECpower_ssj2008: PRIMERGY RX300 S7 vs. PRIMERGY RX300 S6<br />

© Fujitsu Technology Solutions 2012 Page 21 (59)


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Thanks to the new Sandy Bridge microarchitecture<br />

and the 7% higher-performing<br />

IBM J9 VM the PRIMERGY RX300 S7 has in<br />

comparison with the PRIMERGY RX300 S6 a<br />

substantially higher throughput and considerably<br />

lower power consumption.<br />

Both result in an overall increase in energy<br />

efficiency in the PRIMERGY RX300 S7 of<br />

86%.<br />

SPECpower_ssj2008 overall ssj_ops/watt:<br />

PRIMERGY RX300 S7 vs. PRIMERGY RX300 S6<br />

Page 22 (59) © Fujitsu Technology Solutions 2012


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Disk-I/O<br />

Benchmark description<br />

<strong>Performance</strong> measurements of disk subsystems for PRIMERGY servers are used to assess their<br />

performance and enable a comparison of the different storage connections for PRIMERGY servers. As<br />

standard, these performance measurements are carried out with a defined measurement method, which<br />

models the hard disk accesses of real application scenarios on the basis of specifications.<br />

The essential specifications are:<br />

Share of random accesses / sequential accesses<br />

Share of read / write access types<br />

Block size (kB)<br />

Number of parallel accesses (# of outstanding I/Os)<br />

A given value combination of these specifications is known as ―load profile‖. The following five standard load<br />

profiles can be allocated to typical application scenarios:<br />

Standard load<br />

profile<br />

In order to model applications that access in parallel with a different load intensity, the ―# of Outstanding<br />

I/Os‖ is increased, starting with 1, 3, 8 and going up to 512 (from 8 onwards in increments to the power of<br />

two).<br />

The measurements of this document are based on these standard load profiles.<br />

The main results of a measurement are:<br />

Access Type of access Block size<br />

read write<br />

[kB]<br />

Throughput [MB/s] Throughput in megabytes per second<br />

Transactions [IO/s] Transaction rate in I/O operations per second<br />

Latency [ms] Average response time in ms<br />

The data throughput has established itself as the normal measurement variable for sequential load profiles,<br />

whereas the measurement variable ―transaction rate‖ is mostly used for random load profiles with their small<br />

block sizes. Data throughput and transaction rate are directly proportional to each other and can be<br />

transferred to each other according to the formula<br />

Data throughput [MB/s] = Transaction rate [IO/s] × Block size [MB]<br />

Transaction rate [IO/s] = Data throughput [MB/s] / Block size [MB]<br />

Application<br />

File copy random 50% 50% 64 Copying of files<br />

File server random 67% 33% 64 File server<br />

Database random 67% 33% 8<br />

Streaming sequential 100% 0% 64<br />

Database (data transfer)<br />

Mail server<br />

Database (log file),<br />

Data backup;<br />

Video streaming (partial)<br />

Restore sequential 0% 100% 64 Restoring of files<br />

This section specifies hard disk capacities on a basis of 10 (1 TB = 10 12 bytes) while all other capacities, file<br />

sizes, block sizes and throughputs are specified on a basis of 2 (1 MB/s = 2 20 bytes/s).<br />

All the details of the measurement method and the basics of disk I/O performance are described in the white<br />

paper ―Basics of Disk I/O <strong>Performance</strong>‖.<br />

© Fujitsu Technology Solutions 2012 Page 23 (59)


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Benchmark environment<br />

All the measurement results were determined using the hardware and software components listed below.<br />

System Under Test (SUT)<br />

Hardware<br />

Controller 1 × ―LSI SW RAID on Intel C600 (Onboard SATA)‖<br />

1 × ―LSI SW RAID on Intel C600 (Onboard SAS)‖<br />

1 × ―RAID Ctrl SAS 6G 0/1‖<br />

1 × ―RAID Ctrl SAS 5/6 512MB (D2616)‖<br />

1 × ―RAID Ctrl SAS 6G 5/6 1GB (D3116)‖<br />

Drive 16 × EP HDD SAS 6 Gbit/s 2.5 15000 rpm 146 GB<br />

6 × EP HDD SAS 6 Gbit/s 3.5 15000 rpm 300 GB<br />

16 × EP SSD SAS 6 Gbit/s 2.5 200 GB MLC<br />

4 × BC HDD SATA 6 Gbit/s 2.5 7200 rpm 1 TB<br />

Software<br />

Operating system Microsoft Windows Server 2008 Enterprise x64 Edition SP2<br />

Administration<br />

software<br />

Initialization of RAID<br />

arrays<br />

File system NTFS<br />

ServerView RAID Manager 5.0.2<br />

Measuring tool Iometer 27.07.2006<br />

RAID arrays are initialized before the measurement with an elementary block size of 64 kB<br />

(―stripe size‖)<br />

Measurement data Measurement files of 32 GB with 1 – 8 hard disks; 64 GB with 9 – 16 hard disks;<br />

128 GB with 17 or more hard disks<br />

Some components may not be available in all countries / sales regions.<br />

Page 24 (59) © Fujitsu Technology Solutions 2012


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Benchmark results<br />

The results presented here are designed to help you choose the right solution from the various configuration<br />

options of the PRIMERGY RX300 S7 in the light of disk-I/O performance. The selection of suitable<br />

components and the right settings of their parameters is important here. These two aspects should therefore<br />

be dealt with as preparation for the discussion of the performance values.<br />

Components<br />

The hard disks are the first essential component. If there is a reference below to ―hard disks‖, this is meant<br />

as the generic term for HDDs (―hard disk drives‖, in other words conventional hard disks) and SSDs (―solid<br />

state drives‖, i.e. non-volatile electronic storage media). When selecting the type of hard disk and number of<br />

hard disks you can move the weighting in the direction of storage capacity, performance, security or price. In<br />

order to enable a pre-selection of the hard disk types – depending on the required weighting – the hard disk<br />

types for PRIMERGY servers are divided into three classes:<br />

―Economic‖ (ECO): low-priced hard disks<br />

―Business Critical‖ (BC): very failsafe hard disks<br />

―Enterprise‖ (EP): very failsafe and very high-performance hard disks<br />

The following table is a list of the hard disk types that have been available for the PRIMERGY RX300 S7<br />

since system release.<br />

Drive<br />

class<br />

Data medium<br />

type<br />

Interface<br />

Form<br />

factor<br />

© Fujitsu Technology Solutions 2012 Page 25 (59)<br />

krpm<br />

Business Critical HDD SATA 6G 2.5" 7.2<br />

Business Critical HDD SATA 6G 3.5" 7.2<br />

Enterprise HDD SAS 6G 3.5" 15<br />

Enterprise HDD SAS 6G 2.5" 10, 15<br />

Enterprise SSD SATA 6G 2.5" -<br />

Enterprise SSD SAS 6G 2.5" -<br />

Mixed drive configurations of SAS and SATA hard disks in one system are permitted, unless they are<br />

excluded in the configurator for special hard disk types.<br />

The SATA-HDDs offer high capacities right up into the terabyte range at a very low cost. The SAS-HDDs<br />

have shorter access times and achieve higher throughputs due to the higher rotational speed of the SAS-<br />

HDDs (in comparison with the SATA-HDDs). SAS-HDDs with a rotational speed of 15 krpm have better<br />

access times and throughputs than comparable HDDs with a rotational speed of 10 krpm. The 6G interface<br />

has in the meantime established itself as the standard among the SAS-HDDs.<br />

Of all the hard disk types SSDs offer on the one hand by far the highest transaction rates for random load<br />

profiles, and on the other hand the shortest access times. In return, however, the price per gigabyte of<br />

storage capacity is substantially higher.<br />

More hard disks per system are possible as a result of using 2.5" hard disks instead of 3.5" hard disks.<br />

Consequently, the load that each individual hard disk has to overcome decreases and the maximum overall<br />

performance of the system increases.<br />

More detailed performance statements about hard disk types are available in the white paper ―Single Disk<br />

<strong>Performance</strong>‖.<br />

The maximum number of hard disks in the system depends on the system configuration. The following table<br />

lists the essential cases.<br />

Form<br />

factor<br />

Interface<br />

Connection<br />

type<br />

Number of<br />

PCIe<br />

controllers<br />

Maximum<br />

number of<br />

hard disks<br />

2.5" SATA 3G, SAS 3G direct 0 4<br />

3.5" SATA 3G/6G, SAS 6G direct 1 6<br />

2.5" SATA 3G/6G, SAS 6G direct 1 16


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

After the hard disks the RAID controller is the second performance-determining key component. In the case<br />

of these controllers the ―modular RAID‖ concept of the PRIMERGY servers offers a plethora of options to<br />

meet the various requirements of a wide range of different application scenarios.<br />

The following table summarizes the most important features of the available RAID controllers of the system.<br />

A short alias is specified here for each controller, which is used in the subsequent list of the performance<br />

values.<br />

Controller name Alias Cache Supported<br />

interfaces<br />

LSI SW RAID on Intel<br />

C600 (Onboard SATA)<br />

LSI SW RAID on Intel<br />

C600 (Onboard SAS)<br />

RAID Ctrl SAS 6G 0/1<br />

(D2607)<br />

RAID Ctrl SAS 6G 5/6 512<br />

MB (D2616)<br />

RAID Ctrl SAS 6G 5/6 1GB<br />

(D3116)<br />

Max. # disks<br />

in the system<br />

RAID levels BBU/<br />

FBU<br />

Patsburg A - SATA 3G - 4 × 2.5" 0, 1, 10 -/-<br />

Patsburg B - SATA 3G<br />

SAS 3G<br />

LSI2008 - SATA 3G/6G<br />

SAS 3G/6G<br />

LSI2108 512 MB SATA 3G/6G<br />

SAS 3G/6G<br />

LSI2208-1G 1 GB SATA 3G/6G<br />

SAS 3G/6G<br />

- 4 × 2.5" 0, 1, 10 -/-<br />

PCIe 2.0<br />

x8<br />

PCIe 2.0<br />

x8<br />

PCIe 2.0<br />

x8<br />

8 × 2.5"<br />

6 × 3.5"<br />

16 × 2.5"<br />

6 × 3.5"<br />

16 × 2.5"<br />

6 × 3.5"<br />

0, 1, 1E, 10 -/-<br />

0, 1, 5, 6, 10,<br />

50, 60<br />

0, 1, 1E, 5, 6,<br />

10, 50, 60<br />

The onboard RAID controller is implemented in the chipset Intel C600 on the motherboard of the server and<br />

uses the CPU of the server for the RAID functionality. This controller is a simple solution that does not<br />

require a PCIe slot. In addition to the invariably available connection option of SATA hard disks, the<br />

additional SAS functionality can be activated via an ―SAS enabling key‖.<br />

System-specific interfaces<br />

The interfaces of a controller to the motherboard and to the hard disks have in each case specific limits for<br />

data throughput. These limits are listed in the following table. The minimum of these two values is a definite<br />

limit, which cannot be exceeded. This value is highlighted in bold in the following table.<br />

Controller<br />

alias<br />

Effective in the configuration Connection<br />

# Disk<br />

channels<br />

Limit for<br />

throughput of<br />

disk interface<br />

PCIe<br />

version<br />

PCIe<br />

width<br />

Limit for<br />

throughput of<br />

PCIe interface<br />

via<br />

expander<br />

Patsburg A 4 × SATA 3G 973 MB/s - - - -<br />

Patsburg B 4 × SAS 3G 973 MB/s - - - -<br />

LSI2008 8 × SAS 6G 3890 MB/s 2.0 X8 3433 MB/s -<br />

LSI2108 8 × SAS 6G 3890 MB/s 2.0 X8 3433 MB/s <br />

LSI2208-1G 8 × SAS 6G 3890 MB/s 2.0 X8 3433 MB/s <br />

An expander makes it possible to connect more hard disks in a system than the SAS channels that the<br />

controller has. An expander cannot increase the possible maximum throughput of a controller, but makes it<br />

available in total to all connected hard disks.<br />

More details about the RAID controllers of the PRIMERGY systems are available in the white paper ―RAID<br />

Controller <strong>Performance</strong>‖.<br />

Page 26 (59) © Fujitsu Technology Solutions 2012<br />

/-<br />

-/


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Settings<br />

In most cases, the cache of the hard disks has a great influence on disk-I/O performance. It is frequently<br />

regarded as a security problem in case of power failure and is thus switched off. On the other hand, it was<br />

integrated by hard disk manufacturers for the good reason of increasing the write performance. For<br />

performance reasons it is therefore advisable to enable the hard disk cache. This is particular valid for SATA-<br />

HDDs. The performance can as a result increase more than tenfold for specific access patterns and hard<br />

disk types. More information about the performance impact of the hard disk cache is available in the<br />

document ―Single Disk <strong>Performance</strong>‖. To prevent data loss in case of power failure you are recommended to<br />

equip the system with a UPS.<br />

In the case of controllers with a cache there are several parameters that can be set. The optimal settings can<br />

depend on the RAID level, the application scenario and the type of data medium. In the case of RAID levels<br />

5 and 6 in particular (and the more complex RAID level combinations 50 and 60) it is obligatory to enable the<br />

controller cache for application scenarios with write share. If the controller cache is enabled, the data<br />

temporarily stored in the cache should be safeguarded against loss in case of power failure. Suitable<br />

accessories are available for this purpose (e.g. a BBU or FBU).<br />

For the purpose of easy and reliable handling of the settings for RAID controllers and hard disks it is<br />

advisable to use the RAID-Manager software ―ServerView RAID‖ that is supplied for PRIMERGY servers. All<br />

the cache settings for controllers and hard disks can usually be made en bloc – specifically for the<br />

application – by using the pre-defined modi ―<strong>Performance</strong>‖ or ―Data Protection‖. The ―<strong>Performance</strong>‖ mode<br />

ensures the best possible performance settings for the majority of the application scenarios.<br />

More information about the setting options of the controller cache is available in the white paper ―RAID<br />

Controller <strong>Performance</strong>‖.<br />

<strong>Performance</strong> values<br />

In general, disk-I/O performance of a RAID array depends on the type and number of hard disks, on the<br />

RAID level and on the RAID controller. If the limits of the system-specific interfaces are not exceeded, the<br />

statements on disk-I/O performance are therefore valid for all PRIMERGY systems. This is why all the<br />

performance statements of the document ―RAID Controller <strong>Performance</strong>‖ also apply for the PRIMERGY<br />

RX300 S7 if the configurations measured there are also supported by this system.<br />

The performance values of the system are listed in table form below, specifically for different RAID levels,<br />

access types and block sizes. Substantially different configuration versions are dealt with separately.<br />

The performance values in the following tables use the established measurement variables, as already<br />

mentioned in the subsection Benchmark description. Thus, transaction rate is specified for random accesses<br />

and data throughput for sequential accesses. To avoid any confusion among the measurement units the<br />

tables have been separated for the two access types.<br />

The table cells contain the maximum achievable values. This has three implications: On the one hand hard<br />

disks with optimal performance were used (the components used are described in more detail in the<br />

subsection Benchmark environment). Furthermore, cache settings of controllers and hard disks, which are<br />

optimal for the respective access scenario and the RAID level, are used as a basis. And ultimately each<br />

value is the maximum value for the entire load intensity range (# of outstanding I/Os).<br />

In order to also visualize the numerical values each table cell is highlighted with a horizontal bar, the length<br />

of which is proportional to the numerical value in the table cell. All bars shown in the same scale of length<br />

have the same color. In other words, a visual comparison only makes sense for table cells with the same<br />

colored bars.<br />

Since the horizontal bars in the table cells depict the maximum achievable performance values, they are<br />

shown by the color getting lighter as you move from left to right. The light shade of color at the right end of<br />

the bar tells you that the value is a maximum value and can only be achieved under optimal prerequisites.<br />

The darker the shade becomes as you move to the left, the more frequently it will be possible to achieve the<br />

corresponding value in practice.<br />

© Fujitsu Technology Solutions 2012 Page 27 (59)


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Random accesses (performance values in IO/s):<br />

RAID<br />

Controller<br />

Configuration version<br />

Patsburg A SATA 2.5"<br />

Patsburg B SAS 2.5"<br />

LSI2008 SAS 2.5"<br />

LSI2008 SAS<br />

Interface<br />

Form factor<br />

3.5"<br />

LSI2108 SAS 2.5"<br />

LSI2108 SAS 3.5"<br />

LSI2208-1G SAS 2.5"<br />

LSI2208-1G SAS 3.5"<br />

# Disks<br />

RAID level<br />

HDDs random,<br />

8 kB blocks,<br />

67% read, [IO/s]<br />

HDDs random,<br />

64 kB blocks,<br />

67% read, [IO/s]<br />

SSDs random,<br />

8 kB blocks,<br />

67% read, [IO/s]<br />

2 RAID 1 550 447 N/A N/A<br />

4 RAID 0 1073 583 N/A N/A<br />

4 RAID10 828 446 N/A N/A<br />

2 RAID 1 804 694 17736 3916<br />

4 RAID 0 1830 1015 37028 8333<br />

4 RAID10 1347 744 29082 6779<br />

2 RAID 1 820 702 17649 4117<br />

8 RAID 0 3491 1980 40766 12706<br />

8 RAID10 2716 1516 28692 10539<br />

2 RAID 1 868 729 N/A N/A<br />

6 RAID 0 2708 1548 N/A N/A<br />

6 RAID10 2090 1160 N/A N/A<br />

2 RAID 1 859 679 19002 4400<br />

16 RAID 10 7944 4124 25172 15894<br />

16 RAID 0 10460 5606 77421 25486<br />

16 RAID 5 6324 3555 19675 12245<br />

2 RAID 1 1042 730 N/A N/A<br />

6 RAID10 3110 1600 N/A N/A<br />

6 RAID 0 4216 2149 N/A N/A<br />

6 RAID 5 2241 1138 N/A N/A<br />

2 RAID 1 1109 863 20201 4362<br />

16 RAID 10 8135 4232 59199 31605<br />

16 RAID 0 10460 5606 182054 44447<br />

16 RAID 5 5835 3257 41271 21162<br />

2 RAID 1 1105 746 N/A N/A<br />

6 RAID 10 3162 1632 N/A N/A<br />

6 RAID 0 4384 2246 N/A N/A<br />

6 RAID 5 2316 1259 N/A N/A<br />

SSDs random,<br />

64 kB blocks,<br />

67% read, [IO/s]<br />

Page 28 (59) © Fujitsu Technology Solutions 2012


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Sequential accesses (performance values in MB/s):<br />

RAID<br />

Controller<br />

Configuration version<br />

Patsburg A SATA 2.5"<br />

Patsburg B SAS 2.5"<br />

LSI2008 SAS 2.5"<br />

LSI2008 SAS 3.5"<br />

LSI2108 SAS 2.5"<br />

LSI2108 SAS 3.5"<br />

LSI2208-1G SAS 2.5"<br />

LSI2208-1G SAS 3.5"<br />

Interface<br />

Form factor<br />

# Disks<br />

RAID level<br />

HDDs sequential,<br />

64 kB blocks,<br />

100% read, [MB/s]<br />

HDDs sequential,<br />

64 kB blocks,<br />

100% write, [MB/s]<br />

SSDs sequential,<br />

64 kB blocks,<br />

100% read, [MB/s]<br />

2 RAID 1 112 108 N/A N/A<br />

4 RAID 0 422 419 N/A N/A<br />

4 RAID10 226 213 N/A N/A<br />

2 RAID 1 199 192 504 180<br />

4 RAID 0 780 770 953 642<br />

4 RAID10 399 384 662 337<br />

2 RAID 1 287 190 338 199<br />

8 RAID 0 1492 1264 2470 1322<br />

8 RAID10 745 728 1101 634<br />

2 RAID 1 283 184 N/A N/A<br />

6 RAID 0 964 986 N/A N/A<br />

6 RAID10 528 517 N/A N/A<br />

2 RAID 1 371 192 679 176<br />

16 RAID10 1886 864 1953 843<br />

16 RAID 0 2750 2483 2327 2177<br />

16 RAID 5 1808 1203 1870 1225<br />

2 RAID 1 342 183 N/A N/A<br />

6 RAID 10 881 540 N/A N/A<br />

6 RAID 0 1068 1077 N/A N/A<br />

6 RAID 5 903 898 N/A N/A<br />

2 RAID 1 355 194 680 169<br />

16 RAID10 1678 1549 2654 1583<br />

16 RAID 0 2575 2898 2564 2828<br />

16 RAID 5 2573 2166 2584 2144<br />

2 RAID 1 357 183 N/A N/A<br />

6 RAID 10 648 548 N/A N/A<br />

6 RAID 0 1080 1077 N/A N/A<br />

6 RAID 5 901 897 N/A N/A<br />

SSDs sequential,<br />

64 kB blocks,<br />

100% write, [MB/s]<br />

The use of one controller at its maximum configuration with powerful hard disks (configured as RAID 0)<br />

enables the PRIMERGY RX300 S7 to achieve a throughput of up to 2828 MB/s for sequential load profiles<br />

and a transaction rate of up to 182054 IO/s for typical, random application scenarios.<br />

© Fujitsu Technology Solutions 2012 Page 29 (59)


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

SAP SD<br />

Benchmark description<br />

The SAP application software consists of modules used to manage all standard business processes. These<br />

include modules for ERP (Enterprise Resource Planning), such as Assemble-to-Order (ATO), Financial<br />

Accounting (FI), Human Resources (HR), Materials Management (MM), Production Planning (PP) plus Sales<br />

and Distribution (SD), as well as modules for SCM (Supply Chain Management), Retail, Banking, Utilities, BI<br />

(Business Intelligence), CRM (Customer Relation Management) or PLM (Product Lifecycle Management).<br />

The application software is always based on a database so that a SAP configuration consists of the<br />

hardware, the software components operating system, zhe database and the SAP software itself.<br />

SAP AG has developed SAP Standard Application Benchmarks in order to verify the performance, stability<br />

and scaling of a SAP application system. The benchmarks, of which SD Benchmark is the most commonly<br />

used and most important, analyze the performance of the entire system and thus measure the quality of the<br />

integrated individual components.<br />

The benchmark differentiates between a 2-tier and a 3-tier configuration. The 2-tier configuration has the<br />

SAP application and database installed on one server. With a 3-tier configuration the individual components<br />

of the SAP application can be distributed via several servers and an additional server handles the database.<br />

The entire specification of the benchmark developed by SAP AG, Walldorf, Germany can be found at:<br />

http://www.sap.com/benchmark.<br />

Benchmark environment<br />

The measurement set-up is symbolically illustrated below:<br />

Benchmark<br />

driver<br />

Network<br />

2-tier environment<br />

Server Disk subsystem<br />

System Under Test (SUT)<br />

Page 30 (59) © Fujitsu Technology Solutions 2012


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

System Under Test (SUT)<br />

Hardware<br />

Model PRIMERGY RX300 S7<br />

Processor 2 × Xeon E5-2690<br />

Memory 16 × 8GB (1x8GB) 2Rx4 L DDR3-1600 R ECC<br />

Network interface 1Gbit/s LAN<br />

Disk subsystem PRIMERGY RX300 S7:<br />

1 × RAID Ctrl SAS 6G 5/6 512MB (D2616)<br />

3 × HD SATA 6G 250GB 7.2K HOT PLUG 2.5" BC<br />

1 × FC Ctrl 8Gb/s 2 Chan LPe12002<br />

1 × FibreCAT CX4-480 Storage Unit<br />

Power Supply Unit 2 × Power supply 450W (hot-plug)<br />

Software<br />

BIOS settings DDR <strong>Performance</strong> = <strong>Performance</strong> Optimized<br />

Operating system Microsoft Windows Server 2008 R2 Enterprise SP1<br />

Database Microsoft SQL Server 2008 Enterprise x64 Edition<br />

SAP Business Suite<br />

Software<br />

Benchmark driver<br />

Hardware<br />

SAP enhancement package 4 for SAP ERP 6.0<br />

Model PRIMERGY RX300 S4<br />

Processor 2 × Xeon X5460<br />

Memory 32 GB<br />

Network interface 1Gbit/s LAN<br />

Software<br />

Operating system SUSE Linux Enterprise Server 11 SP1<br />

Some components may not be available in all countries or sales regions.<br />

Benchmark results<br />

Certification number 2012008<br />

Number of SAP SD benchmark users 7570<br />

Average dialog response time 0.99 seconds<br />

Throughput<br />

Fully processed order line items/hour<br />

Dialog steps/hour<br />

SAPS<br />

826,330<br />

2,479,000<br />

41,320<br />

Average database request time (dialog/update) 0.019 sec / 0.014 sec<br />

CPU utilization of central server 99%<br />

Operating system, central server Windows Server 2008 R2 Enterprise Edition<br />

RDBMS SQL Server 2008<br />

SAP Business Suite software SAP enhancement package 4 for SAP ERP 6.0<br />

Configuration<br />

Central Server<br />

Fujitsu PRIMERGY RX300 S7<br />

2 processors / 16 cores / 32 threads<br />

Intel Xeon E5-2690, 2.9GHz, 64KB L1 cache and 256KB L2<br />

cache per core, 20 MB L3 cache per processor<br />

128 GB main memory<br />

© Fujitsu Technology Solutions 2012 Page 31 (59)


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

OLTP-2<br />

Benchmark description<br />

OLTP stands for Online Transaction Processing. The OLTP-2 benchmark is based on the typical application<br />

scenario of a database solution. In OLTP-2 database access is simulated and the number of transactions<br />

achieved per second (tps) determined as the unit of measurement for the system.<br />

In contrast to benchmarks such as SPECint and TPC-E, which were standardized by independent bodies<br />

and for which adherence to the respective rules and regulations are monitored, OLTP-2 is an internal<br />

benchmark of Fujitsu. OLTP-2 is based on the well-known database benchmark TPC-E. OLTP-2 was<br />

designed in such a way that a wide range of configurations can be measured to present the scaling of a<br />

system with regard to the CPU and memory configuration.<br />

Even if the two benchmarks OLTP-2 and TPC-E simulate similar application scenarios using the same load<br />

profiles, the results cannot be compared or even treated as equal, as the two benchmarks use different<br />

methods to simulate user load. OLTP-2 values are typically similar to TPC-E values. A direct comparison, or<br />

even referring to the OLTP-2 result as TPC-E, is not permitted, especially because there is no priceperformance<br />

calculation.<br />

Further information can be found in the document Benchmark Overview OLTP-2.<br />

Benchmark environment<br />

The measurement set-up is symbolically illustrated below:<br />

Driver<br />

Clients<br />

Network<br />

Tier A Tier B<br />

Application Server<br />

Network<br />

Database Server<br />

System Under Test (SUT)<br />

All results were determined by way of example on a PRIMERGY RX300 S7.<br />

Disk<br />

subsystem<br />

Page 32 (59) © Fujitsu Technology Solutions 2012


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Database Server (Tier B)<br />

Hardware<br />

Model PRIMERGY RX300 S7<br />

Processor Xeon E5-2600 processor series<br />

Memory 1 processor: 8 × 32GB (1x32GB) 4Rx4 L DDR3-1333 LR ECC<br />

2 processors: 16 × 32GB (1x32GB) 4Rx4 L DDR3-1333 LR ECC<br />

Network interface 2 × onboard LAN 1 Gb/s<br />

Disk subsystem RX300 S7: Onboard RAID Ctrl SAS 6G 5/6 1024MB (D3116)<br />

2 × 73 GB 15k rpm SAS Drive, RAID1 (OS),<br />

6 × 147 GB 15k rpm SAS Drive, RAID10 (LOG)<br />

3 × LSI MegaRAID SAS 9286CV-8e<br />

6 × JX40: 24 × 64 GB SSD Drive each, RAID5 (data)<br />

Software<br />

BIOS Version V4.6.5.1 R1.0.5<br />

Operating system Microsoft Windows Server 2008 R2 Enterprise SP1<br />

Database Microsoft SQL Server 2008 R2 Enterprise SP1<br />

Application Server (Tier A)<br />

Hardware<br />

Model 1 × PRIMERGY RX200 S6<br />

Processor 2 × Xeon X5647<br />

Memory 12 GB, 1333 MHz registered ECC DDR3<br />

Network interface 2 × onboard LAN 1 Gb/s<br />

2 × Dual Port LAN 1Gb/s<br />

Disk subsystem 1 × 73 GB 15k rpm SAS Drive<br />

Software<br />

Operating system Microsoft Windows Server 2008 R2 Standard<br />

Client<br />

Hardware<br />

Model 1 × PRIMERGY RX200 S5<br />

Processor 2 × Xeon X5570<br />

Memory 24 GB, 1333 MHz registered ECC DDR3<br />

Network interface 2 × onboard LAN 1 Gb/s<br />

Disk subsystem 1 × 73 GB 15k rpm SAS Drive<br />

Software<br />

Operating system Microsoft Windows Server 2008 R2 Standard<br />

Benchmark OLTP-2 Software EGen version 1.12.0<br />

Some components may not be available in all countries / sales regions.<br />

© Fujitsu Technology Solutions 2012 Page 33 (59)


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Benchmark results<br />

Database performance greatly depends on the configuration options with CPU, memory and on the<br />

connectivity of an adequate disk subsystem for the database. In the following scaling considerations for the<br />

processors we assume that both the memory and the disk subsystem has been adequately chosen and is<br />

not a bottleneck.<br />

A guideline in the database environment for selecting main memory is that sufficient quantity is more<br />

important than the speed of the memory accesses. This why a configuration with a total memory of 512 GB<br />

was considered for the measurements with two processors and a configuration with a total memory of<br />

256 GB for the measurements with one processor. Both memory configurations have memory access of<br />

1333 MHz. Further information about memory performance can be found in the White Paper Memory<br />

<strong>Performance</strong> of Xeon E5-2600 (Sandy Bridge-EP) Based Systems.<br />

The following diagram shows the OLTP-2 transaction rates that can be achieved with one and two<br />

processors of the Intel Xeon E5-2600 series.<br />

E5-2690 - 8 Core, HT<br />

E5-2680 - 8 Core, HT<br />

E5-2670 - 8 Core, HT<br />

E5-2665 - 8 Core, HT<br />

E5-2660 - 8 Core, HT<br />

E5-2650 - 8 Core, HT<br />

E5-2650L - 8 Core, HT<br />

E5-2667 - 6 Core, HT<br />

E5-2640 - 6 Core, HT<br />

E5-2630 - 6 Core, HT<br />

E5-2630L - 6 Core, HT<br />

E5-2620 - 6 Core, HT<br />

E5-2643 - 4 Core, HT<br />

E5-2609 - 4 Core<br />

E5-2603 - 4 Core<br />

E5-2637 - 2 Core, HT<br />

HT: Hyper-Threading<br />

520.27<br />

528.49<br />

287.16<br />

428.08<br />

232.60<br />

261.81<br />

598.36<br />

538.20<br />

538.76<br />

487.33<br />

OLTP-2 tps<br />

635.64<br />

638.47<br />

745.09<br />

718.68<br />

845.64<br />

795.37<br />

921.05<br />

895.92<br />

971.33<br />

975.50<br />

979.75<br />

935.41<br />

1144.99<br />

1153.27<br />

1082.16<br />

1315.76<br />

1295.48<br />

1484.74<br />

1400.25<br />

1611.48<br />

1569.23<br />

1695.97<br />

2CPUs 512GB RAM<br />

1CPU 256GB RAM<br />

0 200 400 600 800 1000 1200 1400 1600 1800<br />

bold: measured<br />

cursive: calculated<br />

Page 34 (59) © Fujitsu Technology Solutions 2012<br />

tps


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

It is evident that a wide performance range is covered by the variety of released processors. If you compare<br />

the OLTP-2 value of the processor with the lowest performance (Xeon E5-2603) with the value of the<br />

processor with the highest performance (Xeon E5-2690), the result is a 4-fold increase in performance.<br />

Based on the results achieved the processors can be divided into different performance groups:<br />

The start is made with Xeon E5-2603 and E5-2609 as processors with four cores, but without Hyper-<br />

Threading and without turbo mode. Although the Xeon E5-2637 only has two cores, it is nevertheless Hyper-<br />

Threading-capable and on account of the clock frequency lies, as far as performance is concerned, between<br />

these two processors. Due to its high clock frequency and the high QPI speed of 8.00 GT/s the throughput<br />

rates of the 6-core processors with the lowest frequencies (Xeon E5-2620 and E5-2630L) are almost<br />

achieved with the performance-optimized 4-core processor Xeon E5-2643. However, the processors with<br />

95 Watt and 60 Watt respectively also have distinctly lower power consumption than the Xeon E5-2643 with<br />

130 Watt.<br />

The 6-core processors are all Hyper-Threading-capable, have with 7.20 GT/s a higher QPI speed than the<br />

group of 4-core processors with 6.40 GT/s and they have a 50% larger L3 cache of 15 MB. At the upper end<br />

of the performance scale of the 6-core processors is the Xeon E5-2667 (130 Watt) with its especially high<br />

frequency, which on the other hand achieves an OLTP performance that is slightly above the 8-core<br />

processor with the lowest performance, Xeon E5-2650L (70 Watt).<br />

The group of processors with eight cores, a QPI speed of 8.00 GT/s and a 20 MB L3 cache is to be found at<br />

the upper end of the performance scale. Due to the graduated CPU clock frequencies an OLTP performance<br />

of between 1145 tps (2 × Xeon E5-2650L) and 1696 tps (2 × Xeon E5-2690) is achieved.<br />

If you compare the maximum achievable OLTP-2 values of the current system generation with the values<br />

that were achieved on the predecessor systems, the result is an increase of about 34%.<br />

tps<br />

2000<br />

1800<br />

1600<br />

1400<br />

1200<br />

1000<br />

800<br />

600<br />

400<br />

200<br />

0<br />

2 × X5690<br />

192 GB<br />

Predecessor System<br />

Maximum OLTP-2 tps<br />

Comparison of system generations<br />

+ ~ 34%<br />

2 × E5-2690<br />

512 GB<br />

Current System<br />

Current System TX300 S7 RX200 S7 RX300 S7 RX350 S7 BX924 S3<br />

Predecessor System TX300 S6 RX200 S6 RX300 S6 TX300 S6 BX924 S2<br />

© Fujitsu Technology Solutions 2012 Page 35 (59)


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

TPC-E with TPC-Energy<br />

Benchmark description<br />

The TPC-E benchmark measures the performance of online transaction processing systems (OLTP) and is<br />

based on a complex database and a number of different transaction types that are carried out on it. TPC-E is<br />

not only a hardware-independent but also a software-independent benchmark and can thus be run on every<br />

test platform, i.e. proprietary or open. In addition to the results of the measurement, all the details of the<br />

systems measured and the measuring method must also be explained in a measurement <strong>report</strong> (Full<br />

Disclosure Report or FDR). Consequently, this ensures that the measurement meets all benchmark<br />

requirements and is reproducible. TPC-E does not just measure an individual server, but a rather extensive<br />

system configuration. Keys to performance in this respect are the database server, disk I/O and network<br />

communication.<br />

The performance metric is tpsE, where tps means transactions per second. tpsE is the average number of<br />

Trade-Result-Transactions that are performed within a second. The TPC-E standard defines a result as the<br />

tpsE rate, the price per performance value (e.g. $/tpsE) and the availability date of the measured<br />

configuration.<br />

TPC-Energy is an augmentation to the existing TPC benchmarks (e.g. TPC-C, TPC-E, TPC-H). Energy<br />

consumption of the systems is measured while the TPC benchmark is performed. For this TPC has defined a<br />

set of rules on how to measure these values. As the result of this benchmark a metric in the form of "Energy /<br />

<strong>Performance</strong>" is calculated from the measured values. The result for TPC-E is the metric Watts/tpsE.<br />

Further information about TPC-E and TPC-Energy can be found in the overview document Benchmark<br />

Overview TPC-E.<br />

Page 36 (59) © Fujitsu Technology Solutions 2012


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Benchmark results<br />

In July 2012 Fujitsu submitted a TPC-E benchmark result for the PRIMERGY RX300 S7 with the 8-core<br />

processor Intel Xeon E5-2690 and 512 GB memory. This publication also revealed TPC-Energy values for<br />

the PRIMERGY RX300 S7.<br />

The results show an enormous increase in performance compared with the PRIMERGY RX300 S6 with a<br />

simultaneous reduction in costs and lower energy consumption.<br />

TPC-E Throughput<br />

1,871.81 tpsE<br />

Operating System<br />

Microsoft Windows Server<br />

2008 R2 Enterprise Edition<br />

SP1<br />

Initial Database Size<br />

7,704 GB<br />

Price/<strong>Performance</strong><br />

$ 175.57 USD<br />

per tpsE<br />

SUT<br />

PRIMERGY RX300 S7<br />

Availability Date<br />

August 17, 2012<br />

Some components may not be available in all countries / sales regions.<br />

Total System Cost<br />

$ 328,623<br />

Database Server Configuration<br />

Database Manager<br />

Microsoft SQL Server<br />

2012 Enterprise Edition<br />

Redundancy Level 1<br />

RAID-5 data and RAID-10 log<br />

Processors/Cores/Threads<br />

2/16/32<br />

TPC-E 1.12.0<br />

TPC Pricing 1.7.0<br />

TPC-Energy 1.4.2<br />

Report Date<br />

July 5, 2012<br />

TPC-Energy Metric<br />

0.69 Watts/tpsE<br />

Tier A<br />

PRIMERGY RX200 S7<br />

1x Intel Xeon E5-2660 2.20 GHz<br />

16 GB Memory<br />

1x 250 GB 7.2k rpm SATA Drive<br />

2x onboard LAN 1 Gb/s<br />

1x Dual Port LAN 1 Gb/s<br />

Tier B<br />

PRIMERGY RX300 S7<br />

2x Intel Xeon E5-2690 2.90 GHz<br />

512 GB Memory<br />

8x 146 GB 15k rpm SAS Drives<br />

2x onboard LAN 1 Gb/s<br />

5x SAS RAID Controller<br />

Storage<br />

1x PRIMECENTER Rack<br />

4x ETERNUS JX40<br />

60x 200 GB SSD Drives<br />

2 ×1 TB 7.2k rpm SATA Drives<br />

Memory<br />

512 GB<br />

Storage<br />

60 x 200 GB SSD<br />

2 x 1 TB 7.2k rpm HDD<br />

6 x 146 GB 15k rpm HDD<br />

© Fujitsu Technology Solutions 2012 Page 37 (59)


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

PRIMERGY<br />

RX300 S7<br />

TPC-E Throughput<br />

1,871.81 tpsE<br />

Price/<strong>Performance</strong><br />

$ 175.57 USD<br />

per tpsE<br />

Energy Summary<br />

Availability Date<br />

August 17, 2012<br />

Numerical Quantities For Reported Energy Configuration:<br />

REC Idle Power: 843.88 Watts<br />

Average Power of REC : 1288.82 Watts<br />

Subsystem Reporting:<br />

Total System<br />

Cost<br />

$ 328,623<br />

Secondary Metrics Additional Numerical Quantities<br />

TPC-E 1.12.0<br />

TPC Pricing 1.7.0<br />

TPC-Energy 1.4.2<br />

Report Date<br />

July 5, 2012<br />

Availability Date<br />

August 17, 2012<br />

TPC-Energy Metric<br />

0.69 Watts/tpsE<br />

Full Load Full Load Idle Idle<br />

watts/tpsE Avg Watts % of REC Avg Watts % of REC<br />

Database Server *) 0.32 592.41 45.97% 239.56 28.39%<br />

Storage *) 0.31 578.42 44.88% 544.80 64.56%<br />

Application Server *) 0.05 100.99 7.84% 59.12 7.01%<br />

Miscellaneous *) 0.01 17.00 1.32% 0.40 0.05%<br />

Total REC 0.69 1,288.82 100.00% 843.88 100.00%<br />

*) see pricing for list of components<br />

Lowest ambient temperature at air inlet: 20.13 Degrees Celsius<br />

Items in Priced Configuration not in the Reported Energy Configuration<br />

None<br />

Items in the Reported Energy Configuration not in the Measured Energy Configuration<br />

Fujitsu Display B20T-6 LED<br />

More details about this TPC-E result, in particular the Full Disclosure Report, can be found via the TPC web<br />

page http://www.tpc.org/tpce/results/tpce_result_detail.asp?id=112070501.<br />

Page 38 (59) © Fujitsu Technology Solutions 2012


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

In July 2012, Fujitsu is represented with ten PRIMERGY results in the TPC-E list.<br />

System and Processors<br />

Throughput Price /<br />

<strong>Performance</strong><br />

Watts/tpsE Availability Date<br />

TX300 S4 with 2 × Xeon X5460 317.45 tpsE $523.49 per tpsE - August 30, 2008<br />

RX600 S4 with 4 × Xeon X7350 492.34 tpsE $559.88 per tpsE - January 1, 2009<br />

RX600 S4 with 4 × Xeon X7460 721.40 tpsE $459.71 per tpsE - January 1, 2009<br />

RX300 S5 with 2 × Xeon X5570 800.00 tpsE $343.91 per tpsE - April 1, 2009<br />

RX600 S5 with 4 × Xeon X7560 2046.96 tpsE $193.68 per tpsE - September 1, 2010<br />

RX900 S1 with 8 × Xeon X7560 3800.00 tpsE $245.82 per tpsE - October 1, 2010<br />

RX300 S6 with 2 × Xeon X5680 1246.13 tpsE $191.48 per tpsE - November 1, 2010<br />

RX300 S6 with 2 × Xeon X5690 1268.30 tpsE $183.94 per tpsE 0.93 March 1, 2011<br />

RX900 S2 with 8 × Xeon E7-8870 4555.54 tpsE $217.27 per tpsE 1.00 July 1, 2011<br />

RX300 S7 with 2 × Xeon E5-2690 1871.81 tpsE $175.57 per tpsE 0.69 August 17, 2012<br />

See the TPC web site for more information and all the TPC-E results (http://www.tpc.org/tpce).<br />

The following diagram for 2-socket PRIMERGY systems with different processor types shows the good<br />

performance of the 2-socket system PRIMERGY RX300 S7.<br />

better<br />

tpsE<br />

2500<br />

2000<br />

1500<br />

1000<br />

500<br />

0<br />

523.49<br />

317.45<br />

PRIMERGY<br />

TX300 S4<br />

2 × X5460<br />

64 GB<br />

343.91<br />

800.00<br />

PRIMERGY<br />

RX300 S5<br />

2 × X5570<br />

96 GB<br />

1,246.13<br />

1,268.30<br />

191.48 183.94<br />

PRIMERGY<br />

RX300 S6<br />

2 × X5680<br />

96 GB<br />

1,871.81<br />

175.57<br />

In comparison with the PRIMERGY RX300 S6 the increase in performance is +48% and in comparison with<br />

the PRIMERGY RX300 S5 +134%. The price per performance is $175.57/tpsE. Compared with the<br />

PRIMERGY RX300 S6 the costs are reduced to 95% and with the PRIMERGY RX300 S5 to 51%.<br />

© Fujitsu Technology Solutions 2012 Page 39 (59)<br />

tpsE<br />

$/tpsE<br />

PRIMERGY<br />

RX300 S6<br />

2 × X5690<br />

96 GB<br />

PRIMERGY<br />

RX300 S7<br />

2 × E5-2690<br />

512 GB<br />

$/tpsE<br />

500<br />

400<br />

300<br />

200<br />

100<br />

0<br />

better


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

The following overview shows the best TPC-E results (as of July 5 th , 2012) and the corresponding price per<br />

performance ratios for configurations using two processors. PRIMERGY RX300 S7 with 1871.71 tpsE is best<br />

in class with the highest performance value. The price per performance ratio of $175.57/tpsE is the secondbest<br />

value of the TPC-E publications considered here.<br />

System<br />

Processors<br />

tpsE<br />

(higher is better)<br />

$/tpsE<br />

(lower is better)<br />

See the TPC web site for more information and all the TPC-E results (http://www.tpc.org/tpce).<br />

availability<br />

date<br />

Fujitsu PRIMERGY RX300 S7 2×E5-2690 1871.71 175.57 2012-08-17<br />

IBM System x3650 M4 2×E5-2690 1863.23 207.85 2012-05-31<br />

IBM System x3690 X5 2×E7-2870 1560.70 143.32 2011-05-27<br />

HP ProLiant DL380 G7 Server 2×X5690 1284.14 250.00 2011-05-04<br />

Fujitsu<br />

PRIMERGY RX300 S6<br />

12x2.5<br />

2×X5690 1268.30 183.94 2011-03-01<br />

Fujitsu PRIMERGY RX300 S6 2×X5680 1246.13 191.48 2010-11-01<br />

HP ProLiant DL385 G7 Server 2×6282 SE 1232.84 257.00 2011-12-31<br />

HP ProLiant DL380G7 2×X5680 1110.10 294.00 2010-05-11<br />

Dell PowerEdge T710 2×X5680 1074.14 264.32 2010-06-21<br />

HP ProLiant DL385G7 2×6176 SE 887.38 296.00 2010-05-06<br />

Page 40 (59) © Fujitsu Technology Solutions 2012


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

The TPC-E configuration with the PRIMERGY RX300 S7 as database server has the best TPC-E<br />

TPC-Energy result of all TPC-E TPC-Energy publications with 0.69 Watts/tpsE.<br />

Compared with the TPC-E configuration with the predecessor system PRIMERGY RX300 S6 as database<br />

server, the energy efficiency of the overall configuration, documented in Watts/tpsE, has increased by 25%.<br />

All published Fujitsu TPC-E TPC-Energy results are well ahead of the two previously published results of its<br />

competitors.<br />

[Watts/tpsE]<br />

better<br />

8<br />

6<br />

4<br />

2<br />

0<br />

0.69 0.93 1.00 1.09<br />

Fujitsu 1)<br />

PRIMERGY<br />

RX300 S7<br />

Fujitsu 2)<br />

PRIMERGY<br />

RX300 S6<br />

TPC-E<br />

TPC-Energy: Primery Metric<br />

Fujitsu 3)<br />

PRIMERGY<br />

RX900 S2<br />

Fujitsu 4)<br />

PRIMEQUEST<br />

1800E2<br />

See the TPC web site for more information as well as the TPC-E and TPC-Energy results<br />

(http://www.tpc.org/tpce).<br />

1) Fujitsu PRIMERGY RX300 S7 1871.81 tpsE, $175.57/tpsE, 0.69 Watts/tpsE, availability date 08/17/2012<br />

2) Fujitsu PRIMERGY RX300 S6 1268.30 tpsE, $183.94/tpsE, 0.93 Watts/tpsE, availability date 03/01/2011<br />

3) Fujitsu PRIMERGY RX900 S2 4555.54 tpsE, $217.27/tpsE, 1.00 Watts/tpsE, availability date 07/01/2011<br />

4) Fujitsu PRIMEQUEST 1800E2 4414.79 tpsE, $226.19/tpsE, 1.09 Watts/tpsE, availability date 07/01/2011<br />

5) HP ProLiant DL580 G7 2001.12 tpsE, $347.00/tpsE, 5.84 Watts/tpsE, availability date 06/21/2010<br />

6) HP ProLiant DL585 G7 1400.14 tpsE, $330.00/tpsE, 6.72 Watts/tpsE, availability date 06/21/2010<br />

© Fujitsu Technology Solutions 2012 Page 41 (59)<br />

5.84<br />

HP 5)<br />

ProLiant<br />

DL580 G7<br />

6.72<br />

HP 6)<br />

ProLiant<br />

DL585 G7


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

vServCon<br />

Benchmark description<br />

vServCon is a benchmark used by Fujitsu Technology Solutions to compare server configurations with<br />

hypervisor with regard to their suitability for server consolidation. This allows both the comparison of<br />

systems, processors and I/O technologies as well as the comparison of hypervisors, virtualization forms and<br />

additional drivers for virtual machines.<br />

vServCon is not a new benchmark in the true sense of the word. It is more a framework that combines<br />

already established benchmarks (or in modified form) as workloads in order to reproduce the load of a<br />

consolidated and virtualized server environment. Three proven benchmarks are used which cover the<br />

application scenarios database, application server and web server.<br />

Application scenario Benchmark No. of logical CPU cores Memory<br />

Database Sysbench (adapted) 2 1.5 GB<br />

Java application server SPECjbb (adapted, with 50% - 60% load) 2 2 GB<br />

Web server WebBench 1 1.5 GB<br />

Each of the three application scenarios is allocated to a dedicated virtual machine (VM). Add to these a<br />

fourth machine, the so-called idle VM. These four VMs make up a ―tile‖. Depending on the performance<br />

capability of the underlying server hardware, you may as part of a measurement also have to start several<br />

identical tiles in parallel in order to achieve a maximum performance score.<br />

Database<br />

VM<br />

Database Java<br />

VM<br />

Database<br />

VM<br />

Java<br />

VM<br />

Database<br />

VM<br />

Java<br />

VM VM<br />

System Under Test<br />

Java<br />

VM<br />

… …<br />

Web<br />

VM<br />

Web<br />

VM<br />

Web<br />

VM<br />

Idle<br />

VM<br />

Idle<br />

VM<br />

Idle<br />

VM<br />

Each of the three vServCon application scenarios provides a specific benchmark result in the form of<br />

application-specific transaction rates for the respective VM. In order to derive a normalized score, the<br />

individual benchmark results for one tile are put in relation to the respective results of a reference system.<br />

The resulting relative performance values are then suitably weighted and finally added up for all VMs and<br />

tiles. The outcome is a score for this tile number.<br />

Starting as a rule with one tile, this procedure is performed for an increasing number of tiles until no further<br />

significant increase in this vServCon score occurs. The final vServCon score is then the maximum of the<br />

vServCon scores for all tile numbers. This score thus reflects the maximum total throughput that can be<br />

achieved by running the mix defined in vServCon that consists of numerous VMs up to the possible full<br />

utilization of CPU resources. This is why the measurement environment for vServCon measurements is<br />

designed in such a way that only the CPU is the limiting factor and that no limitations occur as a result of<br />

other resources.<br />

The progression of the vServCon scores for the tile numbers provides useful information about the scaling<br />

behavior of the ―System under Test‖.<br />

Moreover, vServCon also documents the total CPU load of the host (VMs and all other CPU activities) and, if<br />

possible, electrical power consumption.<br />

A detailed description of vServCon is in the document: Benchmark Overview vServCon.<br />

Page 42 (59) © Fujitsu Technology Solutions 2012<br />

Web<br />

VM<br />

Idle<br />

VM<br />

Tile n<br />

Tile 3<br />

Tile 2<br />

Tile 1


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Benchmark environment<br />

The measurement set-up is symbolically illustrated below:<br />

All results were determined by way of example on a PRIMERGY RX350 S7.<br />

System Under Test (SUT)<br />

Hardware<br />

Model PRIMERGY RX350 S7<br />

Processor Xeon E5-2600 processor series<br />

Memory 1 processor: 8 × 8GB (1x8GB) 2Rx4 L DDR3-1600 R ECC<br />

2 processors: 16 × 8GB (1x8GB) 2Rx4 L DDR3-1600 R ECC<br />

Network interface 1 × dual port 1GbE adapter<br />

1 × dual port 10GbE server adapter<br />

Disk subsystem 1 × dual-channel FC controller Emulex LPe12002<br />

ETERNUS DX80 storage systems:<br />

Each tile: 50 GB LUN<br />

Each LUN: RAID 0 with 2 × Seagate ST3300657SS disks (15 krpm)<br />

Software<br />

Operating system VMware ESX 5.0.0 Build 469512<br />

Load generator (incl. Framework controller)<br />

Hardware (Shared)<br />

Enclosure PRIMERGY BX900<br />

Hardware<br />

Model 18 × PRIMERGY BX920 S1 server blades<br />

Processor 2 × Xeon X5570<br />

Memory 12 GB<br />

Network interface 3 × 1 Gbit/s LAN<br />

Software<br />

Framework<br />

controller<br />

Load generators<br />

Multiple<br />

1Gb or 10Gb<br />

networks<br />

Operating system Microsoft Windows Server 2003 R2 Enterprise with Hyper-V<br />

Server Disk subsystem<br />

System Under Test (SUT)<br />

© Fujitsu Technology Solutions 2012 Page 43 (59)


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Load generator VM (per tile 3 load generator VMs on various server blades)<br />

Hardware<br />

Processor 1 × logical CPU<br />

Memory 512 MB<br />

Network interface 2 × 1 Gbit/s LAN<br />

Software<br />

Operating system Microsoft Windows Server 2003 R2 Enterprise Edition<br />

Some components may not be available in all countries or sales regions.<br />

Page 44 (59) © Fujitsu Technology Solutions 2012


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Benchmark results<br />

The PRIMERGY dual-socket systems dealt with here are based on Intel Xeon series E5-2600 processors.<br />

The features of the processors are summarized in the section ―Technical data‖.<br />

The available processors of these systems with their results can be seen in the following table.<br />

Xeon E5-2600 Series<br />

Processor<br />

RX200 S7<br />

RX300 S7<br />

RX350 S7<br />

© Fujitsu Technology Solutions 2012 Page 45 (59)<br />

TX300 S7<br />

BX924 S3<br />

CX250 S1<br />

CX270 S1<br />

#Tiles Score<br />

2 Cores, HT, TM E5-2637 4 3.58<br />

4 Cores<br />

E5-2603 4 3.18<br />

E5-2609 4 4.09<br />

4 Cores, HT, TM E5-2643 4 7.02<br />

6 Cores<br />

HT, TM<br />

8 Cores<br />

HT, TM<br />

E5-2620 7 7.44<br />

E5-2630L 7 7.45<br />

E5-2630 7 8.30<br />

E5-2640 7 8.80<br />

E5-2667 7 9.93<br />

E5-2650L 8 8.77<br />

E5-2650 8 10.4<br />

E5-2660 8 11.4<br />

E5-2665 8 11.7<br />

E5-2670 8 12.5<br />

E5-2680 8 12.8<br />

E5-2690 8 13.5<br />

HT = Hyper-Threading, TM = Turbo Mode<br />

These PRIMERGY dual-socket systems are very suitable for application virtualization thanks to the progress<br />

made in processor technology. Compared with a system based on the previous processor generation an<br />

approximate 40% higher virtualization performance can be achieved (measured in vServCon score in their<br />

maximum configuration).<br />

The relatively large performance differences between the processors can be explained by their features. The<br />

values scale on the basis of the number of cores, the size of the L3 cache and the CPU clock frequency and<br />

as a result of the features of Hyper-Threading and turbo mode, which are available in most processor types.<br />

Furthermore, the data transfer rate between processors (―QPI Speed‖) also determines performance. As a<br />

matter of principle, the memory access speed also influences performance. A guideline in the virtualization<br />

environment for selecting main memory is that sufficient quantity is more important than the speed of the<br />

memory accesses.<br />

More information about the topic ―Memory <strong>Performance</strong>‖ and QPI architecture can be found in the White<br />

Paper Memory <strong>Performance</strong> of Xeon E5-2600 (Sandy Bridge-EP) Based Systems.


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

The first diagram compares the virtualization performance values that can be achieved with the processors<br />

reviewed here.<br />

Final vServCon Score<br />

Final vServCon Score<br />

14<br />

12<br />

10<br />

8<br />

6<br />

4<br />

2<br />

0<br />

The Xeon E5-2637 as the processor with two cores only makes the start. A similarly low performance can be<br />

seen in the Xeon E5-2603 and E5-2609 processors, as they have to manage without Hyper-Threading (HT)<br />

and turbo mode (TM). In principle, these weakest processors are only to a limited extent suitable for the<br />

virtualization environment.<br />

A further increase in performance is achieved by the processor with four cores, which supports both Hyper-<br />

Threading and the turbo mode (Xeon E5-2643).<br />

In addition to the number of cores, the L3 cache and the data transfer rate make a considerable contribution<br />

to the respective increase in performance in the 8-core versions compared with the 6-core versions.<br />

Within a group of processors with the same number of cores scaling can be seen via the CPU clock<br />

frequency.<br />

15<br />

10<br />

5<br />

0<br />

E5-2637<br />

2 Core<br />

4 4 4 4 7 7 7 7 7 8 8 8 8 8 8 8<br />

6.95@4 tiles<br />

E5-2603<br />

× 1.94<br />

E5-2609<br />

4 Core<br />

1 x E5-2690 2 x E5-2690<br />

13.50@8 tiles<br />

E5-2643<br />

E5-2620<br />

Xeon E5-2600 Processor Series<br />

E5-2630L<br />

E5-2630<br />

6 Core<br />

E5-2640<br />

Until now we have looked at the virtualization performance of a fully<br />

configured system. However, with a server with two sockets the<br />

question also arises as to how good performance scaling is from<br />

one to two processors. The better the scaling, the lower the<br />

overhead usually caused by the shared use of resources within a<br />

server. The scaling factor also depends on the application. If the<br />

server is used as a virtualization platform for server consolidation,<br />

the system scales with a factor of 1.94. When operated with two<br />

processors, the system thus almost achieves twice the performance<br />

as with one processor, as is illustrated in the diagram opposite<br />

using the processor version Xeon E5-2690 as an example.<br />

Page 46 (59) © Fujitsu Technology Solutions 2012<br />

E5-2667<br />

E5-2650L<br />

E5-2650<br />

E5-2660<br />

E5-2665<br />

8 Core<br />

E5-2670<br />

E5-2680<br />

E5-2690<br />

#Tiles


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

The next diagram illustrates the virtualization performance for increasing numbers of VMs based on the<br />

Xeon E5-2620 (6-Core) and E5-2650 (8-Core) processors. The respective CPU loads of the host have also<br />

been entered. The number of tiles with optimal CPU load is typically at about 90%; beyond that you have<br />

overload, which is where virtualization<br />

performance no longer increases, or<br />

sinks again.<br />

12<br />

E5-2620 E5-2650<br />

100%<br />

In addition to the increased number of<br />

physical cores, Hyper-Threading,<br />

which is supported by almost all Xeon<br />

processors of the E5-2600 series, is<br />

10<br />

---- CPU Util %<br />

90%<br />

80%<br />

an additional reason for the high<br />

number of VMs that can be operated.<br />

As is known, a physical processor<br />

8<br />

70%<br />

60%<br />

core is consequently divided into two<br />

logical cores so that the number of<br />

6<br />

50%<br />

cores available for the hypervisor is<br />

40%<br />

doubled. This standard feature thus<br />

generally increases the virtualization<br />

4<br />

30%<br />

performance of a system.<br />

2<br />

20%<br />

vServCon score<br />

0<br />

1.97<br />

3.83<br />

The scaling curves for the number of tiles as seen in the previous diagram are specifically for systems with<br />

Hyper-Threading. 16 physical and thus 32 logical cores are available with the Xeon E5-2650 processors;<br />

approximately four of them are used per tile (see Benchmark description). This means that a parallel use of<br />

the same physical cores by several VMs is avoided up to a maximum of about four tiles. That is why the<br />

performance curve in this range scales almost ideal. For the quantities above the growth is flatter up to CPU<br />

full utilization.<br />

The previous diagram examined the total performance of all application VMs of a host. However, studying<br />

the performance from an individual application VM viewpoint is also interesting. This information is in the<br />

previous diagram. For example, the total optimum is reached in the above Xeon E5-2650 situation with 24<br />

application VMs (eight tiles, not including the idle VMs); the low load case is represented by three application<br />

VMs (one tile, not including the idle VM). Remember: the vServCon score for one tile is an average value<br />

across the three application scenarios in vServCon. This average performance of one tile drops when<br />

changing from the low load case to the total optimum of the vServCon score - from 2.02 to 10.4/8=1.3, i.e. to<br />

64%. The individual types of application VMs can react very differently in the high load situation. It is thus<br />

clear that in a specific situation the performance requirements of an individual application must be balanced<br />

against the overall requirements regarding the numbers of VMs on a virtualization host.<br />

© Fujitsu Technology Solutions 2012 Page 47 (59)<br />

5.35<br />

6.39<br />

7.20<br />

7.38<br />

7.44<br />

1 2 3 4 5 6 7 1 2 3 4 5 6 7 8<br />

2.02<br />

4.22<br />

5.96<br />

7.46<br />

8.64<br />

9.59<br />

10.1<br />

10.4<br />

10%<br />

0%<br />

#Tiles


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

The virtualization-relevant progress in processor technology since 2008 has an effect on the one hand on an<br />

individual VM and, on the other hand, on the possible maximum number of VMs up to CPU full utilization.<br />

The following comparison shows the proportions for both types of improvements. Four systems are<br />

compared with approximately the same processor frequency: a system from 2008 with 2 × Xeon E5420, a<br />

system from 2009 with 2 × Xeon E5540, a system from 2011 with 2 × Xeon E5649 and a current system with<br />

2 × Xeon E5-2670.<br />

vServCon Score<br />

16<br />

14<br />

12<br />

10<br />

8<br />

6<br />

4<br />

2<br />

0<br />

× 1.30<br />

2008<br />

E5420<br />

2.50 GHz<br />

4C<br />

2009<br />

E5540<br />

2.53 GHz<br />

4C<br />

2011<br />

E5649<br />

2.53 GHz<br />

6C<br />

2012<br />

E5-2670<br />

2.60 GHz<br />

8C<br />

2008<br />

E5420<br />

2.50 GHz<br />

4C<br />

2009<br />

E5540<br />

2.53 GHz<br />

4C<br />

2011<br />

E5649<br />

2.53 GHz<br />

6C<br />

2012<br />

E5-2670<br />

2.60 GHz<br />

8C<br />

Year<br />

CPU<br />

Freq.<br />

#Cores<br />

Page 48 (59) © Fujitsu Technology Solutions 2012<br />

× 2.02<br />

× 1.47<br />

× 1.64<br />

2012 TX300 S7 RX200 S7 RX300 S7 RX350 S7 - - BX924 S3 CX250 S1 CX270 S1<br />

2011 TX300 S6 RX200 S6 RX300 S6 TX300 S6 BX620 S6 BX922 S2 BX924 S2 - -<br />

2009 TX300 S5 RX200 S5 RX300 S5 - BX620 S5 - - - -<br />

2008 TX300 S4 RX200 S4 RX300 S4 - BX620 S4 - - - -<br />

The clearest performance improvements arose from 2008 to 2009 with the introduction of the Xeon 5500<br />

processor generation (e. g. via the feature ―Extended Page Tables‖ (EPT) 1 ). One sees an increase of the<br />

vServCon score by a factor of 1.30 with a few VMs (one tile).<br />

With full utilization of the systems with VMs there was an increase by a factor of 2.02. The one reason was<br />

the performance increase that could be achieved for an individual VM (see score for a few VMs). The other<br />

reason was that more VMs were possible with total optimum (via Hyper-Threading). However, it can be seen<br />

that the optimum was ―bought‖ with a triple number of VMs with a reduced performance of the individual VM.<br />

Where exactly is the technology progress between 2009 and 2012? The performance for an individual VM in<br />

low-load situations has basically remained the same for the processors compared here with approximately<br />

the same clock frequency but with different cache size and speed of memory connection. The decisive<br />

progress is in the higher number of physical cores and – associated with it – in the increased values of pure<br />

performance (factor 1.47 and 1.64 in the diagram).<br />

We must explicitly point out that the increased virtualization performance as seen in the score cannot be<br />

completely deemed as an improvement for one individual VM. More than approximately 30% to 50%<br />

increased throughput compared to an identically clocked processor of the Xeon 5400 generation from 2008<br />

is not possible here. <strong>Performance</strong> increases in the virtualization environment since 2009 are mainly achieved<br />

by increased VM numbers due to the increased number of available logical or physical cores.<br />

1 EPT accelerates memory virtualization via hardware support for the mapping between host and guest memory<br />

addresses.<br />

Few VMs (1 Tile)<br />

Virtualization relevant improvements<br />

Score at optimum Tile count


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

VMmark V2<br />

Benchmark description<br />

VMmark V2 is a benchmark developed by VMware to compare server configurations with hypervisor<br />

solutions from VMware regarding their suitability for server consolidation. In addition to the software for load<br />

generation, the benchmark consists of a defined load profile and binding regulations. The benchmark results<br />

can be submitted to VMware and are published on their Internet site after a successful review process. After<br />

the discontinuation of the proven benchmark ―VMmark V1‖ in October 2010, it has been succeeded by<br />

―VMmark V2‖, which requires a cluster of at least two servers and covers data center functions, like Cloning<br />

and Deployment of virtual machines (VMs), Load Balancing, as well as the moving of VMs with vMotion and<br />

also Storage vMotion.<br />

VMmark V2 is not a new benchmark in the actual sense.<br />

It is in fact a framework that consolidates already<br />

established benchmarks, as workloads in order to<br />

simulate the load of a virtualized consolidated server<br />

environment. Three proven benchmarks, which cover<br />

the application scenarios mail server, Web 2.0, and<br />

e-commerce were integrated in VMmark V2.<br />

Application scenario Load tool # VMs<br />

Mail server LoadGen 1<br />

Web 2.0 Olio client 2<br />

E-commerce DVD Store 2 client 4<br />

Standby server (IdleVMTest) 1<br />

Each of the three application scenarios is assigned to a total of seven dedicated virtual machines. Then add<br />

to these an eighth VM called the ―standby server‖. These eight VMs form a ―tile‖. Because of the<br />

performance capability of the underlying server hardware, it is usually necessary to have started several<br />

identical tiles in parallel as part of a measurement in order to achieve a maximum overall performance.<br />

A new feature of VMmark V2 is an infrastructure component, which is present once for every two hosts. It<br />

measures the efficiency levels of data center consolidation through VM Cloning and Deployment, vMotion<br />

and Storage vMotion. The Load Balancing capacity of the data center is also used (DRS, Distributed<br />

Resource Scheduler).<br />

The result of VMmark V2 is a number, known as a ―score‖, which provides information about the<br />

performance of the measured virtualization solution. The score reflects the maximum total consolidation<br />

benefit of all VMs for a server configuration with hypervisor and is used as a comparison criterion of various<br />

hardware platforms.<br />

This score is determined from the individual results of the VMs and an infrastructure result. Each of the five<br />

VMmark V2 application or front-end VMs provides a specific benchmark result in the form of applicationspecific<br />

transaction rates for each VM. In order to derive a normalized score the individual benchmark results<br />

for one tile are put in relation to the respective results of a reference system. The resulting dimensionless<br />

performance values are then averaged geometrically and finally added up for all VMs. This value is included<br />

in the overall score with a weighting of 80%. The infrastructure workload is only present in the benchmark<br />

once for every two hosts; it determines 20% of the result. The number of transactions per hour and the<br />

average duration in seconds respectively are determined for the score of the infrastructure workload<br />

components.<br />

In addition to the actual score, the number of VMmark V2 tiles is always specified with each VMmark V2<br />

score. The result is thus as follows: ―Score@Number of Tiles‖, for example ―4.20@5 tiles‖.<br />

A detailed description of VMmark V2 is available in the document Benchmark Overview VMmark V2.<br />

© Fujitsu Technology Solutions 2012 Page 49 (59)


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Benchmark environment<br />

The measurement set-up is symbolically illustrated below:<br />

System Under Test (SUT)<br />

Hardware<br />

Number of servers 2<br />

Model PRIMERGY RX300 S7<br />

Processor 2 × Xeon E5-2690<br />

Memory 256 GB: 16 × 16 GB (1x16GB) 2Rx4 L DDR3-1600 R ECC<br />

Network interface 1 × dual port 1GbE adapter<br />

1 × dual port 10GbE server adapter<br />

1 × quad port 1GbE adapter<br />

Disk subsystem 1 × dual-channel FC controller Emulex LPe12002<br />

ETERNUS DX80 S1 and S2 storage systems:<br />

Each tile: 241 GB<br />

Each DX80: RAID 0 with several LUNs<br />

Total: 118 disks (incl. SSDs)<br />

Software<br />

Clients & Management<br />

Load Generators<br />

incl. Prime Client and<br />

Datacenter Management<br />

Server<br />

BIOS Version V4.6.5.1 R1.4.0<br />

BIOS settings See details<br />

Operating system VMware ESX 4.1.0 U2 Build 502767<br />

Operating system<br />

settings<br />

Multiple<br />

1Gb or 10Gb<br />

networks<br />

ESX settings: see details<br />

vMotion<br />

network<br />

Server(s) Storage System<br />

System under Test (SUT)<br />

Page 50 (59) © Fujitsu Technology Solutions 2012


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Prime Client/Datacenter Management Server (DMS)<br />

Hardware (Shared)<br />

Enclosure PRIMERGY BX600<br />

Network Switch 1 × PRIMERGY BX600 GbE Switch Blade 30/12<br />

Hardware<br />

Model 1 × server blade PRIMERGY BX620 S4<br />

Processor 2 × Xeon X5470<br />

Memory 4 GB<br />

Network interface 2 × 1 Gbit/s LAN<br />

Software<br />

Operating system Prime Client: Microsoft Windows Server 2003 R2 Enterprise Edition SP2, KB955839<br />

DMS: Microsoft Windows Server 2003 R2 Enterprise x64 Edition SP2, KB955839<br />

Load generator<br />

Hardware<br />

Model 1 × PRIMERGY RX600 S6<br />

Processor 4 × Xeon E7-4870<br />

Memory 512 GB<br />

Network interface 1 × 1 Gbit/s LAN<br />

2 × 10 Gbit/s LAN<br />

Software<br />

Operating system VMware ESX 4.1.0 U2 Build 502767<br />

Load generator VM (per tile 1 load generator VM)<br />

Hardware<br />

Processor 4 × logical CPU<br />

Memory 4 GB<br />

Network interface 1 × 1 Gbit/s LAN<br />

Software<br />

Operating system Microsoft Windows Server 2008 Enterprise x64 Edition SP2<br />

Details<br />

See disclosure http://www.vmware.com/a/assets/vmmark/pdf/2012-05-01-Fujitsu-RX300S7.pdf<br />

Some components may not be available in all countries or sales regions.<br />

© Fujitsu Technology Solutions 2012 Page 51 (59)


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Benchmark results<br />

On May 1, 2012 Fujitsu achieved with a PRIMERGY RX300 S7 with Xeon E5-2690 processors and VMware<br />

ESX 4.1.0 U2 a VMmark V2 score of ―11.02@10 tiles‖ in a system configuration with a total of 2 × 16<br />

processor cores and when using two identical servers in the ―System under Test‖ (SUT). With this result the<br />

PRIMERGY RX300 S7 is in the official VMmark V2 ranking one of the most powerful 2-socket servers in a<br />

―matched pair‖ configuration consisting of two identical hosts (valid as of benchmark results publication<br />

date).<br />

The current VMmark V2 results as well as the detailed results and configuration data are available at<br />

http://www.vmware.com/a/vmmark/.<br />

VMmark V2 Score<br />

12<br />

10<br />

8<br />

6<br />

4<br />

2<br />

0<br />

Comparison of system generations<br />

11.02@10 tiles<br />

2 × Fujitsu<br />

PRIMERGY RX300 S7<br />

2 × Xeon<br />

E5-2690<br />

x1.45<br />

7.59@7 tiles<br />

2 × Fujitsu<br />

PRIMERGY RX300 S6<br />

2 × Xeon<br />

X5690<br />

In comparison with a PRIMERGY system of the<br />

predecessor generation with Xeon X5690 processors<br />

an increase in performance of about 45% is achieved<br />

with VMmark V2.<br />

The opposite diagram shows the result of the<br />

PRIMERGY RX300 S7 in comparison with the<br />

predecessor system PRIMERGY RX300 S6.<br />

The processors used, which with a good hypervisor setting could make optimal use of their processor<br />

features, were the essential prerequisites for achieving the PRIMERGY RX300 S7 result. These features<br />

include Hyper-Threading. All this has a particularly positive effect during virtualization.<br />

All VMs, their application data, the host operating system as well as additionally required data were on a<br />

powerful fibre channel disk subsystem from ETERNUS DX80 systems. As far as possible, the configuration<br />

of the disk subsystem takes the specific requirements of the benchmark into account. The use of SSDs<br />

(Solid State Disk) in the powerful ETERNUS DX80 S2 resulted in further advantages in response times of<br />

the hard disks used.<br />

The network connection of the load generators and the infrastructure workload connection between the hosts<br />

were implemented with the 10Gb LAN ports.<br />

All the components used were optimally attuned to each other.<br />

Page 52 (59) © Fujitsu Technology Solutions 2012


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

STREAM<br />

Benchmark description<br />

STREAM is a synthetic benchmark that has been used for many years to determine memory throughput and<br />

which was developed by John McCalpin during his professorship at the University of Delaware. Today<br />

STREAM is supported at the University of Virginia, where the source code can be downloaded in either<br />

Fortran or C. STREAM continues to play an important role in the HPC environment in particular. It is for<br />

example an integral part of the HPC Challenge benchmark suite.<br />

The benchmark is designed in such a way that it can be used both on PCs and on server systems. The unit<br />

of measurement of the benchmark is GB/s, i.e. the number of gigabytes that can be read and written per<br />

second.<br />

STREAM measures the memory throughput for sequential accesses. These can generally be performed<br />

more efficiently than accesses that are randomly distributed on the memory, because the CPU caches are<br />

used for sequential access.<br />

Before execution the source code is adapted to the environment to be measured. Therefore, the size of the<br />

data area must be at least four times larger than the total of all CPU caches so that these have as little<br />

influence as possible on the result. The OpenMP program library is used to enable selected parts of the<br />

program to be executed in parallel during the runtime of the benchmark, consequently achieving optimal load<br />

distribution to the available processor cores.<br />

During implementation the defined data area, consisting of 8-byte elements, is successively copied to four<br />

types, and arithmetic calculations are also performed to some extent.<br />

Type Execution Bytes per step Floating-point calculation per step<br />

COPY a(i) = b(i) 16 0<br />

SCALE a(i) = q × b(i) 16 1<br />

SUM a(i) = b(i) + c(i) 24 1<br />

TRIAD a(i) = b(i) + q × c(i) 24 2<br />

The throughput is output in GB/s for each type of calculation. The differences between the various values are<br />

usually only minor on modern systems. In general, only the determined TRIAD value is used as a<br />

comparison.<br />

The measured results primarily depend on the clock frequency of the memory modules; the CPUs influence<br />

the arithmetic calculations. The accuracy of the results is approximately 5%.<br />

This chapter specifies throughputs on a basis of 10 (1 GB/s = 10 9 Byte/s).<br />

Benchmark environment<br />

System Under Test (SUT)<br />

Hardware<br />

Model PRIMERGY RX300 S7<br />

Processor 2 processors of Xeon E5-2600 processor series<br />

Memory 16 × 8GB (1x8GB) 2Rx4 L DDR3-1600 R ECC<br />

Software<br />

BIOS settings Hyper-Threading = Disabled<br />

Operating system Red Hat Enterprise Linux Server release 6.2<br />

Operating system<br />

settings<br />

Compiler Intel C Compiler 12.1<br />

Benchmark Stream.c Version 5.9<br />

echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled<br />

Some components may not be available in all countries or sales regions.<br />

© Fujitsu Technology Solutions 2012 Page 53 (59)


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Benchmark results<br />

Processor Cores Processor<br />

Frequency<br />

[Ghz]<br />

Max. Memory<br />

Frequency<br />

[MHz]<br />

TRIAD<br />

[GB/s]<br />

2 × Xeon E5-2637 2 3.00 1600 41.1<br />

2 × Xeon E5-2603 4 1.80 1067 48.1<br />

2 × Xeon E5-2609 4 2.40 1067 53.9<br />

2 × Xeon E5-2643 4 3.30 1600 75.4<br />

2 × Xeon E5-2630L 6 2.00 1333 68.7<br />

2 × Xeon E5-2620 6 2.00 1333 68.7<br />

2 × Xeon E5-2630 6 2.30 1333 69.8<br />

2 × Xeon E5-2640 6 2.50 1333 70.3<br />

2 × Xeon E5-2667 6 2.90 1600 81.5<br />

2 × Xeon E5-2650L 8 1.80 1600 71.4<br />

2 × Xeon E5-2650 8 2.00 1600 77.0<br />

2 × Xeon E5-2660 8 2.20 1600 78.5<br />

2 × Xeon E5-2665 8 2.40 1600 79.3<br />

2 × Xeon E5-2670 8 2.60 1600 80.0<br />

2 × Xeon E5-2680 8 2.70 1600 79.5<br />

2 × Xeon E5-2690 8 2.90 1600 80.7<br />

The results depend primarily on the maximum memory frequency. The Xeon E5-2637, which with only 2<br />

cores does not use all 4 channels of the memory controller in the STREAM benchmark, is the exception. The<br />

smaller differences with processors with the same maximum memory frequency are a result in arithmetic<br />

calculation of the different processor frequencies.<br />

The following diagram illustrates the throughput of the PRIMERGY RX300 S7 in comparison to its<br />

predecessor, the PRIMERGY RX300 S6, in their most performant configuration.<br />

GB/s<br />

90<br />

80<br />

70<br />

60<br />

50<br />

40<br />

30<br />

20<br />

10<br />

0<br />

STREAM TRIAD:<br />

PRIMERGY RX300S7 vs. PRIMERGY RX300S6<br />

41.4<br />

PRIMERGY RX300 S6<br />

2 × Xeon X5667<br />

81.5<br />

PRIMERGY RX300 S7<br />

2 × Xeon E5-2667<br />

Page 54 (59) © Fujitsu Technology Solutions 2012


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

LINPACK<br />

Benchmark description<br />

LINPACK was developed in the 1970s by Jack Dongarra and some other people to show the performance of<br />

supercomputers. The benchmark consists of a collection of library functions for the analysis and solution of<br />

linear system of equations. A description can be found in the document<br />

http://www.netlib.org/utk/people/JackDongarra/PAPERS/hplpaper.pdf.<br />

LINPACK can be used to measure the speed of a computer during the solution of an N–dimensional linear<br />

system of equations. The result is specified in GFlops (Giga Floating Point Operations per Second). It is a<br />

measure of how many floating-point operations can be carried out per second. The number of floating-point<br />

operations required for the solution is determined by the formula<br />

2 /3 × N 3 + 2 × N 2.<br />

For the calculation LINPACK requires a matrix of size N × N in the main memory with the value N standing<br />

for the number of equations to be solved. Maximum performance is achieved if the available main memory<br />

can be fully used as a result of choosing this value. However, the determination of this limit is very timeconsuming<br />

and the expected increase in the result is only minor. The memory bandwidth of the system also<br />

has hardly any impact on the result, because floating-point calculations are chiefly carried out during the run<br />

and data exchange only seldom takes place between the parallel processes. Thus the benchmark result is<br />

determined for a value of N that is somewhat below the maximum value.<br />

LINPACK is classed as one of the leading benchmarks in the field of high performance computing (HPC).<br />

LINPACK is one of the seven benchmarks currently included in the HPC Challenge benchmark suite, which<br />

takes other performance aspects in the HPC environment into account.<br />

Intel offers a LINPACK version that has been highly optimized for individual systems with Intel processors.<br />

The optimal parameter values are autonomously determined by the software on the basis of the current<br />

processor architecture. Another version provided by Intel is based on hpl (High-<strong>Performance</strong> Linpack) for<br />

use on distributed systems, with the intercommunication of the servers taking place via Message Passing<br />

Interface (MPI). In the case of this version the parameter values are set via a configuration file. Both versions<br />

can be downloaded from http://software.intel.com/en-us/articles/intel-math-kernel-library-linpack-download/.<br />

It is possible to publish LINPACK results at http://www.top500.org/. Prerequisite for this is the use of an MPIbased<br />

(Message Passing Interface) version. (See: http://www.netlib.org/benchmark/hpl)<br />

The maximum theoretical performance of a processor core follows from the number of floating-point<br />

operations that are performed within a clock cycle. Thus e.g. a single processor core with a clock frequency<br />

of 2.4 GHz and 4 floating-point operations per cycle would achieve a maximum performance of 9.6 GFlops.<br />

The ratio of the measured result to the maximum value shows the efficiency of the system for floating-point<br />

calculations. The fewer memory accesses required during the calculation, the better the ratio.<br />

Manufacturer-specific LINPACK versions are also used when graphics cards are used for general purpose<br />

computation on a graphics processing unit (GPGPU). They are based on hpl and contain extensions which<br />

are needed for communication with the graphics cards. During runtime the compute load is distributed over<br />

the system processors and the processors of the graphics cards according to a ratio specified by the user.<br />

The LINPACK result accordingly consists of the total performance of the system processors and graphics<br />

cards, with the system processors not achieving the result that would be possible without a graphics card on<br />

account of the data transfer between main memory and graphics card.<br />

© Fujitsu Technology Solutions 2012 Page 55 (59)


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Benchmark environment<br />

System Under Test (SUT)<br />

Hardware<br />

Model PRIMERGY RX300 S7<br />

Processor 2 processors of Xeon E5-2600 processor series<br />

Memory 16 × 8GB (1x8GB) 2Rx4 L DDR3-1600 R ECC<br />

Software<br />

BIOS settings Hyper-Threading = Disabled<br />

Operating system Red Hat Enterprise Linux Server release 6.2<br />

Benchmark xlinpack_xeon64 from Intel Compiler 12.1<br />

Some components may not be available in all countries or sales regions.<br />

Page 56 (59) © Fujitsu Technology Solutions 2012


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Benchmark results<br />

The available main memory of 128 GB permits a dimension of N = 120000.<br />

Processor Cores Processor<br />

frequency<br />

[Ghz]<br />

Maximum turbo<br />

frequency at full load<br />

[Ghz]<br />

Theoretical<br />

maximum<br />

[GFlops]<br />

LINPACK<br />

[GFlops]<br />

Efficiency<br />

2 × Xeon E5-2637 2 3.00 3.50 112 101 90<br />

2 × Xeon E5-2603 4 1.80 n/a 115 106 92<br />

2 × Xeon E5-2609 4 2.40 n/a 154 140 91<br />

2 × Xeon E5-2643 4 3.30 3.40 218 198 91<br />

2 × Xeon E5-2630L 6 2.00 2.30 221 192 87<br />

2 × Xeon E5-2620 6 2.00 2.30 221 204 92<br />

2 × Xeon E5-2630 6 2.30 2.60 250 229 92<br />

2 × Xeon E5-2640 6 2.50 2.80 269 247 92<br />

2 × Xeon E5-2667 6 2.90 3.20 307 282 92<br />

2 × Xeon E5-2650L 8 1.80 2.00 256 232 91<br />

2 × Xeon E5-2650 8 2.00 2.40 307 280 91<br />

2 × Xeon E5-2660 8 2.20 2.70 346 285 82<br />

2 × Xeon E5-2665 8 2.40 2.80 358 314 88<br />

2 × Xeon E5-2670 8 2.60 3.00 384 318 83<br />

2 × Xeon E5-2680 8 2.70 3.10 397 347 87<br />

2 × Xeon E5-2690 8 2.90 3.30 422 352 83<br />

A theoretical maximum value can be calculated for processors without Turbo mode with the formula<br />

GFlopsmax = Number of floating-point operations per clock cycle × Number of processor cores<br />

× Processor frequency[GHz]<br />

Processors that have Turbo mode are not limited by the nominal processor frequency and therefore do not<br />

provide a constant processor frequency. In this case, the actual processor frequency lies between the<br />

nominal processor frequency and the maximum turbo frequency at full load. To calculate the theoretical<br />

maximum the following formula is used for these processors:<br />

GFlopsmax = Number of floating-point operations per clock cycle × Number of processor cores<br />

× Maximum turbo frequency at full load[GHz]<br />

The following diagram illustrates the throughput of the PRIMERGY RX300 S7 in comparison to its<br />

predecessor, the PRIMERGY RX300 S6, in their most performant configuration.<br />

GFlops<br />

400<br />

350<br />

300<br />

250<br />

200<br />

150<br />

100<br />

50<br />

0<br />

LINPACK:<br />

PRIMERGY RX300 S7 vs. PRIMERGY RX300 S6<br />

160<br />

PRIMERGY RX300 S6<br />

2 × Xeon X5690<br />

352<br />

PRIMERGY RX300 S7<br />

2 × Xeon E5-2690<br />

© Fujitsu Technology Solutions 2012 Page 57 (59)<br />

[%]


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

Literature<br />

PRIMERGY Systems<br />

http://<strong>primergy</strong>.com/<br />

PRIMERGY RX300 S7<br />

Data sheet<br />

http://docs.ts.<strong>fujitsu</strong>.com/dl.aspx?id=9ee3857c-e1e6-44b5-b872-babd34b11188<br />

Memory performance of Xeon E5-2600/4600 (Sandy Bridge-EP)-based systems<br />

http://docs.ts.<strong>fujitsu</strong>.com/dl.aspx?id=a17dbb55-c43f-4ac8-886a-7950cb27ec2a<br />

PRIMERGY <strong>Performance</strong><br />

http://www.<strong>fujitsu</strong>.com/fts/products/computing/servers/<strong>primergy</strong>/benchmarks/<br />

Disk I/O<br />

Basics of Disk I/O <strong>Performance</strong><br />

http://docs.ts.<strong>fujitsu</strong>.com/dl.aspx?id=65781a00-556f-4a98-90a7-7022feacc602<br />

Single Disk <strong>Performance</strong><br />

http://docs.ts.<strong>fujitsu</strong>.com/dl.aspx?id=0e30cb69-44db-4cd5-92a7-d38bacec6a99<br />

RAID Controller <strong>Performance</strong><br />

http://docs.ts.<strong>fujitsu</strong>.com/dl.aspx?id=e2489893-cab7-44f6-bff2-7aeea97c5aef<br />

Information about Iometer<br />

http://www.iometer.org<br />

LINPACK<br />

http://www.netlib.org/linpack/<br />

OLTP-2<br />

Benchmark Overview OLTP-2<br />

http://docs.ts.<strong>fujitsu</strong>.com/dl.aspx?id=e6f7a4c9-aff6-4598-b199-836053214d3f<br />

SAP SD<br />

http://www.sap.com/benchmark<br />

Benchmark overview SAP SD<br />

http://docs.ts.<strong>fujitsu</strong>.com/dl.aspx?id=0a1e69a6-e366-4fd1-a1a6-0dd93148ea10<br />

SPECcpu2006<br />

http://www.spec.org/osg/cpu2006<br />

Benchmark overview SPECcpu2006<br />

http://docs.ts.<strong>fujitsu</strong>.com/dl.aspx?id=1a427c16-12bf-41b0-9ca3-4cc360ef14ce<br />

SPECjbb2005<br />

http://www.spec.org/jbb2005<br />

Benchmark overview SPECjbb2005<br />

http://docs.ts.<strong>fujitsu</strong>.com/dl.aspx?id=5411e8f9-8c56-4ee9-9b3b-98981ab3e820<br />

SPECpower_ssj2008<br />

http://www.spec.org/power_ssj2008<br />

Benchmark Overview SPECpower_ssj2008<br />

http://docs.ts.<strong>fujitsu</strong>.com/dl.aspx?id=166f8497-4bf0-4190-91a1-884b90850ee0<br />

STREAM<br />

http://www.cs.virginia.edu/stream/<br />

Page 58 (59) © Fujitsu Technology Solutions 2012


WHITE PAPER PERFORMANCE REPORT PRIMERGY RX300 S7 VERSION: 1.3 2012-10-09<br />

TPC-E with TPC-Energy<br />

http://www.tpc.org/tpce<br />

Benchmark Overview TPC-E<br />

http://docs.ts.<strong>fujitsu</strong>.com/dl.aspx?id=da0ce7b7-3d80-48cd-9b3a-d12e0b40ed6d<br />

VMmark V2<br />

Benchmark Overview VMmark V2<br />

http://docs.ts.<strong>fujitsu</strong>.com/dl.aspx?id=2b61a08f-52f4-4067-bbbf-dc0b58bee1bd<br />

VMmark V2<br />

http://www.vmmark.com<br />

VMmark V2 Results<br />

http://www.vmware.com/a/vmmark/<br />

vServCon<br />

Benchmark Overview vServCon<br />

http://docs.ts.<strong>fujitsu</strong>.com/dl.aspx?id=b953d1f3-6f98-4b93-95f5-8c8ba3db4e59<br />

Contact<br />

FUJITSU<br />

Website: http://www.<strong>fujitsu</strong>.com/<br />

PRIMERGY Product Marketing<br />

mailto:Primergy-PM@ts.<strong>fujitsu</strong>.com<br />

PRIMERGY <strong>Performance</strong> and Benchmarks<br />

mailto:<strong>primergy</strong>.benchmark@ts.<strong>fujitsu</strong>.com<br />

All rights reserved, including intellectual property rights. Technical data subject to modifications and delivery subject to availability. Any liability that the data<br />

and illustrations are complete, actual or correct is excluded. Designations may be trademarks and/or copyrights of the respective manufacturer, the use of<br />

which by third parties for their own purposes may infringe the rights of such owner.<br />

For further information see http://www.<strong>fujitsu</strong>.com/fts/resources/navigation/terms-of-use.html<br />

2012-10-09 WW EN Copyright © Fujitsu Technology Solutions 2012<br />

© Fujitsu Technology Solutions 2012 Page 59 (59)

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!