Intel recently launched the 5th generation of Intel® Xeon® Scalable processors (Intel Xeon SP), code-named Emerald Rapids; a family of high-end, enterprise-focused processors targeted at a diverse range of workloads. To explore how Intel’s new chips measure up, we’ve worked with Intel and others to run benchmarks with Red Hat Enterprise Linux 8.8 / 9.2 and greater.
Intel’s 5th Gen Xeon Scalable processors are a drop-in compatible with existing 4th Gen Xeon Scalable motherboards. It now supports up to 64 cores per socket vs 60 cores, can handle DDR5-5600 memory speeds over DDR5-4800 prior generation, up to 3x the LLC, and up to 20 GT/s UPI 2.0 speeds. The Red Hat Performance Engineering team configured a peak prototype system from Intel for both of these models to conduct performance measurements.
SAP Performance
RHEL 8.8 SAP HANA Leadership on 5th Generation Intel Xeon Scalable Processor
Leaning on our long history of collaboration, Red Hat and Intel once again worked together to deliver state-of-the-art performance to enterprise data centers and beyond. Red Hat’s development and performance engineering teams have been working on hardware enablement and validation of these new scalable processors for more than a year running a variety of benchmarks prior to the GA release of Red Hat Enterprise Linux.
Higher per-core performance, larger last level cache, faster memory, and storage combined with workload-optimized cores benefit overall system performance. To demonstrate performance and provide additional scalability and sizing information for SAP HANA applications and workloads, SAP introduced the Business Warehouse (BWH) edition of SAP HANA Standard Application Benchmark [1]. Presently on version 3, this benchmark simulates a variety of users with different analytical requirements and measures the key performance indicator (KPI) relevant to each of the three benchmark phases, which are defined below:
- Data load phase, testing data latency and load performance (lower is better)
- Query throughput phase, testing query throughput with moderately complex queries (higher is better)
- Query runtime phase, testing the performance of running very complex queries (lower is better)
Red Hat Enterprise Linux (RHEL) was used in several recent publications of the above benchmark. Specifically, two separate initial record sizes (1.3 and 2.6 billion records) using a Dell PowerEdge R760 server with 5th Gen Intel Xeon Scalable processors, demonstrated that running the workload on Red Hat Enterprise Linux could deliver a significant performance boost over the previous generation of Intel servers (see Table 1).
Table 1. Results in scale-up category running SAP BW Edition for SAP HANA Standard Application Benchmark, Version 3 on SAP NetWeaver 7.50 and SAP HANA 2.0
Initial Records (Billions) | Phase 1 (lower is better) | Phase 2 (higher is better) | Phase 3 (lower is better) | |
Red Hat Enterprise Linux 8.8 [2] | 2.6 | 7,083 sec | 13,410 | 68 sec |
SUSE Linux Enterprise Server 15 [3] | 2.6 | 10,404 sec | 9,917 | 76 sec |
5th generation Intel Xeon / Red Hat Enterprise Linux advantage | 31.9% | 35.2% | 10.5% |
[1] SAP Results as of March 1, 2023, SAP and SAP HANA are the registered trademarks of SAP AG in Germany and in several other countries. See www.sap.com/benchmark for more information
[2] Dell PowerEdge R760 (2 processor / 128 cores / 256 threads, Intel Xeon
Platinum 8592+ processor, 1.9 GHz, 80 KB L1 cache and 2048 KB L2 cache per core, 320 MB L3 cache per processor, 1536 GB main memory). Certification number #2023076
[3] Atos BullSequana SH20 (2 processor / 120 cores / 240 threads, Intel Xeon
Platinum 8490H processor, 1.9 GHz, 80 KB L1 cache and 2048 KB L2 cache per core, 112.5 MB L3 cache per processor, 1024 GB main memory). Certification number #2023028
Additionally, using a dataset size of 1.3 billion initial records, a Dell EMC PowerEdge R760 server running Red Hat Enterprise Linux also outscored a similarly configured server on two out of three benchmark KPIs demonstrating better dataset load time and complex query runtime (see Table 2).
Table 2. Results in scale-up category running SAP BW Edition for SAP HANA Standard Application Benchmark, Version 3 on SAP NetWeaver 7.50 and SAP HANA 2.0
Initial Records (Billions) | Phase 1 (lower is better) | Phase 2 (higher is better) | Phase 3 (lower is better) | |
Red Hat Enterprise Linux 8.8 [4] | 1.3 | 6,069 sec | 17,846 | 65 sec |
SUSE Linux Enterprise Server 15 [5] | 1.3 | 8,041 sec | 14,288 | 61 sec |
5th generation Intel Xeon / Red Hat Enterprise Linux advantage | 24.5% | 24.9% | -6.6% |
[4] Dell PowerEdge R760 (2 processor / 128 cores / 256 threads, Intel Xeon
Platinum 8592+ processor, 1.9 GHz, 80 KB L1 cache and 2048 KB L2 cache per core, 320 MB L3 cache per processor, 1536 GB main memory). Certification number #2023075
[5] Atos BullSequana SH20 (2 processor / 120 cores / 240 threads, Intel Xeon
Platinum 8490H processor, 1.9 GHz, 80 KB L1 cache and 2048 KB L2 cache per core, 112.5 MB L3 cache per processor, 1024 GB main memory). Certification number #2023026
These results demonstrate Red Hat’s commitment to helping OEM partners and ISVs deliver high-performing solutions to our mutual customers and showcase close alignment between Red Hat and Dell that, in collaboration with SAP, led to the creation of certified, single-source solutions for SAP HANA. Available in both single-server and larger, scale-out configurations, Dell’s solution is optimized with Red Hat Enterprise Linux for SAP Solutions.
TPC-H @ SF =10000
Another Industry Standard Benchmark is the TPC-H decision support benchmark from the Transaction Processing Council (TPC).
The results show strong performance of HPE ProLiant DL380 class machines on the TPC-H benchmark @ SF= 10000 scoring a 17.9% improvement in performance in Queries/Hour (QphH) and a 31.4% price performance gain (Price/QphH). The audited TPC-H results were run by HPE and using Microsoft SQLserver 2022 64 bit on 5th Gen Intel Xeon SP running RHEL9.3 compared to a 4th Gen Intel Xeon SP results w/ the same SQLserver 2022 on Microsoft Windows Server 2022 Standard Edition operating systems. The combination of RHEL9.3 and 5th Gen Intel Xeon SP designs help show the value of upgrading the Server and the OS to a solution that achieved the #1 non-clustered 10,000GB TPC-H performance result [6]
TPC-H w/ HPE DB @ 10 TB SF = 10000 | |||||||
Sponsor | System | Performance (QphH) | Price/kQphH | System Availability | Date Submitted | DB Software Name | OS Software Name |
Prior 4th Gen Intel Xeon Processor | 2,028,444 | 821.80 USD | 5/1/2023 | 2/8/2023 | Microsoft SQL Server 2022 Enterprise Edition 64 bit | Microsoft Windows Server 2022 Standard Edition | |
NEW 5th Gen Intel Xeon Processor | 2,391,511 | 625.77 USD | 6/30/2024 | 1/25/2024 | Microsoft SQL Server 2022 Enterprise Edition 64 bit | Red Hat Enterprise Linux Server Release 9.3 | |
Speedup Gen5/Gen4 | 17.9% | 31.4% |
RHEL 9.4 (beta) AI/ML and computing performance with Intel® AMX
Here we explore the 5th Gen Intel Xeon processor [7] performing AI/ML capabilities by comparing performance to the previous 4th Gen Intel Xeon processor [8] using some of the Phoronix Test Suite (PTS) benchmarks for PyTorch and TensorFlow, and the Neural Magic DeepSparse and Intel® OpenVINO™ test suites. These four benchmark suites have more than 100 subtests between them. See [9] to reproduce these results.
We also ran general CPU computing benchmarks like SPEC CPU Base Rate (estimated), and some Two Dimensional FFTW in our lab systems to compare apples to apples on beta RHEL 9.4 systems.
(Our SPEC CPU Base Rate results are not an official run. We used Intel binaries with the ic2024.0.2-lin-sapphirerapids-rate-20231213.cfg config)
The results reflect out-of-the-box performance gains. None of the benchmarks have any 5th Gen Intel Xeon SP specific tunings or optimizations beyond what the compiler can detect automatically. Our results show 5th Gen Intel Xeon SP Average Speedup factors range from 1.07 to 1.22, and Max Speedups range from 1.19 to 1.89 relative to 4th Gen Intel Xeon SP.
Summary
The Red Hat Performance Engineering team works with Intel to ensure performance capabilities of Red Hat Enterprise Linux on systems prior to hardware vendors shipping them in production. This blog reviewed a number of capabilities of Intel’s 5th Generation of features including higher cpu count, faster DDR5 memory, larger 3rd level caches, and improved interprocessor bandwidth. All of these features are supported in shipping versions of RHEL 8.8 and RHEL 9.2. We shared how OEMs used these features to produce leading results on SAP [1] industry standard benchmarks and TPC [6]. We also ran tests on RHEL 9.4 beta showing significant speedups for CPU workloads and AI/ML benchmarks comparing 5th Gen Intel Xeon SP to 4th Gen Intel Xeon SP.
The collaboration between Intel and Red Hat helps expand our capabilities and we will continue delivering innovative features in future versions of RHEL, where we hope to continue being the trusted OS for customers and partners.
Learn more
[6] TPC and TPC-H are trademarks of the Transaction Processing Performance Council. All third-party marks are property of their respective owners: see: https://www.tpc.org/tpch/results. All comparisons and claims as of March 15, 2024. Filtered by 10,000 GB results: https://www.tpc.org/tpch/results/tpch_perf_results5.asp?resulttype=nonc…
[7] 5th Gen Intel Xeon SP Hardware Configuration
Processor: 2 x Intel Xeon Platinum 8592+ @ 3.90GHz (128 Cores / 256 Threads)
Motherboard: Intel D50DNP1SBB (SE5C7411.86B.9533.D01.2310110651 BIOS)
Memory: 1008 GB @ 5800 MT/s
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 52 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 256
On-line CPU(s) list: 0-255
Vendor ID: GenuineIntel
BIOS Vendor ID: Intel(R) Corporation
Model name: INTEL(R) XEON(R) PLATINUM 8592+
BIOS Model name: INTEL(R) XEON(R) PLATINUM 8592+
CPU family: 6
Model: 207
Thread(s) per core: 2
Core(s) per socket: 64
Socket(s): 2
Stepping: 2
CPU(s) scaling MHz: 100%
CPU max MHz: 3900.0000
CPU min MHz: 800.0000
BogoMIPS: 3800.00
Flags:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht
tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm
pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch
cpuid_fault epb cat_l3 cat_l2 cdp_l3 cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid
ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma
clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc
cqm_mbm_total cqm_mbm_local split_lock_detect avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hwp hwp_act_window
hwp_epp hwp_pkg_req vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg
tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk
pconfig arch_lbr ibt amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 6 MiB (128 instances)
L1i: 4 MiB (128 instances)
L2: 256 MiB (128 instances)
L3: 640 MiB (2 instances)
NUMA:
NUMA node(s): 4
NUMA node0 CPU(s): 0-31,128-159
NUMA node1 CPU(s): 32-63,160-191
NUMA node2 CPU(s): 64-95,192-223
NUMA node3 CPU(s): 96-127,224-255
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Enhanced / Automatic IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Srbds: Not affected
Tsx async abort: Not affected
[8] 4th Gen Intel Xeon SP Hardware Configuration
Processor: 2 x Intel Xeon Platinum 8480+ @ 3.80GHz (112 Cores / 224 Threads)
Motherboard: Dell 0VRV9X (1.3.2 BIOS)
Memory: 2016 GB @ 4800 MT/s
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 57 bits virtual
Byte Order: Little Endian
CPU(s): 224
On-line CPU(s) list: 0-223
Vendor ID: GenuineIntel
BIOS Vendor ID: Intel
Model name: Intel(R) Xeon(R) Platinum 8480+
BIOS Model name: Intel(R) Xeon(R) Platinum 8480+
CPU family: 6
Model: 143
Thread(s) per core: 2
Core(s) per socket: 56
Socket(s): 2
Stepping: 8
CPU(s) scaling MHz: 98%
CPU max MHz: 3800.0000
CPU min MHz: 800.0000
BogoMIPS: 4000.00
Flags:
fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht
tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc
cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm
pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch
cpuid_fault epb cat_l3 cat_l2 cdp_l3 cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid
ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma
clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc
cqm_mbm_total cqm_mbm_local split_lock_detect avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hwp hwp_act_window
hwp_epp hwp_pkg_req vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg
tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd fsrm md_clear serialize tsxldtrk
pconfig arch_lbr ibt amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 5.3 MiB (112 instances)
L1i: 3.5 MiB (112 instances)
L2: 224 MiB (112 instances)
L3: 210 MiB (2 instances)
NUMA:
NUMA node(s): 2
NUMA node0 CPU(s): 0,2,4,6,8, . . .
NUMA node1 CPU(s): 1,3,5,7,9, . . .
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Enhanced / Automatic IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Srbds: Not affected
Tsx async abort: Not affected
[9] Using Phoronix-Test-Suites in Containers
The PTS framework is an extremely convenient way to run performance tests, and it has a large ecosystem with many recorded results available for comparison. For official information, including official instructions explaining how to run PTS tests, see Phoronix Test Suite and OpenBenchmarking.org.
We ran the AI/ML related tests in Centos Stream 9 containers (on RHEL 9.4 beta hosts) to avoid any accidental modifications to the host system environment and to enforce a clean slate for each repeated trial.
Steps to reproduce the AI/ML related test results on your system:
podman run -it --rm --net=host --privileged centos:stream9 /bin/bash
sed -i "/\[crb\]/,+9s/enabled=0/enabled=1/" /etc/yum.repos.d/centos.repo
dnf -y install https://dl.fedoraproject.org/pub/epel/epel-release-latest-9.noarch.rpm
dnf -y install atlas-devel autoconf automake binutils blas blas-devel boost-devel boost-thread bzip2 cmake expat-devel findutils gcc gcc-c++ gcc-gfortran gflags-devel git glog-devel gmock-devel gzip hdf5-devel iputils leveldb-devel libquadmath-devel libusb-devel libusbx-devel lmdb-devel make meson nfs-utils ninja-build openblas-devel opencv opencv-devel openssl-devel patch pciutils php-cli php-json php-xml procps-ng protobuf-compiler protobuf-devel python3 python3-devel python3-pip python3-yaml snappy-devel tar unzip vim-enhanced wget xz zip
At this point you might mount a shared volume with phoronix-test-suite already installed, or you can just download and unpack it in the container with steps like these:
wget https://phoronix-test-suite.com/releases/phoronix-test-suite-10.8.4.tar.gz
tar xvzf phoronix-test-suite-10.8.4.tar.gz
cd phoronix-test-suite
./phoronix-test-suite install deepsparse openvino pytorch tensorflow
./phoronix-test-suite benchmark deepsparse openvino pytorch tensorflow
執筆者紹介
Michey is a member of the Red Hat Performance Engineering team, and works on bare metal/virtualization performance and machine learning performance.. His areas of expertise include storage performance, Linux kernel performance, and performance tooling.
チャンネル別に見る
自動化
テクノロジー、チームおよび環境に関する IT 自動化の最新情報
AI (人工知能)
お客様が AI ワークロードをどこでも自由に実行することを可能にするプラットフォームについてのアップデート
オープン・ハイブリッドクラウド
ハイブリッドクラウドで柔軟に未来を築く方法をご確認ください。
セキュリティ
環境やテクノロジー全体に及ぶリスクを軽減する方法に関する最新情報
エッジコンピューティング
エッジでの運用を単純化するプラットフォームのアップデート
インフラストラクチャ
世界有数のエンタープライズ向け Linux プラットフォームの最新情報
アプリケーション
アプリケーションの最も困難な課題に対する Red Hat ソリューションの詳細
オリジナル番組
エンタープライズ向けテクノロジーのメーカーやリーダーによるストーリー
製品
ツール
試用、購入、販売
コミュニケーション
Red Hat について
エンタープライズ・オープンソース・ソリューションのプロバイダーとして世界をリードする Red Hat は、Linux、クラウド、コンテナ、Kubernetes などのテクノロジーを提供しています。Red Hat は強化されたソリューションを提供し、コアデータセンターからネットワークエッジまで、企業が複数のプラットフォームおよび環境間で容易に運用できるようにしています。
言語を選択してください
Red Hat legal and privacy links
- Red Hat について
- 採用情報
- イベント
- 各国のオフィス
- Red Hat へのお問い合わせ
- Red Hat ブログ
- ダイバーシティ、エクイティ、およびインクルージョン
- Cool Stuff Store
- Red Hat Summit