We have tested a variety of configurations, object sizes, and client worker counts in order to maximize the throughput of a seven node Ceph cluster for small and large object workloads. As detailed in the first post the Ceph cluster was built using a single OSD (Object Storage Device) configured per HDD, having a total of 112 OSDs per Ceph cluster. In this post, we will understand the top-line performance for different object sizes and workloads.
Note: The terms “read” and HTTP GET is used interchangeably throughout this post, as are the terms HTTP PUT and “write.”
Large-Object workload
Large-object sequential input/output (I/O) workloads are one of the most common use cases for Ceph object storage. These high-throughput workloads include big data analytics, backup, and archival systems, image storage, and streaming audio, and video. For these types of workloads throughput (MB/s or GB/s) is the key metric that defines storage performance.
As shown in Chart 1 Large-object 100% HTTP GET and HTTP PUT workload exhibited sub-linear scalability when incrementing the number of RGW hosts. As such we measured ~5.5 GBps aggregated bandwidth for HTTP GET and HTTP PUT workloads and interestingly we did not notice resource saturation in Ceph cluster nodes.
This cluster can churn out more if we can direct more load to it. So we identified two ways to do that. 1) Add more client nodes 2) Add more RGW nodes. We could not go with option 1 as we were limited by the physical client nodes available in this lab. So we opted for option 2 and ran another round of tests but this time with 14 RGWs.
As shown in Chart 2, compared to the 7 RGW test, 14 RGW test yielded a 14% higher write performance, topping at ~6.3GBps, similarly, the HTTP GET workload showed 16% higher read performance, topping ~6.5GBps. This was the maximum aggregated throughput observed on this cluster after which media (HDD) saturation was noticed as depicted in Figure 1. Based on the results, we believe that have we added more Ceph OSD nodes to this cluster, the performance could have been scaled even further, until limited by resource saturation.
Chart 1: Large Object test
Chart 2: Large Object test with 14 RGWs
Figure 1: Ceph OSD (HDD) media utilization
Small Object workload
As shown in Chart 3 small object 100% HTTP GET and HTTP PUT workloads exhibited sub-linear scalability when incrementing the number of RGW hosts. As such we measured ~6.2K Ops throughput for HTTP PUT at 9ms application write latency and ~3.2K Ops for HTTP GET workloads with 7 RGW instances.
Until 7 RGW instances, we did not notice resource saturation, so we doubled down on RGW instances by scaling those to 14 and observed degraded performance for HTTP PUT workload which is attributed to media saturation, while HTTP GET performance scaled up and topped out at ~4.5K Ops. As such, write performance could have scaled higher, had we added more Ceph OSD nodes. As far as read performance is concerned, we believe that adding more client nodes should have improved it, but we did not have any more physical nodes in the lab to test this hypothesis.
Another interesting observation from Chart 3 is the reduced average response time for HTTP PUT workloads which gauged at 9ms while HTTP GET showed 17ms of average latency measured from the application generating workload. We believe that one of the reasons for single-digit application latency for write workload is the combination of performance improvement coming from the BlueStore OSD backend as well as the high performance Intel Optane NVMe used to back BlueStore metadata device. It's worth noting that achieving single digit write average latency from an Object Storage system is non-trivial. As depicted in Chart 3, Ceph object storage when deployed with BlueStore OSD backend and Intel Optane for metadata can achieve the write throughput at lower response time.
Chart 3: Small Object Test
Summary and up next
The fixed-size cluster used in this testing has delivered ~6.3GBps and ~6.5GBps large object bandwidth for write and read workloads respectively. The same cluster for small object size has delivered ~6.5K Ops and ~4.5K Ops for write and read workload, respectively.
The results have also shown that BlueStore OSD in combination with Intel Optane NVMe has delivered single-digit average application latency, which is non-trivial for object storage systems. In the next post, we will explore the performance associated with bucket dynamic sharding and how the pre-sharding bucket can help in deterministic performance.
Sull'autore
Ricerca per canale
Automazione
Novità sull'automazione IT di tecnologie, team e ambienti
Intelligenza artificiale
Aggiornamenti sulle piattaforme che consentono alle aziende di eseguire carichi di lavoro IA ovunque
Hybrid cloud open source
Scopri come affrontare il futuro in modo più agile grazie al cloud ibrido
Sicurezza
Le ultime novità sulle nostre soluzioni per ridurre i rischi nelle tecnologie e negli ambienti
Edge computing
Aggiornamenti sulle piattaforme che semplificano l'operatività edge
Infrastruttura
Le ultime novità sulla piattaforma Linux aziendale leader a livello mondiale
Applicazioni
Approfondimenti sulle nostre soluzioni alle sfide applicative più difficili
Serie originali
Raccontiamo le interessanti storie di leader e creatori di tecnologie pensate per le aziende
Prodotti
- Red Hat Enterprise Linux
- Red Hat OpenShift
- Red Hat Ansible Automation Platform
- Servizi cloud
- Scopri tutti i prodotti
Strumenti
- Formazione e certificazioni
- Il mio account
- Supporto clienti
- Risorse per sviluppatori
- Trova un partner
- Red Hat Ecosystem Catalog
- Calcola il valore delle soluzioni Red Hat
- Documentazione
Prova, acquista, vendi
Comunica
- Contatta l'ufficio vendite
- Contatta l'assistenza clienti
- Contatta un esperto della formazione
- Social media
Informazioni su Red Hat
Red Hat è leader mondiale nella fornitura di soluzioni open source per le aziende, tra cui Linux, Kubernetes, container e soluzioni cloud. Le nostre soluzioni open source, rese sicure per un uso aziendale, consentono di operare su più piattaforme e ambienti, dal datacenter centrale all'edge della rete.
Seleziona la tua lingua
Red Hat legal and privacy links
- Informazioni su Red Hat
- Opportunità di lavoro
- Eventi
- Sedi
- Contattaci
- Blog di Red Hat
- Diversità, equità e inclusione
- Cool Stuff Store
- Red Hat Summit