We talk a lot about the linear scalability of Red Hat Gluster Storage, and we can generally back that up with empirical data. Indeed, homogeneously scaling out the storage nodes and network infrastructure can result in both capacity and throughput capabilities that are directly proportional. But it's important to note that this is potential scalability, and how you use the volumes plays a vital role in the experience you have.
We architect optimal solution recommendations based on a few expectations:
- Most of the workload falls into a particular category—high throughput, small file, or latency sensitive, for example.
- When your capacity needs grow, so do your concurrent client demands.
- You're using the glusterfs native client.
Let's take a look at these points and how they affect your real scalability.
Architecting for workload
We know through thousands of test cycle results that there is a generally optimal server configuration that will apply broadly to a majority of workloads. This compiled knowledge is a huge benefit to you, the user, and it can greatly reduce your own time commitment in designing and testing fundamental system architectures. However, just up the stack from the server and network components are low-level configuration choices that you will make for every deployment. These choices are the big knobs—Particular to your workload there is likely one best choice for peak performance. And it's important to note that these aren't choices you can easily change later. Changes at these layers likely require moving data, potentially more than once, and data has inertia.
When you understand your majority workload, and preferably you isolate dislike workloads entirely, you will be positioned to make choices about server density (12, 24, or higher drive capacity), block-level configurations (e.g., HDD vs. SSD, RAID vs. JBOD, caching vs. not, block and stripe sizes), and Gluster volume geometry (e.g., replicated vs. dispersed, failure resiliency, arbiter bricks, tiering). Locked into these choices and the related workload, you'll find it reasonably simple to integrate new nodes and bricks into the volume for predictable capacity and performance expansion.
Client concurrency
So you've built to your workload and everything is great. That is, unless your expectations aren't aligned with a scale-out solution. Any single connection to the storage pool is bound by physics. One client communicates over one network link to one server to one file system and block stack. Sure, some design options allow for single-client concurrency to multiple stacks, but those come at a trade off, and each connection is still bound by physics and bottlenecked somewhere along the line. So if your need is to provide expanded throughput capabilities to a single or a small number of clients, you will likely find that horizontal scale-out won't give you much performance benefit. There are some tricks we can use to architect for such a need, but it will never be an efficient solution.
To that end, an optimal design assumes you are operating at an appropriate client:server concurrency ratio. The best ratio will vary with your workload and the architecture decisions you make per the preceding discussion, but you can expect for most cases a ratio range of 12:1 to 48:1 to be appropriate for peak or plateau storage throughput capabilities. So if you build out a 12-node storage pool based on your capacity needs and then expect 4 client systems to use that storage concurrently, you'll bottleneck on the server node I/O stack long before you saturate the aggregate system capabilities. But with an appropriate concurrent client count of say 150+ for your 12 server nodes, you may be operating at the peak capabilities of the system.
Client choice
Great! So you're heeding all the advice here, and you're going to deploy 12 Red Hat Gluster Storage nodes in an optimal architecture for 150 NFS clients. Well, hold on there a minute, buckaroo. We're more than happy to support the NFS client, but you should know what you're getting into.
When using the Gluster native client, data placement calculations are made on the client side. This means that each client is fully aware of the volume geometry and all server nodes participating, allowing it to determine how the data protection scheme is applied and which nodes and backend filesystems (bricks) each file will be written to. All client-to-server connections are then made efficiently based on this client-side intelligence. And because data placement among the distributed system is done pseudo-randomly, there is a statistically even distribution of work between the clients and servers and therefore predictable performance scalability.
When choosing NFS (or SMB), a client will make its connections to a single Gluster server. That server then has to apply the client-side intelligence for data resilience, conversion, and placement, and it will then make secondary network calls out to each participating server node for the file transaction. This inefficiency leads to a concurrency bottleneck far below the capabilities of the native client—You'll still hit peak throughput at about the same client:server ratio, but that throughput will be well below what can be achieved on the same systems with the native client.
The one surprise that can come up with the NFS client is that if you do indeed require a lower client:server ratio, NFS can in some conditions outperform the native client at that concurrency level. YMMV on this, and you'll still be far below the peak capabilities of the system, but it's worth testing out if you're absolutely determined to connect your 4 clients to your 12 Gluster nodes (but don't say I didn't warn you not to).
Oh yeah? Prove it.
Lucky for you, I did that already. Take a look at our published reference architectures and, in particular, our most recent Gluster Performance and Sizing Guide. And keep an eye out here for future publications as we continue to expand and refine our data.
저자 소개
Dustin has been with Red Hat since 2011 helping customers with open source technologies in business-critical environments. He is passionate about performance, automation, and pushing the boundaries of what people can accomplish. He loves to travel, eat, and experience new cultures with his wife and family. On weekends, you're likely to find him on the paintball field, out for a hike, or settled down for some binge-watching. Someday he'll probably own a boat.
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
오리지널 쇼
엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리
제품
- Red Hat Enterprise Linux
- Red Hat OpenShift Enterprise
- Red Hat Ansible Automation Platform
- 클라우드 서비스
- 모든 제품 보기
툴
체험, 구매 & 영업
커뮤니케이션
Red Hat 소개
Red Hat은 Linux, 클라우드, 컨테이너, 쿠버네티스 등을 포함한 글로벌 엔터프라이즈 오픈소스 솔루션 공급업체입니다. Red Hat은 코어 데이터센터에서 네트워크 엣지에 이르기까지 다양한 플랫폼과 환경에서 기업의 업무 편의성을 높여 주는 강화된 기능의 솔루션을 제공합니다.