Site Reliability Engineering (SRE), initially popularized by Google, is an operating model to solve complex operational issues associated with scalable and highly reliable data center sites. As a development practice founded in engineering, SRE has been a method helpful in industries such as banking align business objectives with technical development and operations goals.
As our topic of discussion, we’re introducing the concept of “Service Reliability Engineering” (SvRE), which incorporates financial service regulatory requirements as part of providing a highly scalable and reliable digital banking service.
Why financial institutions should focus on scaling reliable services
Public cloud providers are concerned with site reliability to provide reliable compute competitively and storage services—site downtime is costly both monetarily and to an organization’s reputation. To minimize site outages and retain accountability for reliability, service deployment and support activities are inherently embedded in the application development role—you build it, you support it.
Financial Institutions are in the business of customer trust, principally conveyed to foster knowledge that funds are safe and available at the time and manner the customer chooses. To inspire trust, banks must minimize risk and provide secure, reliable, responsive, resilient, and always available services. The accelerating pace of digital banking service adoption is constant, and the need to scale reliable services has never been greater.
While website providers and financial institutions share similar reliability objectives as business goals, financial institutions are held to additional regulatory compliance requirements which mandate the segregation of responsibility—minimizing and eliminating as much operational and financial risk as possible.
Financial institutions must also comply with regulations that require sufficient controls to isolate functions and ensure that no single function has end-to-end responsibility of a single process which could compromise financial transactions or cause data loss. So, in a regulatory irony - if you build it, you can’t support it.
A new concept: Service Reliability Engineering (SvRE)
“Service Reliability Engineering” (SvRE) can help bridge this necessary gap between functional development and post-deployment support, incorporating financial service regulatory requirements to separate responsibilities from the start.
As part of DevOps methodology, financial institutions have already implemented controls separating development and operational support as part of continuous delivery pipelines. These pipelines have built-in security, compliance, and segregation of responsibilities. Controls limit production access for developers and access to source code for operations teams.
Within SvRE, the application (the functional part of the service) needs to be separated from the platform (the set of technologies that the application is dependent on to run) in order to isolate the dual responsibilities of building and supporting services. This becomes more complicated in financial institutions, which typically have four organizations involved in the application delivery process, introducing logistical complexity, namely:
-
Application Development.
-
Application Support.
-
Application Deployment and Release.
-
System Support.
Each of these teams has a distinct role. Alas, with manual handovers and complex ticketing systems used in the delivery and maintenance of applications, it becomes hard to identify specific owners associated with the reliability of a particular service.
Often, the System Support team assumes final responsibility for reliability, doing so without native understanding of the application nor how it interacts with the underlying platform.
In smaller organizations, where a whole system is designed to support only a few functions (or applications), it can be easier to develop deep knowledge in both the applications and underlying systems—mitigating some of the complexities. However, in large financial institutions and those with intricate systems and dependencies, it is no small feat for any one group to possess the expert knowledge of the platform, underlying infrastructure, and the application—all of which is required to separate the division of responsibilities needed for SvRE due diligence and compliance.
A possible SvRE solution
An approach to achieve the twin goals of support and regulatory compliance might be to establish Application Recovery Engineers (ARE) and Platform Recovery Engineers (PRE) practices.
Developing strong expertise in application recovery and platform availability mimics the industry definition of SRE organizational responsibilities, with each role having respective assignments for application reliability and platform reliability. By adopting service level objectives (SLO) and error budgets1 as common measures governing service reliability, AREs and PREs can work together to achieve organizational metrics that balance market agility and reliability and establish a framework for measuring and tolerating allowable risk.
While promoting continuous feedback processes across teams (metrics, weekly feedback sessions, joint problem solving, testing, common automation frameworks, etc.), would help mitigate the risk associated with diverging from their essential function—securing the reliability of application services.
As a best practice, the SvRE should be limited to a small set of critical applications, specifically those visible to customers.
A SvRE platform model - putting it all together
Teams are best supported by technology that enforces the separation of responsibilities.
As illustrated in the figure, a platform that provides a reliable way to address ARE and PRE concerns ensures their commitments to organizational mandates are presented. This platform illustrates the isolated capabilities in the application space (where specific projects reside, along with their configurations) from the application nodes (where containers run), from the control plane of the platform.
Furthermore, when the technology provides flexibility to run any project, or containers in any control plane—either behind the institutions’ firewall or in one or more cloud provider sites—confidence is retained with the standardized way of segregating responsibilities because it is built-in to a consistent platform.
The right platform technology can help address the needs for observability, security, application and infrastructure immutability—along with a more secure pipeline that includes release capabilities. It also can help manage the manual, repetitive, tactical tasks that provide enduring value and scale linearly as a service grows.
Like most organizations, an operational shift is happening in financial institutions—one promoting proactive prevention, which will benefit reliability, service deployment, and support activities (all of which can adhere to regulatory and business needs).
Explore our video webinar presentation on utilizing technology as a business strategy in financial services to learn more. Check out more about Red Hat’s approach in applying automation for financial services in hybrid and multi-cloud environments including our hybrid cloud banking checklist.
저자 소개
A veteran in the financial services industry, Jamil Mina is passionate about the value of open source and how it can help financial institutions be successful in achieving their Digital Transformation objectives. As Chief Architect for Financial Services at Red Hat, his goal is to be a strategic partner and trusted adviser to his clients, which means investing a lot of time listening to their needs and concerns.
Previously Mina was a leader at BMO Financial Group, a large financial institution in Canada and in the top 50 worldwide. During his tenure, he was responsible for enabling continuous deployment, data center automation and container orchestration. Driving complex transformational capabilities required a strong collaboration with the business and application development teams.
Mina has a Master in Business Administration with specialization in Financial services from Dalhousie University.
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
오리지널 쇼
엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리
제품
- Red Hat Enterprise Linux
- Red Hat OpenShift Enterprise
- Red Hat Ansible Automation Platform
- 클라우드 서비스
- 모든 제품 보기
툴
체험, 구매 & 영업
커뮤니케이션
Red Hat 소개
Red Hat은 Linux, 클라우드, 컨테이너, 쿠버네티스 등을 포함한 글로벌 엔터프라이즈 오픈소스 솔루션 공급업체입니다. Red Hat은 코어 데이터센터에서 네트워크 엣지에 이르기까지 다양한 플랫폼과 환경에서 기업의 업무 편의성을 높여 주는 강화된 기능의 솔루션을 제공합니다.