블로그 구독
We have very high expectations from any Cloud Native or mode 2 applications deployed on Red Hat hybrid cloud solutions.

When running Red Hat technologies in production, we want our new workloads to be running on top of certified products. They should be architected and deployed with help from certified professionals, proactively maintained with the help of world class support services and have the option to enable organizational resources with training and certifications.
No matter how much support is put into place, the customer needs to be able to operate their

hybrid clouds.


From log aggregation to IT intelligence

Let’s imagine we are delivering a solution for a customer that is building their Digital Foundations to modernize their application development. It’s based on a private Infrastructure as a Service and leverages a container application platform. We can use the Reference Architecture to help deploy our Red Hat Openshift Container Platform on Red Hat OpenStack Platform, and review compatibility beforehand using the Cloud Deployment P

Log aggregation?

lanner. Now, as Alessandro Perilli explains in How to manage the cloud journey? complexity can grow with scale, so it's better to tackle it from the very beginning.

Now we add management capabilities by starting with the Cloud Management Platform, with Red Hat CloudForms, that can handle the IaaS and container application platform. It manages all the components deployed and provides insights into the microservice applications built with containers, orchestrated by kubernetes, running on instances within a tenant on the capabilities provided by the hardware platform.
With this we have covered Day 0 to be able to plan, Day 1 to be able to deploy and we are facing Day 2 in which we operate the full platform. What would our customers do on Day 2 if they faced a physical issue, such as a network cable or network card failing?
The first step would be to investigate why a particular application is behaving incorrectly. We would review the metrics, the logs and the changes to the application in question and the configuration of the application server only to realize that the root of the issue is somewhere else. Then we would go to the container application platform to get the logs and config changes for it, plus the logs and metrics of the operating system underneath ... all of them, from all the Virtual Machines (VMs) to find out that the root of the issue is somewhere else.
Finally, we would go to the IaaS deployment to get all the logs, metrics and configuration changes performed as well as the ones from the operating system ... all of them, from all the physical machines to realize that the root of the issue is yet again, somewhere else ... in an improperly patched cable or failed piece of physical hardware. Even with the great tools that we have in our management portfolio, finding a root cause for an issue like this requires increasing the situational awareness of our IT. With the number of layers and pieces required for mode 2 deployments and applications, this goes from a "nice to have" to an "absolute need".
You may be asking yourself, how can this be made simpler and easier to trace?
The obvious answer is to start performing log aggregation. We are working in different fronts already. Tushar Katarki presented his thoughts at RHTE APAC 2016 and this work is in progress focusing on log aggregation. As Tushar puts it, it’s already improving the lives of users by reducing the operational burden and improving the efficiency to keep the platform running.
To see how we may improve situational awareness, based on our current log aggregation process, we need to fully understand it. To do so we can focus on the data and its journey from the moment it is generated to the moment it is consumed. To do so I had a talk with Javier Roman Espinar, an Architect with a background in High Performance computing and Big Data deployments, where he explained that this can be considered a Big Data issue and should be analyzed from that perspective.
We can use his article Big Data Enterprise Architecture: Overview  as a starting point to analyze the issue and we will realize that there are several stages for logs (or other data) to make them useful. These are the stages that the data goes through:


pastedImage_1.png


And the mapping to our current solution with Elasticsearch + Fluent + Kibana:


pastedImage_2-2.png


... but, what other information can be processed this way that is relevant to the customer. Peter Portante, who has been working intensively on data aggregation had already replied to this question. He explains that we need Logs, but also Metrics (or Telemetry as he describes it) and Configuration to perform a full correlation of all data.
So what’s next?
Next week in part two, the series continues with a deeper look at how we are performing a fuller correlation of all the available data. Stay tuned for more...

저자 소개

채널별 검색

automation icon

오토메이션

기술, 팀, 환경을 포괄하는 자동화 플랫폼에 대한 최신 정보

AI icon

인공지능

고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트

cloud services icon

클라우드 서비스

관리형 클라우드 서비스 포트폴리오에 대해 더 보기

security icon

보안

환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보

edge icon

엣지 컴퓨팅

엣지에서의 운영을 단순화하는 플랫폼 업데이트

Infrastructure icon

인프라

세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보

application development icon

애플리케이션

복잡한 애플리케이션에 대한 솔루션 더 보기

Original series icon

오리지널 쇼

엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리