Introduction
Collecting debugging information from a large set of nodes (such as when creating SOS reports) can be a time consuming task to perform manually. Additionally, in the context of Red Hat OpenShift 4.x and Kubernetes, it is considered a bad practice to ssh into a node and perform debugging actions. To better accomplish this type of operation in OpenShift Container Platform 4, there is a new command: oc adm must-gather, which will collect debugging information across the entire cluster (nodes and control plane). More detailed information on the must-gather command can be found in the platform documentation.
While using the must-gather command is fairly straightforward, the full end-to-end process to facilitate all of the available tasks can be time consuming. This process involves issuing the command, waiting for the associated tasks to complete, and then upload the resulting information to the Red Hat case management system.
A way to further streamline the process is to automate these actions.
Must-Gather Operator
The must-gather operator streamlines running the must-gather command and uploading the results to the Red Hat case management system. The must-gather operator is intended to be used only by the cluster administrator as it requires elevated permissions on the cluster. A must-gather run can be started by creating a MustGather custom resource (CR) similar to the following:
apiVersion: redhatcop.redhat.io/v1alpha1
kind: MustGather
metadata:
name: example
spec:
caseID: 'XXXXXXXX'
caseManagementAccountSecretRef:
name: case-management-creds
serviceAccountRef:
name: must-gather-admin
Within the MustGather CR, three parameters can be defined:
- caseID. Red Hat Support case to which the resulting output will be attached.
- caseManagementAccountSecretRef: secret containing the credentials needed to login and upload files to the Red Hat case management system.
- serviceAccountRef: service account with the cluster-admin role that is used to run the must-gather command. Running as a cluster-admin is a must-gather requirement.
When this CR is created, the operator creates a job that runs must-gather operations, and uploads the resulting information in a compressed file.
The must-gather operator watches only the namespace in which it is deployed. This should make it easier for a cluster administrator to configure limited access to that namespace. This is recommended as that namespace needs to contain a service account with cluster-admin privileges for the reason seen before and therefore needs to be properly protected.
Running Additional Must-Gather Images
The must-gather command supports the option of running multiple must-gather compatible images that can be used for collecting additional information. This option is typically limited to OpenShift addons, such as Kubevirt and OpenShift Container Storage (OCS). The must-gather operator supports this functionality by allowing these images to be specified as in the following example:
apiVersion: redhatcop.redhat.io/v1alpha1
kind: MustGather
metadata:
name: example-more-images
spec:
caseID: 'XXXXXXX'
caseManagementAccountSecretRef:
name: case-management-creds
serviceAccountRef:
name: must-gather-admin
mustGatherImages:
- quay.io/kubevirt/must-gather:latest
- quay.io/ocs-dev/ocs-must-gather
As you can see, the mustGatherImages property is an array of strings representing images. When added to a must-gather CR, all the specified images in addition to the default must gather image will be run.
Installation
The must gather operator can be installed via the OperatorHub or with a Helm chart.
The project GitHub repository contains detailed information on how to install the must-gather operator.
Conclusions
Being able to provide diagnosis information in a consistent fashion makes it easier for Red Hat support to aid in the resolution of issues. A more streamlined and automatic information collecting process makes it more likely for the customer to be able to provide timely debugging information to Red Hat support. The must-gather operator aims to help in this space.
저자 소개
Raffaele is a full-stack enterprise architect with 20+ years of experience. Raffaele started his career in Italy as a Java Architect then gradually moved to Integration Architect and then Enterprise Architect. Later he moved to the United States to eventually become an OpenShift Architect for Red Hat consulting services, acquiring, in the process, knowledge of the infrastructure side of IT.
Currently Raffaele covers a consulting position of cross-portfolio application architect with a focus on OpenShift. Most of his career Raffaele worked with large financial institutions allowing him to acquire an understanding of enterprise processes and security and compliance requirements of large enterprise customers.
Raffaele has become part of the CNCF TAG Storage and contributed to the Cloud Native Disaster Recovery whitepaper.
Recently Raffaele has been focusing on how to improve the developer experience by implementing internal development platforms (IDP).
채널별 검색
오토메이션
기술, 팀, 인프라를 위한 IT 자동화 최신 동향
인공지능
고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트
오픈 하이브리드 클라우드
하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요
보안
환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보
엣지 컴퓨팅
엣지에서의 운영을 단순화하는 플랫폼 업데이트
인프라
세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보
애플리케이션
복잡한 애플리케이션에 대한 솔루션 더 보기
가상화
온프레미스와 클라우드 환경에서 워크로드를 유연하게 운영하기 위한 엔터프라이즈 가상화의 미래