피드 구독

In the introductory article of this series, we discussed fault management and performance management (FM/PM) and how it's a critical domain for telco operators. An important part of FM/PM is observability, because you can't effectively manage what you can't see.

Observability needs a single pane of observability of the entire network and services, including the hardware and software components. The challenge is integrating diverse monitoring, telemetry, and alerting systems components into one interface that can provide real-time insights across the entire telecom infrastructure. In this article, we look at characteristics of some of the most sought-after requirements from telco operators when observing and monitoring solutions for their networks.

Real-time metrics feedback loop

For optimal functionality of a 5G telecom service provider network, the timely correlation of events and metrics must be ensured across different layers and protocols in the telco network, including RAN, core, and transport entities. This correlation depends on how quickly faults are propagated to the management (local and remote) and how fast the response and feedback propagates back to the origin. 

A good observability solution implements this propagation loop within the proscribed time budget. A non-real-time (if not a near-real-time) response is essential for the proper functioning and performance of a 5G network. Any degradation in the network must be quickly noticed by the operator monitoring the networks. 

Real-time metrics feedback loops are the backbone of modern observability and resilience strategies, particularly in industries like telecom, where rapid adjustments are key to maintaining service quality.

On-prem long-term log storage

Telco networks generate massive amounts of real-time data, such as call detail records (CDR), packet flow, and signaling data. Storing, collecting, and processing this data in real time to identify and respond to issues is a challenge. Storing on-prem long term also enables a comprehensive view by correlating logs, metrics, and traces into one cohesive interface. This helps in root cause analysis and faster troubleshooting. Plus, the regulatory compliances in many jurisdictions require telcos to store logs for months, if not years! 

Logging helps with network optimization and SLA monitoring. It enriches data for AI/ML models for predictive analysis and anomaly detection. Last but not least, logs play an important role in troubleshooting and finding patterns for rare issues. 

Identifying data retention requirements of various cloud applications can help choose the right storage solution, such as object storage like Ceph. Log management tools like ELK, Searchstack, and Loki need to be able to access, aggregate, ingest, and process logs with indexing or compression and encryption, as required. Telcos must treat long-term log storage as a regulatory requirement and a strategic asset for better observability, security, and optimization.

Metrics based autoscaling

Horizontal pod autoscaling using custom metrics allows scaling pods based on metrics beyond the default CPU or memory usage, such as application-specific metrics like number of sessions, data throughput, or business metrics.

Consider this use case of dynamic scaling in a 5G network:

  • Metrics collected: CPU and memory usage, throughput, and network specific traffic patterns
  • Analysis: If traffic exceeds 80% capacity on a specific slice, the system predicts a potential SLA breach
  • Feedback action: Automatically spin up additional virtualized RAN nodes to handle the load

Such use cases need custom metrics to be observed by the telco cloud. Metric collectors, like Prometheus, can be configured to scrape custom metrics data.

Variety of management interfaces 

A typical telco service provider has its own customized (usually hierarchical) management interfaces.  Management components expect incoming metrics, alerts, or logs to be in a specific format, and through specific protocols and interfaces. 

The challenge lies with the cloud metric servers and their clients to keep their implementation as generic as possible to cater to various management interfaces, but also easily customize as the telco operator needs. Retention times, secure interfaces, packet sizes, and protocol versions are some of the tunable parameters required to support these requirements. 

Regulatory compliance audit support

Telecom observability and monitoring solutions must align with a variety of regulatory compliance standards to ensure proper auditing, data integrity, and lawful data management. These compliance standards aim to protect sensitive information, enable lawful interception, and provide sufficient transparency to meet legal, business, and operational requirements.

Some common requirements for observability systems include:

  • Enable detailed logging of all intercepted communications
  • Support secure and auditable access pathways for authorized entities
  • Ensures the confidentiality, integrity, and availability of monitoring data
  • Audit logs must track system access, modifications, and operational changes

Telco networks generate massive amounts of observability data, making real-time compliance challenging.

Granularity in parameters

The granularity of parameters in a telco monitoring solution significantly impacts its effectiveness, efficiency, and the value it provides. In this context, "granularity" refers to the level of detail or resolution at which data is collected, processed, and analyzed. Choosing the right level of granularity is critical for addressing the unique demands of a telco network.

Detailed metrics, such as per-second packet latency or per-user session throughput, offer deep visibility into network performance. This is useful for diagnosing specific issues, such as jitter in VoIP calls or individual user experience. Aggregated metrics provide a high-level overview, like average network latency per hour. These are suitable for identifying long-term trends but may overlook transient issues.

Observability in telco

Observability is a requirement for telco networks, to ensure compliance and quality of service. Getting observability right, however, isn't easy. Red Hat serves the telco industry, and provides solutions that meet and exceed expectations. For more information about our Telco services visit our Telco industry page.

product trial

Red Hat Advanced Cluster Security Cloud Service | product trial

Red Hat Advanced Cluster Security Cloud Service | product trial

저자 소개

Deepak has been working in RedHat since 2023 as Product Manager for Cloud Telco platforms. Prior to this he has been with Nokia & Ericsson in areas of software development and solution architecture for products in Radio and core networks. His recent interest has been in Telco Observability and the involved AI/ML technology and tooling for the same.

Read full bio
UI_Icon-Red_Hat-Close-A-Black-RGB

채널별 검색

automation icon

오토메이션

기술, 팀, 인프라를 위한 IT 자동화 최신 동향

AI icon

인공지능

고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트

open hybrid cloud icon

오픈 하이브리드 클라우드

하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요

security icon

보안

환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보

edge icon

엣지 컴퓨팅

엣지에서의 운영을 단순화하는 플랫폼 업데이트

Infrastructure icon

인프라

세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보

application development icon

애플리케이션

복잡한 애플리케이션에 대한 솔루션 더 보기

Original series icon

오리지널 쇼

엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리