If you’re managing databases, you need to be able to identify performance bottlenecks and figure out how to address them. This often involves pinpointing the exact component in the complex web of interconnected dependencies that is causing the issue.
If you’re a Microsoft Windows administrator, the tool to start with is obvious, it’s called perfmon. However, if you’re one of the nearly 1/3rd of respondents to a recent survey who are running SQL Server on Linux, you probably want to consider using Performance Co-Pilot (PCP) and Grafana on Red Hat Enterprise Linux (RHEL), which brings together SQL Server and RHEL metrics in a single easy-to-use dashboard.
When it comes to measuring performance on RHEL, PCP is Red Hat’s preferred choice for collecting metrics. With Grafana becoming a popular open source analytics and monitoring interface for many use-cases, you can use it in RHEL to visualize metrics collected by PCP. Grafana can be configured to show metrics over a specific time range, and the metrics can be exported if you wish to your favorite spreadsheet.
Grafana also comes ready to handle several data sources, and the RHEL PCP-Grafana integration provides vector, redis, and bpftrace as data sources. These integrations also bring several pre-configured dashboards or allow you to customize your own dashboard based on your particular use case.
Since RHEL 8.2, PCP has had support for SQL Server metrics. You can learn more about how to configure it here. RHEL 8.4 adds several changes:
-
The Python dependency
pyodbc
is now available as an RPM, which is automatically brought in as a dependency when installing thepcp-pmda-mssql
package. This means no more need to use pip to install pyodbc. -
There is a new PCP vector based Microsoft SQL Server dashboard. To enable this, go to Data Sources->PCP Vector->Dashboards as shown in the screenshot below, and click import. This imported dashboard can then be accessed from the dashboards section in Grafana found at http://<server-name>:3000/dashboards
For more information on how to set up PCP with Grafana and PCP Vector, please see: Visualizing system performance with RHEL 8 using Performance Co-Pilot (PCP) and Grafana (Part 1).
Using this dashboard, RHEL administrators and SQL Server DBAs can spot key metrics that might be slowing down the performance of their SQL Server workloads on RHEL. -
If you want to automate the installation of PCP for SQL Server, you can also use the Performance Metrics Role for configuring PCP for SQL Server. More information about it can be found here.
Exploring the new Microsoft SQL Server Dashboard
Figuring out where the performance bottlenecks are within the operating system is a challenging problem by itself, and running a database with layers of abstraction on top of it, adds additional complexities.
Operating system and database performance metrics visualizations are available through tools like the new PCP Vector-based Microsoft SQL Server Dashboard in RHEL 8.4. This provides easier performance data correlation and helps to get to the root cause of performance issues more quickly than before.
The dashboard brings together information from the Linux kernel, SQLOS, and the SQL Server database layer and then displays it in Grafana panels using PCP vector graphs.
Tracking CPU Usage and Memory Utilization
After you bring up the Dashboard, what can you do with it?
For starters, we recommend you start by looking at CPU and memory metrics. Looking at the Linux Run Queue, System: % CPU and CPU Utilization metrics can help you identify runaway processes and unanticipated system activity. If your CPUs are highly saturated, it’s time to start looking at the processes on the system using command-line tools like pcp atop
to identify processes hogging CPU or memory on the system.
From a SQL Server perspective, the SQL wait statistics are pretty important to help you track down CPU and memory utilization issues. A high number of SOS_SCHEDULAR_YIELD or THREADPOOL might indicate that your workload needs additional processing resources.
Similarly, SQL Server: Memory Manager metrics can help you identify how much memory SQL Server is working with overall, as well as how much memory is being “stolen” for purposes other than the SQL page cache. An usually high number for stolen server memory indicates memory pressure in the database system, and fixing it might require a change either to how SQL Server is utilizing its memory or increasing the amount of memory provided by the underlying OS platform.
Tracking Disk Utilization
Memory problems can trigger disk problems, so we need to investigate this area next. Without enough memory, disk activity can rapidly increase because we’re paging memory out to disk (swapping).
You’ll have already determined whether you’re swapping if you’ve gone to use the pcp atop
command we described above.
Perhaps the most interesting thing you can look at with Disk Utilization is whether you are running out of throughput or IOPS.
You can do some basic testing to confirm the maximum capability of your devices. For this, fio
is a great tool for generating synthetic workloads. If you know the max throughput and max input/output operations per second (IOPS) you can then use that knowledge to determine when to add more storage via RAID, move to faster storage, add caching to fast storage tiers such as those using Intel’s Optane DC Persistent Memory, or when to split up your data stores.
SQL statistics can also help you understand storage contention issues. For example, you can look at the SQL OS Wait Stats and examine the PAGE_I/O_LATCH metric. Some number of waits for SQL Server is typically acceptable but an order of magnitude difference in this statistic versus other wait types might indicate a problem that needs fixing by moving to faster storage.
That said, there is also the possibility that you have a database tuning problem such as a poorly designed index that is spiking up disk I/O.
We have now covered some of the key metrics. In this table we include several more useful metrics listed in the order of their appearance in the dashboard panels:
Linux: Run Queue |
This panel displays the number of processes that are marked runnable by the kernel at any given time. |
System: % CPU |
Shows the percentage of CPU being used by sys (kernel) and user processes. |
Per CPU Busy (User) |
Shows the percentage of each CPU busy running processes in user space. |
Per CPU Busy (Sys) |
Shows the percentage of each CPU busy running processes in sys (kernel) space. |
Disk Utilization |
Represents the overall utilization of each disk device on the system. |
Disk Throughput |
Represents read and write throughput. |
Disk IOPS |
Represents read and write IOs per second. |
SQL Server: General Statistics – User Connections |
The number of connected users, Running PCP will require a user connection. A second user is typically consumed for health monitoring if running an Availability Group or Failover Cluster using the Red Hat High Availability add-on. |
SQL Server: Memory Manager |
Total, stolen and reserved server memory as seen by SQL Server. |
SQL Server Statistics: Batch Requests |
Batch requests, compilations and recompilations per second. |
SQL Server: Network IO Waits |
Average network wait time in milliseconds, number of waits in progress, waits started per second and cumulative wait time in milliseconds per second. All statistics relevant to wait on network I/O. |
SQL Server: Access Methods Page Splits |
Number of page splits per second that occur as the result of overflowing index pages. |
SQL Server: Statistics Compilations |
Number of compilations and recompilations. |
SQL Server: Plan Cache (Cache Hit Ratio) |
Cache hit ratio for temporary tables and their variables, extended stored procedures, bound trees, SQL plans and Object plans along with the total cache hit ratio. |
SQL OS Waiting Tasks |
Number of waits for each wait type. This counter is incremented at the start of each wait. |
SQL OS Maximum Wait Stats |
Maximum wait time for each wait type. |
SQL OS Wait Times |
Total wait time for this wait type in milliseconds. |
SQL Server Latch Waits |
Internal SQL Server resource locks known as latches can help you to determine user activity and resource usage and identify performance bottlenecks. This metric tracks the number of latch requests that cannot be granted immediately. |
SQL Server Latch Wait Times |
This metric tracks the total time (in milliseconds) spent waiting for latch requests in the last second. |
Data annotations and sharing
Annotations in Grafana provide a way to mark points on the graph, linking data metrics with stories that you want to tell with your data. You can also easily export and share your PCP metrics using Grafana with other administrators so that they can use the same dashboard to view and troubleshoot performance problems.
Conclusion
Now that you have another performance troubleshooting tool for SQL Server on RHEL in your toolbox, you should start using it. A good place to start in understanding the Red Hat side of the equation is the latest RHEL documentation on Managing System Status and Performance.
You can also learn more about SQL Server performance metrics through Microsoft’s SQL Server Performance Monitoring and Tuning Tools documentation. The new Grafana SQL Server dashboard in RHEL 8.4 is off to a great start, bringing together useful statistics from RHEL and SQL Server for monitoring.
執筆者紹介
Don Pinto is a Technical Product Marketing Manager at Red Hat focused on helping customers understand why Red Hat Enterprise Linux is an ideal operating system platform for modern application workloads. Pinto is passionate about data management and operating systems, having authored several technical blogs and white papers on various tech topics. Pinto holds a Masters degree in Computer Science and a Bachelor's degree in Computer Engineering from the University of Toronto, Canada.
Louis Imershein is a Product Manager at Red Hat focussed on Microsoft SQL Server and database workloads. He is responsible for working with Microsoft and Red Hat engineering to ensure that SQL Server performance, management, and security is optimized for Red Hat platforms. For more than 30 years, Louis has worked in technical support, engineering, software architecture, and product management on a wide range of OS, management, security, and storage software projects. Louis joined Red Hat as part of the acquisition of Permabit Technology Corporation, where he was VP of Product.
チャンネル別に見る
自動化
テクノロジー、チームおよび環境に関する IT 自動化の最新情報
AI (人工知能)
お客様が AI ワークロードをどこでも自由に実行することを可能にするプラットフォームについてのアップデート
オープン・ハイブリッドクラウド
ハイブリッドクラウドで柔軟に未来を築く方法をご確認ください。
セキュリティ
環境やテクノロジー全体に及ぶリスクを軽減する方法に関する最新情報
エッジコンピューティング
エッジでの運用を単純化するプラットフォームのアップデート
インフラストラクチャ
世界有数のエンタープライズ向け Linux プラットフォームの最新情報
アプリケーション
アプリケーションの最も困難な課題に対する Red Hat ソリューションの詳細
オリジナル番組
エンタープライズ向けテクノロジーのメーカーやリーダーによるストーリー
製品
ツール
試用、購入、販売
コミュニケーション
Red Hat について
エンタープライズ・オープンソース・ソリューションのプロバイダーとして世界をリードする Red Hat は、Linux、クラウド、コンテナ、Kubernetes などのテクノロジーを提供しています。Red Hat は強化されたソリューションを提供し、コアデータセンターからネットワークエッジまで、企業が複数のプラットフォームおよび環境間で容易に運用できるようにしています。
言語を選択してください
Red Hat legal and privacy links
- Red Hat について
- 採用情報
- イベント
- 各国のオフィス
- Red Hat へのお問い合わせ
- Red Hat ブログ
- ダイバーシティ、エクイティ、およびインクルージョン
- Cool Stuff Store
- Red Hat Summit