I am writing this blog series because I have noticed that there are not many comprehensive blogs written about SQL Server wait statistics analysis, how it really works, and how it should be analyzed. However, this topic is crucial because a large part of SQL Server symptoms can be diagnosed through wait statistics. Wait stats are like a doctor’s toolkit for analyzing the symptoms of a patient in pain; they help narrow down potential diagnoses, deepen our understanding of the root cause of the problem, and, at best, even directly resolve the issue.
Wait Statistics – What Is It?
When SQL Server is running, it is practically always waiting for some resource—be it a network, disk operation, table lock, processor execution, or something else. This happens because SQL Server’s operation consists, in a highly simplified manner, of numerous parallel tasks allocated to processor execution threads (worker threads). These worker threads strive to ensure that all tasks are executed with a certain priority. Sometimes, congestion occurs on a specific resource, such as the network or storage system, causing delays in the execution of threads. This is precisely the type of statistics that reveal, through a comparison between normal conditions (baseline) and snapshots, which values deviate from the norm. SQL Server collects this data in its system views with millisecond accuracy.
This is directly analogous to blood tests! However, SQL Server’s wait stats diagnostics differ from human blood test diagnostics mainly in that each SQL Server has a unique baseline, unlike humans who have standardized “ideal values,” such as for hemoglobin levels. This is because wait stats are relative: networks, servers, storage configurations, instance configurations, and database settings differ; SQL Server versions and editions vary, and database schemas, contents, and usage patterns are almost always environment-specific.
Although this topic is critical for understanding SQL Server’s inner workings, many developers, Data Engineers, or even DBAs who work with databases daily are not sufficiently familiar with it. Perhaps they should be. A word of caution, however: this is a multifaceted and sometimes complex topic, requiring patience and time to grasp. SQL Server 2022, for instance, has over 1,000 different wait stats, with more being added in every version (SQL Server 2017: 921 wait stats, SQL Server 2022: 1,335 wait stats). However, you do not need to know all of them, as only a few dozen key wait stats are crucial for SQL Server performance diagnostics.
SQL Server Diagnostics in General
Diagnosing SQL Server performance involves considering a broad range of factors, such as:
- Server and virtualization platform
- Network and storage solutions
- Operating system configurations
- SQL Server instances and database settings
- Indexing, statistics, and database queries
- Wait statistics
- Plan cache
- Query tracing
Analyzing wait statistics is an excellent way to start diagnosing SQL Server issues, especially when the root problem is unknown. Of course, system configurations should also be reviewed, as suboptimal settings and misconfigurations can significantly impact overall system performance. These are often root causes rather than “symptoms” that wait stats typically help identify.
While diagnosing wait stats provides a comprehensive overview of the system, there is a catch: since each SQL Server has a unique baseline, it can be difficult to determine what is abnormal once a problem has already surfaced. Additionally, because SQL Server resets wait stats every time the instance restarts, it is advisable—especially for business-critical database servers—to use external SQL Server monitoring/performance optimization software, such as SQL Governor. This allows for real-time wait stats snapshots to be compared against previously established baselines.
A Brief History of Wait Statistics
The first version of SQL Server was released in 1989 for the OS/2 platform. Microsoft collaborated with Sybase on SQL Server development until 1995. When Microsoft and Sybase parted ways with SQL Server 6.0, wait stats became the mechanism by which the SQL Server Operating System (SQLOS) recorded internal process changes over time. Since SQL Server 2005, wait stats have been accessible via Dynamic Management Views (DMVs), replacing the previously used DBCC queries.
To inspect your SQL Server’s wait stats, you can use the following system view:
SELECT * FROM sys.dm_os_wait_stats;
SQLOS – A Peek Under the Hood
Before SQL Server 2005, its underlying platform was quite limited, and many of SQL Server’s operations were performed at the operating system level. This made it difficult for SQL Server to keep up with hardware advancements and led to OS version dependency, restricting the ability to leverage new, more powerful hardware features. To address this, SQL Server 2005 introduced a completely new platform layer called SQLOS, which is highly configurable. Its primary role is to bypass the operating system layer and maximize the utilization of both current and future hardware platforms.
Why SQLOS?
Windows OS uses a “preemptive scheduling” approach. This means that each process requiring CPU time receives a time slice, called a “quantum,” from Windows. A process’s priority is determined by a complex formula factoring in resource usage, expected execution time, activity level, etc. Preemptive scheduling allows Windows to interrupt one process if another higher-priority process needs CPU time.
Such a scheduling mechanism would be detrimental to SQL Server’s performance because SQL Server could easily be deprioritized by another higher-priority application. Therefore, SQL Server uses its own non-preemptive scheduling mechanism within SQLOS, ensuring that the operating system does not interfere when it is not needed.
Conclusion
In this blog entry, we learned that wait statistics play a crucial role in SQL Server performance diagnostics. Additionally, we discovered that SQLOS makes all of this possible. Next, you should understand the essential components of SQLOS, such as Schedulers, Tasks, and Worker Threads. Their functional principles must be grasped before delving deeper into the fascinating world of wait stats. That will be the topic of my next blog entry!
Jani K. Savolainen