Continuing my series on advanced performance troubleshooting – see these two posts for the scripts I’ll be using and an introduction to the series:
In this blog post I’d like to show you an example of SOS_SCHEDULER_YIELD waits occurring and how it can seem like a spinlock is the cause.
(Note: you can read more about SOS_SCHEDULER_YIELD waits in my sqlperformance.com blog post here.)
I originally published this blog post and then had a discussion with good friend Bob Ward from Product Support who questioned my conclusions given what he’s seen (thanks Bob!). After digging in further, I found that my original post was incorrect, so this is the corrected version.
The SOS_SCHEDULER_YIELD wait means that an executing thread voluntarily gave up the CPU to allow other threads to execute. The SQL Server code is sprinkled with “voluntary yields” in places where high CPU usage may occur.
One such place where a thread will sleep but not explicitly yield is when backing off after a spinlock collision waiting to see if it can get access to the spinlock. A spinlock is a very lightweight synchronization mechanism deep inside SQL Server that protects access to a data structure (not database data itself). See the end of the second blog post above for more of an explanation on spinlocks.
When a thread voluntarily yields, it does not go onto the waiting tasks list – as it’s not waiting for anything – but instead goes to the bottom of the Runnable Queue for that scheduler. SOS_SCHEDULER_YIELD waits by themselves are not cause for concern unless they are the majority of waits on the system, and performance is suffering.
To set up the test, I’ll create a simple database and table:
?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
USE [master];
GO
DROP DATABASE [YieldTest];
GO
 
CREATE DATABASE [YieldTest];
GO
USE [YieldTest];
GO
 
CREATE TABLE [SampleTable] ([c1] INT IDENTITY);
GO
CREATE NONCLUSTERED INDEX [SampleTable_NC] ON [SampleTable] ([c1]);
GO
 
SET NOCOUNT ON;
GO
INSERT INTO [SampleTable] DEFAULT VALUES;
GO 100
Then I’ll clear out wait stats and latch stats:
?
1
2
3
4
5
DBCC SQLPERF (N'sys.dm_os_latch_stats', CLEAR);
GO
 
DBCC SQLPERF (N'sys.dm_os_wait_stats', CLEAR);
GO
And then fire up 50 clients running the following code (I just have a CMD script that fires up 50 CMD windows, each running the T-SQL code):
?
1
2
3
4
5
6
7
8
9
10
11
USE [YieldTest];
GO
SET NOCOUNT ON;
GO
 
DECLARE @a INT;
WHILE (1=1)
BEGIN
    SELECT @a = COUNT (*) FROM [YieldTest]..[SampleTable] WHERE [c1] = 1;
END;
GO
And the CPUs in my laptop are jammed solid (snapped from my desktop and rotated 90 degrees to save space):
Wow!
Looking at perfmon:
The CPU usage is not Windows (% User Time), and of all the counters I usually monitor (I’ve cut them all out here for clarity) I can see a sustained very high Lock Requests/sec for Object and Page locks (almost 950 thousand requests per second for both types! Gotta love my laptop :-)
So let’s dig in. First off, looking at wait stats (using the script in the wait stats post referenced above):
?
WaitType            Wait_S  Resource_S Signal_S WaitCount Percentage AvgWait_S AvgRes_S AvgSig_S
------------------- ------- ---------- -------- --------- ---------- --------- -------- --------
SOS_SCHEDULER_YIELD 4574.77 0.20       4574.57  206473    99.43      0.0222    0.0000   0.0222
SOS_SCHEDULER_YIELD at almost 100% of the waits on the system means that I’ve got CPU pressure – as I saw from the CPU graphs above. The fact that nothing else is showing up makes me suspect this is a spinlock issue.
Checking in the sys.dm_os_waiting_tasks output (see script in the second blog post referenced above), I see nothing waiting, and if I refresh a few times I see the occasional ASYNC_NETWORK_IO and/or PREEMPTIVE_OS_WAITFORSINGLEOBJECT wait type – which I’d expect from the CMD windows running the T-SQL code.
Checking the latch stats (again, see script in the second blog post referenced above) shows no latch waits apart from a few BUFFER latch waits.
So now let’s look at spinlocks. First off I’m going to take a snapshot of the spinlocks on the system:
?
1
2
3
4
5
6
7
8
9
10
-- Baseline
IF EXISTS (SELECT * FROM [tempdb].[sys].[objects]
    WHERE [name] = N'##TempSpinlockStats1')
    DROP TABLE [##TempSpinlockStats1];
GO
SELECT * INTO [##TempSpinlockStats1]
FROM sys.dm_os_spinlock_stats
WHERE [collisions] > 0
ORDER BY [name];
GO
(On 2005 you’ll need to use DBCC SQLPERF (‘spinlockstats’) and use INSERT/EXEC to get the results into a table – see the post above for example code.)
Then wait 10 seconds or so for the workload to continue… and then grab another snapshot of the spinlocks:
?
1
2
3
4
5
6
7
8
9
10
-- Capture updated stats
IF EXISTS (SELECT * FROM [tempdb].[sys].[objects]
    WHERE [name] = N'##TempSpinlockStats2')
    DROP TABLE [##TempSpinlockStats2];
GO
SELECT * INTO [##TempSpinlockStats2]
FROM sys.dm_os_spinlock_stats
WHERE [collisions] > 0
ORDER BY [name];
GO
Now running the code I came up with to show the difference between the two snapshots:
?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
-- Diff them
SELECT
    '***' AS [New],
    [ts2].[name] AS [Spinlock],
    [ts2].[collisions] AS [DiffCollisions],
    [ts2].[spins] AS [DiffSpins],
    [ts2].[spins_per_collision] AS [SpinsPerCollision],
    [ts2].[sleep_time] AS [DiffSleepTime],
    [ts2].[backoffs] AS [DiffBackoffs]
FROM [##TempSpinlockStats2] [ts2]
LEFT OUTER JOIN [##TempSpinlockStats1] [ts1]
    ON [ts2].[name] = [ts1].[name]
WHERE [ts1].[name] IS NULL
UNION
SELECT
    '' AS [New],
    [ts2].[name] AS [Spinlock],
    [ts2].[collisions] - [ts1].[collisions] AS [DiffCollisions],
    [ts2].[spins] - [ts1].[spins] AS [DiffSpins],
    CASE ([ts2].[spins] - [ts1].[spins]) WHEN 0 THEN 0
        ELSE ([ts2].[spins] - [ts1].[spins]) /
            ([ts2].[collisions] - [ts1].[collisions]) END
            AS [SpinsPerCollision],
    [ts2].[sleep_time] - [ts1].[sleep_time] AS [DiffSleepTime],
    [ts2].[backoffs] - [ts1].[backoffs] AS [DiffBackoffs]
FROM [##TempSpinlockStats2] [ts2]
LEFT OUTER JOIN [##TempSpinlockStats1] [ts1]
    ON [ts2].[name] = [ts1].[name]
WHERE [ts1].[name] IS NOT NULL
    AND [ts2].[collisions] - [ts1].[collisions] > 0
ORDER BY [New] DESC, [Spinlock] ASC;
GO

?
New  Spinlock                        DiffCollisions DiffSpins  SpinsPerCollision DiffSleepTime DiffBackoffs
---- ------------------------------- -------------- ---------- ----------------- ------------- ------------
     LOCK_HASH                       6191134        4005774890 647               686           1601383
     OPT_IDX_STATS                   1164849        126549245  108               57            7555
     SOS_OBJECT_STORE                73             305        4                 0             0
     SOS_WAITABLE_ADDRESS_HASHBUCKET 115            44495      386               0             3
     XDESMGR                         1              0          0                 0             0
This is telling me that there is massive contention for the LOCK_HASH spinlock, further backed up by theSOS_WAITABLE_ADDRESS_HASHBUCKET spinlock. The LOCK_HASH spinlock protects access to the hash buckets used by the lock manager to efficiently keep track of the lock resources for locks held inside SQL Server and to allow efficient searching for lock hash collisions (i.e. does someone hold the lock we want in an incompatible mode?). In this case the contention is so bad that instead of just spinning, the threads are actually backing off and letting other threads execute to allow progress to be made.
And that makes perfect sense because of what my workload is doing – 50 concurrent connections all trying to read the same row on the same page in the same nonclustered index.
But is that the cause of the SOS_SCHEDULER_YIELD waits? To prove it one way or the other, I created an Extended Event session that would capture call stacks when a wait occurs:
?
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
-- Note that before SQL 2012, the wait_type to use is 120, and
-- on 2014 SP1 the wait_type to use is 123. You MUST verify the
-- map_value to use on your build.
-- On SQL 2012 the target name is 'histogram' but the old name still works.
CREATE EVENT SESSION [MonitorWaits] ON SERVER
ADD EVENT [sqlos].[wait_info]
    (ACTION ([package0].[callstack])
    WHERE [wait_type] = 124) -- SOS_SCHEDULER_YIELD only
ADD TARGET [package0].[asynchronous_bucketizer] (
    SET filtering_event_name = N'sqlos.wait_info',
    source_type = 1, -- source_type = 1 is an action
    source = N'package0.callstack') -- bucketize on the callstack
WITH (MAX_MEMORY = 50MB, max_dispatch_latency = 5 seconds)
GO
 
-- Start the session
ALTER EVENT SESSION [MonitorWaits] ON SERVER STATE = START;
GO
 
-- TF to allow call stack resolution
DBCC TRACEON (3656, -1);
GO
 
-- Let the workload run for a few seconds
 
-- Get the callstacks from the bucketizer target
-- Are they showing calls into the lock manager?
SELECT
    [event_session_address],
    [target_name],
    [execution_count],
    CAST ([target_data] AS XML)
FROM sys.dm_xe_session_targets [xst]
INNER JOIN sys.dm_xe_sessions [xs]
    ON ([xst].[event_session_address] = [xs].[address])
WHERE [xs].[name] = N'MonitorWaits';
GO
 
-- Stop the event session
ALTER EVENT SESSION [MonitorWaits] ON SERVER STATE = STOP;
GO
I also made sure to have the correct symbol files in the \binn directory (see How to download a sqlservr.pdb symbol file). After running the workload and examining the callstacks, I found the majority of the waits were coming from voluntary yields deep in the Access Methods code. An example call stack is:
?
IndexPageManager::GetPageWithKey+ef [ @ 0+0x0
GetRowForKeyValue+146 [ @ 0+0x0
IndexRowScanner::EstablishInitialKeyOrderPosition+10a [ @ 0+0x0
IndexDataSetSession::GetNextRowValuesInternal+7d7 [ @ 0+0x0
RowsetNewSS::FetchNextRow+12a [ @ 0+0x0
CQScanRangeNew::GetRow+6a1 [ @ 0+0x0
CQScanCountStarNew::GetRowHelper+44 [ @ 0+0x0
CQScanStreamAggregateNew::Open+70 [ @ 0+0x0
CQueryScan::Uncache+32f [ @ 0+0x0
CXStmtQuery::SetupQueryScanAndExpression+2a2 [ @ 0+0x0
CXStmtQuery::ErsqExecuteQuery+2f8 [ @ 0+0x0
CMsqlExecContext::ExecuteStmts<1,1>+cca [ @ 0+0x0
CMsqlExecContext::FExecute+58b [ @ 0+0x0
CSQLSource::Execute+319
[Edit: check out how to do this in the spinlock whitepaper]
This is clearly (to me) nothing to do with the LOCK_HASH spinlock, so that’s a red herring. In this case, I’m just CPU bound. When a thread goes to sleep when backing off from a spinlock, it directly calls Windows Sleep() – so it does not show up as a SQL Server wait type at all, even though the sleep call is made from the SQL OS layer. Which is a bummer.
How to get around that and reduce CPU usage? This is really contrived workload, but this can occur for real. Even if I try using WITH (NOLOCK), the NOLOCK seek will take a table SCH_S (schema-stability) lock to make sure the table structure doesn’t change while the seek is occurring, so that only gets rid of the page locks, not the object locks and doesn’t help CPU usage. With this (arguably weird) workload, there are a few things I could do (just off the top of my head):
  • Enable read_commited_snapshot for the database, which will reduce the LOCK_HASH spinlock backoffs
  • Scale-out the connections to a few copies of the database, updated using replication
  • Do some mid-tier or client-side caching, with some notification mechanism of data changes (e.g. a DDL trigger firing a Service Broker message)
So this was an example of where wait stats lead to having to look at spinlock stats, but that the spinlock, on even *deeper* investigation, wasn’t the issue at all. Cool stuff.
Next time we’ll look at another cause of SOS_SCHEDULER_YIELD waits.