Measure the Effect of “Cost Threshold for Parallelism”

January 19, 2022

Measure the Effect of “Cost Threshold for Parallelism”

Filed under: Miscelleaneous SQL,SQL Scripts,Technical Articles — Michael J. Swart @ 10:40 am

The configuration setting cost threshold for parallelism has a default value of 5. As a default value, it’s probably too low and should be raised. But what benefit are we hoping for? And how can we measure it?

The database that I work with is a busy OLTP system with lots of very frequent, very inexpensive queries and so I don’t like to see any query that needs to go parallel.

What I’d like to do is raise the configuration cost threshold to something larger and look at the queries that have gone from multi-threaded to single-threaded. I want to see that these queries become cheaper on average. By cheaper I mean consume less cpu. I expect the average duration of these queries to increase.

How do I find these queries? I can look in the cache. The view sys.dm_exec_query_stats can tell me if a query plan is parallel, and I can look into the plans for the estimated cost. In my case, I have relatively few parallel queries. Only about 300 which means the xml parsing piece of this query runs pretty quickly.

Measure the Cost of Parallel Queries

WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT 
	sql_text.[text] as sqltext,
	qp.query_plan,
	xml_values.subtree_cost as estimated_query_cost_in_query_bucks,
	qs.last_dop,
	CAST( qs.total_worker_time / (qs.execution_count + 0.0) as money ) as average_query_cpu_in_microseconds,
	qs.total_worker_time,
	qs.execution_count,
	qs.query_hash,
	qs.query_plan_hash,
	qs.plan_handle,
	qs.sql_handle	
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
CROSS APPLY sys.dm_exec_query_plan (qs.plan_handle) qp
CROSS APPLY 
	(
		SELECT SUBSTRING(st.[text],(qs.statement_start_offset + 2) / 2,
		(CASE 
			WHEN qs.statement_end_offset = -1  THEN LEN(CONVERT(NVARCHAR(MAX),st.[text])) * 2
			ELSE qs.statement_end_offset + 2
			END - qs.statement_start_offset) / 2)
	) as sql_text([text])
OUTER APPLY 
	( 
		SELECT 
			n.c.value('@QueryHash', 'NVARCHAR(30)')  as query_hash,
			n.c.value('@StatementSubTreeCost', 'FLOAT')  as subtree_cost
		FROM qp.query_plan.nodes('//StmtSimple') as n(c)
	) xml_values
WHERE qs.last_dop > 1
AND sys.fn_varbintohexstr(qs.query_hash) = xml_values.query_hash
AND execution_count > 10
ORDER BY xml_values.subtree_cost
OPTION (RECOMPILE);

What Next?

Keep track of the queries you see whose estimated subtree cost is below the new threshold you’re considering. Especially keep track of the query_hash and the average_query_cpu_in_microseconds.
Then make the change and compare the average_query_cpu_in_microseconds before and after. Remember to use the sql_hash as the key because the plan_hash will have changed.
Here’s the query modified to return the “after” results:

Measure the Cost of Those Queries After Config Change

WITH XMLNAMESPACES (DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')
SELECT 
	sql_text.[text] as sqltext,
	qp.query_plan,
	xml_values.subtree_cost as estimated_query_cost_in_query_bucks,
	qs.last_dop,
	CAST( qs.total_worker_time / (qs.execution_count + 0.0) as money ) as average_query_cpu_in_microseconds,
	qs.total_worker_time,
	qs.execution_count,
	qs.query_hash,
	qs.query_plan_hash,
	qs.plan_handle,
	qs.sql_handle	
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) st
CROSS APPLY sys.dm_exec_query_plan (qs.plan_handle) qp
CROSS APPLY 
	(
		SELECT SUBSTRING(st.[text],(qs.statement_start_offset + 2) / 2,
		(CASE 
			WHEN qs.statement_end_offset = -1  THEN LEN(CONVERT(NVARCHAR(MAX),st.[text])) * 2
			ELSE qs.statement_end_offset + 2
			END - qs.statement_start_offset) / 2)
	) as sql_text([text])
OUTER APPLY 
	( 
		SELECT 
			n.c.value('@QueryHash', 'NVARCHAR(30)')  as query_hash,
			n.c.value('@StatementSubTreeCost', 'FLOAT')  as subtree_cost
		FROM qp.query_plan.nodes('//StmtSimple') as n(c)
	) xml_values
WHERE qs.query_hash in ( /* put the list of sql_handles you saw from before the config change here */ )
AND sys.fn_varbintohexstr(qs.query_hash) = xml_values.query_hash
ORDER BY xml_values.subtree_cost
OPTION (RECOMPILE);

What I Found

In general, increasing the threshold from 5 –> 50 generally had a very good effect on those queries that went from multithreaded to singlethreaded.
Over half of the queries improved at least 1 order of magnitude (and a couple improved 3 orders of magnitude!) https://t.co/iELhiSr7On pic.twitter.com/W4Mv7hGVlz

— Michael J Swart (@MJSwart) January 19, 2022

-- Comments (10)

10 Comments »

[…] Michael Swart gets out the ruler: […]

Pingback by Measuring Cost Threshold for Parallelism Effectiveness – Curated SQL — January 20, 2022 @ 8:00 am
Well… This is a bit awkward…

https://eitanblumin.com/2018/11/06/planning-to-increase-cost-threshold-for-parallelism-like-a-smart-person/

Comment by Eitan Blumin — January 31, 2022 @ 2:10 am
Any suggestions on how to do this when all the procedures are encrypted?

Comment by Shaun Austin — January 31, 2022 @ 4:09 am
Brilliant….adding this to my toolkit!

Comment by Kevin3NF — January 31, 2022 @ 11:14 am
But what benefit are we hoping for? And how can we measure it?

Increasing the threshold also has another effect.
Compilation time should be faster. There is no third step – looking for a parallel faster plan.

Try to measure it 🙂

Comment by Wojciech Sawicki — February 1, 2022 @ 2:04 am
Hello Michael great post, thank you!

But the question that always pops up in my head regarding the cost threshold for parallelism is:
In which scenario will this really make a difference? Isn’t this just a scenario with very very many very small querys running on the server?
In any other configuration the mayority of the workload running on the server is made of bigger querys way above 50 query bucks. At least for me this Tuning just a few small querys which are done in no time anyway will not help much, will it?

So what I really would like to measure first is how the workload of small (and parallel) querys is affecting the total workload. Is there any good way to do this?

Comment by Rob — February 1, 2022 @ 4:04 am
@Rob
So for my workload the benefit isn’t necessarily for the end user’s benefit. It’s for the server. For queries that are executed frequently, small improvements have a large impact.

Changing this setting may not be a be a benefit for you but that’s the whole point of the post. You can measure and find out.

You can modify the query above to focus on `total_worker_time` to get a sense of a query’s proportion of the total workload.
Also see https://michaeljswart.com/go/top20

Comment by Michael J. Swart — February 1, 2022 @ 7:41 am
[…] My dear friend Michael J Swart has a Great Post about measuring changes to it here. […]

Pingback by Understand Your Plan: When Queries Go Parallel – Erik Darling Data — March 9, 2022 @ 10:07 am
[…] Cost Threshold for Parallelism […]

Pingback by Video: Configuring a SQL Server Instance – 36 Chambers – The Legendary Journeys: Execution to the max! — April 18, 2023 @ 9:47 am
[…] thoughts are reasonable overall. My recommendation is to use Michael J. Swart’s technique for tuning cost threshold for parallelism as a starting point, as it gives you a basis for what the net effect of your changes […]

Pingback by Thoughts on Cost Threshold for Parallelism – Curated SQL — September 28, 2023 @ 8:00 am