Maximum Simultaneous User Connections

October 16, 2020

Filed under: SQLServerPedia Syndication — Michael J. Swart @ 12:00 pm

Scaling SQL Server High
The beginning of the school year is behind us and what a semester start! 2020 has been tough on many of us and I’m fortunate to work for a company whose services are in such high demand. In fact we’ve seen some scaling challenges like we’ve never seen before. I want to talk about some of them.

Maximum Simultaneous User Connections
In Memory OLTP Defeated Our Tempdb Problems
Detect Excessive Spinlock Contention on SQL Server

A Really Busy September

It’s been a whirlwind of a month. The first day of school, or a new semester is always a busy time for online education. And in 2020 that’s an understatement. This year, we had to answer how busy could one SQL Server get?

We’ve always approached this question using the usual techniques

Cache like crazy (the cheapest query is the one that doesn’t have to be run)
Tackle expensive queries (CPU, IO)
Tackle large wait categories (so much easier said than done)
Offload reporting queries to other servers.

This year we smashed records and faced some interesting challenges along the way. I want to talk about those challenges in the next few blog posts. This first post is about user connections. Something I’ve never really paid much attention to before now.

A Metric I’d Never Thought I’d Have To Worry About

The maximum number of user connections that SQL Server can support is 32,767. That’s it. That’s the end of the line. You can buy faster I.O. or a server with more CPUs but you can’t buy more connections.

I actually mentioned this limit in the post where I introduced Swart’s 10% rule: “If you’re using over 10% of what SQL Server restricts you to, you’re doing it wrong” In that post, I was guarded about that statement as it applied to the user connection limit. But I’d like to upgrade that to elevated.

With such a hard limit, it’s important to watch this metric carefully. You can do that with the performance counter SQLServer: General Statistics – User Connections or with this query:

SELECT   ISNULL(DB_NAME(database_id), 'Total On Server') AS DatabaseName, 
         COUNT(*) AS Connections,
         COUNT(DISTINCT host_name) ClientMachines
FROM     sys.dm_exec_sessions
WHERE    host_name IS NOT NULL
GROUP BY ROLLUP( database_id )

The Maximum is 32,767
If you haven’t changed the maximum number of user connections by some method like sp_configure 'user connections', then the default is 0 and @@MAX_CONNECTIONS will return 32,767. I think the UI for this property is a bit misleading, 0 is absolutely not equivalent to unlimited.

What Does Trouble Look Like?

The issue shows up on the client when it’s unable to establish a connection to the server. There’s a variety of errors you might see such as:

Client unable to establish connection because an error was encountered during handshakes before login. Common causes include client attempting to connect to an unsupported version of SQL Server, server too busy to accept new connections or a resource limitation (memory or maximum allowed connections) on the server.
A connection was successfully established with the server, but then an error occurred during the pre-login handshake. (provider: TCP Provider, error: 0 – An existing connection was forcibly closed by the remote host.)
Or simply the generic:
A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 40 – Could not open a connection to SQL Server)

It’s Almost Always a Symptom

It’s often a symptom and not the root cause. For example, imagine you have a busy server with lots of clients and there’s a sudden slowdown.

Perhaps the slowdown is an increase in a certain wait category like blocking (LCK_M_X) or perhaps the slowdown is an increase in CPU because of a bad query plan. Either way, queries suddenly require a longer duration. They can’t complete quickly and remain open. So any new queries that come along have to make brand new connections (because there are no available connections in the connection pool). Maybe THREADPOOL waits start to pile up. And now even more sessions are waiting and more sessions get created. Then you might reach the maximum number of connections in this scenario.

But decent monitoring will often identify the actual root cause here (the blocking or whatever). And it’s actually easy to ignore the user connection limit because it wasn’t really the bottleneck.

But Sometimes it’s the Root Cause

Once we’ve tackled all the other scaling challenges, and the demand gets cranked up to 11. Then we can see trouble.

In our case. We have an elastic scaling policy that allows us to spin up client machines in response to demand. So hundreds of web servers times the number connection pools times the number of connections in each pool can really add up fast. So those web servers scale really nicely, but without a decent sharding strategy, SQL Server doesn’t.

On one of our servers, this is where we’re sitting now, and the number of connections is uncomfortably high. There’s not a lot of room to tolerate bursts of extra activity here:

The red lines indicate 10% and 100% of the maximum connection count.

What We Can Do About It

Monitor the number of connections that is typical for your servers using the query above for a point in time. Or use the performance counter: SQLServer: General Statistics – User Connections
Make sure you’re not leaking any connections. This SQLPerformance post tells you how to find that.
Use connection pooling efficiently. This Microsoft article SQL Server Connection Pooling (ADO.NET) has some great tips on avoiding pool fragmentation. The article describes pool fragmentation as a web server issue. But the tips are also appropriate to minimizing the total user connections on the server.
Keep the variety of connection strings used by your application small. With connection pooling, connections are only made when they’re needed. But still there’s a little bit of overhead. It takes 4 to 8 minutes for an idle connection to be released by the pool. So minimizing the number of connection pools actually does help.
Queries should be as quick as possible. Get in and out.
- So be quick about reading the data you asked for (i.e. avoid C#’s yield if you can)
- Offload reads as much as possible to other servers. Availability groups have read-only routing features but be careful how you implement this. If you have some connection strings that use ApplicationIntent=ReadOnly and some that don’t, then that’s two different connection pools. If you want to defer configuring AGs until after the connection strings are done, then there can be some tricky overlapping scenarios. It’s complicated, and it just highlights the importance of monitoring.
- When tuning queries. The metric to focus on is the total duration of queries. Try to minimize that number. That’s the total_elapsed_time column in sys.dm_exec_query_stats. Or the elapsed time query in http://michaeljswart.com/go/top20. This is an interesting one. I’ve always preferred to focus on optimizing CPU or logical reads. But in this case, connection count is most sensitive to long running queries no matter the reason.

I’d love to hear about others who have tackled this problem. Let me know what strategies you came up with.

-- Comments (2)

2 Comments »

[…] Michael J. Swart applies Swart’s 10% Rule to maximum simultaneous user connections: […]

Pingback by Swart’s Ten Percent Rule: User Connections – Curated SQL — October 19, 2020 @ 8:05 am
Definitely something to watch out for! Thanks for the reminder.

Comment by Alex Friedman — October 20, 2020 @ 6:05 am

RSS feed for comments on this post. TrackBack URL

Michael J. Swart

October 16, 2020