Engineering

TCP Backlog is a simple concept, yet it may concern you

Some time ago I used to work on a legacy project where we had significant peak loads. During this peak loads, some of the clients experienced “Connection refused” errors while our monitoring showed we still had enough CPU time, RAM and network bandwidth.

We came to an idea that we may need more workers in our Gunicorn server (in our case Gunicorn workers is threads handling incoming connections). But I believe any optimisation should be measurement-based and there should be a way to tell if there is a need for more workers, since it is never clear how many to create - 100, 1000? Let’s suppose we’ve created 1000 workers and it’s sufficient right now - how can we be sure it will be enough during peak load? Yes, it could be possible to add 999999 workers in auto-scaling mode but we will risk running out of RAM on peak loads and experience unexpected side-effects due of OOM randomly killing stuff - our OOM preferred postgresql mostly (ask me how I know). Remember, it was a legacy project with not much options to improve architecture right now.

The first thing I found was that Gunicorn has a built-in mechanism exporting metrics to statsd, but it is quite limited, and shows only average number / average duration of handled requests, it is still difficult to judge overall availability.

There is a more interesting way to find out if we have enough workers to process all requests. This can be done through monitoring the TCP backlog. Let’s check TCP 3-way handshake diagram: client sends SYN packet to server, server responds with SYN-ACK packet and client sends ACK packet. At this stage connection is considered established.

syn ack diagram

A TCP backlog is a connection queue at the SYN stage of a three-way TCP handshake. Backlog is handled by operating system network stack. The maximum backlog queue is defined in the net.ipv4.tcp_max_syn_backlog setting.

When all the workers are busy processing current requests, new connections will start waiting in the queue. The number of connections in the queue that are waiting for a socket on port 8000 can be viewed in “network statistics” or “socket statistics” commands, the Recv-Q column:

netstat -l -n | grep 8000

netstat output

or

sudo ss -lt -n | grep 8000

When the maximum queue length (by default 128 or so) is exhausted, all new clients will receive “connection is refused” error.

Ok, so, now we know exactly how many connections are waiting in the queue and we can start optimising & monitoring this metric.

It is possible to increase the TCP backlog depth, so these refused connections will also sit in the queue but it may lead to even longer response times, gateway timeouts on reverse proxies, timeouts on a clients. Even worse, you may start processing requests for clients that will be disconnected by the time you finished processing, so they will come back again for requests you’ve already processed.

We tried to find balance between workers and the backlog depth, but ultimately our key to success was to log longest-running requests and optimise them so average response time would drop and therefore, TCP backlog will not grow. Most important, now we can be sure that all requests are handled perfectly and we can be alerted if the backlog queue starts to grow before any of the clients experienced “connection refused” error or a timeout (and reverse proxy logs still free of errors during this time).

The use of Gunicorn is just example, this information can be useful for other servers with thread pool workers, or, in fact, any server that works with big loads.

December 07, 2021

Dima Jerlitsyn

Backend engineer