We came to the idea that we may need more workers in our Gunicorn server (in our case, Gunicorn workers are threads handling incoming connections). But I believe any optimisation should be measurement-based, and there should be a way to tell if there is a need for more workers since it is never clear how many to create - 100, 1000? Let’s suppose we’ve created 1000 workers, and it’s sufficient right now - how can we be sure it will be enough during peak load? Yes, adding 999999 workers in auto-scaling mode could be possible. Still, we will risk running out of RAM on peak loads and experience unexpected side-effects due to OOM randomly killing stuff - our OOM preferred PostgreSQL mostly (ask me how I know). Remember, it was a legacy project with not many options to improve architecture
right now.
I first found that Gunicorn has a built-in mechanism
exporting metrics to statsd, but it is pretty limited and shows only the average number / average duration of handled requests. It is still difficult to judge overall availability.
There is a more interesting way to find out if we have enough workers to process all requests - by monitoring the TCP backlog.
First, let’s check the TCP 3-way handshake diagram: a client sends a SYN packet to a server, the server responds with a SYN-ACK packet, and the client sends ACK packet. At this stage, a connection is considered established.