How to properly benchmark server performance.

DevTucker · May 16, 2015, 3:54pm

Currently I am working on a new server for our android application that’s using Netty 5. We’re using this to keep a silent keep-alive (TCP) connection to our application at all times to act as a chat server. However, we’re planning on removing the REST functionality of our application and moving to a complete Server/Client solution. I’ve seen no issues with lag while having connections connected constantly and I’m very careful to make sure that we never hold onto any garbage.

Because of the nature of Netty we’re running all of these connections on a total of two threads. They’re keep-alive, but they’re also silent. Meaning nothing is being processed for them and the only time the server ever sends data to the client is when it’s requested.

What is the best way to test the scalability of the server application? We’re aiming to have at-least 350,000 silent concurrent connections in massive preparation for our global launch. Currently we’re at 5~10 thousand downloads in the Philippines only (Beta testing) and we’re about to nail out advertising around the world. I see 200,000 installs very realistic.

Considering we’re not going to be sending / receiving data constantly from all of these connections, a lot of them will be silent at any given time. (Probably around 3/4 of them, if not more).

What is the proper way to connect a bunch of clients to a server and simulate network activity so I can monitor this? Should I just create a fake abstract client and loop through it spamming connections? What about for monitoring performance, task manager isn’t exactly up to par with telling me what I want to know.

Riven · May 16, 2015, 4:42pm

A serversocket can hold at most 64K connections. You’ll need a lot of NICs to handle 350K connections on a single server. Also keep in mind each socket has at the very least 8K of buffers (you can safely assume 256K), as your calls to socket.setReceiveBufferSize/setSendBufferSize will be ignored by the OS when passing small values. So let’s say 256K (2*128K) per socket times 350K connections; which amounts to 90GB of RAM for socket buffers. Even when you can shave off an order of magniture, by tweaking the OS, that’s still 9GB of socket buffers, and the need for NICs to support all connections… To put this in perspective, Twitter had modified (not just tweaked) Linux, put together a monster of a machine with very specific enterprise level hardware and managed 1 million concurrent connections, for which the switches had to be modified too.

My understanding is that with quite some effort, a novice in this area might be able to get 10,000 connections, before it all grinds to a halt.

Long story short: cluster servers, you need that anyway, if you want to create a reliable service. Having only one server to handle everything is a rookie mistake.

DevTucker · May 16, 2015, 9:20pm

Thank you for your input, while talking to a bunch of users that have worked on a bunch of networking in the past, they’ve never worked on it for such a large scale, so this is definitely helpful information and has made me think twice about the replacement of REST.

However a key component of our application is to have instant messaging built into it, which I’m running into various issues planning out while using REST.

The only think that I can think of is to have the device (Phone) constantly poll the server asking for updates on new messages as there’s no way (that I’m aware of) to send a notification to the device that a new message has been received reliably.

PUSH Notifications aren’t very reliable and can’t really be used for instant messaging, notifying the user of a new message is one thing, but it doesn’t always go through, so I can’t rely on it for transmitting message data.

Could you recommend a proper way to set up instant messaging in an application that has thousands of users?

Developing for a small game that will only have 1-2k connections maximum is certainly easier than this, it seems. Looking forward to any insights you may have.

Riven · May 16, 2015, 10:26pm

Why reinvent the wheel. There are, say, open source IRC server & client libraries. Such IRC servers scale almost linearly with cluster size. They are built for this exact purpose.

Anyway, you’re in over your head. Hire people with experience on the matter, or it will fail, exactly when the damage of an outage would be worst: when people are flooding in… they won’t come back.

DevTucker · May 16, 2015, 11:58pm

Alright; Here we go.

First of all you instantly jumped into the thread, not even answering the original question mind you, assuming that I was using a buffer per client setup, and went to deep lengths to explain why said setup wouldn’t work. That’s great and I’m sure that the information will be useful to someone else in the future that’s attempting to use a buffer/client setup.

However, with your assumptions you also opted out the options for any other networking setup, such as pooled (Shared) buffers and stateless connections. While I had stated in my thread that we were testing with Netty 5 that does not mean that we did not have other alternatives available, either. Your claim about a NIC only being able to hold 64k connections is also only true by default as this can be changed through configurations.

Theoretically speaking, we could run this server using NIO2’s AsynchronousServerSocketChannel and pooled buffers. For example purposes we’re going to say that we have 4GB of RAM and 8 CPU cores. Thus we could process on all 8 cores and give each core their own buffer of 256MB (Total of 2GB pooled buffers) which would operate through splits over a connectionless state. Buffers would be polled through a cycle and data will be removed from the buffer once it’s used. If the incoming data will not fit into said buffer, it will be sent to the next one, the cycle will continue until it is eventually scrapped. Considering the nature of the application (Where the maximum amount of data being sent will be a chat message, that has a maximum size of 272 bytes (including the packet header) and the separator signal.

The entire reasoning behind my response to your initial response was to see how you would recommend the setup, considering the ridiculous lengths you went to prove that it wasn’t even possible without a monster machine.

Perhaps we can get back on track and work more towards getting to the main issue of the topic, which was about ways to benchmark at such a massive scale. Instead of making presumptuous claims.

Regards,
Tuck.

CommanderKeith · May 17, 2015, 3:30am

I’m not an expert, but I run a small Tomcat server and found that JavaMelody (https://code.google.com/p/javamelody/) was very useful for monitoring connections, RAM, CPU, etc. Might be worth checking out just for ideas.
Good luck with the big launch, sounds like an exciting project.

Edit: to test your server setup, couldn’t you have another server with the same specs that just keeps connections open with the other one you’re actually trying to test

Riven · May 17, 2015, 10:33am

[quote=“DevTucker,post:5,topic:54489”]
How else can one join a thread but ‘instantly’.

[quote=“DevTucker,post:5,topic:54489”]
I’m not talking about application managed buffers, but OS managed buffers - see my remark about socket.setReceiveBufferSize(). You cannot pool/recycle these. I’m only talking about OS level sockets, not even addressing Java, libraries or business logic like pooling buffers. As said, it’s telling that even routers will not support this use case without modifications. If you have long-term idle TCP connections, a stressed router will drop idle tcp-sessions, after a minute or so. Look at the big boys, their long-polling sessions rarely idle over 45s. You will need to actively send and receive keep-alive packets, and monitor these connections for timeouts. A rough estimate would be that you’re performing 350K/45s=7777 keep-alives per second, meaning at the very least ~30,000 tcp packets per second (2x DATA, 2x ACK) and that’s just idling overhead. Add chat functionality, which means 1 client’s packet (chat message) being replicated towards N clients, and you’ll easily find yourself dealing with hundreds of thousands, to millions of tcp packets per second, for the 350,000 concurrent connection use case. It is unlikely a single server (with the hardware you described) is able to handle this, not to mention you will easily saturate your network’s bandwidth.

The suggestion to piggyback on existing IRC server software / client libraries, is serious - they solve complex problems, and allow you to worry about the actual service you are building, of which chat is probably a small-subset of desired functionality.

As to answer your original question: benchmarking is the easiest part: just open connections from other peers, like Keith said, simulate some traffic. By pointing out technical challenges of the setup, instead of telling you to open connections - thereby sidestepping the original question, I actually tried to be more helpful. Oh well, given your rather bold response, I’ll let others inform you about these matters.

DevTucker · May 17, 2015, 3:00pm

[quote=“Riven,post:7,topic:54489”]

+1.

[quote=“Riven,post:7,topic:54489”]

DevTucker:

I’m not talking about application managed buffers, but OS managed buffers - see my remark about socket.setReceiveBufferSize(). You cannot pool/recycle these. I’m only talking about OS level sockets, not even addressing Java, libraries or business logic like pooling buffers. As said, it’s telling that even routers will not support this use case without modifications. If you have long-term idle TCP connections, a stressed router will drop idle tcp-sessions, after a minute or so. Look at the big boys, their long-polling sessions rarely idle over 45s. You will need to actively send and receive keep-alive packets, and monitor these connections for timeouts. A rough estimate would be that you’re performing 350K/45s=7777 keep-alives per second, meaning at the very least ~30,000 tcp packets per second (2x DATA, 2x ACK) and that’s just idling overhead. Add chat functionality, which means 1 client’s packet (chat message) being replicated towards N clients, and you’ll easily find yourself dealing with hundreds of thousands, to millions of tcp packets per second, for the 350,000 concurrent connection use case. It is unlikely a single server (with the hardware you described) is able to handle this, not to mention you will easily saturate your network’s bandwidth.

This is why I brought up the stateless connection. I won’t go into details, because you seem knowledgeable enough to know what these are. With the use of these a ‘keep-alive’ packet and are’nt required and there won’t be idle-tcp connections as they will be closed after they are used.

This is something I may possibly look into granted the benchmarks do not hold up. I personally do not have any fear of this. However, it’s always a possibility.

I don’t believe I ever stated it was hard, hell I could just use a profiler. The question was asking for people that have benchmarked on much larger scales to talk about how they did it.

Thank you for your responses. They were definitely informative.

Spasi · May 17, 2015, 4:09pm

Make sure your benchmark code (or any third-party software you use to measure latencies) does not suffer from Coordinated Omission. It’s very easy to miss and many tools (even commercial) suffer from this problem. For solutions, see LatencyUtils and anything Gil Tene has to say on the matter.