Socket Listener "dies" with time

I have a problem with the server of Reign of Rebels , that after a few days running, it simply stops accepting connections in an unusual way.

The client tries to connect, it gets no error at first. Instead, it takes 50 - 80 seconds or so for the client to get a connection error. As if the client were negotiating connection.
If server is down, the client gets the message immediatelly.

I believe there’s something wrong my code, I’ll post the code I use to accept/close connection and would appreciate if somebody sees something weird.

Accepting new connections. I start 10 Threads of these:


public void run() 
{
    while (true) 
        {
       MMTCPConnection c = new MMTCPConnection(welcomeSocket, this);
       addConnection(c); /* this does not block, definetely*/
	}
}

this is in the constructor of MMTCPConnection :


Socket connectionSocket;
....
		try
		{
	connectionSocket = ss.accept();
	inFromClient = new DataInputStream(connectionSocket.getInputStream());		
	outToClient = new DataOutputStream( new BufferedOutputStream( connectionSocket.getOutputStream()));
        }catch (Exception e) {
			e.printStackTrace();
			try {
				connectionSocket.close();
			} catch (IOException e1) {
				e1.printStackTrace();
			}
		} 

In the class MMTCPConnection, when any error occur (including client closed connection), I call this method :


	Socket connectionSocket;
	DataOutputStream outToClient;
	DataInputStream inFromClient;

	private  void closeConnection()
	{
		server.connectionsV.remove(this.id);
		try 
		{
			inFromClient.close();
		}
		catch (Exception e)
		{
			e.printStackTrace();
		}

		try 
		{
			outToClient.close();
		}
		catch (Exception e)
		{
			e.printStackTrace();
		}

		inFromClient=null;
		outToClient=null;

		try {
			connectionSocket.close();
		} catch (IOException e) {
			e.printStackTrace();
		}

		System.out.println("closed connection . "+(int)id);

	}

Am I missing something ?

You should only .accept() a socket in the thread that is listening for incoming connections. Once you obtain a socket, pass the reference to another thread and handle the socket there. Don’t even request the steams, don’t set the timeout, pass it off right away.

If I understand correctly, then I should only start 1 Thread for accepting connections, and handle the starting of it in another thread ? Like :


ServerSocket welcomeSocket;
Socket connectionSocket ;
...

public void run()  /* only one Thread will be started */
{
    while (true) 
        {
         connectionSocket = welcomeSocket.accept();
         createThreadToHandleNewConnection(connectionSocket);
	}
}

I’d create the thread from a pool or send the connection to a connection-handling actor (I’m a scala guy), but yeah that’s basically the right idea: get it out of the accepting thread ASAP.

even this loop is not stable enough for a server under heavy load. It takes only 1 short burst of new connections (thousands) and you run out of file-descriptors, causing .accept(…) to throw an IOException, and your server is unreachable. forever.


while (true) 
{
   try {
      final Socket socket = serversocket.accept();
      // pass it off
   }
   catch(IOException exc) {
       Thread.sleep(...); // cool down, maybe take action
   }
}

I found out the hard way.

If you’re spawning off new threads for connections, you should definitely use a thread pool in order to throttle bursts. If you run out of threads in the pool, you can shunt the connection off to a “we’re busy, try again” response (I call that kind of handler a “Snubber”). If you control the client, it’s possible the user doesn’t ever see an error, because you can make the client try again. Certainly that’s nicer than a connection refused error.

Thread-per-connection can scale pretty well these days with NPTL on Linux, but it’s never going to scale as well as NIO, and you still have to stay vigilant about keeping thread growth under control.

I understand this concern, but right now I have like 2, maximum 3 non-simultaneous connections per day, and still that happens. I’m not thinking about scaling well, I just want it to work stable for a long time, which is not happening.

I’ll change the server with suggestions given here, and increase logging to see exactly where my accpt thread stops responding.

Thank you for now.

If you expect the thing to stay up 24/7 I would recommend catching exceptions like Riven suggested and just “resetting” the socket. That means closing and waiting a bit so the os is clear that the port is not already bound, try starting it again, rinse and repeat. On a server it pays to increase the max amount of file descriptors you have.

Same goes for open connections. Sure you can get cases where a tcp connection won’t go dark for days even weeks. But it is far more likely to get go dark on the order of hours.