java.net.SocketTimeoutException: connect timed out

So, Battledroid continues crawling along. I’ve come across a bit of an issue which doesn’t bode well and will I suspect cause grave problems after any attempt at scaling.

I get a lot of seemingly random “java.net.SocketTimeoutException: connect timed out” when I try and connect to the Battledroid server. It’s a TCP/IP connection on puppygames.net:25000.

95% of the time, the connection succeeds just fine, and the request is processed in a total of 450ms or so in its entirety. I’m polling the server once per second with this simple, tiny request (I think the payload is less than 100 bytes but that’s neither here nor there). But every so often it’ll just timeout attempting to connect. Changing the socket timeout to larger values (up to 20 seconds!) doesn’t seem to fix it… it’s like the initial connection is simply lost in space.

Probing the firewall logs I’m not seeing any evidence of rate limiting or blocking from the firewall (indeed, coming from my home machine, there are no firewall rules in the way to my Linux server).

This is perplexing. A ping never fails. A browse to www.puppygames.net never fails. Only my TCP/IP connections in-game (made by Java) are failing.

Has anybody any good ideas of how to proceed with diagnosing it?

A little further information:
tcpview shows that we get as far as SYN_SENT status on the client. So presumably the preamble genuinely isn’t reaching the server but is vanishing somewhere en route.

Cas :slight_smile:

It sounds like the actual problem is that the process providing the data is blocked.
A common error is assuming that sending a small amount of data will always
succeed quickly, when in fact it can block.

Outside of the actual data transfer, other kinds of synchronization issues
among threads can stop progress; “socket timed out” is only a symptom.

We’re not even getting as far as sending data; it gets as far as sending SYN, then we never get the ACK back - the socket is never connected.

Cas :slight_smile:

Are these plain Old IO Sockets? NIO? URL.openStream()? Presumably no SSL. Does the serversocket still receive an .accept(), after which no data is received? What happens under stress? (try 10 conn / sec) Do you use thread pools backed by ExecutorService? Does it work reliably when the server is hosted on localhost / somewhere in the LAN? Is your client on a wifi network? Is your clients’ uplink saturated, or your servers’ downlink? Do you have the same problems when connecting through other ISPs? (think server<->server)

Can you host a ping-like tcp service on port 25000 (or another port, as not to ruin your dev-efforts) and let the community test it for a bit?

Server code is thus:


private void run() {
	final AtomicBoolean shutdown = new AtomicBoolean();

	while (!shutdown.get()) {
		Socket socket = null;
		try {
			socket = serverSocket.accept();
		} catch (IOException e) {
			warning("Failed to listen", e);
		} catch (Throwable t) {
			shutdown.set(true);
			severe("Unexpected trouble", t);
		}
	}
}

I’ve completely removed all the actual processing, executor, SSL, etc. That’s literally all it is doing now: accepting sockets.

Client code is:


socket = socketFactory.createSocket();
SocketAddress address = new InetSocketAddress(host, port);
socket.connect(address, timeout);
socket.close();

That is, I simply connect (doesn’t matter what timeout value I use) and then correctly close the socket. And then I hit that code every 250ms. It works 95% of the time, then randomly, it’ll get a connect timed out. No exceptions are logged on the server nor in the server’s firewall. I can get the timeout immediately upon starting, or after a few tens of attempts. I can run flat out (1 call every 250ms or so is as fast as it seems to manage) or I can run it once every 10 seconds… same happens.

Cas :slight_smile:

That snippet you posted will run out of serverside file-handles pretty quickly :stuck_out_tongue:

Can you ‘deploy’ this online, tell us the port:addr, and let us do some poking? I can host a similar service on JGO’s VPS, so we can compare.

Some answers to your other questions…

this is plain old sockets, no other code than that which you see there (I whittled it back to the bare code and behold, it still occurs!). No wifi involved, bugger all bandwidth. Also occurs if I hit port 80 instead (Apache) so I think I can safely discount the Java bit of the server and iptables from any wrongdoing.

I’ve opened port 25000 to all (code is similar to as you see there - all you can do is close the socket, it won’t otherwise respond in any meaningful way). See if you can replicate the problem.

Cas :slight_smile:

This may or may not be the issue as I am by all definitions a programming newbie. However, I found a stackoverflow post with the same issue.
The top-rated comment describes the issue as such: The socket will timeout if no data arrives within the timeout period.

Post: http://stackoverflow.com/questions/21603629/serversocket-accept-throwing-sockettimeoutexception-with-null-message
I could be quite wrong, but I thought I should post the link :slight_smile:

CopyableCougar4

Edit: similar findings @ http://examples.javacodegeeks.com/core-java/net/sockettimeoutexception/java-net-sockettimeoutexception-how-to-solve-sockettimeoutexception/

No, it’s not that :slight_smile:

Cas :slight_smile:

The following currently is running on java-gaming.org:25000

My client (running at 2Hz) never has I/O trouble when running this code.

[icode]Server.java[/icode]

import java.io.IOException;
import java.net.ServerSocket;
import java.net.Socket;

public class Server {
	public static void main(String[] args) {
		ServerSocket ss = null;

		while (true) {
			if (ss == null) {
				try {
					ss = new ServerSocket(25000);
				} catch (IOException e) {
					e.printStackTrace();
					sleep(1000);
					ss = null;
					continue;
				}
			}

			try {
				Socket s = ss.accept();
				System.out.println("accepted[" + s + "].closed");
				s.close();
			} catch (IOException e) {
				e.printStackTrace();
				sleep(1000);
				ss = null;
			}
		}
	}

	private static void sleep(int ms) {
		try {
			Thread.sleep(ms);
		} catch (InterruptedException exc) {
			// meh
		}
	}
}

[icode]Client.java[/icode]

import java.io.IOException;
import java.net.InetSocketAddress;
import java.net.Socket;
import java.net.SocketAddress;

public class Client {
	public static void main(String[] args) {
		final String host = "java-gaming.org";
		final int port = 25000;
		final int timeout = 5000;

		while (true) {
			try {
				Socket socket = new Socket();
				SocketAddress address = new InetSocketAddress(host, port);
				socket.connect(address, timeout);
				socket.close();
			} catch (IOException e) {
				e.printStackTrace();
			}

			sleep(500);
		}
	}

	private static void sleep(int ms) {
		try {
			Thread.sleep(ms);
		} catch (InterruptedException exc) {
			// meh
		}
	}
}

puppygames.net:25000 is rock solid too.

I’m pretty certain the trouble is with your ISP, BT (it used to be crappy, back in the day) or your home network, fwiw.

The last unknown is your mystery socketFactory that may be configured… oddly.

hmm I’m on PlusNet (been on PlusNet for the entire time I’ve lived here). I wonder what they could be doing wrong.
I’ll just check I get similar results from java-gaming…

Ok, JGO is rock solid for me too. And vastly faster: 30ms response or less compared to 250ms from puppygames!

Would you mind altering the server code so that it doesn’t close the socket?

Cas :slight_smile:

Well, I can’t let the server run out of file handles. that can screw up the OS quite badly, as every single service/process will start to fail in spectacular ways. So… what can I do for you instead? A ping/pong-like service?

Don’t worry, I’m closing the sockets immediately from the client end and I’ll only run the test for a few seconds… should survive?

(Or stick in a max count and then abort the process after say 1000)

Cas :slight_smile:

Deployed:

[icode]Server.java[/icode]

import java.io.IOException;
import java.net.ServerSocket;
import java.net.Socket;
import java.util.ArrayList;
import java.util.List;

public class Server {
	public static void main(String[] args) {
		ServerSocket ss = null;

		List<Socket> open = new ArrayList<>();

		while (true) {
			if (ss == null) {
				try {
					ss = new ServerSocket(25000);
				} catch (IOException e) {
					e.printStackTrace();
					sleep(1000);
					ss = null;
					continue;
				}
			}

			try {
				Socket s = ss.accept();
				open.add(s);
				System.out.println("accepted[" + s + "] / " + open.size());
			} catch (IOException e) {
				e.printStackTrace();
				sleep(1000);
				ss = null;
			}

			while (open.size() > 50) {
				try {
					open.remove(0).close();
				} catch (IOException e) {
					e.printStackTrace();
				}
			}
		}
	}

	private static void sleep(int ms) {
		try {
			Thread.sleep(ms);
		} catch (InterruptedException exc) {
			// meh
		}
	}
}

Hmm, well that worked perfectly too on JGO. I’ll try that on puppygames.

Cas :slight_smile:

Linode… given they dropped the price of their low-end VPS from $20 to $10 / month, there’s no reason not to make the switch. Dirt cheap, can’t break it (easily). It just rocks. :point:

Think I may migrate to Linode at some point in the near future.

Running that exact code on puppygames.net:25000 now - and it fails almost instantly here with connect timed out etc. How about when you try connecting puppygames?

Cas :slight_smile:

As said before, puppygames.net:25000 was (and to this very moment is!) equally stable for me, just some hefty (atlantic ocean induced) latency.

Why are you using a SocketFactory, and how is it configured?

Right then… so what does this tell us.

Firstly, that it’s not puppygames.net: it works fine for you
Secondly, that it’s not my server code: your code has the same problems for me and also works fine for you
Thirdly, that it’s not my machine (also verified problem exists with laptop too btw): I can run against JGO and the client is fine
Fourthly, that it’s not the client code: as I can run against JGO without problems
Fifthly, that it’s not my ISP: as I can run against JGO without problems
Sixthly, that the port number makes no difference: happens to port 80 as well
Seventhly, that the rate makes no difference: happens at full pelt or at 1 every 10 seconds

It seems to only occur between my computer and puppygames.net.

Sorta running out of options here.

Cas :slight_smile: