java.net.SocketTimeoutException: connect timed out

Riven · November 11, 2014, 8:58pm

I’m pretty certain the trouble is with your ISP, BT (it used to be crappy, back in the day) or your home network, fwiw.

The last unknown is your mystery socketFactory that may be configured… oddly.

princec · November 11, 2014, 9:25pm

hmm I’m on PlusNet (been on PlusNet for the entire time I’ve lived here). I wonder what they could be doing wrong.
I’ll just check I get similar results from java-gaming…

Ok, JGO is rock solid for me too. And vastly faster: 30ms response or less compared to 250ms from puppygames!

Would you mind altering the server code so that it doesn’t close the socket?

Cas

Riven · November 11, 2014, 9:37pm

Well, I can’t let the server run out of file handles. that can screw up the OS quite badly, as every single service/process will start to fail in spectacular ways. So… what can I do for you instead? A ping/pong-like service?

princec · November 11, 2014, 9:39pm

Don’t worry, I’m closing the sockets immediately from the client end and I’ll only run the test for a few seconds… should survive?

(Or stick in a max count and then abort the process after say 1000)

Cas

Riven · November 11, 2014, 9:44pm

Deployed:

[icode]Server.java[/icode]

import java.io.IOException;
import java.net.ServerSocket;
import java.net.Socket;
import java.util.ArrayList;
import java.util.List;

public class Server {
	public static void main(String[] args) {
		ServerSocket ss = null;

		List<Socket> open = new ArrayList<>();

		while (true) {
			if (ss == null) {
				try {
					ss = new ServerSocket(25000);
				} catch (IOException e) {
					e.printStackTrace();
					sleep(1000);
					ss = null;
					continue;
				}
			}

			try {
				Socket s = ss.accept();
				open.add(s);
				System.out.println("accepted[" + s + "] / " + open.size());
			} catch (IOException e) {
				e.printStackTrace();
				sleep(1000);
				ss = null;
			}

			while (open.size() > 50) {
				try {
					open.remove(0).close();
				} catch (IOException e) {
					e.printStackTrace();
				}
			}
		}
	}

	private static void sleep(int ms) {
		try {
			Thread.sleep(ms);
		} catch (InterruptedException exc) {
			// meh
		}
	}
}

princec · November 11, 2014, 9:49pm

Hmm, well that worked perfectly too on JGO. I’ll try that on puppygames.

Cas

Riven · November 11, 2014, 9:52pm

Linode… given they dropped the price of their low-end VPS from $20 to $10 / month, there’s no reason not to make the switch. Dirt cheap, can’t break it (easily). It just rocks. :point:

princec · November 11, 2014, 9:54pm

Think I may migrate to Linode at some point in the near future.

Running that exact code on puppygames.net:25000 now - and it fails almost instantly here with connect timed out etc. How about when you try connecting puppygames?

Cas

Riven · November 11, 2014, 9:56pm

As said before, puppygames.net:25000 was (and to this very moment is!) equally stable for me, just some hefty (atlantic ocean induced) latency.

Why are you using a SocketFactory, and how is it configured?

princec · November 11, 2014, 10:01pm

Right then… so what does this tell us.

Firstly, that it’s not puppygames.net: it works fine for you
Secondly, that it’s not my server code: your code has the same problems for me and also works fine for you
Thirdly, that it’s not my machine (also verified problem exists with laptop too btw): I can run against JGO and the client is fine
Fourthly, that it’s not the client code: as I can run against JGO without problems
Fifthly, that it’s not my ISP: as I can run against JGO without problems
Sixthly, that the port number makes no difference: happens to port 80 as well
Seventhly, that the rate makes no difference: happens at full pelt or at 1 every 10 seconds

It seems to only occur between my computer and puppygames.net.

Sorta running out of options here.

Cas

princec · November 11, 2014, 10:03pm

Just the default socket factory.

Also, your “server” code runs fine on JGO, and exhibits exactly the same behaviour when run on puppygames.net (ie. failure) and it does no socket configuration.

Cas

Riven · November 11, 2014, 10:03pm

[quote=“princec,post:20,topic:51918”]
Your ISP routes (and mangles?) your traffic.

princec · November 11, 2014, 10:04pm

[quote=“Riven”]

This is possible and one of the few tentative options left… but that would mean that it’s only doing this between me and puppygames.net but not me and JGO.

Cas

Riven · November 11, 2014, 10:06pm

Indeed. It’s not uncommon for ISPs to route badly. I get mails from people that can ping to the IP address of java-gaming.org, but cannot connect to port 80. A few weeks later they can, and all is well. Then it starts all over again… sometimes they use proxies to get in, to workaround their crappy ISPs.

princec · November 11, 2014, 10:06pm

Here’s the tracert:


Tracing route to puppygames.net [184.106.147.224]
over a maximum of 30 hops:

  1     5 ms     2 ms     2 ms  srp527w [192.168.15.1]
  2     *        *        *     Request timed out.
  3    30 ms    30 ms    28 ms  lo0-central10.pcl-ag03.plus.net [195.166.128.184]
  4    26 ms    29 ms    27 ms  link-a-central10.pcl-gw01.plus.net [212.159.2.168]
  5    26 ms    29 ms    29 ms  xe-10-2-0.pcl-cr01.plus.net [212.159.0.200]
  6    29 ms    28 ms    30 ms  xe-11-2-0.edge3.London2.Level3.net [212.187.201.213]
  7   124 ms   127 ms   136 ms  ae-210-3610.edge1.Chicago2.Level3.net [4.69.158.229]
  8   125 ms   123 ms   124 ms  ae-210-3610.edge1.Chicago2.Level3.net [4.69.158.229]
  9   125 ms   123 ms   123 ms  4.71.248.54
 10     *        *        *     Request timed out.
 11   124 ms   124 ms   123 ms  czi1-tunnel4.ord1.rackspace.net [50.56.6.163]
 12   127 ms   127 ms   126 ms  core1-CoreB.ord1.rackspace.net [184.106.126.129]
 13   124 ms   124 ms   124 ms  aggr301a-3-core1.ord1.rackspace.net [173.203.0.177]
 14   126 ms   123 ms   123 ms  184-106-147-224.static.cloud-ips.com [184.106.147.224]

Not sure why I’m getting those timeouts.

(For comparison, JGO:)


 1     4 ms     3 ms     5 ms  srp527w [192.168.15.1]
 2     *        *        *     Request timed out.
 3    70 ms    36 ms    34 ms  lo0-central10.pcl-ag03.plus.net [195.166.128.184]
 4    29 ms    34 ms    28 ms  link-b-central10.pcl-gw02.plus.net [212.159.2.170]
 5    26 ms    30 ms    28 ms  xe-10-2-0.pcl-cr02.plus.net [212.159.0.202]
 6    26 ms    31 ms    32 ms  ae1.ptw-cr02.plus.net [195.166.129.2]
 7     *        *        *     Request timed out.
 8    30 ms    29 ms    29 ms  217.20.44.193
 9    31 ms    29 ms    29 ms  212.111.33.234
10    27 ms    29 ms    29 ms  li732-171.members.linode.com [85.159.215.171]

Cas

Riven · November 11, 2014, 10:12pm

puppygames.net

 1    <1 ms    <1 ms    <1 ms  192.168.1.1
 2    20 ms    20 ms    28 ms  ............ ORLY!
 3    25 ms    25 ms    25 ms  ............ ORLY!
 4    25 ms    25 ms    25 ms  ae3.cr1-asd8.nl.euro.net [194.134.161.215]
 5    34 ms    26 ms    26 ms  ae0.br1-asd8.nl.euro.net [194.134.161.171]
 6    26 ms    26 ms    26 ms  er1.ams1.nl.above.net [80.249.208.122]
 7    26 ms    27 ms    26 ms  ae8.cr1.ams5.nl.above.net [64.125.30.205]
 8   112 ms   112 ms   129 ms  xe-0-2-0.cr2.lga5.us.above.net [64.125.27.185]
 9   129 ms   139 ms   139 ms  ae6.cr2.ord2.us.above.net [64.125.24.30]
10   123 ms   124 ms   124 ms  ae10.mpr1.ord11.us.above.net [64.125.24.110]
11   123 ms   124 ms   124 ms  ae4.mpr1.ord5.us.above.net [64.125.24.94]
12   125 ms   125 ms   124 ms  208.185.125.6.IPYX-076520-ZYO.above.net [208.185.125.6]
13   124 ms   134 ms   124 ms  10.25.0.65
14   127 ms   127 ms   127 ms  czi1-tunnel4.ord1.rackspace.net [50.56.6.163]
15   125 ms   124 ms   125 ms  core1-CoreB.ord1.rackspace.net [184.106.126.129]
16   124 ms   124 ms   124 ms  aggr301a-3-core1.ord1.rackspace.net [173.203.0.177]
17   127 ms   128 ms   127 ms  184-106-147-224.static.cloud-ips.com [184.106.147.224]

java-gaming.org

  1    <1 ms    <1 ms    <1 ms  192.168.1.1
  2    22 ms    20 ms    19 ms  ............ ORLY!
  3    26 ms    25 ms    31 ms  ............ ORLY!
  4    26 ms    25 ms    25 ms  ae3.cr1-asd8.nl.euro.net [194.134.161.215]
  5    27 ms    31 ms    25 ms  ae0.br1-asd8.nl.euro.net [194.134.161.171]
  6    26 ms    25 ms    26 ms  er1.ams1.nl.above.net [80.249.208.122]
  7    26 ms    26 ms    26 ms  ae14.cr1.ams10.nl.above.net [64.125.21.77]
  8    31 ms    42 ms    31 ms  ae9.mpr3.lhr3.uk.above.net [64.125.28.242]
  9    31 ms    30 ms    31 ms  ae6.mpr2.lhr3.uk.above.net [64.125.21.22]
 10    31 ms    31 ms    31 ms  94.31.35.186.t01461-01.above.net [94.31.35.186]
 11    34 ms    32 ms    31 ms  212.111.33.234
 12    39 ms    31 ms    32 ms  li732-171.members.linode.com [85.159.215.171]

princec · November 11, 2014, 10:13pm

Right, so… the only difference I can see here is that I have to go via Level3.

Cas

Riven · November 11, 2014, 10:20pm

So… once you established a TCP connection… is it stable? If so, just make N connections on N threads, and close N-1 sockets.

princec · November 11, 2014, 10:23pm

I’ve not got as far as to test the stability of the connections yet but if you remember from the protocol we devised, it only transmits a few bytes, reads a small response, and then shuts down, in order to handle thousands of “simultaneous” clients, so stability isn’t really an issue.

I can of course work around it by simply retrying until I get a connection - which is actually what I will really do - but what is bugging me is that it fails at all at this stage, most unexpectedly. It doesn’t bode well for stability. But if it’s genuinely just a crazy quirk of my route from home to the server, there’s nothing I’ll be able to do about it anyway and continually retrying will “patch” over the deficiency. It just sucks to not know why it’s failing and this sort of random crap is exactly why network programming is so pointlessly difficult :emo:

Cas

Riven · November 11, 2014, 10:30pm

A (few?) months ago you said you rewrote everything to SSL, and as short-lived connections are truly not a good idea with SSL, given the incredible overhead of the handshake, I presumed you rewrote the protocol to persistent connections.

Anyway, network I/O is hard, and I can know, I make the ‘big’ bucks in this general area. If your low level code looks clean, you’re doing it wrong. Put those (self-adjusting) retry-loops behind abstraction layers and you’d be relatively fine.