Just the default socket factory.
Also, your “server” code runs fine on JGO, and exhibits exactly the same behaviour when run on puppygames.net (ie. failure) and it does no socket configuration.
Cas
Just the default socket factory.
Also, your “server” code runs fine on JGO, and exhibits exactly the same behaviour when run on puppygames.net (ie. failure) and it does no socket configuration.
Cas
[quote=“princec,post:20,topic:51918”]
Your ISP routes (and mangles?) your traffic.
[quote=“Riven”]
This is possible and one of the few tentative options left… but that would mean that it’s only doing this between me and puppygames.net but not me and JGO.
Cas
Indeed. It’s not uncommon for ISPs to route badly. I get mails from people that can ping to the IP address of java-gaming.org, but cannot connect to port 80. A few weeks later they can, and all is well. Then it starts all over again… sometimes they use proxies to get in, to workaround their crappy ISPs.
Here’s the tracert:
Tracing route to puppygames.net [184.106.147.224]
over a maximum of 30 hops:
1 5 ms 2 ms 2 ms srp527w [192.168.15.1]
2 * * * Request timed out.
3 30 ms 30 ms 28 ms lo0-central10.pcl-ag03.plus.net [195.166.128.184]
4 26 ms 29 ms 27 ms link-a-central10.pcl-gw01.plus.net [212.159.2.168]
5 26 ms 29 ms 29 ms xe-10-2-0.pcl-cr01.plus.net [212.159.0.200]
6 29 ms 28 ms 30 ms xe-11-2-0.edge3.London2.Level3.net [212.187.201.213]
7 124 ms 127 ms 136 ms ae-210-3610.edge1.Chicago2.Level3.net [4.69.158.229]
8 125 ms 123 ms 124 ms ae-210-3610.edge1.Chicago2.Level3.net [4.69.158.229]
9 125 ms 123 ms 123 ms 4.71.248.54
10 * * * Request timed out.
11 124 ms 124 ms 123 ms czi1-tunnel4.ord1.rackspace.net [50.56.6.163]
12 127 ms 127 ms 126 ms core1-CoreB.ord1.rackspace.net [184.106.126.129]
13 124 ms 124 ms 124 ms aggr301a-3-core1.ord1.rackspace.net [173.203.0.177]
14 126 ms 123 ms 123 ms 184-106-147-224.static.cloud-ips.com [184.106.147.224]
Not sure why I’m getting those timeouts.
(For comparison, JGO:)
1 4 ms 3 ms 5 ms srp527w [192.168.15.1]
2 * * * Request timed out.
3 70 ms 36 ms 34 ms lo0-central10.pcl-ag03.plus.net [195.166.128.184]
4 29 ms 34 ms 28 ms link-b-central10.pcl-gw02.plus.net [212.159.2.170]
5 26 ms 30 ms 28 ms xe-10-2-0.pcl-cr02.plus.net [212.159.0.202]
6 26 ms 31 ms 32 ms ae1.ptw-cr02.plus.net [195.166.129.2]
7 * * * Request timed out.
8 30 ms 29 ms 29 ms 217.20.44.193
9 31 ms 29 ms 29 ms 212.111.33.234
10 27 ms 29 ms 29 ms li732-171.members.linode.com [85.159.215.171]
Cas
1 <1 ms <1 ms <1 ms 192.168.1.1
2 20 ms 20 ms 28 ms ............ ORLY!
3 25 ms 25 ms 25 ms ............ ORLY!
4 25 ms 25 ms 25 ms ae3.cr1-asd8.nl.euro.net [194.134.161.215]
5 34 ms 26 ms 26 ms ae0.br1-asd8.nl.euro.net [194.134.161.171]
6 26 ms 26 ms 26 ms er1.ams1.nl.above.net [80.249.208.122]
7 26 ms 27 ms 26 ms ae8.cr1.ams5.nl.above.net [64.125.30.205]
8 112 ms 112 ms 129 ms xe-0-2-0.cr2.lga5.us.above.net [64.125.27.185]
9 129 ms 139 ms 139 ms ae6.cr2.ord2.us.above.net [64.125.24.30]
10 123 ms 124 ms 124 ms ae10.mpr1.ord11.us.above.net [64.125.24.110]
11 123 ms 124 ms 124 ms ae4.mpr1.ord5.us.above.net [64.125.24.94]
12 125 ms 125 ms 124 ms 208.185.125.6.IPYX-076520-ZYO.above.net [208.185.125.6]
13 124 ms 134 ms 124 ms 10.25.0.65
14 127 ms 127 ms 127 ms czi1-tunnel4.ord1.rackspace.net [50.56.6.163]
15 125 ms 124 ms 125 ms core1-CoreB.ord1.rackspace.net [184.106.126.129]
16 124 ms 124 ms 124 ms aggr301a-3-core1.ord1.rackspace.net [173.203.0.177]
17 127 ms 128 ms 127 ms 184-106-147-224.static.cloud-ips.com [184.106.147.224]
1 <1 ms <1 ms <1 ms 192.168.1.1
2 22 ms 20 ms 19 ms ............ ORLY!
3 26 ms 25 ms 31 ms ............ ORLY!
4 26 ms 25 ms 25 ms ae3.cr1-asd8.nl.euro.net [194.134.161.215]
5 27 ms 31 ms 25 ms ae0.br1-asd8.nl.euro.net [194.134.161.171]
6 26 ms 25 ms 26 ms er1.ams1.nl.above.net [80.249.208.122]
7 26 ms 26 ms 26 ms ae14.cr1.ams10.nl.above.net [64.125.21.77]
8 31 ms 42 ms 31 ms ae9.mpr3.lhr3.uk.above.net [64.125.28.242]
9 31 ms 30 ms 31 ms ae6.mpr2.lhr3.uk.above.net [64.125.21.22]
10 31 ms 31 ms 31 ms 94.31.35.186.t01461-01.above.net [94.31.35.186]
11 34 ms 32 ms 31 ms 212.111.33.234
12 39 ms 31 ms 32 ms li732-171.members.linode.com [85.159.215.171]
Right, so… the only difference I can see here is that I have to go via Level3.
Cas
So… once you established a TCP connection… is it stable? If so, just make N connections on N threads, and close N-1 sockets.
I’ve not got as far as to test the stability of the connections yet but if you remember from the protocol we devised, it only transmits a few bytes, reads a small response, and then shuts down, in order to handle thousands of “simultaneous” clients, so stability isn’t really an issue.
I can of course work around it by simply retrying until I get a connection - which is actually what I will really do - but what is bugging me is that it fails at all at this stage, most unexpectedly. It doesn’t bode well for stability. But if it’s genuinely just a crazy quirk of my route from home to the server, there’s nothing I’ll be able to do about it anyway and continually retrying will “patch” over the deficiency. It just sucks to not know why it’s failing and this sort of random crap is exactly why network programming is so pointlessly difficult :emo:
Cas
A (few?) months ago you said you rewrote everything to SSL, and as short-lived connections are truly not a good idea with SSL, given the incredible overhead of the handshake, I presumed you rewrote the protocol to persistent connections.
Anyway, network I/O is hard, and I can know, I make the ‘big’ bucks in this general area. If your low level code looks clean, you’re doing it wrong. Put those (self-adjusting) retry-loops behind abstraction layers and you’d be relatively fine.
Yeah, it’s all good… it works just fine with SSL connection too - seems to add maybe another 100ms latency but that’s entirely liveable with for a nice secure protocol. I can actually ditch all the SSL stuff from serverside Java code if I’m going to use load balancers as they come with SSL termination built-in. So it’s exactly the same as it was before, but it’s just using a nice (and thoroughly tested!) binary protocol now and pretty simple client/server code. Telling the client to retry is trivial and already in the abstraction… I was just rather worried about its unreliability with only me testing it under no load whatsoever. Now it seems it’s just me. Gah.
Time to move to Linode I think.
Cas
100ms of latency doesn’t seem bad, but it’s 100ms of 1 CPU core on the server doing heavy work. With a dozen handshakes per second your typical VPS will grind to a halt - it doesn’t scale well. You really need loadbalancers with hardware-accelerated SSL or you’re just moving the bottleneck from one machine to another, and those SSL loadbalancers typically aren’t as cheap as a VPS. My initial protocol did a secure handshake without SSL, but it was rather complex - I can understand you prefered the simplicity of SSL, and I hope it works out.
It does actually use that hacked bit of SSL code you wrote though it’s still sending a fair amount of stuff back and forth. I could change it in theory to use a custom handshaking mechanism but sticking to SSL means we can really easily just palm the problem off on, say, a $20/mo Linode node balancer and that’ll handle it. I don’t think we’re really going to get that much traffic any time soon…
Cas
Linodes Loadbalancers have a pretty crappy reputation and are pricy too. Simply use HAproxy on a basic Linode instance :point:
Though I did come across this tidbit:
https://www.imperialviolet.org/2010/06/25/overclocking-ssl.html
and this:
https://www.imperialviolet.org/2011/02/06/stillinexpensive.html
Cas
Try Java’s default SSL Engine, it’s very computationally expensive
As for my protocol being SSL based - it used a sliver of SSL to create a ‘token based session’ that the bulk of the I/O could use - without needing SSL, while retaining guarantees about which peers were communicating. That’s where the complexity was introduced, that I refered to. Anyway, maybe Linode improved their Loadbalancers (back in 2012 - ancient history, I know - they slowly degraded to up to 4s-6s (!!) latency on handshake overhead). Even Java’s SSL Engine beats that hands down! It’s worth a try again. Getting familiar with, and correctly configuring HAproxy is probably more ‘expensive’ than messing about with a Linode Loadbalancer for a few hours. But you know me - I’d gladly spend a day tinkering to save a few bucks per month. :persecutioncomplex:
Please advise if you able to find any solution to this problem. I am facing a similiar issue.
Implement reader thread that fills a queue, and a writer thread driven from a queue.
You will never see timeouts again, but it won’t fix whatever other bugs there are
In the code generating or absorbing data.