Java.nio UDP

[quote]e.g.:

“As an example, consider a transfer of a 1 GB file with the average Internet packet loss rate (2%) and global average RTT (just over 200 msec). In this specific case, TCP would take more than 7 hours to transfer the file. With Digital Fountain, on a fully utilized T1 link, the transfer of the 1 GB file will take about 1.5 hours, and on a fully utilized T3 link the transfer time will be a little over 3 minutes. That translates to a 3X to over 100X improvement.”

WTF? Only a marketing droid or an idiot would use an example with 200msec delay - I get less than that to almost everywhere in the world except Asia and Australia from my 56k modems. Everyone with the most basic grasp of mathematics knows there is absolutely no point in taking a global average of latency for measuring protocol performance. The modal average would at least make sense and have some value here, although if they wanted to do it properly (convincingly) they’d cite the 1 SD range, and do 3 figures for min/mid/max.
[/quote]
Look at http://www.internettrafficreport.com/main.htm
The delays they quote seem reasonable based on the measured data.

[quote]In addition, TCP is alleged to take “more than 7 hours” to transfer a file at an unknown bandwidth. They quote figures for their tech on T1 and T3, and compare them to this 7 hours to give a “3X to over 100X improvement”. Huh? Perhaps they were measuring from a 56kbps modem? If not, why not quote figures for TCP on T1 and T3 - it’s deeply unconvincing that they only quote one figure, suggesting that they probably did test at a different bandwidth altogether. This is the kind of statement you frequently see from companies claiming to have invented faster-than-light communications and other such dross.
[/quote]
Actually I believe elsewhere they prove that with TCP the RTT delay and % of lost packets imposes a theoretical limit regardless of the bandwidth. I admit I don’t know much about the theory behind that, other than I have heard the same from a couple people.

[quote]Also, speaking as someone who actually has transferred 1GB files via high speed connections over the internet before - it takes a heck of a lot less than 7 hours.
[/quote]
But of course you also stated that you didn’t have a delay of 200ms. That would make a big difference.

[quote]Since Fountain Pool enables clients to receive data from multiple Transporter Fountains, it is, in effect, a type of load sharing."

i.e. multi-stage caching.
[/quote]
That is completely optional. It is one method that they presented to have the receiver control the rate - by listening to many broadcasts or few broadcasts the receiver can adjust dynamically to implement a rate control algorithm.

[quote]If it’s the group-theory encoding described above, then WITHOUT synchronization, X here is signficantly greater than the number of packets that a perfect TCP link would use.
[/quote]
No, X is very close to the amount of data in the original file. I think it depends very slightly on which packets were received - but statistically the average would be very good.

I have seen cases studies done by a third party that was evaluating this technology. In there example they were transferring large files between North America and Europe and the gain over standard FTP was huge.

To quote some of the other papers I have from these guys:

“What is not widely understood is that TCP is slow and erratic for large data transfers when the distance between the sender and receiver is large, even when the amount of available end-to-end bandwidth is high. When using FTP over high-speed networks with even modest delay or loss, absolute throughput is limited to a small fraction of the available bandwidth.”

Having spoken with these guys on the phone and read many of their papers I trust that they have “done the math” and understand the intricacies of TCP much better than me.

[quote]do you qualify under blah’s really good reasons to use UDP?

So your process is:
TCP is too slow, so you choose UDP.
UDP doesn’t have retransmittion of lost packets so you add that functionality yourself
UDP doesn’t know about the order of packets so you add that functionality yourself
UDP doesn’t do x so you have to add that functionality yourself

By the end of that process it it sounds like you will have a re-implementation of the TCP protocol using UDP… Will it still be as fast as UDP? I doubt it. Will it be faster than TCP? Possibly if you do a really good job of it but why reinvent the wheel?
[/quote]
I’m solving a different problem than TCP. My solution to these issues is therefore much different. I.e. UDP doesn’t care about the order that packets are received, but TCP guarantees it. My system cares about the order of data in the final file, but it doesn’t care at all about the order that the data is received. So my solution to the out-of-order problem is essentially not to solve it at all, but to make the higher level transmission immune to it. See the difference?

Likewise this enables a much more relaxed approach to retransmission of lost data. Data that is sent AFTER a packet that gets lost is never delayed until the lost packet is recovered as it has to be for TCP in order to deliver packets in order.
I admit that if I cared about packet transmission order then this would not be worth it at all, because then I would be solving a problem that is much the same as that solved by TCP.

By looking at the problem from a higher level - “How do I get this big chunk of data from A to B” rather than looking at the lower level of handling a stream of data, the problem space transforms into something that UDP can handle better than TCP.

Aha. Yes, that can be true, e.g. for the most basic form of TCP, when it uses AIMD (additive increase, multiplicative decrease - hence it drops the throughput faster than it increases it). In this scheme RTT limits how fast you can increase, and % dropped defines how fast you are forced to decrease speed. If they actually stated that was the basis for their ommission, then the FAQ would be a lot more convincing… :). I’m not trying to disprove them, but they don’t do themselves many favours in the explanations they provide.

OTOH, that’s a really rubbish form of TCP that has been improved upon many times. I have to confess I’m not an expert on TCP, and had blindly assumed AIMD wasn’t used any more; the problem may also be present in later forms of TCP, although to a lesser extent. This happens to be one of the things that Vegas largely fixes :).

Yep. My point was that they were using a very unusual scenario as an example. If you pick your micro-benchmarks well…

Well, I quoted a range from about 10% extra upwards; is it much less than 10%? I need to revisit my Information-theory notes :(…

I’m fascinated as to how they could have very little overhead AND no synchronization both at once. More reading ahead for me…

I get the impression the key point here is not so much the distance but the RTT/latency. The website contains mention of quite a few networks that have unusually high latency and/or packet loss as being particularly good places for their tech. Similarly, there have been new/custom protocols for satellite-based IP networks before, specifically designed to get around the ultra-high latency.

[quote]Having spoken with these guys on the phone and read many of their papers I trust that they have “done the math” and understand the intricacies of TCP much better than me.
[/quote]
Sure; equally I’m not claiming they’re making it all up :). But I have seen similar claims before that turned out to be false, so I just have a big dose of scepticism until I’ve unearthed convincing scientific data.

I think the actual data overhead is 5% for the FEC information.

If you look at FEC schemes like Tornado codes I think their system is similar. They have done some really clever stuff though in terms of transmitting the information about how the packets can be recombined without simply sending over encode tree verbatim as one big chunk.

Anyway, this thread is straying a bit now… although i do find it interesting.

I think my response above to William describes where I’m coming from with what I want to try. I think with my simplistic algorithm I still have an advantage over TCP for transfer of a blob of data. Where TCP’s advantage is there if I needed to “stream” that data.

I guess it is something like TCP with an almost infinite window/buffer on both ends. Since I can use the actual file for the buffer on each end.

UDP also has the benefit of working with multicast. My transmitter could be modified to work in a way that would simply continue broadcasting until all receivers ack’d all packets.

Note that I’m bringing acks back into the picture for my scheme… but that the acks have a VERY large window of time to get back to the transmitter. Since ack packets are so small and all they need to say is “got packet X” I can include redundant ack data in all my ack responses so that the probability of an ack not getting back to the transmitter is quite low. In any case my transmitter will loop over the un-ack’d data until it confirms that it got through, so that very low percentage of acks that don’t make it back, even with my multiple redundant acks will be resent when/if the receiver gets a duplicate packet.

The initial response from the Prof who is also the chair of Communications at my Alma Mater is basically “TCP is a hell of a lot better than that” (not his literal response, but pretty close!). I’m still researching this actively (I’ve got some new interesting resources from him, relevant papers etc - but annoyingly in formats like zipped PS which I can’t read from my PC right now), and he might have misunderstood what I was describing, but on the whole it looks like the vendors are citing perf improvements from comparing really bad TCP implementations against a really good implementation of a different protocol.

I’ll followup once I’ve investigated further :).

I would be interested to see what you can dig up as well. As I mentioned inn another thread, I’ve now got the Digital Fountain core library and have done comparisons to FTP on Windows. There was a 50% gain in throughput with almost 0% loss and 70-100 ms RTT over a relatively slow link (consumer broadband).

That is significantly lower than the target link speed that we will be dealing with, and as the bandwidth increases, loss and RTT are a larger factor in the slowdowns that will happen with TCP.

FTP isn’t the best protocol, but it is common. Windows doesn’t likely have the best TCP/IP stack, but that is a component that I do not have control over.

I will not be surprised to see that the typical gains are less than what is in their marketing ‘BS’. But on fast links (10Mbps and higher) I would expect to see a significant improvement over TCP.

I have some tests to run in a week on faster pipes, so I will know more then.

And all this doesn’t address the other aspects… the possibility of multicast. Even satellite broadcast where there is no back channel for ACKs. I won’t be needing that in the near term, but it is something that I might visit later on.

Please keep me informed of your findings. I will report back with my own.

FYI I’ve just been quoted TCP working fine at over 3 Gbps (!); the general impression was that low Mbps really shouldn’t be an issue. Yup, I’ll follow up as I can…

AH the wonders of the community.

I love it when you guys do all my research for me :slight_smile: :slight_smile: :slight_smile:

(Not to mention coding on the core libs!)

Thanks guys, at this point you, the community, are the majority of what makes this site work!

[quote]I love it when you guys do all my research for me :slight_smile: :slight_smile: :slight_smile:

(Not to mention coding on the core libs!)
[/quote]
Just wait 'till you get the bill ;D