Download Cache and Corruption with Applets

kappa · September 9, 2009, 3:15pm

I am downloading files with a Java Applet and running into problems with caching and file corruption.

I download files using a URLConnection by opening an InputStream to it, something like this


urlconnection = url.openConnection();
			
InputStream inputstream = getInputStream(currentFile, urlconnection);
FileOutputStream fos = new FileOutputStream(path + currentFile);
		
int bufferSize;
byte buffer[] = new byte[65536];
			
while ((bufferSize = inputstream.read(buffer, 0, buffer.length)) != -1) {
	fos.write(buffer, 0, bufferSize);
	...
}
			
inputstream.close();
fos.close();

On the rare occasion I download a file it turns out corrupt, I understand this is normal? (or is it that the buffer size is too big or something?), anyway what is annoying me is that on refreshing the applet the file should redownload right? but the file keeps turning out corrupt until the java cache is cleared from the java control panel. I’ve tried setting urlconnection.setUseCaches(false); and urlconnection.setDefaultUseCaches(false); but the problem still occurs, It still keeps downloading a corrupt file and only goes away when the java cache is cleared from the control panel. This leads me to believe that the file is still cached somewhere and is used on refresh instead of redownloading, it does seem like its redownloading though.

I am all out of idea’s now and not sure how or where its managing to cache the corrupt file (inputstream? browser cache?), any ideas or suggestions welcome.

thanks.

ddyer · September 9, 2009, 4:48pm

Browser caches are a plague if your content is changing. You ought to
be able to defeat caches for individual files by adding meaningless arguments
to the URL to make it unique. For example file.txt?attempt=2

In general, updating content reliably is very difficult - there is no file
system operation that will replace file “a” with file “b” as an atomic
operation, such that no outside party will ever see an intermediate
state. So just replacing a file, however carefully, is not adequate.
One way to deal with this is to use a file reading proxy instead of
direct file system access = url=file-reader.cgi?file=xxxx, but be
careful you don’t create a url that will copy any file! The file reader
proxy doesn’thave to read any actual files - it can generate the appropriate
content on the fly from a database or whatever.

Also, when I update my java classes, I never, ever, update them in place.
I create a new root, and create a complete new hierarchy with the modified
content. This is the only way to be sure that no browser sees and caches
a partically updated hierarchy.

Riven · September 9, 2009, 11:07pm

[quote]On the rare occasion I download a file it turns out corrupt, I understand this is normal?
[/quote]
This is definitely NOT normal. How often does a download in your browser fail?? Once a year?

How often does a Java download fail? Every other day? I think this is the root cause of all JavaWebStart problems. I’m starting to get more and more convinced that URLConnection is bugged. I should dive into the code some day.

Some time ago, I had this applet that loaded a bunch of thumbnails from a server. It was quite normal that a few wouldn’t get loaded.

So… on to the problem:

HTTP isn’t that hard. Just do your own HTTP handling. Add support for “Transfer-Encoding: chunked” and you instantly support 99.99% of all webservers. Cache nothing. Worked for me. The guarantee that all bugs are in your code is worth gold.

DzzD · September 10, 2009, 12:41am

Absolutly right URLConnection is Buuugged and… act differently on different JVM … borring…

Unfortunatly it have a usefull feature that cannot be made using a TCP socket, it is allowed to read browser configuration and especially proxy setting including login, password and IP.

You can go througt HTTP proxy with TCP socket by using Proxy-Authorization header but you will have to ask the user to enter it login/password and the proxy IP, but finally that may not be a big issue.

if you dont want to implement chunked exchange (or other complexe/heavy HTTP 1.1) features you can make your request with HTTP/1.0 rather than HTTP/1.1

but… another but… there are several possible issue when implementing HTTP protocole partially, you should especially take care of the following :

Request headers for HTTP 1.1 :
Host (requiered for 1.1 request)

Response code for both 1.0 / 1.1:
407 : proxy authentication requiered
307/301 : moved => you must do another call to the returned URI/location

I have to deal with Applet/Server communication at work, I choose to make my own HTTP implementation using Socket as mentioned by Riven (the main first reason was a bugged keep-alive implementation on URLConnection), but my Class will keep a fall back to URLConnection and switch to it in case of something go wrong with the direct TCP Socket connection.

finnaly I would recommend the following : connect using your home made HTTP connector and if it fail retry with URLConnection

kappa · September 11, 2009, 12:24am

thx for the suggestions.

I’m not really sure a custom http connector is really possible here, since I’m limited to a single class and gotta keep the size small, roughly how much code are we talking about here to implement a custom http connector?

The corruption is a pretty serious issue because from what I’ve seen so far there is roughly a 1/20 chance that a download will be corrupt, which is pretty high.

Aside from the download becoming corrupt, any idea’s where and how the corrupt file is getting cached and stored? If I could at least get the caching to stop it should be good enough for now.

Riven · September 11, 2009, 6:11am

The only problem is that HTTP is both text based any binary. So you can’t simply use BufferedReader.readLine() because it will buffer beyond the “\r\n” and screwup your binary content (if any).

So just use your average BufferedInputStream, probably throw a PushbackInputStream into the mix, and wait for that “\r\n” until you read an empty line (end of header). Now parse all header-lines which gives you a clue on how to ‘decode’ the content.

Maybe a few hundred lines.