ZipInputStream/GZIPInputStream faster than FileInputStream

So I’m writing a .json parser, and I’m giving my parser support for .json, .zip, and .gz. Both zip and gz would be compressed versions of the json file. When specifying input streams, it looks like this…


public JsonParser(String jsonFile)
{
		builder = new StringBuilder();
		try
		{
			// Determines file type
			String fileType = jsonFile.substring(jsonFile.lastIndexOf('.')+1);
			fileType = fileType.toLowerCase();
			
			// If it is a .json file...
			if(fileType.equals("json"))
				in = new DataInputStream(new FileInputStream(jsonFile));
			
			// Otherwise, if it is a .zip file...
			else if(fileType.equals("zip"))
			{
				ZipInputStream zis = new ZipInputStream(new FileInputStream(jsonFile));
				zis.getNextEntry();
				in = new DataInputStream(zis);
			}
			
			// Otherwise, if it is a .gz file...
			else if(fileType.equals("gz"))
			{
				in = new DataInputStream(new GZIPInputStream(new FileInputStream(jsonFile)));
			}
			
			// Otherwise...
			else
				throw new RuntimeException("Extension " + fileType + " not accepted by JsonParser");
		}
		catch(IOException e)
		{
			e.printStackTrace();
		}
}

For SOME reason, when parsing the file when looking for a particular token, it is significantly faster to find it when using a GZIPInputStream or a ZipInputStream. In case you are interested, I am parsing a .json file generated by Tiled which is half of a megabyte in size uncompressed. Many nearly empty layers. Anyway, the file is always read byte by byte with a DataInputStream on the outside. Why would this happen?

;D Wait, wait, I have a theory suddenly! I think it is because reading in bytes is slow, and ZipInputStream/GZIPInputStream read in fewer bytes while generating a bunch of them through decompression. Because it is much faster to generate bytes than read them in from a file, parsing it would be faster.

ZipInputStream and GZIPInputStream do buffer behind the scenes, whereas DataInputStream does not.

new DataInputStream(new BufferedInputStream(new FileInputStream(...)))

That would solve your performance problem :slight_smile:

Compressed data also has the advantage of moving less data and the extra computation can be faster than the data moves.

Eg, reading LZO compressed data is often faster than uncompressed.