Files on your harddrive can be mapped and passed directly into textures/buffers

EDIT:
Don’t do this! It has some severe problems which can cause your program to grind into a halt after mapping a lot of files. See the below discussion for more details! I recommend using FileChannels with a temporary direct buffer, not to map them!

Hello.

Today I tried out something interesting. I was working on making it possible to source my streamable textures from multiple sources (asset files, texture packs and raw files in a directory) when I stumbled upon an interesting function. FileChannels allow you to get a MappedByteBuffer for a certain range of a file on your harddrive. This byte buffer is direct, meaning that it’s possible to pass it directly into OpenGL functions like glTexImage2D() and glBufferData(). For texture data, this can be pretty worthless. Most of the time the data is stored in some kind of image format (PNG or even JPG) which needs to be decompressed before it can be passed into glTexImage2D(), but for my streaming system this decompression was way too slow. My streamable texture files contain the raw image data compressed using S3TC or BPTC which I simply dump into glCompressedTexImage2D(). To test out the potential gains of using mapped files, I’ve developed a small test program which compares the CPU performance of 3 different ways of loading raw texture data from a file.

The first way of loading stuff is with old-school input streams. This requires a lot of copies, since FileInputStreams work with byte[]s, not ByteBuffers. We have to read the texture data into a byte[], copy it to a direct ByteBuffer and then pass it to glTexImage2D().


	private static byte[] bytes = new byte[DATA_LENGTH];
	private static ByteBuffer buffer = BufferUtils.createByteBuffer(DATA_LENGTH);

	private static long loadStream() throws Exception {
		long startTime = System.nanoTime();

		FileInputStream fis = new FileInputStream(RAW_FILE);
		
		int read = 0;
		while(read < DATA_LENGTH){
			int r = fis.read(bytes, read, DATA_LENGTH-read);
			if(r == -1){
				throw new IOException();
			}
			read += r;
		}
		buffer.put(bytes).flip();
		glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, 512, 512, 0, GL_RGBA, GL_UNSIGNED_BYTE, buffer);
		
		fis.close();
		
		return System.nanoTime() - startTime;
	}

The second way is to use NIO and FileChannels. FileChannels work on ByteBuffers directly, so we can use a direct ByteBuffer from the start!


	private static ByteBuffer buffer = BufferUtils.createByteBuffer(DATA_LENGTH);

	private static long loadChannel() throws Exception{
		long startTime = System.nanoTime();

		FileInputStream fis = new FileInputStream(RAW_FILE);
		FileChannel fc = fis.getChannel();
		
		while(buffer.hasRemaining()){
			fc.read(buffer);
		}
		buffer.flip();
		glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, 512, 512, 0, GL_RGBA, GL_UNSIGNED_BYTE, buffer);
		
		fc.close();
		fis.close();
		
		return System.nanoTime() - startTime;
	}

The last most awesome way is to use NIO and FileChannels to map part of the file as a MappedByteBuffer. This is just so magical and simple.


	private static long loadMapped() throws Exception {
		long startTime = System.nanoTime();

		FileInputStream fis = new FileInputStream(RAW_FILE);
		FileChannel fc = fis.getChannel();
		
		MappedByteBuffer mbb = fc.map(MapMode.READ_ONLY, 0, fc.size());
		glTexImage2D(GL_TEXTURE_2D, 0, GL_RGBA8, 512, 512, 0, GL_RGBA, GL_UNSIGNED_BYTE, mbb);
		
		fc.close();
		fis.close();

		return System.nanoTime() - startTime;
	}

As you can see, there’s some timing code in each function. The average time taken to do these operations over a few thousand runs in a loop (all reading the same file over and over again, so it’s not representative for IO performance, only CPU performance) is listed below:

Stream: 0.657057 ms
Channel: 0.207856 ms
Mapped: 0.169004 ms

So it’s not only simple as hell to do, it’s faster as well and doesn’t require any temporary memory!

I guess it depends on when you are loading in your textures , for a faster loading time with a huge number of sprites on a game i can see why it would be necessary however on a small project unless you are literally streaming sprites into the game continuously it would be negligable.

Still very cool.

Probably not applicable to texture streaming (unless you’re evil), but this is also a neat way to get pretty fast IPC, as multiple processes can map the same file and thus communicate in that chunk of memory.

Why not? My streamable textures are just each mipmap’s compressed texture data dumped to a big file. Being able to map a whole mipmap and pass it in sounds immensely useful, and decompressing the texture is simply too slow, especially since S3TC and BPTC textures don’t compress very well.

But when are you passing textures between processes? Threads maybe, but not processes.

Although I have seen some talk about spinning up multiple jvms in this fashion just so that each jit can focus on a core area of the code, etc. and be profitable if the IPC has low-enough overhead (mmap’ed files). Probably only useful in very particular circumstances. Maybe texture decompression from a conventional format would be in this category.

I’m not following. I just want to read raw data from the harddrive and pass it to OpenGL in the most efficient way possible. Since mapping the file seems like not only the fastest but also the simplest and most memory efficient way, why is it not fit for streaming? I don’t have multiple processes.

I was just pointing out that this is also a really good way to do IPC, for your purpose it’s also perfectly good. Sorry. [/massivederailment]

Ah, right. =3 No problem.

The only issue with MappedByteBuffer is that you cannot unmap them manually: closing the channel is not enough. You have to rely on the GC to collect it, and by that time you may have exhausted valuable memory.

You can use hackery to unmap a buffer, but as always, it’s risky shit and may break with any JRE release.


	public static void unmap(MappedByteBuffer buffer)
	{
		try
		{
			Method cleanerMethod = buffer.getClass().getMethod("cleaner", new Class[0]);
			cleanerMethod.setAccessible(true);
			Object cleaner = cleanerMethod.invoke(buffer);

			Method cleanMethod = Class.forName("sun.misc.Cleaner").getMethod("clean", new Class[0]);
			cleanMethod.setAccessible(true);
			cleanMethod.invoke(cleaner);
		}
		catch (Throwable t)
		{
			throw new UnsupportedOperationException("arg", t);
		}
	}

You’re seriously better off writing a few JNI methods.

Is there a serious risk of a out-of-memory crash unless I do that?

MappedByteBuffer is rarely used due to this design flaw. There really are no valid use-cases for this class - it’s like LinkedList: those who use it are not yet properly informed :persecutioncomplex:


public class MappedByteBufferStressTest
{
	public static void main(String[] args) throws IOException
	{
@@		boolean doUnmap = true;

		for(int i = 0; i < 64; i++)
		{
			File file = new File("D:/stress." + i + ".map");
			System.out.println("Writing 1GB to " + file.getName());
			RandomAccessFile raf = new RandomAccessFile(file, "rw");
			raf.setLength(1024 * 1024 * 1024);

			ByteBuffer tmp = ByteBuffer.allocateDirect(64 * 1024);
			while (tmp.hasRemaining())
				tmp.put((byte) 'a');
			tmp.flip();

			FileChannel fc = raf.getChannel();
			MappedByteBuffer mapped = fc.map(MapMode.READ_WRITE, 0, fc.size());
			while (mapped.hasRemaining())
				mapped.put((ByteBuffer) tmp.clear());
			fc.close();
			raf.close();

			if(doUnmap)
			{
				unmap(mapped); // N.B.: async - the OS will be doing disk I/O long after the unmap, potentially long after your process terminates.
			}
		}
	}

	public static void unmap(MappedByteBuffer buffer)
	{
		try
		{
			Method cleanerMethod = buffer.getClass().getMethod("cleaner", new Class[0]);
			cleanerMethod.setAccessible(true);
			Object cleaner = cleanerMethod.invoke(buffer);

			Method cleanMethod = Class.forName("sun.misc.Cleaner").getMethod("clean", new Class[0]);
			cleanMethod.setAccessible(true);
			cleanMethod.invoke(cleaner);
		}
		catch (Throwable t)
		{
			throw new UnsupportedOperationException("arg", t);
		}
	}
}

With [icode]doUnmap[/icode] set to [icode]true[/icode], the OS stresses, yet plows through.

With [icode]doUnmap[/icode] set to [icode]false[/icode], the OS freezes/locks up, processes stop responding, until you manage to kill the java process - which might fail, so save your work prior to running this snippet.

To answer your question: it won’t be an OutOfMemoryError, but… much worse.

Actually I’ve got a particularly good use for it: I use it as my world-database. It’s opened for the duration of the game’s execution and makes for a particularly efficient way to manage the data for 4 million territories. Server does the same.

WRT unmapping… if you’re wise you’re sticking to a specific JVM anyway for deployment: all you need to know is that your hack works for that JVM. One day it may break - but when that day comes you only have to look in this one place and switch back to the filechannel method.

Cas :slight_smile:

I’ll look into this more and update the first post to reflect this discussion once I’ve gotten rid of the 25cm of snow that fell outside our house…

@Cas: keep in mind that ranges of a MappedBuffer can be moved in and out of memory at any time, which means getting or putting a single byte may cause a >100ms hickup, so I wouldn’t access my MappedByteBuffer in a game loop.

* Riven stating the obvious :persecutioncomplex:

It’s accessed like a database - no realtime access required generally - though when rendering the world screen it does read it every time the view moves. But I can live with hiccups.

Cas :slight_smile:

On a side note: I saw a lengthy YouTube video in which John Carmack said he ditched mapped files entirely, recently, due to their unpredictable latency. IIRC they were extensively used in Rage.

Sorry for the derail, I’ll try to be better next time :persecutioncomplex:

Some background information: (see: EVALUATION)
http://bugs.java.com/view_bug.do?bug_id=4724038

Riven, you mentioned writing my own file mapping implementation using JNI. That sounds a bit annoying. Isn’t there any simple implementation of it out there that I can use?

As Cas said, if you control the environment, the reflection hack is ‘good enough’. Obviously you’d cache the Method references to squeeze out the ‘last’ bit of performance - Method.invoke itself is probably fast enough for your (considerable) requirements.