Data structures for large arrays

pixelprime · August 30, 2014, 9:38am

I have a large 2D area which is to be represented by graphical tiles.

Each tile needs a few bits of information stored against it - enough that I ended up creating an actual object for each tile - but it doesn’t seem that efficient to me for the actual level size required.

Here’s the sort of information I need to keep an eye on for each tile:

X & Y position (current a short, since its a grid reference and will never go over 5000 in either direction)
Power level (a byte, as its minimum is 0 and maximum is 16)
Requires power (a byte, with a minimum of 0 and maximum of 16)
Static texture ID (a short, because there will never more than 32,767 textures!)
Animated texture ID (again, a short, used like the above but either/or)
Animation frame (a byte, used the control its texture animation)
Frame time in milliseconds (a short, used the control its texture animation)

So my class looks a bit like this:


package com.ak.game;

public class Tile
{
	private short x, y;
	private byte power;
	private byte needsPower;
	private short texID, animTexID;
	private byte frame;
	private short frameTime;
	
	public Tile(byte x, byte y)
	{
		this.x = x;
		this.y = y;
	}
}

My question really is how to best approach the definition of thousands of these tiles.

If I were to want to have a level comprising of, say, a 320 x 3000 tiles, surely an array of Tile[][] objects is really inefficient?

I only ever plan on rendering whatever is on the screen, which is fine (I can reference the leftX / rightX and TopY / BottomY tiles to create a small ‘viewport’ tile loop for this). But updating tiles that are offscreen that need to be tracked? Seems like a hugely inefficient route to have to cycle through all 320 x 3000 (960k) tiles to accomplish this!

I’d be really grateful to get some pointers on how other people may have solved similar situations before. Would I have to look into some kind of aggregating update over multiple frames? Perhaps updating only X number of tiles each frame, thus splitting up the wholesale update leg-work over the course of multiple frames?

I should add that I want to avoid using multi-threading where possible, since I want to try and keep the base code as optimised as possible without giving up and ‘cheating’ on another thread! (something I’m not confident with yet).

Thoughts would be welcome!
Thanks!

lcass · August 30, 2014, 9:46am

Well you wouldnt be rendering it all at one time would you? so you load it in from secondary memory as required and then save the new data , so you have chunks of the map (2d I assume) so say only 256 by 256 tiles are visible but the player can move 5 tiles per tick then you need to have the next 5 tiles outward loaded but the rest can be stored on memory.

pixelprime · August 30, 2014, 10:06am

That does seem like a sensible suggestion.

So what you’re suggesting is that I split the world map into a kind of chunk format (say 16x16 tiles apiece), and have a maximum ‘range’ of chunks that are stored in memory.

Then, if a chunk goes out of range, I store it to disk and flush it from the active list of chunks, and conversely when I attempt to create a ‘new’ chunk (when in range) I first check to see if it exists on disk and load it from there first?

So, working through this in my head, I could do the following:

Start off with a blank ‘active chunks’ array
Determine ‘range’ of chunks around the player, thus giving me a minimum and maximum chunk X & Y grid reference.
Check if these chunks are in the active chunks list, add if not insert prototype (empty) chunks flagged as ‘not loaded’
Check through the active chunks list, and for those that aren’t loaded, check for a file on disk and load its stage from there, else create the chunk from scratch using the required generation rules
Cycling through the active chunks array, determine which chunks are ‘on screen’ and add them to a ‘visible chunks’ list, which will be used to render them

I’d store the current chunk coordinate the player is in, so that if they change chunks I can kick off the next bit of code:

Determine the minimum and maximum chunk coordinates in range
Cycle through the active chunks list and if a chunk is out of range now, save it out to disk and remove it from the active list
Once done, repeat the ‘add new chunk’ code to add chunks that are now in range (that weren’t before), flagging them for loading
Flush the visibility list and rebuild it based on the new active chunks list

This seems to work in my head, but maybe I’ve missed something. Thoughts would be welcome here!

Thanks for the pointer!

xtre · August 30, 2014, 10:14am

I’m doing something similar. I ran it through conditions to check if they are off the screen and need to be loaded or not. You can check it out if you like? It’s not the best thing I’m taking about as much of a guess as you are. I’ll upload the jar also. The code and jar file to test what it runs like.

pixelprime · August 30, 2014, 11:00am

Thanks, it helps to see how other devs approach the same tehnique!

Much appreciated!

lcass · August 30, 2014, 4:38pm

You got it.

richierich · August 30, 2014, 6:06pm

I’m not one of the experts here at all, but FWIW a few things I thought:

1 million x 20-byte objects doesn’t seem a lot of memory really. In what way were you worrying about inefficiency?
If it was me I would definitely take the animation info out and hold it in animation objects, and only manage animations for tiles on the screen.
Do you really need the x and y in there? Maybe most access to Tile objects will be via addressing by x,y into the array anyway, in which case you know x and y.
Only keeping the visible tiles in memory sounds iffy. What if you want to zoom in and out? Also won’t regenerating from disk all the time lose any dynamic information that’s been added by player actions?
For me a Tile object is the way to go though purely from an OO point of view, I mean Map and Tile are the obvious objects that surely any tile-based game has!? Even if you later come back and compress the map data as part of a tuning exercise.

lcass · August 30, 2014, 10:19pm

richierich:

I’m not one of the experts here at all, but FWIW a few things I thought:

1 million x 20-byte objects doesn’t seem a lot of memory really. In what way were you worrying about inefficiency?

If it was me I would definitely take the animation info out and hold it in animation objects, and only manage animations for tiles on the screen.

Do you really need the x and y in there? Maybe most access to Tile objects will be via addressing by x,y into the array anyway, in which case you know x and y.

Only keeping the visible tiles in memory sounds iffy. What if you want to zoom in and out? Also won’t regenerating from disk all the time lose any dynamic information that’s been added by player actions?

For me a Tile object is the way to go though purely from an OO point of view, I mean Map and Tile are the obvious objects that surely any tile-based game has!? Even if you later come back and compress the map data as part of a tuning exercise.

So the issue is that 5000 * 5000 = 25000000 , thats a lot of memory in terms of java , even if it is just 20 bytes thats *searches for phone *cannot find phone , over a quarter of a gig of data, which is a lot considering all we have done is load in a map . If you want to zoom out you simple change the memory read search preferences.
Hes storing an animation id (smart I hadnt thought of that :)) which is a short.
Whats happening is all the information that is in view and a bit thats further out (to avoid tearing etc) is loaded into main memory , what happens is its edited here and then when its out of view its stored in the secondary memory along with the new player data. There is no constant regeneration simply reading and writing whats not seen back to memory.

BurntPizza · August 30, 2014, 10:39pm

Even if you can’t save data per-tile, you can cut memory usage by flattening the tile[][] to tile[]: http://www.java-gaming.org/topics/is-there-a-logical-reason-to-stuff-2d-array-data-into-one/33042/msg/309753/view.html#msg309753

You also theoretically achieve slightly better read performance.

princec · August 30, 2014, 10:47pm

Wait till you’ve actually got a problem here before you try solving it.

Cas

65K · August 31, 2014, 5:36am

Check if all data is really needed for each tile individually.
Like having an animation for each tile, all the time.
If not, extract animation information.

pixelprime · August 31, 2014, 9:04am

Hello again! Thanks for all of your suggestions everybody, it’s been great to get some perspective on this.

I should probably explain why I need to store certain variables - but I’d like to clarify off the bat that I probably don’t need to store the X and Y grid reference - since the inferred position in the array is naturally enough there.

During the update cycle, tiles that have an animTexID (i.e. one that isn’t -1, or ‘none’) will have their frame time incremented by the current engine millisecond delta time. There’s a quick TextureManager call to see whether or not that frame time is longer than the nominated frame time for that particular texture ID, and if so its frame counter is incremented. It’s then either capped (if its a non-looping animation that’s hit its last frame) or looped if it’s hit its last frame.

In answer to the data size question, realistically, I very much doubt my game will go over a potential map size of 800 x 2400 tiles. And even then, I think having all 800 tiles of width is a bit excessive for my needs. But eh, I think it’s worth extending the capability of the engine, just in case….

So, taking some of the options offered, splitting that map size into chunks of 16 x 16 tiles gives me a size of 50 x 150 chunks. If I were to dump each chunk into a file, that’s a grand total of 7,500 files.

I could be clever and use chunk ‘strips’ instead - where I define an entire row of chunks in a file - which would leave me with only 150 files, but the obvious additional overhead of how long it takes to read a chunk row into memory, but swings and roundabouts I guess.

So, when I create a new map, I prototype the creation of all the chunk files:


public static String newChunkString()
	{
		String builder = "";
		
		for (int x = 0; x < CHUNKSIZE; x++)
		{
			for (int y = 0; y < CHUNKSIZE; y++)
			{
				builder += "[";
				builder += "T=0;";		// refers to the enumerated index of the type
				builder += "R=0;";		// how much 'resource' this tile has remaining
				builder += "]\n";
			}
		}
		
		return builder;
	}

The great thing is that I don’t really need to store that much information about a particular tile in the chunk file - its texture IDs I can get from the tile type, the frame timers will be reset on launch (and probably set to a random frame in their maximum frame count to restore the visual ‘randomness’), and the power requirement and current level will be calculated on launch also.

A quick test of a 10 x 20 chunk level gave me a total of 550Kb of files (200).

Running it with the full compliment of 50 x 150 chunks gave me a total of 20Mb of files (7,500). The map took around 10 seconds to generate these resources.

I might opt for the chunk rows simply because I’m concerned about the awful file fragmentation you’d probably end up with on disk with the one-file-per-chunk method. But perhaps I’m over-thinking it? It is a kind of one-shot deal… once the chunks are made on disk, it’ll just be a case of loading and saving a handful of them every now and then.

Finally, in response to the comment suggesting I move to a single dimensional array - I understand the performance improvements gained (I had a look at the thread you mentioned, and did some other digging too) - but I also can’t ignore how much easier it is to get my head around a two dimensional array. Being able to provide simple X and Y coordinates into the arrays just seems, I don’t know, easier to wrap my head around I guess!

BurntPizza · August 31, 2014, 2:54pm

Instead of having a file per chunk, why not many (even all) chunks per file? This could be done easily in (at least) two ways:
Beginning of the file has an index table of byte offsets into the file where the data each chunk is stored, or you simply have chunks serialized in a way that they are all the same size.

I recommend the use of an actual StringBuilder for something like that, that’s exactly the kind of use case for it.
Are those strings the way your data is represented in the file? I would recommend using a DataOutputStream or ByteBuffer + FileChannel and using a binary format.

As far as array flattening:


public static final int index(int x, int y, int width) {
    return x + y * width;
}

// equivalent code
tiles[y][x] = ...;
tiles[index(x, y, MAP_WIDTH)] = ...;

However I agree with Cas, wait until you have a problem to try and fix it. I’d say try to build the system in such a way that your game code doesn’t have to know how the tiles are stored/represented on disk/etc, so that it is back-end agnostic and thus you can swap in a faster optimized system later with no changes to the rest of your code.

basil · August 31, 2014, 3:34pm

just tossing in another idea …

instead of a big ass array which requires to be allocated in advance, even for empty tiles - one could use a hashmap like …

HashMap<PositionKey,Tile> data;

would be not limited by max dimensions of the playfield. the trick tho’ is how to hash the position.

i’m using both, a preallocated array partition and a hashmap partition. ofc there are downsides with both, so as usual it’s up to the use-case. preallocated are better with small amount of data, hashmaps are better with big data and very irregular (big empty spaces in between) distribution. I just like the idea of not dealing with bounds.

BurntPizza · August 31, 2014, 3:39pm

It’s all possible and easy to experiment with if it’s abstracted to a TerrainManager.getTileAt() or whatever.
Could even do something like a B-tree/hashed array tree if straight hashmap loss of locality is a problem.

lcass · September 5, 2014, 6:17pm

Just another way to do this , as pointed out by an ingenious method implemented within retro pixel castles , store the tile ID not the Tile , then have a grouping of static tiles (with different textures and methods) then use a method that detects the id and returns the tile you selected.