Is there a logical reason to stuff 2D array data into one?

Rayvolution · May 4, 2014, 9:16pm

I’m working on my Pathfinding in my game attempting to come up with my own version of A*, and I’ve found it massively less complicated to use a 2D array to gather data, and it seems plenty fast enough to suit my needs. But, a lot of other coders seem to want to cram the X/Y coordinates into a single array.

I was curious what your guy’s input is on these 2 methods of storing X/Y coordinate values, and if there’s any reason I shouldn’t just go with a 2D array? I can’t even find a real performance difference, and it’s a lot easier to code with an int[xCord][yCord] array.

Storing the daya in a small 2D array:

int[][] array = new int[mapHeight][mapWidth]; // or, [x][y] really 
//then store data by:
array[x][y] = value;

Storing the same data in a large 1D array:

int[] array = new int[mapHeight * mapWidth];
//then store data by;
array[y + (x * mapWidth)] = value;

the 1D array is probably technically less memory intensive, but it’s a pain in the arse to work with. :o

Main reason I bring this up is I don’t want to get all the way into the final product of my pathfinding to find out there’s some legitimate serious flaw with using a 2D array.

jonjava · May 4, 2014, 9:45pm

There isn’t and it’s not. Don’t worry about it.

Rayvolution · May 4, 2014, 9:52pm

Good to know, because I can’t find any justified way why I’d want a huge 1D array that’s hard to work with over a 2D array that’s basically just 2 small 1D arrays.

That brings an interesting functionality question to mind though, this array is for a 256x256 tile map, technically speaking is int[65536] (aka: 256x256) and int[256][256] the same size, or is int[256][256]'s total size actually 512?

If the 1D array’s total size is larger than the 2D’s, I can’t imagine any way that a 1D array would be better at all.

EDIT: I am assuming here that a int[256][256] array is 256 arrays of 256 arrays, meaning it’s the same size as int[65536]

CodeHead · May 4, 2014, 10:06pm

They are the same size. The memory footprint is the product of the individual dimensions times the object size. I would venture to guess that everything is stored in a single dimension array behind the scenes with the [] operator handling the offset math for the developer. I remember hearing the advice to use singe dimension arrays many years ago, but have considered it outdated wisdom for general programming for a while now.

BurntPizza · May 4, 2014, 10:10pm

Yeah, don’t worry about it. If it’s easier to use, then it’s probably better for you.
I myself have simply gotten used to using a 1D array from various projects where it was actually necessary, so in that case the 1D is easier for me.

The only real performance benefits I can think of are: less memory usage (minimal to decent, depending on circumstances), and better data locatity and thus cache friendliness (again ranges from minimal to algorithm-breaking). Also less indirection due to only having to make 1 lookup. (Also a cache-related issue)

http://www.javamex.com/tutorials/memory/array_memory_usage.shtml

The [icode]int[][/icode] uses 16 + 4n bytes, where n is the length of the array. For n = 65536, size = 262,160 bytes
The [icode]int[][][/icode] uses 16 + 4j + j(16 + 4k) bytes, where j and k are the dimensions of the outer and inner arrays, respectively. For j,k= 256, size = 267,280 bytes, or 5 kb more.

This is false. “Multidimensional” arrays in java are nested arrays, you might be thinking of C where MD arrays are more or less syntax sugar over a 1D array.

CodeHead · May 4, 2014, 10:22pm

Indeed, I am pulling the information out of my C recollections. Could you elaborate on the “nested array” comment? It would seem to me that a contiguous memory allocation would be the most efficient way to approach a fixed size array no matter the number of dimensions, so I’m curious as to how/why Java would stray from that type of arrangement. (Not questioning your answer, just genuinely curious.)

BurntPizza · May 4, 2014, 10:31pm

A 1D array is as such: [icode]int[] i = new int[] { 1, 3, 5, 7 };[/icode] and sits in memory as such:

[tr][td]header[/td][td]1[/td][td]3[/td][td]5[/td][td]7[/td][/tr]
As with any object array (in objects the header is 8 bytes) the header is 12 bytes and the total size is rounded up to the nearest 8 bytes. Since [icode]int[/icode]s are divisible by 8 I just round up the header to 16 from the get-go for calculations.

A 2D array is as such:

int[][] i = new int[][] {
{ 1, 3 }
{ 5, 7 }
};

and sits in memory as such:

Outer array: (rowRefx are pointers to the sub-arrays)
[tr][td]header[/td][td]rowRef0[/td][td]rowRef1[/td][/tr]

Row 0:
[tr][td]header[/td][td]1[/td][td]3[/td][/tr]

Row 1:
[tr][td]header[/td][td]5[/td][td]7[/td][/tr]

EDIT:
As for contiguity, I don’t believe the JVM makes any guarantees, but it’s likely the allocator attempts to make it as contiguous as possible. The pointer indirection is still there however, and each “element” in the outer arrays is an object in it’s own right and may be anywhere in the heap. You can see how this coupled with the memory usage increase can quickly get out of hand at higher dimensions.

CodeHead · May 4, 2014, 10:41pm

That just seems inefficient to me, at least in the case of primitive types. I appreciate the clarification though.

BurntPizza · May 4, 2014, 10:47pm

It is inefficient, and why people continue to use 1D arrays to represent higher-dimensional structures. It’s not nearly as bad as this example with [icode]int[/icode]s when the objects stored in the array are large however. (Although then each [icode]int[/icode] is a equally large 32 bit pointer to the object you are storing, leading to more indirection and inefficiency)

Pretty much the worse case however is using [icode]boolean[][/icode]s. Each 1-bit [icode]boolean[/icode] is rounded up to a full byte. This is terrible at higher dimensions. Use a [icode]BitSet[/icode] instead. (Exception is when speed is absolutely critical, an array lookup is still usually faster than a BitSet lookup even with the indirection and poor locality)

Edit for quick example:
In terms of % of total space “wasted,” small, multidimensional arrays of small items are the worst. Consider an array of 4 booleans vs. a 2x2 array of booleans:

[icode]boolean[4][/icode]: 12 + 1 * 4 = 16 bytes. (note that this is still 32x as much information as the booleans are storing)
[icode]boolean[2][2][/icode]: (12 + 4 * 2 + 4^rounding) + 2(12 + 1 * 2 + 2^rounding) = 56 bytes. 3.5x more, and 112x times as much information as the array is actually storing. >99% of the memory is “wasted.”

Rayvolution · May 5, 2014, 1:42am

So what I’m getting out of this is since I am dealing in such a large array, the ratio of wasted memory is so negligibly small I shouldn’t care? (Unless I am loading thousands of these, and I’m not anyway).

…but on the other hand, if I was loading a int[2][2] over a int[4] I should care, because the ratio of wasted memory is much larger. (Not that I could think of many logical reasons I would load a int[2][2] array, but for arguments sake. )

BurntPizza · May 5, 2014, 3:31am

No, you still shouldn’t really care because looping over a 4 element array takes nanoseconds either way the array is structured.

Rayvolution · May 5, 2014, 3:42am

Even better.

Gibbo3771 · May 5, 2014, 3:43am

I doubt the difference if noticeable, I use 2D arrays for all my tiles, can’t imagine using 1D array.

Just do whatever is easier, stop worrying about performance/efficiency unless you hit a problem IMO.

Code code code, refactor later.

Also my A* algorithm does not iterate over an array of tiles for movement values or anything, that would be pointless and expensive IMO. Instead when the map loads, generate a 2D array of ints that can be passed to the A* NodeMap to identify node values when doing a search.

That means if you happen to alter the map, all you do is fetch the x y of that tile and alter the array of ints in the NodeMap, if that makes sense Lol.

BurntPizza · May 5, 2014, 3:48am

Indeed, do not worry about performance until you have documented reason to worry.

That said, there can be legitimate reasons to worry. In one of my cellular automation-related projects a long time ago I remember using a 2D array for the main data structure was the main performance bottleneck. Replacing it with a 1D array and using it effectively got me from tens of updates per second (intolerable in that particular case) to hundreds.

Cero · May 5, 2014, 5:04am

well how much memory are you using ?

ikovrigin · May 5, 2014, 5:17am

1D vs 2D is not about memorty consumation but about access time (reads count).
2D case:
using row offset read rowRef
using column offset read data

1D case:
calculate offset and read

Roquen · May 5, 2014, 5:21am

Since this is posted in " Performance Tuning" then answer is: There’s a huge difference between n-D arrays and 1D arrays.

In n-D arrays you walk random memory (insanely slow) everywhere except linear access in the deepest.
The math above is all wrong…all except the lowest are arrays of objects per entry. Much fatter and more GC overhead.
The compiler will frequently not be able to determine if an array object has been changed between accesses and will need to walk the chain of arrays to get to the element.

EDIT: Personally the only time I use an N-D is for time unimportant sparse datasets. Use a getter/setter method.

Rayvolution · May 5, 2014, 5:24am

Hardly anything really; I can’t even identify a memory usage change when I completely disable the entire pathfinding class.

Oddly enough I can’t even see the different in my memory usage with a profiler either, go figure.

Gibbo3771 · May 5, 2014, 7:53am

Soooo0000000… /thread ?

Roquen · May 5, 2014, 8:05am

I’d say the point is:

[icode]val = data[x][y][/icode] vs. [icode]val = get(x,y)[/icode]
[icode]data[x][y] = val[/icode] vs. [icode]set(x,y,val)[/icode]

so it’s reasonable to get into the habit of flattening arrays. Maybe they’ll get around to adding this to the language at some point.