Is there a logical reason to stuff 2D array data into one?

Some surprising results…


public class CacheGrid {
	private static final int dim = (1 << 12);
	private static final int[] grid = new int[dim * dim];
	private static final int[][] rows = new int[dim][dim];

	public static void main(String[] args) {
		System.out.println(System.getProperty("java.version"));
		System.out.println(System.getProperty("java.vm.version"));

		for(int i = 0; i < 10; i++) {
			System.out.println();
			{
				long t0 = System.nanoTime();
				fill_1d_XY();
				long t1 = System.nanoTime();
				System.out.println("fill_1d_XY took: " + (t1 - t0) / 1000000 + "ms");
			}
			{
				long t0 = System.nanoTime();
				fill_1d_YX();
				long t1 = System.nanoTime();
				System.out.println("fill_1d_YX took: " + (t1 - t0) / 1000000 + "ms");
			}

			{
				long t0 = System.nanoTime();
				fill_2d_XY();
				long t1 = System.nanoTime();
				System.out.println("fill_2d_XY took: " + (t1 - t0) / 1000000 + "ms");
			}
			{
				long t0 = System.nanoTime();
				fill_2d_YX();
				long t1 = System.nanoTime();
				System.out.println("fill_2d_YX took: " + (t1 - t0) / 1000000 + "ms");
			}
		}
	}

	public static void fill_1d_XY() {
		for(int x = 0; x < dim; x++)
			for(int y = 0; y < dim; y++)
				grid[(y << 12) | x] = y * dim + x;
	}

	public static void fill_1d_YX() {
		for(int y = 0; y < dim; y++)
			for(int x = 0; x < dim; x++)
				grid[(y << 12) | x] = y * dim + x;
	}

	public static void fill_2d_XY() {
		for(int x = 0; x < dim; x++)
			for(int y = 0; y < dim; y++)
				rows[y][x] = y * dim + x;
	}

	public static void fill_2d_YX() {
		for(int y = 0; y < dim; y++)
			for(int x = 0; x < dim; x++)
				rows[y][x] = y * dim + x;
	}
}


1.7.0_45
24.45-b08

fill_1d_XY took: 1229ms
fill_1d_YX took: 20ms
fill_2d_XY took: 545ms
fill_2d_YX took: 20ms

fill_1d_XY took: 1227ms
fill_1d_YX took: 16ms
fill_2d_XY took: 539ms
fill_2d_YX took: 15ms

fill_1d_XY took: 1225ms
fill_1d_YX took: 17ms
fill_2d_XY took: 539ms
fill_2d_YX took: 15ms

fill_1d_XY took: 1225ms
fill_1d_YX took: 17ms
fill_2d_XY took: 539ms
fill_2d_YX took: 15ms

fill_1d_XY took: 1229ms
fill_1d_YX took: 17ms
fill_2d_XY took: 538ms
fill_2d_YX took: 15ms

fill_1d_XY took: 1223ms
fill_1d_YX took: 17ms
fill_2d_XY took: 538ms
fill_2d_YX took: 15ms

In this micro-benchmark, int[w][h] is always faster than int[w*h], who would-a-thunk!

(Probably due to HotSpot not able to remove bounds-checks in the 1D version)

Try adding a random-access and “psuedo-random” access couple of benchmarks and see how it compares.

Cas :slight_smile:

I made a quick attempt at the random fill, not sure if precalculating the random indices kills the bench, but I don’t see much of a better way.

EDIT: accounted for extra lookups by adding linearX and linearY, also fixed “<< dim”

import java.util.*;

public class CacheGrid {
   private static final int dim = (1 << 12);
   private static final int[] grid = new int[dim * dim];
   private static final int[][] rows = new int[dim][dim];
   
   private static int[] randX = new int[dim], randY = new int[dim], linearX = new int[dim], linearY = new int[dim];
   
   static {
     Random rand = new Random(35682457L);
     Set<Integer> set = new HashSet<Integer>();
     while(set.size() < dim)
       set.add(rand.nextInt(dim));
     Integer[] ints = set.toArray(new Integer[0]);
     for (int i = 0; i < dim; i++)
       randX[i] = ints[i];
     set.clear();
     
     while(set.size() < dim)
       set.add(rand.nextInt(dim));
     ints = set.toArray(new Integer[0]);
     for (int i = 0; i < dim; i++)
       randY[i] = ints[i];
     
     
     for(int i = 0; i < dim; i++) {
       linearX[i] = i;
       linearY[i] = i;
     }
   }

   public static void main(String[] args) {
      System.out.println(System.getProperty("java.version"));
      System.out.println(System.getProperty("java.vm.version"));

      for(int i = 0; i < 10; i++) {
         System.out.println();
         {
            long t0 = System.nanoTime();
            fill_1d_XY();
            long t1 = System.nanoTime();
            System.out.println("fill_1d_XY took: " + (t1 - t0) / 1000000 + "ms");
         }
         {
            long t0 = System.nanoTime();
            fill_1d_YX();
            long t1 = System.nanoTime();
            System.out.println("fill_1d_YX took: " + (t1 - t0) / 1000000 + "ms");
         }
         {
            long t0 = System.nanoTime();
            fill_2d_XY();
            long t1 = System.nanoTime();
            System.out.println("fill_2d_XY took: " + (t1 - t0) / 1000000 + "ms");
         }
         {
            long t0 = System.nanoTime();
            fill_2d_YX();
            long t1 = System.nanoTime();
            System.out.println("fill_2d_YX took: " + (t1 - t0) / 1000000 + "ms");
         }
         {
            long t0 = System.nanoTime();
            fill_1d_random();
            long t1 = System.nanoTime();
            System.out.println("fill_1d_random took: " + (t1 - t0) / 1000000 + "ms");
         }
         {
            long t0 = System.nanoTime();
            fill_2d_random();
            long t1 = System.nanoTime();
            System.out.println("fill_2d_random took: " + (t1 - t0) / 1000000 + "ms");
         }
      }
   }
   
   public static void fill_1d_random() {
     for(int x = 0; x < dim; x++) {
       for(int y = 0; y < dim; y++) {
         int xi = randX[x];
         int yi = randY[y];
         
         grid[(yi << 12) | xi] = yi * dim + xi;
       }
     }
   }
   
   public static void fill_2d_random() {
     for(int x = 0; x < dim; x++) {
       for(int y = 0; y < dim; y++) {
         int xi = randX[x];
         int yi = randY[y];
         
         rows[yi][xi] = yi * dim + xi;
       }
     }
   }

   public static void fill_1d_XY() {
      for(int x = 0; x < dim; x++)
        for(int y = 0; y < dim; y++) {
         int xi = linearX[x];
         int yi = linearY[y];
         grid[(yi << 12) | xi] = yi * dim + xi;
      }
   }

   public static void fill_1d_YX() {
      for(int y = 0; y < dim; y++)
        for(int x = 0; x < dim; x++) {
         int xi = linearX[x];
         int yi = linearY[y];
         grid[(yi << 12) | xi] = yi * dim + xi;
      }
   }

   public static void fill_2d_XY() {
      for(int x = 0; x < dim; x++)
        for(int y = 0; y < dim; y++) {
         int xi = linearX[x];
         int yi = linearY[y];
         rows[yi][xi] = yi * dim + xi;
      }
   }

   public static void fill_2d_YX() {
      for(int y = 0; y < dim; y++)
        for(int x = 0; x < dim; x++) {
         int xi = linearX[x];
         int yi = linearY[y];
         rows[yi][xi] = yi * dim + xi;
      }
   }
}

Results: (last 6 sets)

[quote]1.7.0_25
1.7.0_25
23.25-b01

fill_1d_XY took: 674ms
fill_1d_YX took: 64ms
fill_2d_XY took: 188ms
fill_2d_YX took: 67ms
fill_1d_random took: 673ms
fill_2d_random took: 198ms

fill_1d_XY took: 673ms
fill_1d_YX took: 65ms
fill_2d_XY took: 189ms
fill_2d_YX took: 68ms
fill_1d_random took: 673ms
fill_2d_random took: 196ms

fill_1d_XY took: 674ms
fill_1d_YX took: 66ms
fill_2d_XY took: 189ms
fill_2d_YX took: 68ms
fill_1d_random took: 671ms
fill_2d_random took: 198ms

fill_1d_XY took: 673ms
fill_1d_YX took: 64ms
fill_2d_XY took: 191ms
fill_2d_YX took: 67ms
fill_1d_random took: 686ms
fill_2d_random took: 197ms

fill_1d_XY took: 672ms
fill_1d_YX took: 64ms
fill_2d_XY took: 187ms
fill_2d_YX took: 68ms
fill_1d_random took: 674ms
fill_2d_random took: 199ms

fill_1d_XY took: 672ms
fill_1d_YX took: 63ms
fill_2d_XY took: 189ms
fill_2d_YX took: 67ms
fill_1d_random took: 673ms
fill_2d_random took: 198ms
[/quote]

You are shifting ‘yi’ by 4096 bits!

This means you are actually shifting it by 4096&31 = 0 bits, which means you’re mostly touching the first ‘row’ of your grid. Suddenly everything first in cache, skewing the results tremendously :point:

Whoops, fixing…


import java.util.*;

public class CacheGrid {
	private static final int dim = (1 << 12);
	private static final int[] grid = new int[dim * dim];
	private static final int[][] rows = new int[dim][dim];

	private static int[] //
			randX = new int[dim], //
			randY = new int[dim], //
			randL = new int[dim * dim], //
			linearX = new int[dim], //
			linearY = new int[dim], //
			linearL = new int[dim * dim];

	static {
		for(int i = 0; i < dim; i++)
			linearX[i] = randX[i] = i;
		for(int i = 0; i < dim; i++)
			linearY[i] = randY[i] = i;
		for(int i = 0; i < dim * dim; i++)
			linearL[i] = randL[i] = i;

		shuffle(randX, new Random(293856L));
		shuffle(randY, new Random(293856L));
		shuffle(randL, new Random(293856L));
	}

	public static void shuffle(int[] arr, Random rndm) {
		for(int i = 0; i < arr.length; i++) {
			int idx1 = rndm.nextInt(arr.length - i);
			int idx2 = arr.length - 1 - i;

			int tmp = arr[idx1];
			arr[idx1] = arr[idx2];
			arr[idx2] = tmp;
		}
	}

	public static void main(String[] args) {
		System.out.println(System.getProperty("java.version"));
		System.out.println(System.getProperty("java.vm.version"));

		for(int i = 0; i < 10; i++) {
			System.out.println();
			{
				long t0 = System.nanoTime();
				fill_1d_LI();
				long t1 = System.nanoTime();
				System.out.println("fill_1d_LI took: " + (t1 - t0) / 1000000 + "ms");
			}
			{
				long t0 = System.nanoTime();
				fill_1d_LL();
				long t1 = System.nanoTime();
				System.out.println("fill_1d_LL took: " + (t1 - t0) / 1000000 + "ms");
			}
			{
				long t0 = System.nanoTime();
				fill_1d_LR();
				long t1 = System.nanoTime();
				System.out.println("fill_1d_LR took: " + (t1 - t0) / 1000000 + "ms");
			}
			{
				long t0 = System.nanoTime();
				fill_1d_XY();
				long t1 = System.nanoTime();
				System.out.println("fill_1d_XY took: " + (t1 - t0) / 1000000 + "ms");
			}
			{
				long t0 = System.nanoTime();
				fill_1d_YX();
				long t1 = System.nanoTime();
				System.out.println("fill_1d_YX took: " + (t1 - t0) / 1000000 + "ms");
			}
			{
				long t0 = System.nanoTime();
				fill_2d_XY();
				long t1 = System.nanoTime();
				System.out.println("fill_2d_XY took: " + (t1 - t0) / 1000000 + "ms");
			}
			{
				long t0 = System.nanoTime();
				fill_2d_YX();
				long t1 = System.nanoTime();
				System.out.println("fill_2d_YX took: " + (t1 - t0) / 1000000 + "ms");
			}
			{
				long t0 = System.nanoTime();
				fill_1d_random();
				long t1 = System.nanoTime();
				System.out.println("fill_1d_random took: " + (t1 - t0) / 1000000 + "ms");
			}
			{
				long t0 = System.nanoTime();
				fill_2d_random();
				long t1 = System.nanoTime();
				System.out.println("fill_2d_random took: " + (t1 - t0) / 1000000 + "ms");
			}
		}
	}

	public static void fill_1d_random() {
		for(int x = 0; x < dim; x++) {
			for(int y = 0; y < dim; y++) {
				int xi = randX[x];
				int yi = randY[y];

				grid[(yi << 12) | xi] = yi * dim + xi;
			}
		}
	}

	public static void fill_2d_random() {
		for(int x = 0; x < dim; x++) {
			for(int y = 0; y < dim; y++) {
				int xi = randX[x];
				int yi = randY[y];

				rows[yi][xi] = yi * dim + xi;
			}
		}
	}

	public static void fill_1d_LI() {
		for(int i = 0; i < dim * dim; i++) {
			grid[i] = i;
		}
	}

	public static void fill_1d_LL() {
		for(int i = 0; i < dim * dim; i++) {
			grid[linearL[i]] = i;
		}
	}

	public static void fill_1d_LR() {
		for(int i = 0; i < dim * dim; i++) {
			grid[randL[i]] = i;
		}
	}

	public static void fill_1d_XY() {
		for(int x = 0; x < dim; x++)
			for(int y = 0; y < dim; y++) {
				int xi = linearX[x];
				int yi = linearY[y];
				grid[(yi << 12) | xi] = yi * dim + xi;
			}
	}

	public static void fill_1d_YX() {
		for(int y = 0; y < dim; y++)
			for(int x = 0; x < dim; x++) {
				int xi = linearX[x];
				int yi = linearY[y];
				grid[(yi << 12) | xi] = yi * dim + xi;
			}
	}

	public static void fill_2d_XY() {
		for(int x = 0; x < dim; x++)
			for(int y = 0; y < dim; y++) {
				int xi = linearX[x];
				int yi = linearY[y];
				rows[yi][xi] = yi * dim + xi;
			}
	}

	public static void fill_2d_YX() {
		for(int y = 0; y < dim; y++)
			for(int x = 0; x < dim; x++) {
				int xi = linearX[x];
				int yi = linearY[y];
				rows[yi][xi] = yi * dim + xi;
			}
	}
}

1.7.0_45
24.45-b08

fill_1d_LI took: 22ms
fill_1d_LL took: 43ms
fill_1d_LR took: 1266ms
fill_1d_XY took: 1425ms
fill_1d_YX took: 50ms
fill_2d_XY took: 920ms
fill_2d_YX took: 45ms
fill_1d_random took: 1476ms
fill_2d_random took: 2010ms

fill_1d_LI took: 19ms
fill_1d_LL took: 41ms
fill_1d_LR took: 1158ms
fill_1d_XY took: 1312ms
fill_1d_YX took: 41ms
fill_2d_XY took: 1037ms
fill_2d_YX took: 39ms
fill_1d_random took: 1457ms
fill_2d_random took: 2010ms

fill_1d_LI took: 18ms
fill_1d_LL took: 37ms
fill_1d_LR took: 1418ms
fill_1d_XY took: 1319ms
fill_1d_YX took: 41ms
fill_2d_XY took: 1048ms
fill_2d_YX took: 41ms
fill_1d_random took: 1417ms
fill_2d_random took: 1850ms

Nice, I was going to add some of those other comparisons, also realized that shuffling would be better than brute forcing a set, but class was ending so I didn’t have time.

Looks about like I expected.

Also comparing 1d_LI to 1d_YX without the linearX, linearY lookups:

Interestingly, LI is slightly faster (barely, but measurable) than YX, indicating that the loops where not flattened to equal code as the single loop. This is a consistent result. Bounds checking?

Some cache optimization ideas for the random fill functions: ::slight_smile:

private static final int POW = 10, DIM = 1 << POW;

private static final int[] grid = new int[DIM*DIM];
private static final int[][] rows = new int[DIM][DIM];

private static int[] randX = new int[DIM], randY = new int[DIM], 
                     linearX = new int[DIM], linearY = new int[DIM];

public static void fill_1d_random() {
  for (int y = 0; y != DIM;) {
    int yi = randY[y++], x = 0;

    while (x != DIM) {
      int xi = randX[x++];
      grid[yi<<POW | xi] = yi^xi;
    }
  }
}

public static void fill_2d_random() {
  for (int y = 0; y != DIM;) {
    int yi = randX[y++], ry[] = rows[yi], x = 0;

    while (x != DIM) {
      int xi = randX[x++];
      ry[xi] = yi^xi;
    }
  }
}

And for the non-random 2D fill too: 8)

public static void fill_2d_YX() {
  for (int y = 0; y != DIM;) {
    int yi = linearY[y++], ry[] = rows[yi], x = 0;

    while (x != DIM) {
      int xi = linearX[x++];
      ry[xi] = yi*DIM + xi;
    }
  }
}

looks up :cranky:

I personally like 1D arrays because the may situations I have where it is easier to reference objects by their (pre-computed) index value directly.

Silly example off the top of my head:


Object[] all_objects_there_are_in_2d_world = new Object[]{ ... };
int[]  selected_objects = new int[]{ ... }; //Indices of specific objects

void doSomethingForAll()
{
   for(int index : selected_objects)
   {
        all_objects_there_are_in_2d_world [index].DoSomething();
   }
}

But it is certainly situation specific.

looks back down :emo:

Have I ever mentioned that table-lookups for random numbers is crazy slow and horrible quality?

Food for thought: What expectation do you have of the compiler doing the right thing before it runs? Are these data set sizes reasonable? Are the tests representative of expected time consuming functions? How does the memory architecture behavior when pounding on writes?

Going back to the original question…

If your game logic can be implemented more simply with a 1D array, then there is most definitely a good reason.

Pixel operations for example, can often be performed irrespective of axis coordinates, thus the algorithm will look a lot neater if the data is manipulated as a flat array (avoiding the need for nested loops).

I quickly hacked Riven’s test bed: http://pastebin.java-gaming.org/8aa10911f92

Data sizes are smaller (tweak pow at top of the file). Removed some test that are basically repeats, tweaked a few and added a couple. Run and look at how numbers evolve. If too fast tweak trialPrintRate up and down if too slow.

Then try again with:

java -server -cp bin -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintInlining CacheGrid

Still you’re not taking into consideration my tiny optimizations: :-X

public static void fill_2d_random() {
  for (int y = 0; y != DIM;) {
    int yi = randX[y++], ry[] = rows[yi], x = 0;

    while (x != DIM) {
      int xi = randX[x++];
      ry[xi] = yi^xi;
    }
  }
}

I’m not in a position to test right now, but do those really do much? Sure you save a couple of array lookups I guess, but I don’t think the ry[] would assist much with caching (it’s still a pointer).

That’s the point, 1 pointer rather than 2! Moreover, values which don’t change later in the inner loop are cached right in the outer loop! :stuck_out_tongue:

Look up tables are slow. You have to be cutting a large number of operations for them to be a win.

I had to sort 2d array tiles, so that was one instance where it was logical for me. As said before, it is also a little bit more memory efficient (RAM and physical). Actually, I find it very hard to use 2d arrays outside of tables or tile based games. :persecutioncomplex:

my, now this has become a thread. :o

Well, as an update, I have used several Array[256][256] arrays in my new pathfinding algorithm, the finalized (for now) pathfinding class can now pathfind over complex/convoluted mazed-terrain on a 256x256 map at a rate of 0-1ms/calc for short, and 8-10ms/calc for long distances. (Average long distances check around 35,000 tiles before finalizing the path, pathing over a complicated maze from around 0,0 on the map to 256,256).

My pathfinding is my own take on A*, that removes almost all of the terrain/heuristic data and actually seeks the path backwards from destination to start. It also finds shortest path A* would have found every time. :wink:

I currently have over 5,000 entities picking random spots all over a 256x256 map and pathfinding to them, and keeping a solid 60FPS.

I’ll share the code later, when I’m done cleaning it up and announce the second project that this is a part of.

That’s rather impressive… I have to max out a 2.6GHz i7 to achieve that :expressionless: (admittedly I’m also drawing 100k sprites)

Cas :slight_smile: