How would you simplify a nested for loop?

I’m wondering how you would simplify a nested for loop for better performance, like for example, how would you simplify something like this which generates tiles?


for (int y = 0; y < height; y++)
{
    for (int x = 0; x < width; y++)
    {
        makeTile(x, y);
    }
}

Is there a way to simplify that into one loop?

Not sure how you define “simpler,” but since x and y are integers:


for (int i = 0; i < width * height; i++) {
    makeTile(i % width, i / width);
}

This transformation is occasionally useful.
It’s simply the inverse of [icode]index = x + y * width[/icode] which you may have seen turn up in a few places.

First off, you’d probably get an array index out of bounds on the height…

for (int x = 0; x < width; y++)

Second off,

for (int t = 0; t < (width * height); t++) {
    makeTile(t % width, t / width);
}

EDIT: Ninja’d. :cranky:

ah, sorry. i actually wrote the example thing just as an example, wasn’t being used in anything :stuck_out_tongue:

And what BurntPizza posted was exactly what I wanted and was wondering about, so thanks!

I know I’m a bit late, but I just want to add my two cent.

When I do loops like that, I usually do them like this

for (int y = 0; y < height; y++)
    for (int x = 0; x < width; x++)
        makeTile(x, y);

//And if need, here is where I set code outside the loop

Is this actually going to be any quicker? You have now added a mod operation and a division operation for every i. My mind says it will be slower

I ran a quick test to prove it


public class Speed
{
    private static int counter = 0;
    
    public static void main(String[] args)
    {
        int width = 800;
        int height = 600;
        
        long a = System.nanoTime();
        
        for (int i = 0; i < width * height; i++)
        {
            makeTile(i % width, i / width);
        }
        
        long b = System.nanoTime();
        
        for (int y = 0; y < height; y++)
        {
            for (int x = 0; x < width; x++)
            {
                makeTile(x, y);
            }
        }                

        long c = System.nanoTime();
        
        for (int i = 0; i < width * height; i++)
        {
            makeTile(i % width, i / width);
        }
        
        long d = System.nanoTime();
        
        for (int y = 0; y < height; y++)
        {
            for (int x = 0; x < width; x++)
            {
                makeTile(x, y);
            }
        }   
        
        long e = System.nanoTime();
        
        System.out.println("Single : " + (b-a));
        System.out.println("Double : " + (c-b));
        System.out.println("Single2 : " + (d-c));
        System.out.println("Double2 : " + (e-d));
    }
    
    public static void makeTile(int x,int y)
    {
        counter++;
    }
}

Output :

Single : 6771960
Double : 1434076
Single2 : 4413039
Double2 : 1309231

Note I ran both tests twice to avoid any optimisations the compiler does on the first run, the times also vary on every run but they all show the same picture

That’s… not how benchmarks work.
It also wasn’t a matter of speed, only “simplicity,” in this case reducing the number of loops.

But tinfoilboy did specify performance

[quote]I’m wondering how you would simplify a nested for loop for better performance
[/quote]
On the benchmark point you are kind of correct but the results do show that the single loop was more expensive which was the point I was trying to make.

Fair enough, I didn’t notice the performance consideration in the OP. :persecutioncomplex:
Indeed it most likely won’t be faster, although the cost of the division(s) will be very small compared to most loop bodies. Also bounds-check elision and such effects will probably make a bigger difference in most use cases.

There is also this:


for (int x = 0, y = 0; y < height;) {
    makeTile(x, y);
    if (++x == width) {
        x = 0;
        y += 1;
    }
}

Although that is the most strange and probably worst of all.
Nested loops are often the most fastest for iterating on multiple variables, as long as bad memory access patterns/cache thrashing are avoided.

The question for OP is: Is the loop condition branching your bottleneck? Really?

Without specific problem its hard to optimize anything. But in rule of thumb is to sort loops based on counts so you get big inner loop. So you get less amounts of looping start/end overhead and branch misspredictions.

I’d argue cache trashing is much more expensive than branch mispredictions.

Design your loops in such a way that a (sub)sequence of [icode]makeTile[/icode] calls, tend to touch roughly the same memory area, or linearly move through it (given pre-fetching), as opposed to hopping around, causing said cache trashing.

TL;DR: without knowing the memory access patterns of makeTile, there is no way to give you informed advice.

I feel like getting geeky , so to expand on what riven said.
When the cpu reads an array (2d rather) it will load it first dimension priority.
For instance array[][] the indexs array[0][0-length] will be loaded into cache, if you fail to read in this order then cache thrashing will occur , by this I mean reading array[0-length][0] , this means the cache will load in the entire set of memory each time because this is what its optimised to do. This is generally why single dimension arrays are preferable , it’s much easier to use without mistakes as long as you know how to handle indexing and generally (slightly) more efficient due to the continuous nature of the storage in memory meaning it can be read directly through.

Single flattened array still has the same problem:


array[] // implicitly [x+y*w], otherwise known as row-major order, the same as array[y][x]

for x in [0..w]
    for y in [0..h]
        doStuff(array[x + y * w]); // will thrash, jumping around in intervals of w

More concisely: access in row-major order if your matrix is row-major, and the same for column-major order.

Sorting out the code so the biggest loop is inner loop usually also benefit the cache. But without specific use case its quite pointless to argue about it. But just like the code can be changed can be the memory layout.