how to detect occasional stuttering

hello,

my game suffers from stuttering. but not regular stuttering. just occasional. it happens about once every 5 second, that a frame takes extraordinally long to render which is extremely noticeable and pretty annoying for the player. the problem is, that i have no idea how to detect the problem and how to fix it, since it will not show up in a profile because over all the frames measured it is still a very small portion.

so how can i detect what is causing this occasional long frame?

i dont think its garbage collection, since i almost eliminated garbage collection using object pooling (see other thread)

thanks!

add this to the commandline:

-verbose:gc

I had the same problem. I first thought it was the gc. It occurs that I was wrong ; it was the way I managed the AWT event queue.
I ended up switching to SWT and everything is fine now.

By the way, don’t go to fast modifying your code with object pools ; I did the same and then had to revert it for sake of code simplicity…

 Vincent

You guys do know that the java memory model is like one big object pool right? Actually object pooling your stuff in code is only necessary in specific situations…

Just my 2 pence…

DP :slight_smile:

I would have thought a multi-tiered fountain would be a better metaphor for describing generational garbage collector.

Preventing the garbage collector from running might be one of this specific situations :wink:

Don’t forget the allocation-time for new objects. These take an order of magnitude longer than fetching some object from a pool.

I know allocation-time in Java is a LOT faster than in C (with malloc), but it’s far from ‘a pointer shift’, which is what Sun is trying us to believe.

Construction time can be as little as a pointer-shift. It depends if your objects are expensive to construct or not.

In general:

Only pool objects which are expensive to construct.

and if you find yourself plagued by millions of tiny objects, consider just creating one reusable one and accessing it statically if you can.

Cas :slight_smile:

Cas, you used that argument for the third time (in a few years) now. So I’ll reply like usual:

How expensive is this object-allocation:


class Vec3
{
   public float x,y,z;
}

Pooling gives me a heck of a performance-increase.

It is NOT just a pointer-shift. You gain even more performance when pooling heavy objects, which doesn’t mean tiny objects are nearly free. But well, if it’s not the bottleneck, you might actually think it’s ‘fast enough’.

It’s probably an issue with how you’re drawing the screen or with timer granularity, not an issue with garbage collection.

You might want to post the code for your main loop if it’s short enough for people to read.

Constructing that Vec3f and initialising x, y, z in the constructor roughly the same expense as pooling a Vec3f and initializing x, y, z. The real problem of course is destruction which adds a tiny but perceptible extra bit of time. You might be able to tune it out of the VM using GC parameters.

When escape analysis and stack allocation makes it into the VM this year (I believe), all that horribly complicated pooling code dealing with tiny objects is going to look rather messy and complex and actually will be slower, as well. But that’s then and this is now; if profiling gives a performance increase, that can’t really be argued with I suppose.

Pooling might be overkill for a lot of situations mind - I tend to just declare private static final Vec3f TEMP = new Vec3fs() where I need to keep on constructing and throwing things away, rather than actually using a pool.

Cas :slight_smile:

Sorry, it’s simply not true. The overhead is in the allocation, not in the destructors.

I tested this by incresing the heap, adding -verbose:gc, to monitor the GC, and it was NOT run during the benchmark. Still the performance was poor, compared to pooling the Vec3’s.

Well there are several factors to consider here:
1.) how large was your data-set? If you only generate 2000 objects in the pool you end up having your whole data-set in the L2 cache, whereas allocation in the heap has to move bytes arround. However this is not likely to be a real-world result.
2.) Do you set your floats to “0” before you hand out the object of the pool - Java’s allocation code does this

I created a really simple and stupid micro-benchmark and although I did not expect it - java allocation won without any tuning:


import java.util.*;

public class AllocTester
{
    public static void main(String[] args)
    {
        
        /*warmup*/
        Pooler p = new Pooler();
    
        for(int i=0; i < 1000000; i++)
        {
            new Vec3f();
        }

        for(int i=0; i < 1000000; i++)
        {
           Vec3f v = p.getObject();
            p.releaseObject(v);
        }

    /*measure*/
        long start = System.currentTimeMillis();
        for(int i=0; i < 100000000; i++)
        {
            new Vec3f();
        }
        long end = System.currentTimeMillis();
        System.out.println("Allocation took: "+(end-start));

        start = System.currentTimeMillis();
        for(int i=0; i < 100000000; i++)
        {
           Vec3f v = p.getObject();
            p.releaseObject(v);
        }
        end = System.currentTimeMillis();
        System.out.println("Pool took: "+(end-start));
    }
}

class Vec3f
{
   public float x,y,z;
}

class Pooler
{
    ArrayList pool = new ArrayList();
    
    public Vec3f getObject()
    {
        if(pool.size() > 0)
        {
            return (Vec3f) pool.remove(pool.size() - 1);
        }else   
        {
           return new Vec3f();
        }
    }

    public void releaseObject(Vec3f v)
    {
        pool.add(v);
    }
}

My results were:
Allocation took: 2113
Pool took: 4216

lg Clemens

The VM is way too smart for such micro-benchmarks :slight_smile: It’ll realize nothing is happening, and remove the creation of the object.

When I have time, I’ll post a ‘real’ case, turned into a benchmark.

Benchmark classes

Vec.java
VecPool.java
Bench.java

Java Server VM 1.6.0_03


Typical tNew:  20399729ns (20.4ms)
Typical tPool:  3582788ns (3.5ms)

Java Client VM 1.6.0_03


Typical tNew:  21358511ns (21.4ms)
Typical tPool:  3895816ns (3.9ms)

well i guess the reason is that you are not using ArrayList - furthermore all your methods are static.
Here begins the fight between readability and performance :wink:

I modified the benchmark and this way there is no way out for the JIT:


import java.util.*;

public class AllocTester
{
    public static void main(String[] args)
    {
        Vec3f v2 = new Vec3f();
        /*warmup*/
        Pooler p = new Pooler();
    
        for(int i=0; i < 1000000; i++)
        {
            Vec3f v1 = new Vec3f();
            v1.setValues(i);
            v2.add(v1);
        }

        for(int i=0; i < 1000000; i++)
        {
           Vec3f v1 = p.getObject();
            v1.setValues(i);
            v2.add(v1);
            p.releaseObject(v1);
        }

    /*measure*/
        long start = System.currentTimeMillis();
        for(int i=0; i < 100000000; i++)
        {
            Vec3f v1 = new Vec3f();
            v1.setValues(i);
            v2.add(v1);
        }
        long end = System.currentTimeMillis();
        System.out.println("Allocation took: "+(end-start));

        start = System.currentTimeMillis();
        for(int i=0; i < 100000000; i++)
        {
           Vec3f v1 = p.getObject();
            v1.setValues(i);
            v2.add(v1);
            p.releaseObject(v1);
        }
        end = System.currentTimeMillis();
        System.out.println("Pool took: "+(end-start));

         System.out.println(v2.x+v2.y+v2.z);
    }
}

class Vec3f
{
   public float x,y,z;

    public void setValues(float i)
    {
        x = i;
        y = i;
        z = i;
     }
    
    public void add(Vec3f v)
    {
        x += v.x;
        y += v.y;
        z += v.z;
    }
}

class Pooler
{
    ArrayList pool = new ArrayList();
    
    public Vec3f getObject()
    {
        if(pool.size() > 0)
        {
            return (Vec3f) pool.remove(pool.size() - 1);
        }else   
        {
           return new Vec3f();
        }
    }

    public void releaseObject(Vec3f v)
    {
        pool.add(v);
    }
}

So if even ArrayList access is slower, which is heavily inlined and optimized by the JIT (removing and adding the last element are more or less no-ops), I wonder wether this is really worth all the troubles.
Furthermore allocation was not optimized away, as I was able to watch the GC … which was quite busy.

lg Clemens

[quote="Linuxhippy,post:16,topic:31200"]
well i guess the reason is that you are not using ArrayList - furthermore all your methods are static.
Here begins the fight between readability and performance ;)

Uh… static methods are not faster (!), I used them because I was lazy and wanted to have something up and running quickly.

It turns out ArrayList are indeed slower, but not by much.


Typical tNew:  19647749
Typical tPool:  4967600

Moving all the references around after converting from static to instance-methods, I lost rougly 10%.


Typical tNew:  19236453
Typical tPool:  5485736


/*
 * Created on 16 jan 2008
 */

package eden.pooling;

import java.util.ArrayList;

public class VecPool
{
   private static ArrayList<Vec> cache;

   static
   {
      int space = 1024;

      VecPool.cache = new ArrayList<Vec>();

      // fill cache
      for (int i = 0; i < space; i++)
         VecPool.dump(VecPool.grab());
   }

   public static Vec grab()
   {
      if (VecPool.cache.size() == 0)
         return new Vec(0, 0, 0);
      return VecPool.cache.get(VecPool.cache.size() - 1);
   }

   public static void dump(Vec v)
   {
      if (VecPool.cache.size() != 1024)
         VecPool.cache.add(v);
   }
}

Riven, your VecPool does not initialize the x,y,z members to zero.

Cas :slight_smile:

So… what? :slight_smile: What’s the use?

I know Java-objects are all zero-filled, and that’s part of the overhead we want to get rid of.

This is about optimising, not copying VM-behaviour.

Anyway, with zero-fill:


   public static Vec grab()
   {
      if (VecPool.size == 0)
         return new Vec(0, 0, 0);
      return VecPool.cache[--VecPool.size].set(0,0,0);
   }


Typical tNew:  19604587
Typical tPool:  3887017

static methods are faster, because you save some fetching and at least one branch (the branch which takes care about de-optimization).

Your benchmark does many different things, it could be quiet possible that some cache-issues come up.
I just can state that in mine allocation is faster, while in yours the pool performs better :wink:

lg Clemens