Libgdx vs Stumpy: Sprites

theagentd · October 21, 2012, 12:34am

That’s not possible. There’s no OGL 2.0 GPU that supports the extension. When going from DX9 to DX10 there was a huge hardware change. Before the GPUs had different types of cores for vertex processing and fragment processing which fits traditional rendering well. However, with DX9 came the possibility to do deferred lighting, which means that you first render the geometry and store its light data and then do the lighting with the extra information (normals, material, etc) stored in the first pass. The first pass is limited by vertex processing and ROP performance (lots of vertices, low per pixel math cost (no lighting), high amount of data stored per pixel (ROP load)). That meant that the pixel processors basically sat idle. In the second pass we had the reverse problem. We’re drawing simple light geometry (few vertices), but they cover lots of pixels and require heavy calculations per pixel. In this case the vertex processors sit idle. For this reason GPUs moved over to unified architectures where there’s just one type of core in the GPU but they can process both vertices and pixels. This allows them to be used more efficiently when the pixel-vertex load is unbalanced. It also opened up for GPU computing and geometry shaders, and removed all the silly limitations from vertex shaders (only fragment shaders could do texture reads). Instead of making a whole new core type for geometry shaders they just made them flexible enough to do everything.

davedes · October 21, 2012, 8:30am

[quote]There’s no OGL 2.0 GPU that supports the extension.
[/quote]
Are we talking about GL_EXT_geometry_shader4?

Maybe I should have said GL 2.1+ version. My Mac supports geometry shaders through extension.

theagentd · October 21, 2012, 6:04pm

Geometry shaders is only supported by GPUs with a unified shader architecture (called Shader Model 5 in DirectX). If your Mac has a GPU that supports OGL 3 but no decent drivers, then that’s a different story of course. I’d have to convert the shader to use an old GLSL version, but other than that it should work fine. It just feels bad to hack together stuff because Apple can’t be bothered to make decent drivers…

I’ll see what I can do, but I’m not really familiar with so old geometry shaders.

StumpyStrust · October 24, 2012, 1:51am

So I got a little excited and am working on making a geometry shader sprite batcher but all I have to go off of is what theagentd posted. I have it so points go into quads (used your shader theangentd sorry) but I do not use premultiplied alpha so I had to change the frag shader.

For color I use bytes 1-255. Which makes things more confusing. Now I am not really sure wth is going on when it comes to color and textures as I either get blank screen or everything just taking on 255/1f (rgba). The performance boost is about 10 fps regardless of how many particles there are. Not as big as theangentd implmentation as he used multi threading.

Reason why I also have fps drop when you click is be cause I normalize a vector in the physics which is a big performance killer.

theagentd · October 24, 2012, 11:57am

Are you using custom shader attributes or the built-in ones? It’s better to use custom ones. Can you post the code that you use to setup your rendering?

Here’s the code I used to set it up:

Initialization code:



	private static final int PARTICLE_SIZE = 20;
	private static final int POSITION_OFFSET = 0;
	private static final int SIZE_OFFSET = 8;
	private static final int COLOR_OFFSET = 16;
	
	private ShaderProgram geometryShader; //My custom shader program class

        //After shader has been loaded from file:
	int gPositionLocation = geometryShader.getAttribLocation("position"); //Calls GL20.glGetAttribLocation(program, position);
	int gSizeLocation = geometryShader.getAttribLocation("size");
	int gColorLocation = geometryShader.getAttribLocation("color");
		
	geometryShader.bind();
	glUniform2f(geometryShader.getUniformLocation("screenSize"), Display.getWidth(), Display.getHeight());

Rendering code:


      geometryShader.bind();
		
		glBindBuffer(GL_ARRAY_BUFFER, dataVBO);

		glVertexAttribPointer(gPositionLocation, 2, GL_FLOAT, false, PARTICLE_SIZE, POSITION_OFFSET);
		glVertexAttribPointer(gSizeLocation, 2, GL_FLOAT, false, PARTICLE_SIZE, SIZE_OFFSET);
		glVertexAttribPointer(gColorLocation, 4, GL_UNSIGNED_BYTE, true, PARTICLE_SIZE, COLOR_OFFSET);

		glEnableVertexAttribArray(gPositionLocation);
		glEnableVertexAttribArray(gSizeLocation);
		glEnableVertexAttribArray(gColorLocation);
		
		
		glDrawArrays(GL_POINTS, 0, particles.size());

		
		glDisableVertexAttribArray(gPositionLocation);
		glDisableVertexAttribArray(gSizeLocation);
		glDisableVertexAttribArray(gColorLocation);

StumpyStrust · October 28, 2012, 5:35am

Sorry for not responding sooner but RL can take time away (got an 80% in discrete math).

I decompiled your program just to see how you were doing everything and lets just say that I had an epiphany. I think I finally grasp how exactly shaders and such thing work in actual code (you know, after seeing some real code). I used your shader and ShaderProgram classes.

This is how i set up the attributes.

private void render()
	{
		vertBuff.put(vertArray);
		vertBuff.flip();
		colBuff.put(colorArray);
		colBuff.flip();
		texBuff.put(sizeArray);
		texBuff.flip();
		
        //glVertexPointer(2, 0, vertBuff);
        glVertexAttribPointer(gPositionLocation, 2, false, 0, vertBuff);
		//glColorPointer(4,true, 0, colBuff);
        glVertexAttribPointer(gSizeLocation, 2, false, 0, texBuff);
        //glTexCoordPointer(2, 0, texBuff);
        glVertexAttribPointer(gColorLocation, 4, true,false, 0, colBuff);
        
        glEnableVertexAttribArray(gPositionLocation);
        glEnableVertexAttribArray(gSizeLocation);
        glEnableVertexAttribArray(gColorLocation);
        
        glDrawArrays(GL_POINTS, 0, draws);
        vertBuff.clear();
        colBuff.clear();
        texBuff.clear();
        vertIndex = 0;
		colIndex = 0;
		sizeIndex = 0;
		draws = 0; 
	}

I figured out why color was messed up. I am using 0-255 bytes for color but in the shaders they expect 0-1 (floats) by dividing the colors in the shader by 255 I get correct colors but the alpha is still off. I completely got rid of the textures stuff as they were never showing up.

I also got different sized images working in the geometry shader by using the x and y from the size passed in.

Here are the shaders.

vert :point: same as yours

#version 330

#define POSITION 0
#define SIZE 1
#define COLOR 2

layout(location = POSITION) in vec2 position;
layout(location = SIZE) in vec2 size;
layout(location = COLOR) in vec4 color;

out vec2 vPosition;
out vec2 vSize;
out vec4 vColor;

void main(){
	vPosition = position;
	vSize = size;
	vColor = color;
}

geometry :point: changed a bit

#version 330

layout(points) in;
layout(triangle_strip, max_vertices = 4) out;

uniform vec2 screenSize;

in vec2[] vPosition;
in vec2[] vSize;
in vec4[] vColor;

out vec2 texCoords;
out vec4 color;

vec4 toScreen(vec2 pos){
	vec4 result = vec4(pos * 2 / screenSize - 1, 0, 1); 
	result.y = -result.y; //Flip y ftw
    return result;
}

void main() {
	//if i divide everything by 255 I get correct colors but not alpha.  
	color = vColor[0];
	
	gl_Position = toScreen(vec2(vPosition[0].x - vSize[0].x/3,vPosition[0].y - vSize[0].y/3));
	texCoords = vec2(0, 0);
	EmitVertex();
	
	gl_Position = toScreen(vec2(vPosition[0].x + vSize[0].x/3,vPosition[0].y - vSize[0].y/3));
	texCoords = vec2(0, 1);
	EmitVertex();
	
	gl_Position = toScreen(vec2(vPosition[0].x - vSize[0].x/3,vPosition[0].y + vSize[0].y/3));
	texCoords = vec2(1, 1);
	EmitVertex();
	
	gl_Position = toScreen(vec2(vPosition[0].x + vSize[0].x/3,vPosition[0].y + vSize[0].y/3));
	texCoords = vec2(0, 1);
	EmitVertex();
	
	
    EndPrimitive();
}

frag :point: changed

#version 330

uniform sampler2D color_texture;

in vec4 color;

out vec4 fragColor;

void main()
{
        //just playing with color here
	fragColor =  vec4(.3,.6,color.z*.04,0.1);
}

I am not using VBOs vertex arrays. 3 of them, one for position, size, and color. It was position, texture coords and color.

Here is the full GeomBatch source maybe I am doing something wrong.

http://pastebin.java-gaming.org/c0d2f129c22

I am not sure if I should continue with making a geometry shader sprite batcher as right now using just vertex arrays it is rather fast and works on just about everything. I will say that the performance boost is nice (about 20+ fps at 50k+ sprites) but yours uses multi threading which makes the result look much better then they are without it and I kinda…ok really want to try an implement in my batcher for giggles. 8)

FYI once I get everything a bit more matured I am going to make a tutorial on how to make a sprite batcher first with just VA then VBO and if all goes well, geometry shaders. I think it is a big step going from glBegin, glEnd to VBO but once you get there, everything is nice.

PS: How did you learn all this opengl stuff? did you start with C++? or is there some special, secret cave around here where they keep all the scrolls on this stuff? ???

badlogicgames · October 28, 2012, 9:55am

I’ve never looked into geometry shaders, that looks hilariously simple and fun! To bad we can’t add that to libgdx due to GLES 2.0 being a bit sucky.

Stumpy, a tutorial would be rather neat

theagentd · October 28, 2012, 12:53pm

@StumpyStrust

Looks great. You don’t really need VBOs, it’s still a batcher since you’re drawing all sprites with just one draw call. VBOs are mainly used to store data on the GPU, but if you’re reuploading it every frame anyway you don’t really need to use them.

There’s a parameter for glVertexAttribPointer() that controls whether the data should be normalized or not. It’s the boolean value in the middle.


glVertexAttribPointer(gPositionLocation, 2, GL_FLOAT, false, PARTICLE_SIZE, POSITION_OFFSET);
glVertexAttribPointer(gSizeLocation, 2, GL_FLOAT, false, PARTICLE_SIZE, SIZE_OFFSET);
glVertexAttribPointer(gColorLocation, 4, GL_UNSIGNED_BYTE, true, PARTICLE_SIZE, COLOR_OFFSET);

Notice that only gColorLocation is normalized. You might have yet another problem somewhere if your alpha values aren’t working out.

I pretty much naturally started moving over to LWJGL from Java2D once I started scaling my 2D tiles and FPS dropped to around 4. Went over to LWJGL and I got a few hundred. Bought an OpenGL book (The OpenGL Superbible) to get me started. Sadly it was a bit outdated when it comes to shaders and OpenGL 3+, but the latest edition seems to have been updated with OpenGL 3 code. Anyway, after getting the basics I just did some 2D lighting stuff with FBOs for a while and learned about shaders, and THEN things got really interesting! I like books to get me started, but once you’re up and running the internet is a much better source of up-to-date information.

StumpyStrust · October 29, 2012, 4:32am

Well got everything working… ;D ;D ;D ;D textures and all. The color was all right. That boolean was for whether it was an unsigned byte. Turns out, I had been doing something with the color in one of my shaders. The ones I posted were not the ones being used… :cranky:

Got to say though that now I almost instantly get fillrate limited before cpu if the sprites are larger then 8+ pixels. So I think I will stay away from multithreading this as most will probably get fillrate limited first.

Here are a bunch of screens while testing. FPS drops because of mass particles and particle physics. FYI, if you have dual monitors and have set them up properly, particle physics across them is simply delicious.

I will try writing a tutorial on all this once I clean things up and have time. (probably next week end)

This is for all the buzz from Roquen and DrHalfway about random number generation.
Here is generating 50k particles at 6*6 pixels in size using javas Random class for randomizing velocity. If you want me to try using your PRNG algorithms just ask. ;D

And thanks a bunch theangentd. 8) +9000!!!

theagentd · October 29, 2012, 9:14pm

You’re right about being fill rate limited, and by simply batching you’re winning a lot of performance. However, although your particle rendering code is GPU limited it still uses a fair amount of CPU time, often just waiting for data to be written or read to RAM since that’s one of the biggest bottlenecks here. It’s possible to thread an actual game implementation in a much simpler way. For example, a very simple particle engine without any dynamic geometry collision detection doesn’t depend on anything else in the game, so it’s easy to just put the particle updating and buffer loading in it’s own thread and forget about it. Just trigger an update when you start a frame and wait for it to finish at the end of the frame in case you have too many particles, then just draw them all on the normal thread. The win here is that in most cases this will make your particles 100% free when it comes to CPU time, since the particles are updated in parallel to the main game and a huge amount of computers have dual cores. It’s also dead simple to implement since you’re just calling particleSystem.update() from a different thread (or something like that) + some basic syncing logic between the main thread and the particle thread.

Bonus info: Even hyperthreading helps a lot in this case since we’re so memory limited. To simplify things a lot, hyperthreading allows the CPU to have two threads loaded at the same time and to quickly switch between them in case one of them hits a cache miss and needs to wait for data from RAM. It’s not two cores, but it allows your one core to be utilized better and does do a lot of difference in this specific use case.

There’s also another real performance problem for a practical particle engine: creating and removing particles. If you create a lot of particles you might produce a lot of garbage which might cause hitches when the garbage collector kicks in and has to release megabytes of built-up garbage. For a smooth experiment it’s better to pool particles. DO NOT use a LinkedList for this, use an ArrayList and use list.remove(list.size() - 1) to pick the last recycled particle to avoid shifting the whole array like list.remove(0) would. Pooling actually hurts performance very slightly, but a smooth experience is more important than a slightly higher average FPS with 100ms spikes now and then.

Another huge performance problem is if you’re using an ArrayList to hold your (alive) particles and a random particle in the middle of the list dies. A list.remove(index) causes all following particles to be shifted one step to the left. It’s done with a very fast System.arraycopy() call, but it’s still really slow if you have thousands of particles and a high number of them dies in a short period of time, which might cause FPS spikes or drops at inconvenient times. Instead use two ArrayLists and copy all surviving particles between them each frame. Please see this excellent example by Cas on how to implement such a system. It’s meant to be used for game objects since they suffer the same problem, but the problem is actually even bigger for particles since we usually have a lot more particles than objects.

That’s pretty much all I have for now. Well, you could throw in real multithreading (with X threads instead of just one extra), but it just gets awfully complicated for no good reason due to fill rate limitations. As long as you solve the problem of creating and removing particles you should be fine without any threading at all and pretty much 100% fool proof with the above simple threading idea. This should give you a particle system with very stable CPU performance in a real game.

StumpyStrust · October 30, 2012, 3:37am

In real game right now its fine.

I got threading working using your lib for the updating and all but got no fps boost (other then not droping when mouse click) as it spends almost all of its time filling arrays. This means that anything you do on the cpu takes time away from filling arrays but at this point you get so fillrate limited that going through the ugliness may not be worth it.
I will probably be fill rate limited at 50k sprites if they are 64*64 pixels. I don’t drop fps on my shitty old pc till 100k+ now with the geobatcher.

I know about hyperthreading and all that jazz and I know about pooling (even though I have yet to implement it). I also, use the arraylist trick right now and have avoided link lists like the plague ever since CS 2…

I need to add in different tex coords as it needs to work with calls that you would use with the old batcher and is still setup for atlases and texture regions. I also need to add rotation on the gpu…not sure how I want to do that. I like the if to check to see if you need to even do rotation and I know that you stay the hell away from ifs in shaders. But I also know that you don’t want to use too much trig in shaders.

Question on pooling in java. If you have an object and you take the referance of the object and use new to create a new object of that type. Does the old referance get GC? I think it does. I just don’t know if I should have massive methods for resetting or many methods for setting one thing.


MehObject b = new MehObject(); //create
b = new MehObject(); //create again? 
b.reset();
b.set(Super, long, set, method, to, set, everything, in, object); //proper pooling? 
b.setThing1();
b.setThing2();
b.setThing3(); 
//mass method calls slow things down much?

matheus23 · November 1, 2012, 7:37pm

You could do that in the geometry shader. Just pass in a rotation. You could use vec3’s and use the z attribute as rotation (or something like that).

StumpyStrust:

Question on pooling in java. If you have an object and you take the referance of the object and use new to create a new object of that type. Does the old referance get GC? I think it does. I just don’t know if I should have massive methods for resetting or many methods for setting one thing.
MehObject b = new MehObject(); //create
b = new MehObject(); //create again? 
b.reset();
b.set(Super, long, set, method, to, set, everything, in, object); //proper pooling? 
b.setThing1();
b.setThing2();
b.setThing3(); 
//mass method calls slow things down much?

Eh.


Object o = null; // Allocates 32/64 bits.
o = new Object(); // Allocates the memory needed for the attributes of the "Object" class.
o = new Object(); // Allocates a bunch of memory again...
// The problem with the above statement is, that
// the first "Object" which was created, won't be simply thrown away instantanious, 
// it needs to be gc'ed (you could have given another object a reference, 
// the purpose of the gc is to collect objects, which aren't needed anymore...
// (I'm just explaining, why the second time "Object" is created, the old
// Object doesn't get thrown away...) )
// Take this as an example:
Object o = null;
o = new Object();
anotherObject.setObject(o);
o = null;
// (In case we wouldn't do it garbage-collected:) the "new Object()" is getting
// thrown away, since we re-assign "null" to "o".

How one usually does pooling:


// For example with a vector. Can simply be changed to be a particle:
// Init-Function or constructor of whatever (your game):
Vec2 vec = new Vec2();
// Getting a vector to be used:
public Vec2 get(float x, float y) {
    return vec.set(x, y); // Vec2.set(float, float) returns itself.
}

The thing about pooling is to avoid “new” statements, because that would allocate new memory, and propably throw away the old object.

So why don’t we always use pooling and only have one instance of every class?

Let’s dig up the old example, but with vectors, again:


Vec2 vec = new Vec2(10f, 10f);
// You expect the enemy to be placed at (10|10), which is right...
enemy.setPosition(vec);
// ... until now. We change the values for vec, and since we 
// change the values in the same object, which also got passed
// to the enemy, the enemy will now be at (100|100)...
vec.set(100f, 100f);

So beware of that when pooling objects.

EDIT:
One more thing…
There are languages out there, where EVERY object is immutable, which means nothing can be changed. No values. Nothing. (Haskell)
It’s called “side-effect” (it’s just the above example with the enemy, why side-effects are bad). Many functional languages (I’d say) prefer being somehow immutable. Some allow side-effects, but it’s not recommended to use them (like scheme / lisp) and some don’t allow them (as said: Haskell).

theagentd · November 1, 2012, 10:57pm

Oh, sorry, I completely missed your response, Stumpy! T_T

StumpyStrust · November 2, 2012, 4:05am

Yeah thats what I kinda though about pooling. So basically you will have to have methods with mass args for all attributes or set them with different method calls as the chance the dead particle being of the type you want is slim.

I was going to do exactly that for rotation. Use vec3 for loc and have z be the rotation. How do I rotate in shader with out dropping performance?

theagentd · November 2, 2012, 5:00pm

My idea of pooling:


public class Pool{
	
	private ArrayList<Particle> pool;
	
	public Pool(){
		pool = new ArrayList<>();
	}
	
	public Particle get(){
		if(!pool.isEmpty()){
			return pool.remove(pool.size() - 1);
		}
		//Tell the particle which pool it belongs to
		//so it can recycle itself when it dies. 
		return new Particle(pool);
		//(Note: the caller of get() is responsible for initializing the particle)
	}
	
	public void recycle(Particle p){
		pool.add(p);
	}
}

Rotation can be done in multiple ways but the best way to do it in a shader is with a 2D rotation matrix. You simply generate a rotation matrix from an angle (the rotation variable) you pass to the shader per particle and rotate the generated coordinates with this matrix. It doesn’t affect fill rate at all, so it won’t cost anything in a real game. However, it does cost some memory bandwidth for the extra rotation variable per particle plus some geometry shader performance, but this is completely irrelevant in this case since fill rate will outweigh it by far.

You can generate the rotation matrix in your geometry shader like this:


    float sin = sin(vRotation[0]);
    float cos = cos(vRotation[0]);
    mat2 rotationMatrix = mat2(
        cos, -sin,
        sin, cos
    );

Then rotate the local coordinates by multiply them with this matrix. See the full shader source below.

There’s no reason to pack the rotation into a vec3. Just add another float variable to the shader and treat it as a completely different attribute, since that’s what it is. Packing is soooo fixed function pipeline.

Shader source code

Java source code
(The only relevant stuff is the attribute setup at start and before/after rendering, but I don’t have time to pick out the relevant parts… >_<)

Performance had a slight impact since my particles/sprites/whatever are so small and many (= not fill-rate limited): I now only get around 1.0 million particles at 60 FPS, down from 1.1 million. I strongly suspect it’s because of the additional memory footprint of the particles. 20 bytes --> 24 bytes = a 20% increase in memory usage. The additional GPU load shouldn’t be significant.

EDIT: Did some more benchmarking. Turns out the GPU impact was higher than I thought and that seems to be the main reason for the performance loss. However, doing that math on the CPU instead and uploading all 4 coordinates is of course a lot more expensive, so it’s obviously worth it.

StumpyStrust · November 7, 2012, 3:43am

Just posted a tut on spritebatcher. Would love for some of you openGL/coding gods to go over and rip it apart…you know…so I can make it better.

I know that the rotation explanation is lacking but I suck at maths so… :persecutioncomplex: