Heightmapping sprites

Tommassino · November 2, 2011, 1:59pm

Hi there, im playing with OGL and ive made a little experiment about 2.5d graphics. I want to have 2 images for each sprite, one a colormap and one a heightmap/zmap (more like distance to camera-map - similar to those that are used when you want to do depth of view). The heightmap adds a lot more feeling that the sprite is actually a 3D model, even though it is just an image. Ive coded this working example:


import java.awt.Graphics2D;
import java.awt.Rectangle;
import java.awt.color.ColorSpace;
import java.awt.geom.AffineTransform;
import java.awt.image.BufferedImage;
import java.awt.image.ComponentColorModel;
import java.awt.image.DataBuffer;
import java.awt.image.DataBufferByte;
import java.awt.image.Raster;
import java.awt.image.WritableRaster;
import java.io.File;
import java.io.IOException;

import javax.imageio.ImageIO;

import org.lwjgl.opengl.GL11;

public class Image extends Node {

	public Image(String path, String dmap, int guid) {
		super(guid);
		this.path = path;
		this.dmap = dmap;
	}

	private String path, dmap;
	public int width, height;
	public int x = 100, y=100;
	public double scale = 0.3;
	byte[] dataABGR, dataHeight;

	public void update() {
		if (dataABGR == null) {
			Rectangle r = new Rectangle();
			dataABGR = loadImage(path,r,scale);
			width = r.width;
			height = r.height;
			dataHeight = loadImage(dmap,r,scale);
		}
		GL11.glBegin(GL11.GL_POINTS);
		for(int x = 0; x < width; x++)
			for(int y = 0; y < height; y++){
				int o = (x+y*width)*4;
				byte r = dataABGR[o+3];
				byte g = dataABGR[o+2];
				byte b = dataABGR[o+1];
				byte a = dataABGR[o];
				float z = height(x,y);
				GL11.glColor4ub(r,g,b,a);
				if(a != 0)//#TODO fix this... gl should support alpha
					GL11.glVertex3f(this.x+x, this.y+y, z/500.0f);
			}
		GL11.glEnd();
	}
	
	private float height(int x, int y){
		return (int)dataHeight[(x+y*width)*4+1] & 0xFF;
	}

	public static byte[] loadImage(String path, Rectangle dimensions, double scale) {

		try {
			BufferedImage load = ImageIO.read(new File(path));
			int width = (int)(load.getWidth()*scale);
			int height = (int)(load.getHeight()*scale);
			dimensions.width = width;
			dimensions.height = height;
			
			BufferedImage img = new BufferedImage(width, height, BufferedImage.TYPE_4BYTE_ABGR);
			Graphics2D g = img.createGraphics();
			g.setTransform(AffineTransform.getScaleInstance(scale, scale));
			g.drawImage(load, null, null);
			int[] data = new int[width*height];
			System.out.println("Loaded texture "+path+" mapped on "+width+"x"+height);
			return ((DataBufferByte)img.getRaster().getDataBuffer()).getData();
		} catch (IOException e) {
			e.printStackTrace();
		}
		return null;
	}
}

I suppose most of you dont wanna read the code, so ill explain a bit. The class represents a sprite with a heightmap, it loads the sprite and heightmap and on each update draws each pixel (no it doesnt hand the texture or byte array to OGL, it sends each pixel individually) on the correct x,y and heighmap calculated z position. I like this method because i can draw my sprites, or model them, and i think i could have them composed into nice complex structures. Maybe a simple 3D approach would be better, but because i like hand-drawn graphics i wanted to try this out

Ok so… now to the question. This approach works, here is a result http://filesmelt.com/dl/result1.png and the colormap and the heightmap used in the example. The problem is, it is slow. I suppose there is some way to do the same thing with a shader, to use the gpu. My experience with 3D and OGL is more theoretical than practical, so thats why i ask you here if you know about some more performance efficient way. I suppose I could send the colormap and the heightmap to the gpu and let some vertex shader do the work, but I dont know how

PS: this is a copied post from http://goo.gl/8tkNv i think here i might have better chances of getting an answer

Orangy_Tang · November 2, 2011, 3:20pm

Try googling ‘nailboards’. I’m not sure what the best way to render those is these days. Stuff relating to ‘soft particles’ would be relevant too.

Tommassino · November 2, 2011, 6:58pm

Thanks for the reply. Nailbord search did indeed spit out some similar stuff, but sadly, mostly questions and a few papers (mainly theory in those). But, i found this: http://glprogramming.com/red/chapter14.html#name18 which looks like exactly what i need. Ill write here if im any successfull.

Ok well, I’ve read a bit about how the buffers work and i think this should work, even though it doesnt:
I call this each update


		GL11.glClearDepth(0xFF);
		GL11.glClearColor(0,0,0,0);
		GL11.glClear(GL11.GL_COLOR_BUFFER_BIT | GL11.GL_DEPTH_BUFFER_BIT | GL11.GL_STENCIL_BUFFER_BIT);
		GL11.glPushAttrib(GL11.GL_DEPTH_BUFFER_BIT | GL11.GL_COLOR_BUFFER_BIT | GL11.GL_STENCIL_BUFFER_BIT);
		GL11.glDepthRange(0x0, 0xFF);
		
		GL11.glEnable(GL11.GL_STENCIL_TEST);
		GL11.glEnable(GL11.GL_DEPTH_TEST);
		GL11.glEnable(GL11.GL_BLEND);
		
		GL11.glBlendFunc(GL11.GL_SRC_ALPHA, GL11.GL_ONE_MINUS_SRC_ALPHA);
		GL11.glHint(GL11.GL_PERSPECTIVE_CORRECTION_HINT, GL11.GL_NICEST);

And this is the new update in the Image class above (this is called multiple times per update cycle - once per each image)


		GL11.glDepthMask(true);
		GL11.glColorMask(false, false, false, false);
		GL11.glStencilMask(GL11.GL_TRUE);

		GL11.glDepthFunc(GL11.GL_LESS);
		GL11.glStencilFunc(GL11.GL_ALWAYS, 0x0, 0xFFFFFFFF);
		GL11.glStencilOp(GL11.GL_KEEP, GL11.GL_KEEP, GL11.GL_REPLACE);
		
		GL11.glRasterPos2i(x, y);
		GL11.glDrawPixels(width, height, GL11.GL_BYTE,
				GL11.GL_DEPTH_COMPONENT, dataZ);
		
		GL11.glDepthMask(false);
		GL11.glColorMask(true, true, true, true);
		GL11.glStencilMask(GL11.GL_TRUE);

		GL11.glStencilFunc(GL11.GL_GREATER, 0x0, 0xFFFFFFFF);
		GL11.glStencilOp(GL11.GL_ZERO, GL11.GL_ZERO, GL11.GL_ZERO);
		
		GL11.glDrawPixels(width, height, GL11.GL_RGBA, GL11.GL_UNSIGNED_BYTE,
				dataRGBA);

I do this in 2 passes, first i draw my heightmap (bytearray with each byte height - later ill probably want floats) to the z buffer, and i use this stencil buffer thing to create a map of pixels that are updated, those are the pixels that had their z good enough to get to the z buffer. After that I draw the image, but with the stencil mask applied. The z is from 0 to 255 (byte), 0 being near and 255 far, therefore the depth function is LESS and the depth buffer is initially set to 0xFF. What is wierd is that even if i play with the stencil function in the second pass (set it to never) everything gets rendered, as if it had no influence on the rendering.
I think this should work, however im probably missing some GL calls or something, cause it only draws the image, as if there was no zbuffer.
Anybody knows what am I doing wrong? Thanks in advance

theagentd · November 3, 2011, 12:43am

You could just do it with a shader. You’ll need two textures in that case: an RGBA color texture and a depth component texture. You’ll also need to use alpha testing, either manually or the built-in function, to discard transparent pixels. You can then use a shader to first sample the color texture, optionally do manual alpha testing and discarding the alpha, and finally sampling the depth texture and write that to gl_FragDepth. It’s pretty simple if you have any experience with shaders from before… xd

glDrawPixels is a really old function that has been deprecated in the latest OpenGL versions. I’d recommend that you use textured quads instead. Simply keeping your depth and color data on the GPU in textures could increase performance a lot.

Tommassino · November 3, 2011, 5:55pm

Well, i have no experience with shaders, but its quite easy, and it works great! The resulting image! Ill post this for future generations to find


//DepthSprite.java=========================================================================================
package cz.witzany.gamev2.graphics;

import java.io.File;

import org.lwjgl.opengl.ARBShaderObjects;
import org.lwjgl.opengl.GL11;
import org.lwjgl.opengl.GL13;

import cz.witzany.gamev2.graphics.utils.ImageLoader;
import cz.witzany.gamev2.graphics.utils.Shader;

public class DepthSprite extends Node {

	public static final int MSG_SET_POS = 1;
	
	private int x,y,width,height;
	private double scale;
	private Shader shader;
	private int tex, depth;
	
	public DepthSprite(int guid, int x, int y, double scale, String texture) {
		super(guid);
		shader = new Shader("Data/Shaders/texture");
		java.awt.Rectangle r = new java.awt.Rectangle();
		File f = new File(texture);
		File color = new File(f,f.getName()+".png");
		File depth = new File(f,"z-"+f.getName()+".png");
		tex = ImageLoader.loadImage(color.getAbsolutePath(), r);
		width = r.width;
		height = r.height;
		this.depth = ImageLoader.loadImage(depth.getAbsolutePath(), r);
		this.x=x;
		this.y=y;
		this.scale = scale;
	}

	public void update() {
		GL11.glLoadIdentity();
		GL11.glPushMatrix();
		GL11.glTranslatef(x, y, 1);
		GL11.glScaled(width*scale, height*scale, 0);
		
		shader.apply();
		
		int colorMap = ARBShaderObjects.glGetUniformLocationARB(shader.shader, "colorMap");
		if(colorMap == -1)
			throw new RuntimeException("Sampler not available:"+colorMap);
		GL13.glActiveTexture(GL13.GL_TEXTURE0);
		GL11.glBindTexture(GL11.GL_TEXTURE_2D,tex);
		ARBShaderObjects.glUniform1iARB(colorMap, 0);

		int depthMap = ARBShaderObjects.glGetUniformLocationARB(shader.shader, "depthMap");
		if(depthMap == -1)
			throw new RuntimeException("Sampler not available:"+depthMap);
		GL13.glActiveTexture(GL13.GL_TEXTURE1);
		GL11.glBindTexture(GL11.GL_TEXTURE_2D,depth);
		ARBShaderObjects.glUniform1iARB(depthMap, 1);
		
		GL11.glBegin(GL11.GL_QUADS);
		GL11.glColor4f(1, 0, 0, 1);
		GL11.glTexCoord2f(0, 0);
		GL11.glVertex2d(0, 0);

		GL11.glTexCoord2f(1, 0);
		GL11.glVertex2d(1, 0);

		GL11.glTexCoord2f(1, 1);
		GL11.glVertex2d(1, 1);

		GL11.glTexCoord2f(0, 1);
		GL11.glVertex2d(0, 1);
		GL11.glEnd();
		
		shader.release();
		
		GL11.glPopMatrix();
	}

}
//Shader.java=============================================================================================
package cz.witzany.gamev2.graphics.utils;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.nio.ByteBuffer;
import java.nio.IntBuffer;

import org.lwjgl.BufferUtils;
import org.lwjgl.opengl.ARBFragmentShader;
import org.lwjgl.opengl.ARBShaderObjects;
import org.lwjgl.opengl.ARBVertexShader;

public class Shader {
	
	public int shader;
	public int vertShader;
	public int fragShader;
	private boolean useShader;

	public Shader(String shaderDir){
		File sDir = new File(shaderDir);
		useShader = true;
		if(!sDir.exists() || !sDir.isDirectory()){
			useShader = false;
			return;
		}
		File vShader = new File(sDir,sDir.getName()+".vert");
		File fShader = new File(sDir,sDir.getName()+".frag");
		if(!vShader.exists() || !fShader.exists()){
			useShader = false;
			return;
		}
		shader = ARBShaderObjects.glCreateProgramObjectARB();
		if(shader == 0){
			useShader = false;
			return;
		}
		vertShader = createShader(vShader.getAbsolutePath(),ARBVertexShader.GL_VERTEX_SHADER_ARB);
		fragShader = createShader(fShader.getAbsolutePath(),ARBFragmentShader.GL_FRAGMENT_SHADER_ARB);
		if(vertShader == 0 || fragShader == 0){
			useShader = false;
			return;
		}
        ARBShaderObjects.glAttachObjectARB(shader, vertShader);
        ARBShaderObjects.glAttachObjectARB(shader, fragShader);
        ARBShaderObjects.glLinkProgramARB(shader);
        ARBShaderObjects.glValidateProgramARB(shader);
        useShader=printLogInfo(shader);
	}
	
	private int createShader(String filename, int type){
		int shader = ARBShaderObjects
				.glCreateShaderObjectARB(type);
		if (shader == 0) {
			return 0;
		}
		String code = "";
		String line;
		try {
			BufferedReader reader = new BufferedReader(new FileReader(filename));
			while ((line = reader.readLine()) != null) {
				code += line + "\n";
			}
		} catch (Exception e) {
			System.out.println("Fail reading vertex shading code");
			return 0;
		}
		ARBShaderObjects.glShaderSourceARB(shader, code);
		ARBShaderObjects.glCompileShaderARB(shader);

		if (!printLogInfo(shader)) 
			return 0;
		return shader;
	}
    
    private boolean printLogInfo(int obj){
        IntBuffer iVal = BufferUtils.createIntBuffer(1);
        ARBShaderObjects.glGetObjectParameterARB(obj,
        ARBShaderObjects.GL_OBJECT_INFO_LOG_LENGTH_ARB, iVal);

        int length = iVal.get();
        if (length > 1) {
            // We have some info we need to output.
            ByteBuffer infoLog = BufferUtils.createByteBuffer(length);
            iVal.flip();
            ARBShaderObjects.glGetInfoLogARB(obj, iVal, infoLog);
            byte[] infoBytes = new byte[length];
            infoLog.get(infoBytes);
            String out = new String(infoBytes);
            System.out.println("Info log:\n"+out);
        }
        else return true;
        return false;
    }
    
    public void apply(){
    	if(useShader)
            ARBShaderObjects.glUseProgramObjectARB(shader);
    	else
    		System.out.println("warning, shader not used");
    }
    
    public void release(){
    	ARBShaderObjects.glUseProgramObjectARB(0);
    }
}
//ImageLoader.java=========================================================================================
package cz.witzany.gamev2.graphics.utils;

import java.awt.Rectangle;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.nio.ByteBuffer;
import java.nio.IntBuffer;
import java.util.HashMap;

import net.sourceforge.fastpng.PNGDecoder;

import org.lwjgl.BufferUtils;
import org.lwjgl.opengl.GL11;

public class ImageLoader {

	private HashMap<String, Integer> images = new HashMap<String, Integer>();
	private HashMap<String, Rectangle> dimensions = new HashMap<String, Rectangle>();
	private static ImageLoader instance = new ImageLoader();

	private ImageLoader() {
	}

	private int load(String path, Rectangle dimension) {
		if(images.containsKey(path)){
			Rectangle dim = dimensions.get(path);
			dimension.height = dim.height;
			dimension.width = dim.width;
			return images.get(path);
		}
		try{
			InputStream in = new FileInputStream(path);
			PNGDecoder decoder = new PNGDecoder(in);

			dimension.width = decoder.getWidth();
			dimension.height = decoder.getHeight();
			dimensions.put(path, new Rectangle(dimension));
			
			ByteBuffer data = ByteBuffer.allocateDirect(4*decoder.getWidth()*decoder.getHeight());
			decoder.decode(data, decoder.getWidth()*4, PNGDecoder.TextureFormat.RGBA);
			data.rewind();
			
			IntBuffer tmp = BufferUtils.createIntBuffer(1);
			GL11.glGenTextures(tmp);
			tmp.rewind();
			GL11.glBindTexture(GL11.GL_TEXTURE_2D, tmp.get(0));
			tmp.rewind();
			GL11.glTexParameteri(GL11.GL_TEXTURE_2D, GL11.GL_TEXTURE_MAG_FILTER, GL11.GL_NEAREST);
			GL11.glTexParameteri(GL11.GL_TEXTURE_2D, GL11.GL_TEXTURE_MIN_FILTER, GL11.GL_NEAREST);
			GL11.glTexImage2D(GL11.GL_TEXTURE_2D,0,GL11.GL_RGBA,decoder.getWidth(),decoder.getHeight(),0,GL11.GL_RGBA,GL11.GL_UNSIGNED_BYTE,data);
			GL11.glPixelStorei(GL11.GL_UNPACK_ALIGNMENT, 4);
			int p = tmp.get(0);
			images.put(path, p);
			return p;
		}catch (FileNotFoundException ex){
			return -1;
		} catch (IOException e) {
			return -1;
		}
	}

	public static int loadImage(String path, Rectangle dimensions) {
		return instance.load(path, dimensions);
	}
}
//vertex shader============================================================================================
void main(){
    gl_TexCoord[0] = gl_MultiTexCoord0;
    gl_Position = gl_ModelViewProjectionMatrix*gl_Vertex;
}
//fragment shader==========================================================================================
uniform sampler2D colorMap;
uniform sampler2D depthMap;

void main(){
    gl_FragColor = texture2D( colorMap, (gl_TexCoord[0].st)); 
    vec3 depth = vec3(texture2D( depthMap, (gl_TexCoord[0].st)));
    gl_FragDepth = depth.r;
}

Now, my next goal is lighting, that is, make some basic dynamic lighting. Ill ask here if i have any problems.

theagentd · November 3, 2011, 6:49pm

O_O Awesome!

Can you post the difference in performance between your shader solution and the old glCopyPixels() one?
A small tip for nicer effects (effectively 3D) is to also keep a z coordinate in your depth sprite class. That way you can move things closer and farther to the screen.
You’re spreading out a 255-value depth over your 16 or 24 bit depth buffer. Why not add a constant scale value to the fragment shader to give you a much larger depth range? Dividing the depth by 65535 would make your program use every possible depth value of a 24 bit depth buffer. You will also want a uniform depth range value so that you map depth map’s values to a certain range (so you can decide how “pointy” your sprite is). You then add the sprite’s z coordinate (also a uniform) and finally divide by the constant. Result: Full use of the depth buffer, and you basically have perfect depth tested 2.5D.
Remember that your game is limited to graphics cards that support OpenGL 2.0 or the necessary extensions you use. I fully support you on the decision though, literally every card sold in the last 5+ years supports it.
Don’t create a new Shader object for every DepthSprite. You’ll only need one that can be shared by all DepthSprites, regardless of what images they have. Reusing the textures between sprites is also a good idea.
You should really enable alpha testing and discard the alpha in the fragment shader (if you have an alpha channel in whatever you’re drawing to). Blending + depth buffer = death.

I see extremely bad aliasing too be honest. xD As you’re calculating depth in the shader, normal multisampling won’t help as the shader is only run once per pixel. You’ll need supersampling or per-sample shaders to run the shader multiple times per pixel. If a pixelized style is what you want, just ignore the rest of my post. xD

The most simple approach is to render to a supersampled FBO texture and later downsample it by averaging the relevant area of pixels under each screen pixel. For example if you want 4x antialiasing (= 4 shades) you’d need a 4 times as wide AND high FBO, and then average 4x4 pixels for each screen pixel. Granted, performance will be insanely bad, especially with lots of overdraw, but it would work on anything supporting OpenGL 3.0, and it’s very easy to implement.

A second approach is to use an accumulation buffer to render the same scene multiple times with a small sub-pixel offset. With this you can achieve 4x antialiasing by drawing the same scene 4 times = 4x the pixel area, instead of drawing everything to a larger screen = 16x the pixel area. It’s much more effective and can be used to achieve unlimited antialiasing samples which in practice equals 255x antialiasing, but that would be so insanely slow that you wouldn’t want to do it. The maximum number of samples in resolution supersampling is limited by the maximum GPU (and performance of course), which on today’s cards is 8196x8196 (= only 4x4 supersampling on a 1920x1080 display). 8x AA would be easy to implement, and considering your code is veeeeery CPU limited at the moment (many OpenGL calls per sprite, e.t.c) the increased GPU will be practically free up to a certain point.
To implement it, add a vec2 jitter uniform variable to your sprite shader which you add to the sprite position AFTER the matrix multiplication. Draw all sprites to the screen and then copy the values to the accumulation buffer. Change the jitter and repeat this process for as many samples as you want. Finally resolve the accumulation buffer to the screen. It’s important to choose good jitter positions to get a good antialiasing effect.

The final really simple approach is to use standard multisampling and enable per-sample shaders. This however requires a multisampled buffer and OpenGL 4.x support (= DirectX 11 class cards). Not really attractive, but I just wanted to mention it.

Tommassino · November 3, 2011, 7:59pm

The shader loading, yeah, i noticed that soon after posting, i already load each shader only once…
About the performance, i dont have any numbers, but i can run now with 500 of these sprites on the screen with above 60 fps (i have my fps limited to 60), previously, i couldnt do that (less than 1 fps). However, more sprites lower the fps under the sweet 60 fps value.
Cause of the performance im thinking about moving the depth value into the alpha channel (now its another png file, with 3 bytes wasted per depth value), with a alpha test for greater than 0, leaving only 1 bit for alpha. And playing with the depth to take advantage of the depth buffers size is probably a good thing to do, but first i need to have some stuff to test it on, these cubes wont do anymore
About aliasing, im quite satisfied with the graphics for now, perhaps later ill look into it, but it doesnt concern me just now.

Edit:
moving the depth channel into the alpha channel and turning off blending helped a lot with performance, now i can have 1000 sprites with 50 fps. Wierdly enough, i had to move the alpha test into the shader, GL11.glAlphaFunc(GL11.GL_GREATER, 0x00); didnt mask off anything.


uniform sampler2D colorMap;

void main(){
    vec4 c = texture2D(colorMap, (gl_TexCoord[0].st));
    if(c.a > 0.0){ 
      gl_FragColor = vec4(c.r,c.g,c.b,1.0);
      gl_FragDepth = 1.0-c.a;
    }
}

theagentd · November 3, 2011, 11:51pm

Tommassino:

The shader loading, yeah, i noticed that soon after posting, i already load each shader only once…
About the performance, i dont have any numbers, but i can run now with 500 of these sprites on the screen with above 60 fps (i have my fps limited to 60), previously, i couldnt do that (less than 1 fps). However, more sprites lower the fps under the sweet 60 fps value.
Cause of the performance im thinking about moving the depth value into the alpha channel (now its another png file, with 3 bytes wasted per depth value), with a alpha test for greater than 0, leaving only 1 bit for alpha. And playing with the depth to take advantage of the depth buffers size is probably a good thing to do, but first i need to have some stuff to test it on, these cubes wont do anymore
About aliasing, im quite satisfied with the graphics for now, perhaps later ill look into it, but it doesnt concern me just now.

Edit:
moving the depth channel into the alpha channel and turning off blending helped a lot with performance, now i can have 1000 sprites with 50 fps. Wierdly enough, i had to move the alpha test into the shader, GL11.glAlphaFunc(GL11.GL_GREATER, 0x00); didnt mask off anything.
uniform sampler2D colorMap;

void main(){
    vec4 c = texture2D(colorMap, (gl_TexCoord[0].st));
    if(c.a > 0.0){ 
      gl_FragColor = vec4(c.r,c.g,c.b,1.0);
      gl_FragDepth = 1.0-c.a;
    }
}

Wait, what’s up with that shader?! You’re not calling discard;! That’s a little dangerous, you’re writing an undefined color to each transparent pixel! They are probably not passing the depth test anyway, but you might get more performance with discard. gl_FragDepth is initialized with the interpolated depth, but gl_FragColor starts as uninitialized memory = “random” data.

The reason why the built-in alpha test didn’t work was probably because you didn’t discard the alpha in the shader. You also shouldn’t use blending together with alpha testing…

You should know that your sprite rendering is very likely CPU limited due to the huge number of OpenGL calls you do per sprite. You can probably win some CPU performance by drawing with a vertex buffer for the coordinates, and optionally a vertex array for the attribute settings. Doing that would remove glBegin, glEnd and all glVertex and glTexCoord calls and replace them with a buffer bind, two attribute pointer calls and a glDrawArrays() call.

And a small tip:
gl_FragColor = vec4(c.r,c.g,c.b,1.0);
is the same as
gl_FragColor = vec4(c,1.0);

Tommassino · November 4, 2011, 10:38am

I dont think discarding is necessary, i clear the depth to 1 (furthest) on each update so these dont pass the depth test anyway, in fact depth 1 is reserved for stuff i dont want to see. In fact, i tested it and calling discard actually lowers my fps.

gl_FragColor = vec4(c,1.0);

Doesnt work… c is a vec4, this works:

gl_FragColor = vec4(vec3(c),1.0);

theagentd · November 4, 2011, 11:39am

Ah, right. I meant

gl_FragColor = vec4(c.rgb, 1.0);

Tommassino · November 5, 2011, 12:03pm

Hi again, ok now, im trying to get the vertex buffer approach to work, but im stuck. I’ve tried doing it as described in the lwjgl tutorial but i think that doesnt work with shaders. I digged around the web a little more and I found a method that surely uses shaders. So I tried binding the vertex buffer to shader variables, but i couldnt get it to draw anything on the screen either. Is this approach the right one? There is simply too much stuff on the internet to know whats right and whats wrong Here is my code:


// Mesh.java==============================================================================================
package cz.witzany.gamev2.graphics.utils;

import java.nio.ByteBuffer;
import java.nio.FloatBuffer;
import java.nio.IntBuffer;

import org.lwjgl.BufferUtils;
import org.lwjgl.opengl.ARBVertexBufferObject;
import org.lwjgl.opengl.GL11;
import org.lwjgl.opengl.GL20;

import cz.witzany.gamev2.graphics.utils.ShaderLoader.Shader;

public class Mesh {
	private int VBOBuffer;
	private int stride;
	private Shader shader;
	private int position, texture;

	public Mesh(String file, Shader shader) {
		// #TODO load from a file
		// x y z u v
		float[] vertices = { 0f, 0f, 0f, 0f, 0f, 1f, 0f, 0f, 1f, 0f, 1f, 1f,
				0f, 1f, 1f, 0f, 1f, 0f, 0f, 1f };
		this.stride = 5 * 4;
		this.shader = shader;

		//register vertex buffer
		VBOBuffer = genBuffer();

		//bind
		ARBVertexBufferObject.glBindBufferARB(
				ARBVertexBufferObject.GL_ARRAY_BUFFER_ARB, VBOBuffer);

		//push data to the buffer
		FloatBuffer fbuffer = ByteBuffer.allocateDirect(vertices.length * 4)
				.asFloatBuffer();
		fbuffer.put(vertices);
		fbuffer.rewind();
		ARBVertexBufferObject.glBufferDataARB(
				ARBVertexBufferObject.GL_ARRAY_BUFFER_ARB, fbuffer,
				ARBVertexBufferObject.GL_STATIC_DRAW_ARB);

		//get the texture and position location pointers
		shader.apply();
		position = GL20.glGetAttribLocation(shader.shader, "position");
		texture = GL20.glGetAttribLocation(shader.shader, "texture");
		shader.release();
	}

	private int genBuffer() {
		IntBuffer buffer = BufferUtils.createIntBuffer(1);
		ARBVertexBufferObject.glGenBuffersARB(buffer);
		return buffer.get(0);
	}

	public void draw() {
		//bind the buffer
		ARBVertexBufferObject.glBindBufferARB(
				ARBVertexBufferObject.GL_ARRAY_BUFFER_ARB, VBOBuffer);

		//set the position and texture to point on the buffer
		GL20.glVertexAttribPointer(position, 3, GL11.GL_FLOAT, false, stride, 0);
		GL20.glVertexAttribPointer(texture, 2, GL11.GL_FLOAT, false, stride, 12);

		GL20.glEnableVertexAttribArray(position);
		GL20.glEnableVertexAttribArray(texture);

		//debug draw, draw anything
		GL11.glDrawArrays(GL11.GL_TRIANGLE_STRIP, 0, 3); // #TODO indices
	}

	public void dispose() {
		ARBVertexBufferObject.glDeleteBuffersARB(VBOBuffer);
	}
}
// DepthSprite.java=========================================================================================
package cz.witzany.gamev2.graphics;

import java.awt.Rectangle;
import java.io.File;

import org.lwjgl.opengl.ARBShaderObjects;
import org.lwjgl.opengl.GL11;

import cz.witzany.gamev2.graphics.utils.ImageLoader;
import cz.witzany.gamev2.graphics.utils.Mesh;
import cz.witzany.gamev2.graphics.utils.MeshLoader;
import cz.witzany.gamev2.graphics.utils.ShaderLoader;
import cz.witzany.gamev2.graphics.utils.Shader;

public class DepthSprite extends Node {

	private int x, y, width, height;
	private Rectangle origSize;
	private double scale;
	private Shader shader;
	private Mesh mesh;
	private int tex, normal;
	private float heightScale;

	public DepthSprite(int guid, int x, int y, double scale, String texture, float heightScale) {
		super(guid);
		shader = ShaderLoader.loadShader("Data/Shaders/Depthsprite");
		mesh = MeshLoader.loadMesh("Data/Mesh/Quad",shader);
		origSize = new Rectangle();
		File f = new File(texture);
		File color = new File(f, "rgbd"+f.getName() + ".png");
		this.tex = ImageLoader.loadImage(color.getAbsolutePath(), origSize);
		width = (int) (origSize.width*scale);
		height = (int) (origSize.height*scale);
		this.x = x;
		this.y = y;
		this.scale = scale;
		this.heightScale = heightScale;
	}

	public void update() {
		GL11.glLoadIdentity();
		GL11.glPushMatrix();
		GL11.glTranslatef(x, y, 1);
		GL11.glScaled(width, height, 0);

		shader.apply();
		shader.setTexture("colorMap", 0, tex);

		int location = ARBShaderObjects.glGetUniformLocationARB(shader.shader, "height");
		if(location == -1)
			throw new RuntimeException("height not available");
		ARBShaderObjects.glUniform1fARB(location, heightScale);
		
		mesh.draw();

		shader.release();

		GL11.glPopMatrix();
	}
}
// Depthsprite.vert=========================================================================================
attribute vec3 position;
attribute vec2 texture;

varying vec2 texturep;
void main(){
    texturep = texture;
    gl_Position = gl_ModelViewProjectionMatrix*vec4(position,1.0);
}
// Depthsprite.frag=========================================================================================
uniform sampler2D colorMap;
uniform float height; 
varying vec2 texturep; 

void main(){
     vec4 c = texture2D(colorMap, (texturep.st));
     if(c.a > 0.0){ 
       gl_FragColor = vec4(c.rgb,1.0);
       gl_FragDepth = 1.0-height*c.a;
     }
}

theagentd · November 5, 2011, 12:46pm

It could just be the byte order messing up. Try to use BufferUtils when you create your FloatBuffer for vertex data as it handles that automatically. A diagnosing tip is to draw the data as GL_POINTS instead, and see where the points end up.

Tommassino · November 5, 2011, 12:52pm

Doesnt seems to help, tried both big and little endian.


		// push data to the buffer
		ByteBuffer fbuffer = ByteBuffer.allocateDirect(vertices.length * 4);
		fbuffer.order(ByteOrder.LITTLE_ENDIAN);
		for (float f : vertices)
			fbuffer.putFloat(f);
		fbuffer.rewind();
		ARBVertexBufferObject.glBufferDataARB(
				ARBVertexBufferObject.GL_ARRAY_BUFFER_ARB, fbuffer,
				ARBVertexBufferObject.GL_STATIC_DRAW_ARB);

And, im not sure if drawing points is good, cause the frag shader filters the 0 height points (which the corners are), so they wouldnt be displayed.
EDIT: i get image now, i changed the debug draw from points to triangle strip again + the byte ordering and it started working
EDIT2: changed it to draw quads… drawing quads is a lot faster than drawing triangle strips (50fps agains 30 fps)
Whats interesting is that i didnt get any performance boost from this… my fps with 2000 sprites is around 25 with, and without vertex buffering.

theagentd · November 5, 2011, 3:51pm

You’re still doing an awful lot of draw calls. To actually increase performance noticeably you have to batch your sprites together. Sorry if I gave you false hope there… ._.

If you really don’t want to use BufferUtils, use ByteOrder.nativeOrder();. LITTLE_ENDIAN is only working because your computer happens to use that.

Tommassino · November 5, 2011, 8:06pm

By batching together you mean something like not calling those shader and vertex binds if they are already bound? And call only transforms and drawarrays? I think i could do that, but i dont think it will help much, i think that could be already optimized.

theagentd · November 6, 2011, 2:31am

Not really, more like drawing all sprites with a single glDrawArrays() call. It’s your transforms that are the problem because you have so many OpenGL calls already. This isn’t something that can be easily solved to be honest, so make sure that performance isn’t good enough already before you try this out, as it makes the code a lot less readable and more error prone.

The main point of batching is to get a number of OpenGL calls that is independent of the number of actual instances of each different 3D model (or in this case: sprites). It should obviously be faster to only use x draw calls than use y * n, where n is the number of sprites. x might be more expensive, but y is done for each sprite so it will be slower for lots of them. For 3D models, this is usually accomplished with instancing, which with a single draw call draws the same model multiple times. There are some ways of getting the instance specific data (often just a model matrix to get it them to different positions) to the vertex shader so you can make the instances have some differing attributes like position, color, rotation, e.t.c.

I think this can be a lot cleaner to solve with newer OpenGL versions. To be honest, I don’t recommend that you do this, but if you THEORETICALLY were to you could use a technique called point sprites to draw each sprite as a GL_POINT which is then expanded to a quad (triangle strip) in a geometry shader. For some unfathomable reason the driver implementation of point sprites in ARBPointSprite is limited to squares with a maximum size of 64x64 pixels, so you’ll have to do it with a geometry shader.

Now stop again and question if you really need this. If it sounds useful or maybe even just interesting, continue. Otherwise just move along with your game. xD Also

An easy way to batch sprites is to either batch all sprites together to a single draw call by putting them all together in the same texture. Using a texture atlas (having the sprites next to each other in a large texture) can create artifacts with mipmaps as the sprites can bleed into each other when the texture is minimized. To avoid this you can use a Texture Array, which is basically an array of 2D textures. You can choose which layer you want to access using a third texture coordinate in your shader (there’s a specific sampler for them). The point of texture arrays is that no sampling is done between the layers, and it’s possible to have mipmaps with them. This is not possible with simple 3D textures. This is just to avoid having to do multiple texture binds when drawing different sprites, but if you only have 20 different sprites or so you don’t have to do it.

Now for the shader. The first thing you need to figure out is which attributes stays constant for multiple sprites. If your drawing 50 identical sprites to different positions, the only thing your shader needs to have per drawn sprite is position. Size, color, rotation (as a 2D rotation matrix), e.t.c is all constant for a given draw call, so you can keep these as uniforms. You’ll want as few attributes as possible, while still being able to achieve whatever you want to achieve with as few draw calls as possible. For example, in a very simple 2D particle system the only thing that varies between points is their position and their color. Size is constant and can in this case be either an uniform or just a plain constant if I only have a single type of particles.

Your vertex shader should just pass through the vertex attributes to geometry shader. The real “magic” happens in there. Here you use whatever data you need to output the 4 corners of your sprite, basically expanding your point to a sprite. You will want to do as little work as possible here, basically just mapping uniforms, constants and attribute to the fragment shader inputs. Going back to your cube implementation, the only thing that varies per cube is position. I assume you don’t have any rotation so we’ll remove that. You’re not using a color either. In the end, the only thing that varies per cube is position, so your vertex shader takes in a vec2 positions and outputs it to gl_Position. Your geometry shader reads this position and outputs 4 unique positions: (x, y), (x + width, y), (x, y + height), (x + width, y + height). Also keep a vec2 uniform with 2 x the inverse screen size (2.0/screenSize). To get screen positions in the -1 to 1 range you can do (position * twoOverScreenSize - 1.0), removing the need for a matrix completely. You can also generate texture coordinates for these attributes in the geometry shader. (x, y) would have (0, 0), (x + width, y) would have (1, 0), e.t.c. Finally your fragment shader is pretty much identical to how it is now.

The result is that you’ve reduced the size of the attribute data from 5 floats to 2 floats, which is 20 bytes -> 8 bytes per sprite instance. The real win comes in the number of draw calls you do. To draw a cube, just add (batch) its position to a large FloatBuffer, and draw all cubes with a single draw call to glDrawArrays() in the end. You’ll be able to draw as many sprites as you want while the number of draw calls remain constant. This in theory should be a lot faster, but it depends a lot on what your current bottleneck. I’m 99% sure that it’s your CPU at the moment, but it could very well be fill rate limited if you have an old graphics card, but I seriously doubt it though. Even if you aren’t fill rate bound at the moment, your sprites are pretty big. Your goal is basically to shift the bottleneck from the CPU to the GPU’s fill rate, which with that big sprites isn’t very hard.

I know that this represents a best case scenario for point sprites, as the only thing that varies is the position. You’ll have to find a balance between the number of draw calls and the amount of attribute data you send. This relies on your definition of “different” when it comes to sprites. You could basically draw every possible sprite using a texture array and lots of attributes per sprite instance, but it might be a lot faster to split this into two different draw calls if you can eliminate some attributes.

Let’s say you want to draw 1000 arbitrary sized sprites. You have the following attributes: a position (x, y) in floats, and a size (width, height) also in floats. This works fine and is really fast. However, if you only have 2 different possible sizes of your sprites, it might be worth moving the size into a uniform and separating the drawing into two different calls to glDrawArrays() with a call to glUniform inbetween.

Note that geometry shaders have horrible performance if used wrongly. As your GPU has several hundreds or even thousands of small processors, it relies on being able to run things in parallel. Having few very expensive geometry shaders is therefore very bad performance-wise compared to having many cheap geometry shaders. The same is true for any shader stage. This is basically the source of the criticism of geometry shaders, as people were expecting them to be able to do heavy tessellation. They weren’t really intended for that, as you might have figured out. Geometry shaders are FAST and can open up lots of possibilities when used correctly. The can reduce the bandwidth and computations needed for point sprites and they can see all the vertices in a primitive, which is important for some algorithms. The performance hit is also exaggerated, a passthrough geometry shader is free on my NVidia card.

Now for the last time question if you really need this. My experience comes from optimizing PARTICLE SYSTEMS, where I basically had millions of pixel sized particles. In such case drawing each particle with a single draw call is completely out of question for more than say 5000 particles. However, when batching up particles which only had varying positions and colors I quickly hit another bottleneck: the speed at which I can fill the ByteBuffer with my particles’ vertex data. Minimizing the data submitted by using appropriate variable types was therefore my first step. Using bytes for the color instead of floats gave me a nice improvement. You could do the same with your current code (for no noticeable speed increase though xD) as you’re just submitting either 0 or 1. You’re clearly not using anything that a byte cannot represent. Note that you should still have have a stride that is a multiple of 4. This is due to the way OpenGL can optimize reads from the buffer if it is a multiple of 4. In my case replacing RGBA from 4 floats to 4 bytes still came out as multiple of 4 (24 bytes -> 12 bytes). In your case, you have 5 bytes, so you’ll want to pad it with 3 bytes so it totals to 8 bytes per vertex to keep it to a multiple of 4.

I then added lots of other optimizations like multithreading the updating, Riven’s MappedObject implementation, e.t.c. In the end I had a particle engine capable of 64 FPS with 1 000 000 particles on a dual core LAPTOP. The funny thing is that my program would be bottlenecked when the points were bigger than 4 pixels. Basically, if you have sprites in the 200x200 pixel range, you will hit a GPU bottleneck much faster. Therefore you really shouldn’t waste time optimizing something that isn’t the bottleneck of your game.

HOLY SHIT I DID IT AGAIN PLEASE DON’T HIT ME T__________T

EDIT: TL;DR: Batch your sprites so that you can draw them in a constant number of draw calls regardless of the number of instances you have of them. This can be done with a geometry shader, but it’s not always worth implementing it if the sprites are big as they will be fillrate limited. The rest is just specifics in how to implement it with a geometry shader. Also try to keep the amount of data you send each frame as low as possible, as it consumes bandwidth, but also lots of CPU power to fill the data buffer.

EDIT2: And ra4king is mean to me. T___T

ra4king · November 6, 2011, 4:26am

theagentd…you’re writing too many walls of text…slap

EDIT: You should really start adding a “TL;DR” at the end of all your posts. They will be quite beneficial to a lot of people, including me

EDIT2: <3

Tommassino · November 6, 2011, 3:07pm

TL but i did read, thanks for the insight.
Even though this looks like an interesting thing to do, i think ill do fine with my 2000 sprites for now.
And now i feel bad cause i have such a short reply for this wall of text

theagentd · November 6, 2011, 4:04pm

Sorry again. I mean, come on! I gotta stop ramble! Anonymity is dangerous as hell… >____<

R.D · November 6, 2011, 6:49pm

While reading this I had a idea… Why is it that there is no Shader Framework based on LWJGL? I think something like that would simplify the process of compeling and linking a shader or setting uniforms and stuff. Maybe even giving a set of simple shaders ( I saw this in the OpenGL SuperBible). I don’t know much about shaders but I think this would be pretty cool und also not that hard to make am I wrong?