FloatBuffer and Batching

After a few weeks, I have finally created my basic sprite batcher!
It uses a VBO (DYNAMIC creation hint) and IBO (Static creation hint), where the IBO is pre-filled with the needed index data.
Whenever I need to draw a sprite I have method that places the vertex and texture data into a giant float array.
Then when I reach my sprite limit or need to change textures, I shoot everything to the GPU!
Basically I call a single Put method on a floatbuffer, where my giant float array is the data source, and I draw everything needed.



mySuperFloatBuffer.put(myGiantFloatArray);



So whats left? The only thing left… performance tests… unfortunately I don’t like the numbers too much :frowning:

I’m testing my batcher against its STATIC counterpart. Meaning my VBO is set as STATIC, my vertex data (X Y Z) is pregenerated, and the floatbuffer used by the VBO is prefilled.
The other way, I’m writing over the floatbuffer every frame with random vertex data from my giant float array using a single Put call.

Now here are the numbers
This is using 20,000 textured quads(sized 32 x 32) over a 800x480 screen space

Dynamic (Normal Way): Delta Time = 0.048 seconds or 22.72 FPS
Static (Testing Purpose way): Delta Time = 0.020 seconds or 50 FPS

So as you can see that is a pretty large difference, so what gives?
I know I won’t be able to get them completely matching. Since one way I’m changing the data every frame and the other way is prefilled never changing.
BUT I feel like my numbers should be a bit better

My vertex and fragment shaders are extremely simple. The vertex shader just gets pretransformed vertex points. The fragment shader just uses the texture, no manipulation of any kind.
I believe my problem lies on the CPU side. Specifically, how I treat vertex data, I mean I am doing this

Raw vertex data into giant float array then one big copy using the Put method to place it the floatbuffer. Before finally sending it to gpu using glBufferSubData



//In my draw method
myGiantFloatArray[currentSize] = 0.0f;
myGiantFloatArray[currentSize + 1] = 0.0f;
myGiantFloatArray[currentSize + 2] = 0.0f;
myGiantFloatArray[currentSize + 3] = 1.0f;
/* other vertex data */

//------------------------------------------------------

//In my Render (flush batch) method
mySuperFloatBuffer.put(myGiantFloatArray);
mySuperFloatBuffer.position(0);

/*Other openGL stuff  */
glBufferSubData(/*Params*/);

/*Any remaining stuff and index draw call */


My real heavy hitter in that above code is the Put. Taking up 0.010 seconds or so of my precious time!
So finally, is there a better way to handle things on the CPU side? Any help would be greatly appreciated.
Let me know if I need to add more info on anything

Thanks!

By the way, I’m using openGL ES 2.0, which does not have any glMapBuffer calls :frowning: