That’s basically why I’m using glBufferData instead of the Sub version. It’s marginally but measurably faster.
What you should be doing is:
Set up OpenGL state for particle rendering
Update each particle sequentially
Map your VBO
Write each particle to the VBO
glDrawArrays
Unmap VBO (“orphan” it)
Swapbuffers
You should find that the glDrawArrays and swapbuffers returns immediately, allowing you to immediately carry on updating the particles for the next frame. Even the call to map the VBO shouldn’t block as the driver should be intelligent enough to give you a completely new buffer if it hasn’t finished drawing the previous one. It should only eventually block at the second call to swapbuffers if it hasn’t yet actually finished rendering from the first one - that is, you are flat out.
Cas 
In this case the ‘only’ gain of my MappedObject library is that the mandatory copy from app-data to mapped-buffer is blazing fast.
I benchmarked ByteBuffer.put(ByteBuffer) and sun.misc.Unsafe.memoryCopy(p1,p2,len) and they were equally fast. On the other hand, pulling data from objects and pushing it into a buffer is significantly slower (2-3x as slow, in my microbenchmark).
It’s probably not where your bottleneck is, unless maybe in particle engines and sprite engines (whatever the difference is).
I’ll, try that tonight. Got school in ten minutes… FUUUUUUUUUU—
EDIT: I made it in time, somehow. I’m awesome!
So I tried to implement it, submitting my data with glMapBuffer instead of glBufferData, and got a small but noticeable increase in performance. Approximately 61 --> 63 FPS. This is what I’m doing:
for (int i = 0; i < threads; i++) {
particleSubtasks[i].bindBuffer();
ByteBuffer mappedBuffer = GL15.glMapBuffer(GL15.GL_ARRAY_BUFFER, GL15.GL_WRITE_ONLY, particlesPerThread * particleByteSize, particleSubtasks[i].getOldMappedBuffer());
ByteBuffer particleData = particleSubtasks[i].getBuffer();
mappedBuffer.put(particleData);
GL15.glUnmapBuffer(GL15.GL_ARRAY_BUFFER);
mappedBuffer.flip();
particleData.flip(); //So it doesn't crash during the next update xD
GL11.glVertexPointer(2, GL11.GL_FLOAT, particleByteSize, 0);
GL11.glColorPointer(4, GL11.GL_UNSIGNED_BYTE, particleByteSize, 8);
GL11.glDrawArrays(GL11.GL_POINTS, 0, particlesPerThread);
}
Anything else I can optimize? =D
EDIT: Now I also reuse the mapped buffer instead of passing null to glMapBuffer… Code updated. No measurable difference though.
Where are you setting the reference you return in getOldMappedBuffer()
Ah, I forgot to do that. -_- I even made a function for setting it in my ParticleSubtask, but forgot to call it. The program works fine, but I can’t test the performance. My CPU has entered permanent power saving running at 800MHz, so I’m getting around 22 FPS instead of 64 FPS. I’ve contacted ASUS, but they’ll probably just tell me to send in my computer. Considering my computer is pretty much worthless now, I don’t really have a choice. If I have to send it in, I’ll be sure to say farewell to everyone on JGO, because I probably won’t be able to survive here in Japan for 1-2 months without a computer. Now I’m off to flash my BIOS.
EDIT: ThrottleStop, marry me and have my babies.