Access violation with glDrawArrays

Don’t create arrays for each vertex. Also, it looks like you’re drawing fullscreen quads. 100 fullscreen passes = 10-20 FPS.

Is OpenGL supposed to be this slow or is there an alternative to what I want to do. I mentioned this in an earlier thread which didn’t get so much attention, so might as well hotlink it here: http://www.java-gaming.org/topics/streaming-to-vbos-or/35174/msg/332551/view.html

EDIT: Tried to be smart but it didn’t change anything, so I don’t thing it’s really about buffer writing… http://pastebin.com/vKWKAvZM

EDIT2: The fullscreen quad thing seems to be a major FPS killer, why is this the case?

EDIT3: It almost feels like immediate mode would be faster than doing this :I

Cas, didn’t you say, the power of mapping gl buffers comes from using threads ?

http://en.wikipedia.org/wiki/Fillrate Read it and weep :slight_smile:

Mike

There is neglible overhead for mapping unsynchronized buffers, they will never cause a GPU stall, they can however cause a driver stall. See: http://www.java-gaming.org/index.php?topic=32169.0

[quote] Similarly, mapping a buffer forces the game thread to wait for the server thread to finish any pending operations, stalling the game thread until the buffer can be mapped. What we’re seeing is not glMapBufferRange() becoming more expensive; we’re seeing a driver stall! Using unsynchronized VBOs eliminates the synchronization with the GPU (to ensure the data is no longer in use), but the internal driver thread synchronization cannot be avoided this way.
[/quote]
GL_UNSYNCHRONIZED_BIT is brand new but immediately replaced with GL_MAP_PERSISTENT_BIT, which neither causes GPU nor driver stalls.

Yes, you can fill your mapped buffers from worker threads, which is especially effective on AMD processors, as they have crappy memory controllers (compared to Intel i7), causing individual cores to have severely limited bandwidth.

I just tried drawing by filling the mapped buffer only once and the speed didn’t increase so dramatically, which makes me confused so as to what’s slowing me down. When I can play Counter-Strike: Global Offensive at solid 120 FPS, why can’t I draw 6000 vertices?

Decrease the size of the triangles with a lot and see what happens :slight_smile:

Mike

It runs faster ;D

So the goal in the end is reducing overdraw as much as possible?

I’m assuming theagentd is referring to ARB_buffer_storage when talking about persistently mapped memory.

If so, he’s right in that its the fastest way now, only downside is that it seems to require pretty new hardware (OpenGL 4.3+) making it pretty much unusable for applications targeting the masses (unless you want to have an application with multiple rendering paths). Once such hardware becomes mainstream its definitely the way to go.

The vertices are fast to process, but the GPU chokes on filling all your pixels.

Some numbers for you:

  • 1920x1080 = 2 073 600 ≈ 2 million pixels
  • 100 fullscreen passes * 2 073 600 pixels ≈ 200 million pixels filled per frame
  • Theoretical fillrate of a GTX 770: 33.5 gigapixels/sec, or 33 500 000 000 pixels per second.
  • Theoretical FPS of a GTX 770: 33.5 / 0.2 = 167.5 FPS
  • Actual FPS of a GTX 770 using this code: 166-168, seemingly fluctuating.

That was surprisingly accurate. o___o

It is however easy to make a flexible interface which can be implemented using glBufferData(), unsynchronized mapped buffers and persistent mapped buffers and that can dynamically switch between them. See this again: http://www.java-gaming.org/topics/maximizing-vbo-upload-performance/32169/view.html

Here’s the performance of the Insomnia test program with a shitload of shadow maps being rendered and buffers being mapped:

Unsynchronized: 36 FPS, ~26.5 ms per frame spent on doing OpenGL calls.
Persistent: 58 FPS, ~9.8 ms per frame spent on doing OpenGL calls as driver multithreading kicks in once buffer mapping isn’t causing synchronization anymore.

Would anyone be interested in an extensive graphics performance optimization article about identifying and solving bottlenecks?

is that a question ?

With examples, please

Also, I guess a lesson I learned today was more on the GPU performance being related to fillrate more than anything else–and also shaders

Yes, it is.

Yes, that would be very useful.

God damn it, kappa, your GIF logo is messing with my GPU benchmarks. >___<

Yo I just have a general question with this. Do you guys use VBOS or do you use other methods, if so how do you use your VBOs , do you group them into similar object types where you can predict the movement simplifying and lowering buffer modification or do you have one very large vbo for the entire screen. Maybe some convoluted offscreen FBO that generates a texture which you then write to the main FBO. *Lcass runs before he becomes any-more giddy about graphics :0

My sprite engine uses two large VBOs - vertex data and index data. The sprite engine can also render arbitrary bits of geometry interleaves with the sprites, which use the same VBOs as the rest of the sprites but not necessarily the same vertex layout. Occasionally I render other bits and bobs like special effects, and these use their own little VBOs.

Also each screen I have has its own separate and distinct sprite engine. And because of a quirk in my UI code, currently viewports (“scrollpanes”) have their own sprite engine too for the contents.

Cas :slight_smile:

I basically do the same as Cas. I try to batch up as much data as possible into as few VBOs as possible. I currently batch up all my model vertex data into a single VBO and all indices in a second VBO. For skinned models, I also batch up all the skeleton data into one MASSIVE VBO which is accessed as a texture buffer, which I can strongly recommend if you have data that is reused between vertices like skinning data.

you prefer TBO’s over SSBO’s ?

Could you explain what data you refer when you say skeleton data?

In Hardland I make one big Buffer where I pack all data for all draw calls per frame. Then each instanced draw call just have to know start index and data size per instance. Then instancing work for skinned and any special meshes too.