GPU, wich amount of VBO data can be allocated ?

blizzard · October 18, 2011, 8:13am

Hello

I start working with VBO´s and my first tests look good.
For further planning the following question comes up.
Which amount of GPU memory can be allocated by using VBO´s ?
Just to get a rough feeling about the sizes that can be used.

I have a NVIDIA Quadro FX 770M GPU and a utility program shows that there are 512 MB available on the GPU card.
Are there 512MB available for use or how do I find out what is available ?

Thanks

theagentd · October 18, 2011, 10:31am

Your GPU has 512 MB of VRAM, and that’s where your VBO data is stored. Remember that this memory is shared with everything else running just like system RAM, so don’t expect all of those 512MB to be unused. For example, Windows Aero uses about 100-125MB of VRAM constantly. If you disable it, it drops to ~25 if I recall correctly. I recommend you try out GPU-Z, a program that can monitor memory usage, at least on NVidia cards.
512 MB is however not the limit of how much VBO data you can have. If the graphics card runs out of memory, it will start swapping VRAM to system memory, similar to how your computer swaps system memory to the hard drive. This is obviously not very good for performance, but it is actually not that bad either. The memory manager is intelligent enough to swap out unused things, and the PCI-E bus is fast enough to handle it pretty well. What happens when you overwhelm your VRAM depends on with what you overwhelm it with. If you have 1GB of unused (cached or preloaded or something) VBOs lying around you won’t see much of a performance drop (I estimate it to less than 10% at least). If you overwhelm it with textures, things get much worse, as the whole texture is needed in memory for a much longer time (a VBO is only read ones per draw and can then be swapped out again). Source of info: My own game. I load chunks of the game world into VBOs and only draw the ones that pass the frustum culling test. I used about 2GB of VRAM, and my card only has 1.5GB. xD

Fun fact: If you overwhelm your VRAM with a framebuffer (render target) you will freeze your computer. Yeah, the mouse and everything. Swapping memory that is so commonly used simply freezes the game. What? You wonder why I know that? Who wouldn’t want to play a game with 32SxAA (= 2x2 driver supersampling + 8xMSAA) on a 2x2 supersampled RGBA_32F render target? I mean, come on! That’s just 16x ordered grid supersampling with 8xMSAA and my 1.5GB VRAM ran out?!

delt0r · October 18, 2011, 11:58am

Even if you have not run out of VRAM, a the driver is free to not store your VBO on the card. For example in streaming mode.

theagentd · October 18, 2011, 12:22pm

Ah, yeah, that makes sense too.
If you fire up the NVidia Control Panel and on the menu bar choose help and then System Information, you can find out exactly how much memory you’re allowed to use. Mine says 1.5GB dedicated video memory, 3316 total available memory.

princec · October 18, 2011, 12:28pm

Last I read, using about 4-8MB for a streaming VBO was recommended. And of course for static geometry, whatever size it needs to be to hold the data. Create as many of them as you need.

Cas

theagentd · October 18, 2011, 12:37pm

I’ve noticed that using STREAM_DRAW doesn’t affect FPS at all. I’m sending 12MB of data every frame for 1 million particles at 60 FPS. It’s CPU-limited though, so it might just be that.

gouessej · October 18, 2011, 12:40pm

Hi

Please take care of a particular thing. As far as I know, ATI and NVidia graphics cards do not exactly have the same behavior when they do not succeed in storing VBO data on the GPU, some (but not all) ATI graphics cards simply return an error code whereas most NVIDIA graphics cards try to store the data somewhere else and return an error code only when even this fails.

Actually, I think that some of these flags have never been properly supported, I only see a difference between dynamic and static except on very early implementations of VBO (for example ARB implementation in OpenGL 1.3).

princec · October 18, 2011, 12:42pm

You’ll never really know until you profile the call to glDrawRangeElements With VBOs you will tend to find that glDrawRangeElements returns immediately. You will get blocked on glMapBuffer instead if your buffer is still in use, which it won’t be, because you’ll have called glUnmapBufferARB after you render with it.

Cas

theagentd · October 18, 2011, 12:53pm

Ah, I use glBufferData. glBufferData is however a big performance hogger in my particle test. Calling glBufferData 5 times per frame drops FPS to the low tens.

princec · October 18, 2011, 1:38pm

glMapData’s the best way to do it. Failing that you should be calling glBufferSubData instead of glBufferData as glBufferData causes the driver to discard and recreate the buffer.

Cas

princec · October 18, 2011, 1:42pm

FYI the reason that glMapData is the best way to do it is that the buffer you receive from the call is mapped directly - with any luck! - to fast-writing DMA memory which doesn’t pollute your CPU data cache. What you’re probably doing at the moment is filling up a big ByteBuffer in system RAM, trashing your caches completely along the way, then copying that entire buffer to the DMA staging area with glBufferData. Far better to simply ask OpenGL to give you a direct byte buffer straight into that RAM and not trash any caches or do any copying

Cas

blizzard · October 18, 2011, 1:44pm

Thank you for the information …

I´ve one further question, is it recommended to use glMapBuffer to update the data in my VBO´s ? For my first test I use glBufferData, it work but is it a fast way to do this ?

My English is not that good so I´m not sure if I have understand the performance hint from theagentd correct

blizzard · October 18, 2011, 1:50pm

That was the answer princec

thanks

theagentd · October 18, 2011, 2:15pm

I’m using Riven’s MappedObject library for this. I’m sure glMapBuffer will work with it, but how do I do that? =S

blizzard · October 18, 2011, 2:22pm

I try to use gl2.glMapBuffer …

it crashes … has anybody a short example ?

I will try it again tomerrow now I have to go

bye

princec · October 18, 2011, 3:06pm

Riven’s library should work great with mapped VBOs… provided you don’t attempt to read any data, which you probably will do, by accident. I’d advise for now not using the mapping library and just concentrate on making sure you’re writing the correct data in there.

“It crashes” btw is no help to anyone whatsoever trying to help you.

Cas

theagentd · October 18, 2011, 3:56pm

So I can’t read the data back? Then it’s useless for most programs using MappedObject. The reason I’m using it in the first place is to be able to permanently store the position and color in a ByteBuffer so I don’t have to copy everything into it every frame.

princec · October 18, 2011, 4:03pm

It’s still got its advantages and its place but if you try mapping your objects directly into a mapped VBO you’re fundamentally doing it wrong The VBO is purely for rendering data, you don’t want to have, say, Sprites in there or anything. Having said that mapped objects may potentially save you some space and should also, if you’re processing them in nice friendly efficient linear ways, be cache-friendly.

Cas

theagentd · October 18, 2011, 4:07pm

I have one million particle positions and colors stored in a mapped ByteBuffer, so I don’t have to update it in a Particle object and then put the updated position in a ByteBuffer for each particle each frame. I’m saving both performance and memory by doing that. How am I doing things wrong? >_>

lhkbob · October 18, 2011, 5:02pm

If you’re storing the data in a GPU mapped buffer (and not one of Riven’s mapped buffers), you are making the graphics card go through a lot more effort to manage the data within the VBO. If you’re making the mapped data read-able, the graphics card has to synchronize every time you try to render with that data, which is inefficient.

It is best to separate the graphics card and CPU as much as possible because they work best in an asynchronous fashion. If you can push requests/data to the card and then let the CPU run, the CPU works best and you’re not forcing the GPU to do something it considers inefficient. It also lets the GPU process all of the requests in the queue in an uninterrupted fashion, which is also good for performance.