Why batch?

I’ve spent some time with multiple homebrew engines/frameworks on this forum to catch up some ideas on design etc., and the thing that I’ve been noticing is that quite a lot of developers are implementing batching in their engines/frameworks.

I’ve never really understood why they do it; wouldn’t it be more sufficient to upload data when changed, instead of every frame?

Could someone explain why some would choose this design?

Thanks :slight_smile:

I couldn’t completely see that the article is answering my question. :stuck_out_tongue:

GPU Have many Procces cores an shi love big fat single work, and hate many small works,

Because GPU have many cores big job like draw pixels can be separated,
but small job can make more time on giving tasks to cores and preparation then work itself.

So you must give as bigger single piece of work, as you can.

  • you also losing procces time between switching(transfer) tasks CPU – GPU.

Yeah, I understand that, but I want to know why batching is superior (if it is) to a big VBO with geometry that you just translate around with a matrix.

So to sum up:

VBO that doesn’t get updated every frame, but achieve transformation via a matrix vs. VBO that does get updated every frame, and achieve transformation via coordinates in the VBO.

That doesn’t really have anything to do with batching. Batching is about drawing multiple things in a single draw call to minimize the number of expensive draw calls you have to do. For example, let’s say you’re drawing 10 000 particles or tiles. These are perfect matches for batching. 10 000 calls to glDrawArrays() (or cringe using immediate mode) is going to be slow as hell. Note that the CPU is the bottleneck here; your GPU can easily render millions of particles at 60 FPS. The problem is that one glDrawArrays() call is a lot more expensive for the CPU than it is to render 4 vertices and a handful of pixels for the GPU. If you instead batch them together so you can render them using a single glDrawArrays() call, you’ll get a massive performance boost since the CPU is no longer holding you back.

As you can see, this has nothing to do with where the vertex data you’re using resides or comes from. The same logic applies to static tiles stored in VRAM and particles streamed to a VBO each frame.

Thanks for clearing up! ;D

I never really have used batching from those engines, but don’t they require you to upload data to GPU every frame?

You can do something like that with VBO’s. Put all the static level tile data which never changes into big fat VBO and never change it, just render it at an offset. Isn’t that better than putting data every frame into “batch”?

Better aproach for huge maps is split data to smaller static chunks(like 64x64) and only render ones that are visible. Another aproach would be keep vertex data static and use dynamic index buffer and only render visible tiles.

It should be noted that these are usually premature optimizations – good to keep in mind, but probably not worth the hassle unless you actually find that your tile sprite batching is causing your game to be vertex bound.

Also equally important to realize where the bottleneck is; if you see tile map rendering killing your FPS, it might not mean that static VBOs will help if your bottleneck lies with fill-rate.

In most cases a SpriteBatch like in LibGDX will perform just fine.

I think LibGDX’s SpriteBatch indeed only supports streaming data to the GPU, but that’s simply because sprites are dynamic objects that move around a lot. It would indeed be better to use chunks stored in static VBO as pitbuller said to avoid having to set up and upload the tile data each frame, but the thing is that this also qualifies as batching. A chunk is a “batch of tiles” so we don’t have to use one draw call per tile.

Which optimization are you referring to? Batching and frustum culling are both very important optimizations if your world is so large that some of the objects in the game are outside the screen. In 2D, you often have vast tile levels that are much larger than your screen. Sure, you may be able to attain 60 FPS without batching or culling, but low-end computers might struggle. People are generally not very accepting of performance problems in 2D games. In 3D games, at least frustum culling is essential. Minecraft (or any game for that matter) without culling = 5+ times more work due to rendering blocks behind you. Remove batching as well, and you’ll be doing millions of draw calls each frame (this is Minecraft specific though), which is 1000x more than you can do at 60 FPS on high end.

Regarding bottlenecks: Batching helps with CPU performance. Frustum culling helps with both CPU performance AND reduces the number of processed vertices. Vertex performance isn’t very important, at least not in 2D games, but the reduced CPU overhead of not rendering things that are outside the screen is often at least noticeable.

Yes, batching and basic culling are important. I was referring more to pitbuller’s optimizations; he was talking about using a static VBO for tiles, or creating a chunking system.

With LibGDX’s SpriteBatch, good use of sprite sheets, and some basic culling (not drawing tiles that are not on screen), most 2D games will perform just fine.

Yup. libgdx has a SpriteCache which essentially batches to a VBO so the geometry doesn’t have to be sent to the GPU each frame. It gives an ID to each “cache” in the VBO, which is used to draw or rewrite the cache. I haven’t actually found a need for SpriteCache, but it was fun to write.