Culling Scene Geometry for Rendering

kitfox · October 11, 2005, 7:28am

To speed up rendering of my scene, I’m trying to figure out ways to cull out objects that won’t be rendered in a particular frame. I’m in the research/planning stages right now, so I’ve not committed to a particular method yet. I’m hoping to see what the vetrans think is a good or bad idea. (Or perhaps even if culling is overkill).

I’m looking for a general purpose solution that would work for both outdoor and indoor scenes. In another thread, both ‘bounds base culling’ and BSP trees were mentioned.

What is bounds base culling? Does this just mean test an object’s bounding box against the view frutsum?
Portal culling seems like a good idea for indoor scenes, but not for the general case. Are there good portal based culling algorithms for the general case? Can portals be constructed on the fly, or does an artist need to design them?
How can BSP be used for culling geometry? Is this just a way to partition the scene into chunks so that an entire chunk of data can be tested against a view frutsum? If so, would using an octree to partition space be as good or better? Or perhaps a method where I assign for each intermediate node in my scene graph a bounding box equal to the bounding box of the union of all its children?

Mark McKay

Niwak · October 11, 2005, 12:56pm

As far as I know, there is no unique good solution for all purpose. I suggest reading http://people.csail.mit.edu/fredo/PUBLI/surveyTVCG.pdf which gives a very good survey of all major methods.

Bounds base culling consist of testing the bounds of an object (or a group of object) against the view frustrum. It is the most basic method that nearly all scenegraphs will offer. It is rather cheap and can be used with other culling algorithm for refinement (for example you perform portal culling which gives you a rough potentially visible set that you refine with bounds base culling).

BSP, quadtree and octree are very similar partitioning techniques ; they just differ in the type of spatial division. The efficiency of one of these methods against the other will depend on the rendered scene. I think that you can find research paper about this subject on the internet.

I think the easiest way o test this is to have a pluggable architecture that allows different culling method as plugins. I have not yet think how you can efficiently implement this. Ogre3D gives a solution (that I don’t find fully satisfying but it’s the better I’ve found).

By the way, I’m answering but I’m no veteran… I’m just in the process of writing my own engine and I’ve come across the same questions recently.

 Vincent

darkprophet · October 11, 2005, 9:42pm

Portals cannot be generated on the fly as the definition of a portal engine needs a convex “room” with 1 or several portals leading to that room, and finding those combinations is hard and its relatively easy to let the artists specify a few polygons. You can always use the room’s bound as the convex object and display it once one of the room’s portals is in view.

BSPs cannot be generated on the fly as to get maximum usage from them, they need to be balanced and finding that balanced tree takes long hours or even days if you want to push it.

Bounding culling is something of an interest to me. Its simple to create, but difficult to master. The basic idea is to encompass a geometry with a convex hull that can be a sphere, world axis aligned bounding box (AABB), object oriented bounding box (OOBB), or a k-dop. I have listed those in order of increating “tightness” around the object, however, the bad side is they are also listed in order of how slow they are to calculate (sphere being the fastest, k-dop being the slowest). The art of frustum culling is getting the right balance to achieve maximum FPS because you dont want to be spending too much time in the CPU when it would just be faster to render the dam thing.

The way I do frustum culling is rather simple. I initialise the OOBB to contain the vertices in an axis aligned manner, then with every update, rotate the OOBB to comform with the rotation of the geometry its encompassing. Then, before rendering, you traverse the tree, check every bound to the frustum and see which one is outside and which ones are inside and only draw the inside ones. Thats the simple jist of it, but its also the slowest method you can do

To make frustum culling fly, I use 3 optimisations, i’l list them in order of importance:

Temporal plane coherency
This idea comes from the simple fact that if the bound failed on a plane, then there is a high probability it will fail on the same plane next frame. So simply store the last plane the bound has and check against that first, if that fails, you can immediately say the bound (and hence the geometry) is out of view. If that fails, then you have to find out in which way does it fail? Does it fail completely? meaning are all 8 vertices of the OOBB inside the plane? Or are some of them inside and some are outside (Intersecting the plane)? This leads me on to the next optimisation

Plane bitmasking
If you have a tree, each node should have a bound that encompasses the bound of its children, then simply checking against that 1 bound will eliminate an entire heirarchy (alot of geometry in some cases for a simple calculation). If the node’s bound is intersecting a plane, you can safely say to its children only check against the plane that the node is intersecting. This again saves on the number of plane checking that is required and can save 40 dot products (8 vertices * 5 planes). But if the node is not intersecting, i.e. failing or passing, then this optimisation is useless, which leads me to the next optimisations, when a bound is inside.

Octant test
You divide the frustum into 8 parts, 4 near the camera, 4 far from the camera. This produces a centre point in the frustum which can be calculated using:


Vector3f centre = new Vector3f();
centre.x = camera.location.x + (camera.getLeftPlaneCoefficient() + camera.getRightPlaneCoefficient())/2;
centre.y = camera.location.y + (camera.getUpPlaneCoefficient() + getDownPlaneCoefficient())/2;
centre.z = camera.location.z + (camera.getNearPlaneCoefficient() + getFarPlaneCoefficient())/2;

Then check the bound’s location against the centre point to find out in which octant the bound is in, this will reduce the number of planes to check from 6 to 3, saving you 24 dot produces (8 vertices * 3 planes).

Those three work beautifully with any type of bound, you just have to pick one that does it for you. I chose OOBB’s because they are tight fitting around the object, yet are easy to calculate. But if you use AABB, there is another optimisation you can use, and thats the n-p vertex test:

n-p vertex test
If the box is aligned to the world axis, that means for each bound check you only need to test 2 vertices, the n vertex and the p vertex, because the n vertex is the closest to the plane, the p vertex is the furtherest from the plane, and from those 2 vertices, you can determine if the bound is inside, outside or intersecting. Just another optimisation in the hat…

There are alot more optimisations, some can even be hardware accelerated (such as occlusion queries, which you will need for portal culling), but i’l leave those details to you

DP

Orangy_Tang · October 11, 2005, 10:01pm

Actually, portals can be a great method for general case culling. You divide your world into sectors[1] linked by portals, then do standard recursive rendering starting from the sector the camera is in. This gives you all the usual funkyness - accurate occulsion, simple memory structure and handy for collision partitioning too. But now you’ve got a group of sectors to render, you can do whatever you want in them - an indoor sector might contain a BSP tree you use for rendering. An outdoor sector might have only basic frustum culling but lots of LODing. Or a sector could just be a brute force render, instead relying on suitably dense portals to cull invisible sectors. At the other end you could make your entire world just one sector if you know that it’s internal rendering method is good enough.

[1] Only traditional software renderer based portals require you to have convex sectors - you can have concave ones once you start using a z-buffer.

darkprophet · October 11, 2005, 10:13pm

but the actual division of the space to the sectors cannot be done in real time as its difficult to get the right heuristic to match the portals and the sectors up, not optimally anyway…Unless you know of a way?

Orangy_Tang · October 11, 2005, 10:35pm

Yes, you’ll have to have hand placed portals. But someone will be manually creating your levels anyway, so that shouldn’t be a problem.

kitfox · October 12, 2005, 12:25pm

Your technique sounds quite useful, Darkprophet. Would the best way to go about this be to transfrom my OOBBs into eye space before testing against planes (thus greatly simplifying the plane test and letting me use an arbitrary frutsum)? Would OGL let me accelerate this, or do I need to do all the math with javax.vecmath?

Mark McKay

kitfox · October 12, 2005, 12:55pm

Make that transform my OOBBs into clip coordinates. That is to say, the composition of the modelview and projection matricies will produce an affine matrix that maps visible space to the unit cube.

darkprophet · October 12, 2005, 2:35pm

If you are serious about making an engine, you would need vector math everywhere! There is virtually no place in the graphics pipeline that doesn’t require them, so yeah, you will need vecmath stuff.

As for the OGL stuff, you can construct the frustum from the perspective and modelview matrices intially, and on every update, just update the frustum from the modelview matrix and only deal with the perspective matrix if the frustum properties have changed (e.g. aspect ratio, fov, near/far plane ratio). However, you would incur quite a few JNI overhead calls, and althought it has become trivial in the latest releases, its still there. I would rather compute them numerically. One such method is eberly’s way in which the plane coeffecients are calculated based on the frustum parameters, then with each new rendered frame, the current frustum is transformed to be aligned with the modelview matrix. This saves the JNI call and the fiddling with buffers and whatnot:

www.geometrictools.com

My OOBB is made up of 2 vectors, an offset from the owner and the extents. The offset is calculated such that it is set to the mean of the vertices or to the centre of the model, depending on the situation and whatever works for you. The extents is how big the OOBB in each direction (x, y, z), its difficult to calculate the OOBB such that it is an absolute minimum in real time. Eberly has a good way of doing it, but its definetly not for real time. What I do is store the OOBBs for each frame in the model in our model format (vea and vem) and simply update the OOBB based on that data

The way Ive done it is to find out the maximum value of the vertices in the 3 axis, and simply set the extents to the distance from the centre (or offset) of the OOBB. Then on each update, you transform the extent by the owner’s rotation (can be a quaternion or matrix). If you are using a quaternion its a simple matter of:

v’ = qr * v * qr^-1

where v’ is the new vector, qr is the rotation quaternion, v is the current vector, and qr^-1 is the inverse of the quaternion.

To check if the OOBB is outside/inside/intersects a plane of the frustum, simply find the pseudodistance via a dot produce of the 8 vertices that are a combination of addition/subtractions of the extents from the offset to find a vertex, if the pseudodistance is > 0, then its inside, if its < 0 its ouside. Then with that data of the 8 vertices, you can find the state of the OOBB and determine if the OOBB’s owner is to be rendered or not.

Does that answer your questions? Or do you want me to go into further details?

DP

SeskaPeel · October 12, 2005, 7:53pm

A very simple way to cull objects against the frustum is to use a sphere culling approach. You need to precompute the bounding sphere of each of your models (google Eberly bounding sphere), and then test it against your view frustum. I don’t remeber exactly where, but there is an over optimized version that can reject a (transformed) sphere in less that 6 operations. Google it with “view frustum culling sphere”. It will work with portals too.

You can have all this implemented in 5 hours.

SeskaPeel

darkprophet · October 12, 2005, 11:31pm

6 operations? You can do it in 1! Do a pseudodistance from the centre of the of the sphere to the plane and check to see if that distance is < -radius of the sphere, if so, then its outside…

But like I said above, the sphere is generally less fitting and you will still draw some geometry even tho they are not visible if the sphere overestimates alot (e.g. long thing object, cylinder perhaps) until the bounding sphere is well away from the plane…

DP

SeskaPeel · October 13, 2005, 6:44pm

Maybe you are confusing lines of code and operations.

You’re definitely right, spheres fit not well long objects. And then, how many of these do you render ? And then, what about a near to free culling routine that will avoid 95% of unwanted draw calls and batch state changes, and that will not burden the CPU at all ? Maybe you completely avoid CPU bottlenecks, … if you do, I’d be pleased to know how you managed, I’d be willing to pay actually.

@kitfox : Sphere culling is the best approximation you can find. In most application, the camera moves a lot, and rendering models near frustum but outside it is not a problem. Sometimes it even allows a better texture caching (you uplaod them a few frame vefore they are displayed). Then depending on your application, maybe you don’t want this. Maybe your engine is designed to perform a specific indoor / outdoor / industrial / medical rendering, and then you won’t want the most generic culling scheme available so that it can perform correctly the most of the time.

SeskaPeel.

darkprophet · October 13, 2005, 10:24pm

Lamposts, walls, fences, buildings, dogs, cats, rats, trucks, cars, motorbikes, plants, cupboards, kitchen sinks, ovens, microwaves…do you want me to continue listing? I can go on forever

Ok, heres the code from my sphere test:


Plane[] checkPlanes = Frustum.getPlanes(sphereBound);
for (int i = 0; i < checkPlanes.length; i++) {
  if (checkPlanes[i].pseudodistance(sphereBound.location) < -radius) {
    return Plane.NEGATIVE;
  }  else if (checkPlanes[i].pseudodistance(sphereBound.location) > -radius) {
    return Plane.POSITIVE:
  }
}

and pesudo distance does a dot product on the location, you want the code for that too? The above code clearly shows that its 1 operation for 1 plane and thats 1 dot product, the if comparison doesn’t even count as an operation anymore because they are so inexpensive…

Thats quite funny actually, since every single paper ever written on this subject disagrees with you. You can check for this yourself if you want with an AABB box, go on, draw a cow, or a bunny, or some speakers, or a headphone, or a TV, or a clock tower, or a UFO on paper and see for yourself.

Spheres are the easy way, fastest on the CPU, but not on the GPU. They overcompensate the amount of geometry needed to draw and hence it puts more pressure on the GPU. And since math calculations (such as multiplicaiton and whatnot) are ridiculously cheap, there is no reason not to put less burden on the GPU for a very very small extra step in the CPU. And currently, drawing a triangle is far more expensive than doing a few math additions.

Edit:Fixed Weston’s loop bug in the code above, thanks weston

Annoyed DP

weston · October 14, 2005, 3:58am

SeskaPeel> Maybe you are confusing lines of code and operations.

I think he is basically right… or else you will have to redefine what you mean by operations. As far as the computer is concerned, the dot product itself is hardly a single operation, its 3 multiplications and 2 additions. Also, the if statement is not without cost and should not be ignored. The compare itself is equivalent to a subtraction, but there are reason why just having the if there can slow stuff down (branch penalties). Also, maybe I’m misunderstanding something in the code shown, but it looks like that will only loop through once ever since you are returning in the if and the in the else, so whats the point of the loop?

darkprophet · October 14, 2005, 10:11am

Hehe, your right about the code weston, it was just a simple test I did to see how well spheres coped against 1 plane. Didn’t realise there was a bug there

Operations usually mean to me logical operations and not computer instructions. E.g. dot product is 1 operation, if statement another…etc. But if he meant computer instructions (dont know why I call them that), then your right, 6 operations it is…

Slightly Happier and Less Angry DP

Niwak · October 14, 2005, 7:00pm

You can have a look to http://www.flipcode.com/articles/article_frustumculling-pf.shtml which presents a hybrid method that goes from the cheapest to the most expensive method for optimal refinement.

 Vincent

SeskaPeel · October 14, 2005, 8:03pm

Every single paper written on the subject means that you really read a lot. Anyway, by best approximation, I was meaning the ratio CPU Time / culled geometry. But, obviously, yes, an AABB fits better than a sphere most of the time.

[quote]And since math calculations (such as multiplicaiton and whatnot) are ridiculously cheap, there is no reason not to put less burden on the GPU for a very very small extra step in the CPU. And currently, drawing a triangle is far more expensive than doing a few math additions.
[/quote]
If I were you, I would not be that sure. As I mentionned earlier, it’s quite impossible to avoid a CPU bottleneck at some time in your frame rendering. Every 3D application is somewhat CPU bound. Launch any commercial one, look at the fps, then underclock your CPU, and do the test again, the fps will be lower. But then it depends on what your engine is designed for.

For your list of 3D models, I was talking to the tread owner. And for the op count, I think weston explained my thoughts clearly. Rendering 1K animated LOD models in the view frustum, I can assure you you want a very fast culling routine. One operation more is a lot (kind of +16.66% in the worst case actually).

Don’t be annoyed, we are just disagreeing on one point, nothing more.

SeskaPeel.

Vorax · October 15, 2005, 6:11am

I agree with SeskaPeel and the flipcode article posted above. Sphere checking is very fast, regardless of the geometry. If the sphere passes, then you move to the less efficient tests like bounding box. For scenes with large numbers of objects, this is a huge optimization. It doesn’t matter if it’s shaped like a TV or telephone pole. The worst that happens is you get a false positive from the sphere, then you go and check the box. But for each one the sphere test rules out, you save alot of calculations. The gains tend to far outweigh the losses.

darkprophet · October 15, 2005, 11:38am

So let me get this straight. Your going to check against a Sphere, if its positive, you check against AABB, if thats positive you check against OOBB? Ok…you go and do that

What im saying is that with spheres, you are going to overestimate the amount of geometry your going to draw resulting in the bottleneck being the GPU and not the CPU. I.e. your going to render geometry that shouldn’t be rendered. If you did read what i said above, and implemented it properly, your going to get a very good speed increase that is very close to being as fast as spheres on the CPU, but leaps faster on the GPU because you draw less geometry.

I think ive said all that needs to be said from my side…

DP

Vorax · October 15, 2005, 5:13pm

Yup, you got it.

No, you don’t draw anymore then with only bounds testing. You draw the same amount, but you determine what is occluded faster with sphere testing mixed with bounds testing over bounds testing alone.

Here’s a simpler explanation of why sphere testing mixed with bounds testing is the prefered method:

1000 objects - all shapes and sizes

100 of them actually in the frustum and should be rendered

36 operations required for bounds check

6 operations required for sphere check

WIthout sphere testing:

1000 * 36 operations = 36,000
3600 operations were acutally useful
- TOTAL operations = 36,000

With sphere testing (assuming a very bad case of 50% false positives):

1000 * 6 operations = 6000
500 * 36 = 18,000
- TOTAL operations = 24,000

With sphere testing (assuming a typical case of 25% false positives):

1000 * 6 operations = 6000
250 * 36 = 9,000
- TOTAL operations = 15,000

With sphere testing (assuming optimal case of 0% false positives - lots of roundish objects):

1000 * 6 operations = 6000
100 * 36 = 3,600
- TOTAL operations = 9,600

A bad case still gives you a net savings of 12,000 operations (33% less CPU), a typical case gives you a net savings of 21,000 operations (58% less CPU)

Of course, you wouldn’t need to run all 36 operations every time in any of the cases above (only for totally occluded objects, 900 in the example), you can fail out early. The point of failing out is to not waste CPU, the point of sphere testing is to fail out of the box testing in the first place.