Best data structure for triangle collision meshes

theagentd · January 21, 2016, 2:51am

I have a huge number of static triangles forming the collision mesh of the world. I’m currently using a quad tree here, but it’s very slow. There are two main problems I want to solve right now:

Using a quad tree means that it works the best for flat levels, but we’re planing on making levels with more vertical movement. Therefore, an actual 3D structure would be preferable.
The current quad tree is slow as hell. In some extreme cases the quad tree query takes up 80% of the CPU time of the game since Kai helped optimize JOML for the actual collision response for the triangles (thanks Kai!).

It seems like the biggest problem right now is that simply traversing the quad tree is very slow. Finding an alternative data structure sounds like a good idea.

Octree: Well, supports 3D but will be even slower to traverse.
2D grid: Still “flattens” the data structure, won’t work well with vertical levels. Also, the triangles are arbitrarily big and should preferably not have to be split up.
3D grid: Sounds like a great idea, but the triangles are arbitrarily big and should preferably not have to be split up.
BSP tree: Not entirely sure how this would work, but I get the feeling that the “linked list” of nodes in this case would be even higher than for a quad tree, reducing performance even more.
???

Fixing a 3D grid to work with arbitrarily large triangles sounds like the absolute best solution here, but I can’t figure out how to make it work. Duplicating triangles into all grid cells they overlap wouldn’t work unless there’s a way to avoid processing the same triangle once (randomly doubling the collision response of one triangle would break physics). Marking triangles as processed would be hard to multithread, and keeping track of which triangles that have been processed is costly to check against.

Any ideas? :clue:

BurntPizza · January 21, 2016, 3:03am

R-Tree? Should be faster than Quad/Oct-trees I believe, it’s at least highly tune-able. Look at the R* variant with it’s Sort-Tile-Recursive construction method since you can exploit the fact that your data is static. Those are stupid fast for spatial queries in my experience.

Archive · January 21, 2016, 4:12am

for terrain collision in shard master i get the X-Z (Y being up) coords of the player and find the height on the triangle it is currently on at point (X, Z) using


// p1,p2,p3 is vertex1,2,3 of triangle the player is on and x and z and the player's x and z coordinates. Sometimes this
// method returns a value that makes Double.isNan() true so make sure you check for that.
	public static float calcY(Vector p1, Vector p2, Vector p3, float x, float z) {
		float det = (p2.z - p3.z) * (p1.x - p3.x) + (p3.x - p2.x) * (p1.z - p3.z);

		float l1 = ((p2.z - p3.z) * (x - p3.x) + (p3.x - p2.x) * (z - p3.z)) / det;
		float l2 = ((p3.z - p1.z) * (x - p3.x) + (p1.x - p3.x) * (z - p3.z)) / det;
		float l3 = 1.0f - l1 - l2;

		return l1 * p1.y + l2 * p2.y + l3 * p3.y;
	}

KaiHH · January 21, 2016, 8:25am

Can you probably make use of temporal coherence (the player does probably not move from one side of the level to the other side in one physics frame/tick) and associate/cache/store some deeper quadtree node with the player, so that you don’t have to check all triangles in your level every time? Or are huge triangles duplicated into many nodes the real problem?

princec · January 21, 2016, 9:34am

One big issue you might have is that a quadtree implemented in Java is basically scattered all over memory, which instantly removes much of its performance. If you were to move to a packed memory format in a ByteBuffer (Riven’s structs for example or that thing Spasi was working on) you’d get much better performance.

Another thing you might think about is reducing the granularity of your quadtree… no point in getting to the point where you’re storing just a few triangles in each node: it causes an explosion in the size of the quadtree. While that might only mean another couple of nodes in depth to traverse, it also means jumping all over memory even more, and I suspect memory is the slow part.

Cas

EgonOlsen · January 21, 2016, 10:13am

It might be interresting to know how huge ‘huge’ is?

Spasi · January 21, 2016, 11:15am

[quote=“theagentd,post:1,topic:56314”]
Before trying something entirely new, it might be worth optimizing the current structure. My favorite information on the subject is Pitfalls of Object Oriented Programming. Not entirely possible in Java (until we get something like “packed arrays” in Java 10+), but the idea still applies. If you allocate all node data per tree level in one batch, most likely they will end up contiguous in memory. The two Java-specific problems here are a) GC moving the objects around (though that will be done in “batches” too) and b) the object header overhead. Both can be solved with off-heap data, via NIO buffers or a struct framework, but that sacrifices a bit of computation performance (many optimizations are coming in Java 9 that might change this).

princec · January 21, 2016, 11:42am

It’s not inconceivable that you could actually do the quadtree in native code and get the best of both worlds - assuming the impedance mismatch between Java and C isn’t too onerous.

Cas

theagentd · January 21, 2016, 4:11pm

R-trees sound really interesting! I did some Googling, but I couldn’t find any code specific for R*-trees. Do you have any good resources for them?

It’s not a heightmap terrain. The triangles can be at arbitrary positions, so this won’t work.

Triangles aren’t duplicated for each node. I found that placing triangles in the smallest node that fit them placed a crapload of triangles in the root node if they simply overlapped the center of the level. I ended up forcing triangles into the node that they’re centered on and computing the 3D bounds of each quad-tree node. Bounds for each node are used when searching through it. This worked much better for me. From what I’ve read about R-trees this seems a tiny bit similar to what’s done there.

Yeah, it’s usually very temporally coherent, but I’m not sure how to implement it in a sane way. Feels like the caching overhead would be more than the wins.

I also think that the memory accesses are the slow part. I’ve tried to modify the maximum depth and the split threshold to the empirically best values I’ve found.

A big level would be ~600 000 to 1 000 000 triangles.

[quote=“Spasi,post:7,topic:56314”]

Interesting point and slide! I might actually try this out if I have the time! Gonna do some small improvements to try to improve memory performance first to see if it’ll be worth it first.

Hmm… My current quad tree relies on callbacks from the query function. I would really prefer avoiding native code. x__x

BurntPizza · January 21, 2016, 4:57pm

[quote]Do you have any good resources for them?
[/quote]
Mostly just the paper(s) cited on wikipedia: The R* Tree

I forgot that STR bulk loading works on regular R-trees, so I’d just start there; here’s that paper: STR
You know it’s good if NASA came up with it! :point:

Icecore · January 21, 2016, 6:20pm

Why ?)
Do you use any sort of Lod collision mesh?

any tree with dynamic sizes like R-tree
-manually separate big open world on sectors

*use native GPU PhysX XD

up:

You really use triangles in collision? ^^
no one doing so - they change render model to -> collision with primitive parts like sphere, tube, box, etc

Hitbox == collision model

theagentd · January 21, 2016, 7:21pm

Mostly just the paper(s) cited on wikipedia: The R* Tree

I forgot that STR bulk loading works on regular R-trees, so I’d just start there; here’s that paper: STR
You know it’s good if NASA came up with it! :point:
[/quote]
Alright, I’ll look into it, although it’ll probably take me a few days to “absorb” it all. If you know of a small minimal implementation of R* trees that’d probably help me a lot.

This is meant for the terrain. We will use a lower res collision mesh for the terrain, but it will still be very big.

Our objects are already treated as spheres. What I want triangles for is the terrain, which the Source engine most definitely handles the same way as me.

EDIT: Managed to cut a significant chunk of the query CPU time from my quadtree by optimizing memory layout. Each quadtree node had a Bounds3D object keeping track of the 3D bounds of the node. I instead made the node EXTEND Bounds3D meaning the bounds variables were stored together with the node instead of at a separate location. This gave me a nice 18% performance boost.

BurntPizza · January 21, 2016, 7:33pm

[quote]If you know of a small minimal implementation of R* trees that’d probably help me a lot.
[/quote]
The only ones I know of aren’t at all minimal (think enterprise-grade…): [1], [2] and are mostly intended for stuff like geospatial applications.

All you would need is two functions: bulk-load construction (since you have pre-known, static data, no insertion/deletion functionality is required) and intersection query, correct?

theagentd · January 21, 2016, 7:37pm

All you would need is two functions: bulk-load construction (since you have pre-known, static data, no insertion/deletion functionality is required) and intersection query, correct?
[/quote]
Hmm. Although 95%+ of the triangles would be static, there would be some meshes (buildings, some props) that would need to be modified during the game. Being able to insert/delete data would be useful for those, but it would be possible to handle dynamic objects in a separate way as well since so much of the data is static.

BurntPizza · January 21, 2016, 7:41pm

I would have a separate datastructure for dynamic objects if possible, at least when auditioning the R-Tree structure, as the insertion and deletion functions are where the bulk of the complexity is. The 2 query types could be potentially done in parallel as well, although they might be too fast to gain much.

theagentd · January 21, 2016, 7:52pm

Yeah, it sounds good to start with simply constructing the tree.

Let’s see if I’ve gotten this right…

Each node seems to have a fixed number of maximum children, and children can be either triangles (in my case) or other nodes.
Each node’s AABB is the box that can fit all children’s AABBs.
The goal of the splitting algorithm is to minimize overlap between nodes on the same level, avoid elongated volumes and maximize the number of children of each node.
R*-trees differ from R-trees by somehow optimizing the tree gradually by removing random triangles when a node is split to child nodes and reinserting them again.

The big question mark is how the splitting is supposed to work. I think I understand how the resulting structure is supposed to look, and like you said the actually complex part seems to be constructing the tree.

The joke’s on you! I’m already doing multiple queries in parallel on my quad tree. =3 The physics are entirely threaded actually and pretty much scaling linearly with cores, including body-body collision detection and response.

BurntPizza · January 21, 2016, 8:03pm

You pretty much have it right.

Yes.
(just to be explicit) Specifically the minimal AABB that fits the children’s (you knew this)
Minimize node area, perimeter and overlap while maximizing number of children per node
R-Tree: split an overflowing node in place; R*-Tree: when a node overflows, first attempt to remove some of it’s children and insert them elsewhere before having to do an actual split

The good news is that the STR construction handles splitting, indeed probably better than the R*-Tree topological-sort-like split method.
No other splitting needs to be delt with, since your tree would (hopefully) be static.

EDIT: found my old rtree/str post here: http://www.java-gaming.org/topics/what-i-did-today/33622/msg/335424/view.html#msg335424
_basil responds further down the page.

theagentd · January 22, 2016, 1:05am

I decided to do a 2D test implementation to get a feel for it. I’ve got leaf node generation working I think.

Evenly distributed points in a circle

Concentration of points in the center

For evenly distributed points, it works fairly well. It can’t handle concentrations too well, but I think it’ll work well, as an evenly covering terrain is fairly coherent. A 3D implementation would probably be a bit more complicated. From here on I’m a bit unsure about how to proceed. I’ve generated the leaf nodes but there’s little documentation on how to build the rest of the tree downwards from here.

For 2D, I calculate the following values:


int tilesNeeded = divideRoundUp(count, nodeCapacity); //count / nodeCapacity rounded up
int tilesDim = (int)Math.ceil(Math.sqrt(tilesNeeded)); //sqrt(tilesNeeded) rounded up

For 3D, I’d have to replace the sqrt() with cbrt(), but I recon that this will be unoptimal. There’s a big risk that our levels will be elongated in some direction, most likely either very flat, very tall or corridor-like. It seems like modifying the tiling to take this into account would be a good idea. Weighting the number of tiles along each axis by the dimensions of the level volume’s dimensions sounds like it could improve the shape of the leaf nodes of the tree.

Archive · January 22, 2016, 1:09am

sorry i cant give expert opinions (i never have used a tree like this before), however what if you define a few points that are equidistant from each other in the 3D world and have all the triangles that are closer to that point have a flag that says that they are part of Point A, for example, and do this for all other points in the map like Point B,C,etc… And then, all you have to do is test the distanceSquared from your player’s coordinates to all the Points and see which is closest and when you found which point is closest, test against all the triangles that are flagged with the closest Point’s flag.

Hard to explain, I hope you understand what im saying.

princec · January 22, 2016, 9:39am

I can’t see that your terrain triangles will be clustered. They should be perfectly evenly distributed, more or less; the quadtree is your best bet.
If you really want ultimate performance use a ByteBuffer and flyweight to manipulate it. You should be able to get close-to-C++ performance.

Cas