Awesome SAT! 7-8 ms for 60k objects.

matheus23 · November 19, 2012, 4:01pm

Forgot a quote:

Eh, this is collision detection… You need it in almost every game…

The advantage of this engine is (or better the algorithm: SAT):
You can use any rotated / scaled / translated / however formed polygons, instead of only circles or AABB’s (Axis aligned bounding boxes…)

And one more thing I forgot in the last post:

Eehh… It’s not really optimized… not sure, why it’s that fast ;D

And I already linked the source, it’s on github. If you talk about the test project, here is the source on pastebin:
http://www.java-gaming.org/?action=pastebin&id=314

This code class only uses LWJGL and the linked Utils class (see github link).

EDIT: Oh… sorry I just doubleposted…
Anyways: I just looked at your project link, theagentd
Forgive me, but it somehow looks like something similar to java.util.concurrent… There are executers in there too

But the stuff with id generation and these atomic integers look intresting again. Is it somehow supposed to at least somehow be similiar to the OpenCL pipeline?

matheus23 · November 19, 2012, 8:02pm

Eh… Sorry for triple-posting, but this is an update:

I now added Circle vs polygon Collision checking too.

The source is on pastebin.

You can download the working demo here

I’m having 5-6 ms time, but about every second a three times as long GC pause Have to profile this…

theagentd · November 19, 2012, 9:55pm

I cleaned out some old functionality (I was missing a Vec2 function in drawCircle() and the SpriteBatch did nothing?) and ran it with the server VM which gave a small performance boost.

matheus23:

Whaaaa! Multithreading!
Yeah. I’ve already worked with mutithreading (I had a game running simulation and rendering in two different thread one time… I even think it was my first game? Yes it was… intresting… never did it again ;D )

Yeah, even I do have more than 1 core… I’ve got 8 logical and 4 physical cores, and yeah I’ve got hyperthreading, and yeah It’d speed everything up… And finally yeah, it would be very nerdy… But intresting…

I don’t know how you’d go about doing this… I mean, what exactly would you perform twice or more at the same time?

Splitting up the polygons to check into 2 different threads:
Meeeh… this could give very strange behavior if you want to do something with those collisions, since reactions would be run on 2 different threads (or even more…)… Also, in a real game example each polygon has to calculate collisions with each other polygon, which would give much much harder thread-safety problems…

And I can’t think of another way…

How the SAT works:
first check circle vs. circle (not using sqrt().)…
and then: for each axis of the 2 to-check polygons: (axis = normal)
calculate dot product of each vertex and the normal,

These are all calculations done…
So yeah, trigonometry is calculated, but it’s only the dot product, so only + and *…
And finally:
About your threading library: I’ve already seen it once… I didn’t take a look back then, but how exactly does it work, theagentBigDecimal (kidding ;D )?

Anyways, now thank you all for your intrest, I really appreciate this

Multithreading is easy as long as you avoid all synchronization. For a real game it’s easy to do thread the biggest parts of the collision detection/response per object. Just divide the objects into chunks and process each chunk in different threads.

In a real game you might have a flow looking somewhat like this:

Clear quad tree and add all objects to it.
For each object:
1. Query area around object for other objects.
2. For each neighboring object:
  1. Check if they intersect.
  2. If they collide, react to the collision.

The first part, building the quad tree is very hard to thread, since we’d need to use a synchronized() block (or some other form of synchronization) every time we modify the same resource from multiple threads, which obviously is for each object. Since the quad tree can only be modified by one thread at a time, it doesn’t make much sense to thread it. Hopefully it will not be our main bottleneck, and we might be able to do other things like rendering terrain or updating particles at the same time.

In the first loop, for each object we will query the neighboring area from the quad tree and get a list over the neighbors (bad) or a callback for each neighbor (good). Such a method is inherently thread-safe since it only reads information from the quad tree (though it is possible to screw it up by using instance/static variables to store temporary variables). We then run the collision detection (SAT) against each neighbor. For each collision we need to calculate a collision response. I’m assuming that we’re making a 2D physics engine or something here, in which case we calculate a force to apply to both objects. Now, assuming we have eliminated duplicate collisions (X collides with Y, but Y also collides with X) we a concurrency problem. Since we’re calculating the force for a collision pair, we need to apply this force to both objects in opposite directions. To write data, we need synchronization since each object can potentially affect every other object, but even synchronization won’t help if you want 100% deterministic results. Even with synchronization, the force from multiple collisions with the same object will be applied to the object in a random order, and due to how floating point numbers work, (a + b) + c might not be exactly the same as (b + c) + a. So even if we completely kill parallelism for the collision response, we’re still having a certain level of randomness here.

The solution is pretty simple, and pretty stupid. Instead of relying on synchronization, we just do some duplicate work instead. We let each iteration of the loop only write force to the object we’re updating instead of both objects in a pair. That eliminates synchronization since we’re only writing data to one object per iteration. The addition order is also solved since the collision response forces are added in the same order as the neighbors are returned from the quad tree (which depends solely on the quad tree). The cost of course is that we’re doing the collision detection “twice”. X now collides with Y (force added to X) AND Y also collides with X (force added to Y). Note that everything isn’t actually done twice. The neighbor query returns the exact same thing, and only after that will we be able to remove duplicate collisions*. That means that to be able to skip duplicate collisions, we’re already reading the data of all neighboring objects, regardless of whether or not we’ll actually do a collision check with them. Remember that the program as it is is already memory limited, so using multiple threads will allow the available bandwidth to be better used.

for example by comparing an ID number between the objects. If X.id > Y.id then run collision detection. When Y detects the same collision, it’ll quickly skip that collision check since the Y.id > X.id is not true.

I implemented multithreading in your TestSAT program, basically I made X number of threads process every Xth object in the loop below:

for (int i = 0; i < polys.length; i++) {
     if (Collision.polyVsPoly(quad, polys[i])) {
          collisions[i] = true;
          colliding = true;
     }
}

I used my threading library for it, so you’ll need it in your class path to test it: http://www.java-gaming.org/?action=pastebin&id=316 (look at initMT() and tick() )

My threading library is a lot more powerful (for games) than the built-in Java executors. In my library you tell your executor to run a TaskTree, which contains your full game logic and rendering code split up into Task objects. By setting conditions like “X can only be run after Y and Z”, you can create “synchronization points”. The executor then runs all Tasks that are independent of each other in separate threads without any further setup. For the above collision detection threading example, you’d create a TaskTree with two tasks, a SimpleTask (a task run in one thread) for constructing the quad tree, and a SplitTask for checking collisions which requires the first task to be completed. If we were to add for example a third task for updating particles which is completely independent to out two collision-related tasks, the executor will run it in parallel to the other two without you having to setup anything other than that.

matheus23 · November 19, 2012, 10:10pm

:o :o :o

Appreciate him guys!!! Appreciate! He deserved it, seriously :o

Thank you so much… this just gave me so much information and motivation to continue. Again, thank you so much

Also, judging from this:

theagentd:

My threading library is a lot more powerful (for games) than the built-in Java executors. In my library you tell your executor to run a TaskTree, which contains your full game logic and rendering code split up into Task objects. By setting conditions like “X can only be run after Y and Z”, you can create “synchronization points”. The executor then runs all Tasks that are independent of each other in separate threads without any further setup. For the above collision detection threading example, you’d create a TaskTree with two tasks, a SimpleTask (a task run in one thread) for constructing the quad tree, and a SplitTask for checking collisions which requires the first task to be completed. If we were to add for example a third task for updating particles which is completely independent to out two collision-related tasks, the executor will run it in parallel to the other two without you having to setup anything other than that.

Your library sound pretty revolutionary… I mean… in the future we will have 64-core 5GHz cpu’s, so…
Also I’m pretty sure this makes Mutithreading and especially designing much easier

One more thing:

Yeah… the sprite batch wasn’t used later, because it wasn’t able to draw line loops…
And the Vec2 function (I guess you are talking about “rotate()”?) should be pushed to github now :persecutioncomplex:

theagentd · November 19, 2012, 11:03pm

Thanks, I’ve been trying to convince people to use that library for a while… >_>

It’s not really revolutionary. It’s just promoting a scalable way of multithreading games. You still have to code according to that way of thinking. It doesn’t just magically cause everything to run on different cores, nor does it ensure any synchronization whatsoever. The library just does what you tell it to do with the Tasks it gets. YOU have to ensure that the Tasks are thread safe and won’t crash.

EDIT:
Here’s a benchmark of your code vs my hacked in multithreading on my hyperthreaded quad-core. It’s not really optimal, some things like clearing the buffers and all is still single-threaded.

100 000 objects: Single: 12ms, 8 threads: 4ms.
500 000 objects: Single: 59-60ms, 8 threads: 18-19ms.

Around 3x performance for just around 20 lines of extra code to setup everything.

ClickerMonkey · November 20, 2012, 1:38am

Perhaps we should make a competition with this… given a set of scenarios… who can create the best spatial database?

Metrics

Speed
Memory Usage
Add/Remove/Update Entity Speed

Factors

Speed of moving objects
Percent of moving vs stationary objects
Functionality
search( Shape, Offset, entity types to test against, callback )
nearest( Shape, Offset, maximum entities, OR maximum distance, entity types to test against, callback )

Huh, huh, huh?

ra4king · November 20, 2012, 5:55am

theagentd, you’re just a god aren’t you.

EDIT: where’s the link to your threading library again?

EDIT 2: Found it! http://code.google.com/p/small-java-threading-library/

namrog84 · November 20, 2012, 7:30am

I’m gonna try and integrate this into the game I am currently working on!
good job

theagentd · November 20, 2012, 10:20am

Thanks! There’s a new version up on the project page now with slightly better performance and some new features. If you have any questions you know where to find me. =S I still haven’t got exactly how Google Code works; the SVN trunk isn’t updated and I couldn’t find out how to make it host the Javadoc online. There’s a .rar for the Javadoc though (and it’s available from Eclipse since the source code is in the .jar library file).

sproingie · November 20, 2012, 4:55pm

Between 13 and 20 for 175,000 objects. Cranked it up to a million, averaged about 110ms

i7 3770 3.5GHz (no OC for me here), GTX690

theagentd · November 20, 2012, 5:47pm

Words cannot describe my jealousy! I hope micro-stuttering ruins your day!

looks at my own GTX 295

SHIT!

sproingie · November 20, 2012, 6:45pm

;D

I’m told micro-stutter is really only noticeable for that card at higher resolutions, and I just have the one 1080 monitor. Besides, you can disable SLI on it now.

Funny thing is, I’ve just been playing Torchlight II lately, hardly a GPU-killer.

matheus23 · November 20, 2012, 6:49pm

Ohly bad, that GPU power won’t give you any performance for this demo
(Or at least not from what is print)

Only cpu-calculation time is measured :D, but the cpu is very good, so… (I should run the multi-threaded version from theagentd with turbo enabled :persecutioncomplex: 4-cored 8-logically-cored 3.7 GHz Xenon… :persecutioncomplex: )

theagentd · November 20, 2012, 7:21pm

It depends on the load on the GPU. If the game is CPU limited, which it will be for a few years, there won’t be any stuttering. However, as time passes games will get more graphically intense, so it’ll get worse and worse. Besides, microstuttering doesn’t mean anything when you get like 120+ FPS in every single game…