Multi threaded game loop

I’m building an Entity Component System framework, and I want it to be able to implement multiple cores on a computer. It’s not that any of the games I have in mind actually NEED the power of multiple cores. I just want to be able to give my framework that option since multi-core processing interests me.

My current update loop looks something like…

http://s8.postimg.org/ui6592dtx/single.png

Here, every function updates a part of the World, whether it’s an Entity, or a Property of the World. Every Function is called a System, and has a priority integer. The World sorts the order in which each System is stored based on its priority to ensure that certain Systems are called at a particular time in the update loop. Since some Systems have the same priority, however, they are open to being processed concurrently since the order in which they are invoked is arbitrary.

I WANT my game loop to look something like…

http://s8.postimg.org/mqpfai9ol/multi.png

Here, Systems with the same priority will be executed concurrently. My problem is that I don’t know how to conserve Threads. I’ve heard that it’s inefficient to create threads on the fly only to have them be immediately discarded, so what I want is to be able to reuse threads. I’m not exactly thread savvy, you see. I would like to be able to disable threads that are not in use, but not have them be discarded.

I want to write something like…



// If there's more than one System stored...
if(size > 1)
{
    // Particular GameThread
    GameThread threadly;

    // Range of Systems with the same priority
    int lowerBound=0, upperBound;

    // For all Systems stored in a World...
    for(int i=1; i<size; i++)
    {
        // If found priority upper bound...
        if(systems[i].priority() != systems[i-1].priority())
        {
            // Track upper bound
            upperBound = i;

            // If upperBound - lowerBound == 1, then we've found a System with a distinct priority.  Just run the sucker! :)
            if(upperBound - lowerBound == 1)
            {
                systems[i-1].execute();
            }

            // Otherwise, we're dealing with multiple Systems with the same Priority.  Time to program concurrently!
            else
            {
                // Have GameThreads process Systems of same priorities concurrently
                for(int j=lowerBound; j<upperBound; j++)
                {
                    // Assign a System to a waiting GameThread, and have the GameThread execute
                    threadly = GameThreads.getThread();
                    threadly.setSystem(systems[j]);
                    threadly.execute();
                }

                // Wait for threads to finish updating somehow, then continue with the World's update loop

            }

            // Renew lower bound
            lowerBound = upperBound;
        } 
    }
}

// Otherwise, just execute the lonely sucker on our current thread :)
else if(size != 0)
    systems[0].execute();

What this should do is group Systems of the same priority together, and have them execute concurrently.

Course, this is just theoretical code. What the hell is a GameThread, huh? It’s some sort of Thread, that’s for sure, and it is expecting a single System object as an argument. This is similar to a Thread expecting a Runnable. I’m not too sure how I would implement something like this with methods like wait(), notify(), or notifyAll(). My only knowledge on threads is how to use them for, say, loading resources, and game loops.

I’m developing a threading library which does almost exactly what you want. You can create a number of tasks (= what you call a system), then you can set up dependencies between those tasks: “Task A can only be run after Task B and Task C have completed.” Then you can run a set of tasks on a multithreaded executor which will assign tasks to different threads if it finds tasks that can be run in parallel. Setting up the dependencies of your multi-core example tree would take around 20 lines of code (not including the code for the implementation of the tasks of course, only setting up dependencies). Interested? ^^

[quote]My problem is that I don’t know how to conserve Threads. I’ve heard that it’s inefficient to create threads on the fly only to have them be immediately discarded, so what I want is to be able to reuse threads. I’m not exactly thread savvy, you see. I would like to be able to disable threads that are not in use, but not have them be discarded.
[/quote]
Have you looked at ThreadPoolExecutor? Lots of good info about handling thread pools in general at this url:
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html

I’m not very thread savvy either, but I think if a thread completes its task, there is little more that can be done to disable them further other than to discard them.

I just learned that one can determine the number of processors at runtime:

int N_CPUS = Runtime.getRuntime().availableprocessors();

Am looking at my copy of “Java Concurrency in Practice” by Goetz, et al.

Seems like one could code in some forks and joins into a processing loop, based on the number the processors available, as long as the items being updated don’t affect each other. But I haven’t experimented with it yet. I only have a dual core, though.

You know what, I’m pretty sure I am! Are you suggesting that you’ll distribute said library?

Have you looked at ThreadPoolExecutor? Lots of good info about handling thread pools in general at this url:
http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ThreadPoolExecutor.html

I’m not very thread savvy either, but I think if a thread completes its task, there is little more that can be done to disable them further other than to discard them.

I just learned that one can determine the number of processors at runtime:

int N_CPUS = Runtime.getRuntime().availableprocessors();

Am looking at my copy of “Java Concurrency in Practice” by Goetz, et al.

Seems like one could code in some forks and joins into a processing loop, based on the number the processors available, as long as the items being updated don’t affect each other. But I haven’t experimented with it yet. I only have a dual core, though.
[/quote]
Good link, friend! I was thinking that join() or wait() might be the key. I don’t think that having a dual core will really change your concept of Threads, though. Concurrency existed in programming before the advent of multiple cores, but it required constant switching between threads. Still does if there are more threads than cores, I suppose. Runtime.getRuntime().availableprocessors() Might be useful if I don’t want to make more threads than cores for obvious reasons.

Instead of separated movement, collision and gravity systems I would rather have one physics system.
How you do make sure that tasks like for example player and enemy don’t get in each others way when operating on the game world ?

Cool! Although the library is available online, I’ve added a few new features and (hopefully) fixed a few bugs so it’s a bit outdated now. If you’d like you can add me on Skype (same username as here) and I can send you the latest build, or if you don’t have Skype I can give you a download link. It’d be great if you’d be willing to help me test it a bit more!

I’m currently using it to accelerate frustum culling and skeleton animation in my 3D engine. In some (allegedly tailored) cases I can get almost 3x higher FPS on using all 4 cores. The average scaling is a bit lower (closer to 2x) because a big part of the engine isn’t threaded yet. I hope to improve that a bit of course.

Breaking things down like you have will actually slow things down for most systems and make things more complicated.

Instead use a different threads for sound, game loop, AI, and such.

You’re going to have to be specific about conflicts between a player and an enemy. I’ve worked on an ECS before, and I never had too many issues. It can be hassle though, which is why you need to know what you’re doing when building one. The intricacies brought upon by an ECS are unfortunate, but necessary in the name of code reuse/flexibility.

I’m splitting up my implementation so that I can use it as an “example”, friend. This doesn’t actually represent my code. I’m developing an ECS which includes a lot of modularization. The design creates a negligible performance hit, and leaves my code extremely flexible. Not to mention that it’s great for utilizing multiple cores. And yes, it is complicated. It’s meant to make a complicated game.

Well, you seem to know more about making games than I do seeing as you’re working on (2+1)D games. I’m not exactly, to quote myself, “savvy” at Skype, but I’d be happy to figure it out later this week. I’m studying for my Principles of Programming exam. Just point me in the right direction, and I’ll get back to you later. I’d love to test what you’ve built, if I’m capable at least. If not, I can observe what you’ve made and treat it as a puzzle :slight_smile:

First of all, you did not have any concurrency issues.
I recommend working through a book like “Java Concurrency in Practice” before starting on this.

Ah, well game objects cannot be manipulated concurrently, of course. If a thread wants to manipulate an Entity while another already is, it will set it aside and process it later. Course, I could always lock the thread until the entity is free, but I don’t want too many pauses. Assuming concurrency across game entities is not an issue, as long as the order of two types of processes is arbitrary, they can be run concurrently. If I needed the player’s code to always run after the enemy’s, I would set the priority of the player’s process to a higher integer than the enemy’s.

[quote]Ah, well game objects cannot be manipulated concurrently, of course.
[/quote]
Maybe. Maybe not.

I was fooling around with this stuff a while back and made a series of crude little programs based on what was started on this thread. Each thread has its own sleep time amount, no game loop involved.

There are lots of links to different Actor systems referenced on that thread that should still be worth a look, as well as input from theagentd on his framework.

The most interesting thing to me (on my program) was the collision detection system. Objects test for collision by looking to see if “a space is occupied” rather than any sort of querying of moving or space-occupying objects or subset of these objects as done in various bin or quad-tree systems. I did my best to minimize blocking, by keeping what synchronization was being done to a very local minimum. Can’t recall the details–should look at it again.

I recently ran across this framework, runs on either Java or Clojure (“Quasar” or “Pulsar”) and can handle 10K spaceships shooting at each other, each on its own “fiber”.
http://blog.paralleluniverse.co/post/49445260575/quasar-pulsar
This post has a demo:
“No More Callbacks: 10,000 Actors, 10,000 Threads, 10,000 Spaceships”
http://blog.paralleluniverse.co

I haven’t read it closely yet, but I think the collision detection uses a database, which implies (to me) that they have also set something up where they look at the “space around the object” rather than at lists of objects. They don’t seem to be having much of an issue with blocking.

I’m dabbling with Clojure at the moment, using it as my LISP while viewing the CISP lectures. I’m hoping it will help give me better “functional programming” chops–ability to think using FP building blocks. But for the most part, it is not clear what the practical benefits to game design/programming might be. All this might just be a recondite form of “procrastination”.

Maybe. Maybe not.

I was fooling around with this stuff a while back and made a series of crude little programs based on what was started on this thread. Each thread has its own sleep time amount, no game loop involved.

There are lots of links to different Actor systems referenced on that thread that should still be worth a look, as well as input from theagentd on his framework.

The most interesting thing to me (on my program) was the collision detection system. Objects test for collision by looking to see if “a space is occupied” rather than any sort of querying of moving or space-occupying objects or subset of these objects as done in various bin or quad-tree systems. I did my best to minimize blocking, by keeping what synchronization was being done to a very local minimum. Can’t recall the details–should look at it again.

I recently ran across this framework, runs on either Java or Clojure (“Quasar” or “Pulsar”) and can handle 10K spaceships shooting at each other, each on its own “fiber”.
http://blog.paralleluniverse.co/post/49445260575/quasar-pulsar
This post has a demo:
“No More Callbacks: 10,000 Actors, 10,000 Threads, 10,000 Spaceships”
http://blog.paralleluniverse.co

I haven’t read it closely yet, but I think the collision detection uses a database, which implies (to me) that they have also set something up where they look at the “space around the object” rather than at lists of objects. They don’t seem to be having much of an issue with blocking.

I’m dabbling with Clojure at the moment, using it as my LISP while viewing the CISP lectures. I’m hoping it will help give me better “functional programming” chops–ability to think using FP building blocks. But for the most part, it is not clear what the practical benefits to game design/programming might be. All this might just be a recondite form of “procrastination”.
[/quote]
I would argue that the creation of all programming languages for the sole purpose of experimentation are a form of procrastination. And I’ll admit, I’m just wanting to learn concurrent programming just for educational purposes. I don’t expect any HUGE boosts to come my frameworks way. Just trying to get on the Functional and Concurrent bandwagon while its new. But 10,000 threads? Huh, do we really need that many threads when there are only 4 cores max these days? Or did they simply do that to demonstrate the power of their framework? Maybe, just MAYBE I should check out that link.

GPUs can have almost 3000 cores nowadays, and those cores need multiple threads each to hide the memory latency of GPU memory. ^^

Oh your god! I didn’t realize that. Well, I knew a gpu could do things concurrently, but I didn’t know it had that many cores! By the way, I managed to create a class that executes multiple processes concurrently. It also recycles the threads like I wanted. It took a lot of waits and interrupts which I wasn’t exactly happy about, but I got it done. If my process is something expensive like…


/**
 * Message task
 */
class MessageTask implements Runnable
{
	// Name of message
	private String name;
	
	/**
	 * Instantiates MessageTask
	 */
	public MessageTask(String name)
	{
		this.name = name;
	}
	
	@Override
	public void run()
	{
		String cat = "";
		
		// Concatenates string.  StringBuilder is shaking its head...
		for(int i=0; i<=100; i++)
		{
			cat += i;
			if(i%10 == 0)
				cat += '\n';
		}
		
		// Outputs name
		if(name != null)
			System.out.println(name + " finished");
	}
}

I can see performance boosts of 3x or higher when running this process 4x at once. These results make sense since I have a quad core laptop.

I would write something like…


// Constructs multiprocessor
MultiProcessor processor = new MultiProcessor();

// Adds tasks to execute in parallel
processor.add(new MessageTask("Task 1")).add(new MessageTask("Task 2")).add(new MessageTask("Task 3")).add(new MessageTask("Task 4"));

// Processes tasks.  This may be invoked as many times as the user pleases.  Current thread is halted until processing is finished.
// I may make the halting of the current thread optional.
processor.process();

// When I want to discard the threads that are being used, I invoke...
processor.discardThreads();

// HOWEVER!  Processor is not dead!  It can be recycled!
processor.process();

// And new threads are created once more...

FYI, the processor uses the invoking thread as well as extra threads necessary for concurrency. If I add 4 tasks, 3 new threads are created. It’s a
(thread-1) per task deal. I still haven’t written the code that handles priority, but I’ve done the hard part, yessir. I’ll probably leave MultiProcessor as a utility class, and handle priorities elsewhere.

Crap, it turns out I wasn’t doing my benchmark properly. My multithreaded implementation is actually SLOWER than the single core implementation. This is a bit of a pain… Dunno WHY it would run slower. Is there a penalty for invoking interrupt() on a thread?

Hard to say without knowing what you actually did.
Maybe your measuring includes thread creation times, or there is no warm-up phase, or code gets optimized, etc.

Code’s optimized. I write something like…


numProcessing = tSize;

// For all threads...
for(int i=0; i<tSize; i++)
{
    // Restart them.  NORMALLY I would start the threads like a normal person, but pretend like they are already active and are in a waiting state.
    threads[i].interrupt();
}

// If threads are still processing (and they probably are...)
if(numProcessing != 0)
{
    try
    {
        // Wait till all threads are done
        wait();
    }
    catch (InterruptedException ie)
    {
        // Interrupted when all of the threads are done.  The last thread done will interrupt this thread.  Interrupts when numProcessing is 0.
    }
}

But this runs faster…


for(int i=0; i<rSize; i++)
{
    runnables[i].run();
}

I can promise you that I have warmed up the jvm before testing this. I actually do the test over and over every 20 milli or so in my program.
I’ve made the runnable I supply be very wasteful. Lots of string concatenation, sqrt calls, whatever I can waste, I’ll use. I do this to make the overhead of multithreading more negligible.

Signalling and waiting for threads has some overhead. Unless the work you split up between the threads is significantly larger than the overhead you won’t see sny scaling.

Snap. Ok, how about I leave the threads that are not in use in an infinite loop where they sleep for a very small period of time. Let’s say, 1ms? The threads will continue when they have returned to a processing state. (AKA, the processing flag is set to true). The bad part about this is that threads won’t know exactly when to start processing again, and will only do so when they ‘get around to it’.