Java Continuations and GreenThreads

Version 0.8.10 has been released.

New features

  • New custom scheduler, capable of handling 2-3 million concurrent virtual threads on one native thread with half the scheduling overhead to about 900ns per context switch
  • Support for VirtualThreadLocal
  • Support for message passing mailboxes
  • Reduced memory consumption per virtual thread, through lazy initialization
  • Reduced memory consumption per coroutine, for more efficient packing/unpacking of data (credits to Matthias Mann)

Everybody is encouraged to update to this new version!

Usage

VirtualThreadLocal code sample:


VirtualThreadLocal<String> threadLocal = new VirtualThreadLocal<String>() {
	protected String initialValue() {
		return "undefined";
	}
};

String s = threadLocal.get();
threadLocal.set("initialized");
String t = threadLocal.get();

VirtualMessage code sample:


final VirtualThread threadA = new VirtualThread(new VirtualRunnable() {
	@Override
	public void run() throws SuspendExecution {
		VirtualMessage got = VirtualThread.pollMessage();
		// ... or ...
		VirtualMessage got = VirtualThread.awaitMessage();


		VirtualThread source = got.author();
		Object actualMessage = got.message();
	}
});

final VirtualThread threadB = new VirtualThread(new VirtualRunnable() {
	@Override
	public void run() throws SuspendExecution {
		threadA.passMessage("hello");
	}
});

Download files

For everybody wondering what this library can be used for apart from AI, I just wrote a simple (http) server that was load-tested with a utility called load_http. My server reads the request, sends the response and closes the connection. There is no logic or file I/O involved.

The code is based on NIO, but without Selectors. It will forcefully perform I/O, until the SocketChannel tells the http server that it cannot read/write anything, and cause the VirtualThread to sleep for a bit and try again. Imagine the process like this:


while(data.hasRemaining()) {
   if(socketChannel.write(data) == 0) { 
      VirtualThread.sleep(10);
   }
}

Using this approach, I managed to create a http server that handles literally (slightly) over 9000 http-requests & responses per second, over a 50 second period.

This simply cannot be achieved with NIO Selectors, which get stuck at about 550 requests & responses per second.

Everything runs on 1 native thread. I might multithread the code to see how it improves performance.

I am extremely impressed. Great job Riven O__O

What… you sure? o.O

[EDIT]: I must try this

I can imagine most of you guys will yawn, but on a quad-core, with 4 native threads each running a VirtualProcessor, I managed 12,000 16,000 requests & responses per second (each with its own tcp connection), within a VirtualBox VM running Debian 2.6.32 on a Win7 host.


@@./http_load -parallel 1000 -second 50 urls.txt

813412 fetches, 1000 max parallel, 2.4988e+09 bytes, in 50 seconds
3072 mean bytes/connection
@@16268.2 fetches/sec, 4.9976e+07 bytes/sec
msecs/connect: 5.78194 mean, 9000.63 max, 0.036 min
msecs/first-response: 5.23287 mean, 642.339 max, 0.623 min


http://pastebin.java-gaming.org/a8dea333443 (the core of the beast)

Edit: it turned out the VirtualBox VM was limited to 1 CPU core. After changing that I get 16K instead of 12K fetches/sec.

That IS cool as hell! Still, I find it a bit hard to justify its use in games though. I guess the win lies in how you can design things like AI more than performance. So, when will JGO be hosted with that? ^^

Why is it hard to justify in games? Especially in AI the possibilities are endless. AI with intelligent (!) and complex behaviour is very hard to achieve without flow-control. I agree that for simple AI, this library is indeed over the top.

But it’s not difficult to add the ability for an entity’s AI to “sleep”. Wait-notify is a bit trickier I guess.

Is the library deterministic enough to be used for network lock-stepping? Does it have some amazing feature I’m missing?

It’s not about the ability to sleep. It’s about having inherent state (in local variables, in loops, etc).

To show you an example of a lumberjack AI:



public class Lumberjack extends Human {

	final ResourceConversion makeLogs;

	public Lumberjack() {
		this.name = "lumberjack";

		// ....
	}

	@Override
	public void ai() throws SuspendExecution {

		GameItem lastTree = null;

		while (true) {
			this.survive();

			lastTree = this.gatherWood(lastTree);

			this.dumpWood();

			VirtualThread.sleep(1000);
		}
	}

// this acutally is a method in the class Human
	protected void survive() throws SuspendExecution {
		Vec2 goBackTo = new Vec2(this.position);

		boolean didSomethingElse = false;
		while (this.isThirsty()) {
			this.task = "looking for water";
			GameItem water = this.findNearestWater();
			if (water == null) {
				break;
			}

			didSomethingElse = true;

			this.task = "walking to water";
			moveTo(water.position);

			this.task = "drinking water";
			this.drinkWater(water);
		}

		if (didSomethingElse) {
			this.task = "going back...";
		}
		moveTo(goBackTo);
	}

	public int chop(GameItem tree) throws SuspendExecution {

		int transfer = tree.transfer(TREE, this, 1);
		if (tree.isEmpty()) {
			tree.kill();
		}

		return transfer;
	}

	private GameItem gatherWood(GameItem tree) throws SuspendExecution {
		while (!this.inventory.logs.isFull()) {
			this.survive();

			this.task = "looking for a tree";
			if (tree == null || !tree.isAlive()) {
				tree = this.findNearestTree();
				if (tree == null) {
					break;
				}
			}

			this.task = ((Math.random() < 0.1) ? 't' : 'w') + "alking to a tree";
			this.moveTo(tree.position);

			this.task = "chopping tree";
			while (this.chop(tree) > 0) {
				VirtualThread.sleep(Misc.random(500, 1500));
				this.survive();
			}

			this.task = "chopping tree into logs";
			do {
				VirtualThread.sleep(Misc.random(500, 1500));
				this.survive();
			} while (this.makeLogs.convert(this));
		}
		return tree;
	}

	private void dumpWood() throws SuspendExecution {
		while (!this.inventory.logs.isEmpty()) {
			this.task = "looking for storage";
			GameItem storage = this.findNearestStorageWithFreeSpace();
			if (storage == null) {
				break;
			}

			this.task = "dragging logs to " + storage;
			this.moveTo(storage.position);

			this.survive();

			this.task = "dumping logs at " + storage;
			/* int dumped = */this.transfer(LOG, storage);

			VirtualThread.sleep(1000);
		}
	}

	private GameItem findNearestTree() throws SuspendExecution {
		return Game.nearestProducer(TREE, false, null, this.position);
	}

	private GameItem findNearestStorageWithFreeSpace() throws SuspendExecution {
		GameItem item1 = Game.nearestHolderWithSpace(LOG, null, this.position);
		GameItem item2 = Game.nearestConsumer(LOG, false, null, this.position);

		if (item1 != null && item2 != null) {
			float dist1 = Vec2.distanceSquared(item1.position, this.position);
			float dist2 = Vec2.distanceSquared(item2.position, this.position);
			return dist1 < dist2 ? item1 : item2;
		}

		return (item1 != null) ? item1 : item2;
	}
}

This sample falls in the category of ‘dirt stupid AI’, so continuations are overkill, but at the very least it shows how ‘state’ is stored in (recursive) methods.

“continuations are overkill”

Nothing of the sort. Why would they be more overkill than, say, a loop? Okay, yes, I’ll grant that a continuations library with its kinky bytecode manipulation to get around a language not having native support might not be the first thing to reach for when a native approach would work equally well. But continuations aren’t mysterious exotic things – in fact their main problem is that they’re too low-level for most circumstances, a sort of “functional GOTO” you can build any other control flow construct you want from.

Let’s demystify continuations. This library is a great first step toward that. Interestingly it seems to be following the path of Stackless Python, which started with first-class continuations, then increasingly focused on microthreads until ultimately abandoning callcc-style mechanisms altogether. But I’m also a sucker for CSP primitives too, so this library is awesome any way you slice it.

Yes, VirtualThread scheduling is deterministic.

[quote=“sproingie,post:50,topic:40622”]
In my opinion coroutines and continuations are too lowlevel for just about all developers. The idea that you can store a capture of a method in memory and then execute it multiple times will scare the hell out of just about everybody dealing with C-like languages, although I think it comes reasonably close to an out-of-bounds POSIX fork.

I think this incredibly powerful and awesome functionality, provided by Matthias Mann’s library, is best hidden away from average Joe and put behind a simple abstraction layer that exposes it as a concept that the developer is already familiar with: threads.

The only thing people have got to get used to is co-operative scheduling, which (obviously) isn’t a requirement in a native thread.

I dunno. For someone that knows nothing about either (coop & pre-emptive) I’d think that learning coop is easier…but I suppose it’s all about the API/language integration. As an aside one of the uni’s very active in HotSpot (and other Sun/Oracle JVMs) has implemented coroutines in the JVM (along with a fair number of other nice features like real mixins) (http://ssw.jku.at/Research/Projects/JVM/).

I haven’t (yet) looked at Riven’s source. Serialization? Have you thought about a watchdog thread with time-out as an option?

I’m using this for an export dialog, simply to display progress without needing a state machine for my huge export method that has to run in the GL thread. It isn’t really a big deal to use the agent and put some processing in the build. Just now upgrading to the latest, if you break all my stuff Riven, I kill you! :wink: :stuck_out_tongue: Edit: seems to still work. :slight_smile:

[quote=“Roquen,post:52,topic:40622”]
I wrote a simple Java bytecode interpreter in Java gasp and implementing continuations in that was trivial. After all, you basically just save and change the instruction-pointer (and stack - which contains the execution frames) and restore the stack some time later by simply setting a pointer. The amount of trickely involved to support this with bytecode instrumentation is mind boggling.

[quote=“Roquen,post:52,topic:40622”]
Serialization is already supported. The downside of using a watchdog thread is that when you want to kill a VirtualThread you really have to kill the native thread, which… doesn’t exactly equate to pre-emptive scheduling as it will kill your entire single-threaded application. :slight_smile:

I was thinking more along the lines of (as an example): you have a single real thread running just AIs and one of the fibers goes into an infinite loop (or is simply taking too long)…watchdog kills, the system starts a new thread to continue running (after perhaps calling a “you’ve been a bad boy” method on the offender)…etc.

Again, the problem is that you can only kill a VirtualThread by killing the native thread, and thereby killing all VirtualThreads that run on the VirtualProcessor.

You cannot simply restart the game logic with spawning a new AI thread, as all stacks and execution frames of all VirtualThreads are lost.

It’s like… calling java.lang.Thread.stop() (as that’s actually what happens), you leave the application in undefined state. Finally blocks won’t get executed, referenced objects can be left in corrupt state (only N out of M fields are updated) - in short, you’re pretty likely to leave your application in a state that you don’t even want it to recover from.

I really like this, and i intend to use it too. I remember some years ago i was working on AI bots, and i had to write my own scheduler for them. It would have been much better if i could write code which doesn’t have to go trough the complete bot logic at each iteration.

What about using this for handling players too? Put a certain number of players onto a single native thread and make a VirtualThread for each of the players. I’m evaluating Netty.IO for networking right now, and seems to me that it handles incoming UDP differently than TCP. TCP connection can be delegated out to threads, so each native thread can handle certain number of connections. But UDP packets sent to a single listening port just get delivered to a single thread/channel/pipeline, no matter from where they came. So its the UDP handlers task to sort them out. What if players first got to connect to a specific UDP port, and after authentication they were redirected to resume their traffic with another port? So the server would listen on certain UDP ports, and the players would be distributed over those. At each UDP port there would be a native thread with a handler, and the packets would be delivered to VirtualThreads. I really want to avoid having a single thread for forwarding data to players, since that would mean that all the traffic would have to pass trough one specific queue. (I know this belongs to networking. I was just thinking about how this could be combined with that.)

This is very neat.
Lots of props to You, Sir.

Could it be compiled for Java 6? I’d just pull in the source but it’s in JARs mixed with class files…


javap -verbose VirtualThread.class | grep version
 minor version: 0
 major version: 51

I created a jar with the source for you:

[x] http://indiespot.net/files/projects/continuationslib/continuations-all-src-v0.8.10.jar

That’s the most I can do at this time of day :slight_smile: