Java's built in Scripting engine

nsigma · November 13, 2013, 11:13am

Thanks! ;D

Possibly, except for a (&%£ license incompatibility.

I’m not sure I could achieve quite what I want without the boilerplate that Janino provides anyway.

Ah, I see you like playing with fire!

The Janino ClassLoader already has the ability to provide a ProtectionDomain for classes compiled through it. That would appear to be the way to do this, but I’ve never tried.

Exactly. The timer issue is that you then need a second monitoring thread, which potentially brings in thread switching issues - in particular one thing I’m looking at doing at the moment is live coding audio DSP. Extra threads could be an issue there.

The bigger thing is what to do with a thread stuck in a loop. Call stop() on it? That would have to be very carefully thought out if anything’s shared. Would have thought that would be even worse for you - could have more serious effects on a server!

CommanderKeith · November 13, 2013, 12:42pm

Hmm, interesting problem about the thread stopping. I guess Thread.interrupt won’t work in an infinite loop.
I’ve never used it but how about byte code injection? That seems to be a way of adding arbitrary code over the top of someone else’s code. Maybe you could inject some code that checks a flag at the start of every loop to see if the thread should die??
Another thing, why is it an issue for you if your user breaks his own program by making an infinite loop?

Janino is slowly adopting more recent java language specs, see the change log: http://janino.net/changelog.html
and future directions: http://docs.codehaus.org/display/JANINO/Licensing

Great, I didn’t know that Janino had security features. I tried searching about it and found this:
http://dist.codehaus.org/janino/javadoc/org/codehaus/janino/SimpleCompiler.html#setParentClassLoader(java.lang.ClassLoader, java.lang.Class[])
There’s not much more information/documentation than that. I’ll have to dig into the janino source to figure out how to use it.

EDIT: Sorry I missed the point about ProtectionDomains. Cool, i’ll look into that too: http://dist.codehaus.org/janino/javadoc/org/codehaus/commons/compiler/AbstractJavaSourceClassLoader.ProtectionDomainFactory.html#getProtectionDomain(java.lang.String)

Cheers,
Keith

Roquen · November 13, 2013, 1:32pm

On preventing run-away scripts. An other option is to do code weaving and have the script itself bail if it taking too long. If you’ve never goofed with compilers this might be too much of a time commitment. Luckily with janino (or javac framework) you can manipulate the AST instead of asm. The upside is that you could do other code inspection and weaving. Say disallowing most calls of ‘new’ or whatever else.

Humm…I’m not familiar with the ProtectionDomain class and glancing at the source it looks like it would take longer for me to figure out than just writing a classloader. If this notion isn’t clear…shout out.

nsigma · November 13, 2013, 2:13pm

Simply that it isn’t very user friendly! Praxis LIVE is a graphical patcher / dataflow environment where you can add fragments of code (Java, GLSL, etc.) to the processing graph at runtime. The idea is that you can incrementally change and back out code as it runs. Entering an infinite loop would completely stall the media pipeline, forcing a restart and losing some changes. If someone writes a while(true) loop without thinking, they get what they deserve - as you implied, some infinite loops are more subtle.

I may consider a protected environment, probably as Roquen suggests using AST manipulation rather than byte-code manipulation, if I can get it to run without losing too much performance. It’s not a high-priority though.

Slowly is unfortunately the word though.

In the current sources I was playing with yesterday this is marked deprecated, and is also a no-op. JavaDoc comment says “Auxiliary classes never really worked… don’t use them.”

Abuse · November 13, 2013, 5:07pm

I suppose a small plug for Starsector (formerly Starfarer) would not be entirely inappropriate here, given its use of Janino in its scripting & extensive modding support.

I don’t know if Alex (the game’s creator) frequents these forums, but he could undoubtedly cast some valuable light upon the usability & security of using Janino in this scenario.

Riven · November 13, 2013, 8:33pm

With serverside 3rd party code, you have to not only worry about cpu-cycle consuming runaway scripts. Memory consuming scripts are much more dangerous, because you basically can’t gracefully recover from code like this:


String s = "";
while(true) {
   s += " ";
}

as random, unrelated, critical code will start throwing OutOfMemoryErrors - it will very likely pull your service down when the GC panics, and maybe even the entire server, due to excessive swapping.

With ASM, you can rewrite bytecode to intercept every new and anew instruction, and manage the allocation count your script is allowed to reach. It’s not easy to make this water tight, because almost all the JRE classes expose methods that do allocations behind the scenes (like StringBuilder.append, as per the above example)

I’d propose the 3rd party code to run in a separate, bolted down JVM, which can be nuked from orbit when it misbehaves. Not very practical, but what ya gonna do. With a bit of trickery, you could use MappedByteBuffers to share a read-only view of your business objects. (if they are backed by buffers - where are structs when you need 'em)

CommanderKeith · November 14, 2013, 9:33am

Great tips guys, I hadn’t heard of AST or ASM. Both look very useful.

Great idea, thanks Roquen. I’ll try this approach first.

Hmm, i never thought of that problem, thanks Riven. I’ll put your code through AST and see if it’s possible to detect the implicit string allocation.

Lol, that’s too bad.

Great, I’ll drop him a note on his forum/mailing list. Cheers

Roquen · November 14, 2013, 11:14am

@Riven’s comment: This is related to disallowing almost all ‘new’ invocations in user scripts (the almost part is to allow things like boxing). Have all instances come from SDK calls. Disallow StringBuilder and StringBuffer in the script classloader. In fact disallow pretty much every class that isn’t from the SDK (the SDK’s classloader can handle arbitrary classes) and required for basic operation.

On code weaving: I was initially thinking of weaving in a method call (which checks for timeout and tosses an exception if needed) at the entry of all user defined methods (at least to start with…you can get clever later) and at the top of loops. But if your inspecting for user-defined loops…you could just disallow them and only allow iterating on SDK provided collections.

Riven · November 15, 2013, 6:57am

That still doesn’t quite cover: [icode]while(true) list.add(null);[/icode]

You could create a callback prior to every new, newarray, anewarray, multianewarray, invoke*, goto, goto_w, if*, if_*, jsr, ret and athrow though. That way you can be reasonably sure you can intercept a runaway script, assuming that it can only create a limited amount of instances as you limit the number of instructions that can be executed, without being at the mercy of the OS thread scheduler.

Anyway, this is getting a bit offtopic. :-X

Roquen · November 15, 2013, 7:57am

Sure it does: while(true) is an error if you made this choice. and so is list.add(…).

(EDIT: and I’m also talking about looking at the AST level prior to the lowering to bytecodes)

Riven · November 15, 2013, 8:12am

Well, yeah, if you made the choice to disallow all loops and most method calls all together, then yes, you’re safe :persecutioncomplex:

Roquen · November 15, 2013, 9:10am

The goal of scripting is to allow building of behaviors and modification of game state data and not general computing. I think what I’m suggesting has a fair amount of merit and is more flexible that what you’ll see in most commercial games that provide an end-user scripting language. You’re allowing the janino (or javac) folks do the heavy lifting of conversion of source to AST and lowering to bytecodes. The JVM folks are taking care of converting bytecodes to native. You only have to write a custom classloader which only allows permitted classes to the scripter and an AST visitor to handles any weaving and subseting of java. This should be fairly easy to keep up-to-date with any changes the janino/javac folks do. On loops it seems a reasonable thing to disallow esp. if the scripts are running server side and the SDK provides some iterators such as [icode]entitiesWithinRadius(…)[/icode].

If you have these basics locked-down and working rock solid you could get fancy and add in work-arounds for some of these limitations by providing SDK calls.

CommanderKeith · November 15, 2013, 5:10pm

I downloaded eclipse and the AST plugin,
followed the instructions to get AST working in my program independently of eclipse:
http://www.programcreek.com/2011/01/a-complete-standalone-example-of-astparser/
Managed to analyse some code using the org.eclipse.jdt.core.dom.ASTVisitor class. There are heaps of ‘visit…’ methods which can be used to detect things, the javadocs are here:

If I analyse this simple file:


package eclipseast;
public class TestFile {
	public TestFile(){
		// a constructor
	}
	public boolean someMethod(){
		TestFile testFile = new TestFile();
		while (true){
			Object obj = new Object();
			break;
		}
		String str = "hi";
		return str.startsWith("hi there");
	}
}

It produces this output:


Line 1: ASTNode of type CompilationUnit
Line 1: ASTNode of type PackageDeclaration
Line 1: ASTNode of type SimpleName named 'eclipseast'
Line 2: ASTNode of type TypeDeclaration
Line 2: ASTNode of type Modifier
Line 2: ASTNode of type SimpleName named 'TestFile'
Line 3: ASTNode of type MethodDeclaration
Line 3: ASTNode of type Modifier
Line 3: ASTNode of type SimpleName named 'TestFile'
Line 3: ASTNode of type Block
Line 6: ASTNode of type MethodDeclaration
Line 6: ASTNode of type Modifier
Line 6: ASTNode of type PrimitiveType named 'boolean'
Line 6: ASTNode of type SimpleName named 'someMethod'
Line 6: ASTNode of type Block
Line 7: ASTNode of type VariableDeclarationStatement
Line 7: ASTNode of type SimpleType named 'TestFile'
Line 7: ASTNode of type SimpleName named 'TestFile'
Line 7: ASTNode of type VariableDeclarationFragment named 'testFile'
Line 7: ASTNode of type SimpleName named 'testFile'
Line 7: ASTNode of type ClassInstanceCreation named 'TestFile'
Line 7: ASTNode of type SimpleType named 'TestFile'
Line 7: ASTNode of type SimpleName named 'TestFile'
Line 8: ASTNode of type WhileStatement
Line 8: ASTNode of type BooleanLiteral
Line 8: ASTNode of type Block
Line 9: ASTNode of type VariableDeclarationStatement
Line 9: ASTNode of type SimpleType named 'Object'
Line 9: ASTNode of type SimpleName named 'Object'
Line 9: ASTNode of type VariableDeclarationFragment named 'obj'
Line 9: ASTNode of type SimpleName named 'obj'
Line 9: ASTNode of type ClassInstanceCreation named 'Object'
Line 9: ASTNode of type SimpleType named 'Object'
Line 9: ASTNode of type SimpleName named 'Object'
Line 10: ASTNode of type BreakStatement
Line 12: ASTNode of type VariableDeclarationStatement
Line 12: ASTNode of type SimpleType named 'String'
Line 12: ASTNode of type SimpleName named 'String'
Line 12: ASTNode of type VariableDeclarationFragment named 'str'
Line 12: ASTNode of type SimpleName named 'str'
Line 12: ASTNode of type StringLiteral
Line 13: ASTNode of type ReturnStatement
Line 13: ASTNode of type MethodInvocation named 'startsWith'
Line 13: ASTNode of type SimpleName named 'str'
Line 13: ASTNode of type SimpleName named 'startsWith'
Line 13: ASTNode of type StringLiteral

The zip file with 2 simple java source files and the Eclipse EDT jar files that produce the output above is here if anyone wants to mess with it: https://dl.dropboxusercontent.com/u/50479250/EclipseAST.zip

I think it’s feasible for me to weave time-checking methods into the start of all loop iterations (WhileStatement, ForStatement, …), method calls (MethodInvocation) and others which allows me to terminate the 3rd party script in case it’s carrying on for too long.

I could restrict access to all methods and classes except some that will be allowed, for example anything to do with game state or the basic classes like String, ArrayList and others. This can be done by querying the MethodInvocation or StringLiteral or other relevant ASTNode about what class of object is being created/having its method called.

But about controlling the problem of memory over-allocation, I don’t think that I can easily restrict it by looking at the source code using this AST method. I mean how can I know that String[] stringArray = new String[Math.pow(100000,10000)]; is going to make a humongous String array just by looking at the source? I’m wondering if you guys know another method of monitoring memory allocation, perhaps using ASM and bytecode?

Cheers,
Keith

Riven · November 15, 2013, 5:33pm

With ASM you can alter the bytecode to first DUP the value at the top of the stack, pass it to another method to verify the arraylength is ‘sane’ prior to executing the NEWARRAY/ANEWARRAY opcode. That way your sanity checks happen at runtime - as that’s the only time mallicious input can be detected.

As for Roquen’s Iterable strategy, that provides a loophole where you can nest enhanced-for loops into oblivion as a means to create a near-infinte loop, which can be used to stage the aformentioned attacks, while all by themselves creating tonnes of Iterator instances that are strongly reachable.

Bytecode manipulation through ASM is just as easy as messing about with AST, whilst being way more powerful and effective, and less restrictive to the scripter. Let them have their custom loops

CommanderKeith · November 15, 2013, 5:42pm

Good idea, thanks! I’ll give it a go and report back.

Danny02 · November 15, 2013, 7:36pm

Am I the only one thinking about the halting problem here?
Or does it not apply, because we don’t want to prove anything and only set restrictions?

Riven · November 15, 2013, 8:24pm

That’s why you set a limit on (conditional) jumps through bytecode transformation.

Danny02 · November 15, 2013, 8:28pm

what about recursion, is that included(not only tailrec)?

Riven · November 15, 2013, 8:42pm

Please read the thread :point:

[quote=“Riven,post:29,topic:45008”]

Roquen · November 15, 2013, 8:48pm

Seems to be a minor bit of mis-communication. By AST I meant whatever AST/Visitor functionality the choosen compiler based provides and not any specific library…so janino has this in it, so does javac and (yes) eclipse.

I’m suggesting that the user not be allocate memory for themselves…ever. They can only get an instance of something from code you provide. That’s the only way you can insure a bound on allocations. Related to that is why I’m suggestion you not give them access to any standard java classes, beyond those pretty much required: like primitive wrappers and String. Every class you allow them direct access to needs to be inspected for any potential holes and you need to redo that work any time you upgrade the JVM. Recall what I said above…you don’t need to provide general computation framework…just things they need to be able to define behaviors and modify accessible game states.

I doesn’t matter what the script does…if it takes too long it’s forced to stop. No code analysis is needed.

Good point. Back to weaving in a checks at top of loops. The reason I thought it would be good to avoid this is so the server could run fewer serializing instructions (reading the counter).

In the cases we’re currently talking about the choice doesn’t really matter too much.

Pfffff…