Assembly routines API

Is there an API out there for compiling and executing arbitrary assembly language routines?

Y’know, something like GLSL, but for ordinary assembly language (entirely architecturally dependent of course).

I ask because I currently have the entirely shit task of trying to make ffmpeg compile (using all sorts of black arts) and I noticed there’s a lot of performance assembly routines in there for SIMD and so on for all sorts of different and exotic processors.

Cas :slight_smile:

Why don’t you use JSR 166y to benefit of SIMD? Maybe my suggestion is silly :’(

I’m not clear…what’s the target arch?

All architectures, via plugin assemblers (only one assembler is of course of any use on a particular architecture). The idea is to be able to supply assembler as Strings to this API, which would then compile the code to native machine code, and then you’d be able to dynamically jump into this machine code with method calls. A sort of cross between JNA and the concepts behind GLSL.


Routine r = new Routine(
	"name", 
	// Cheated with quotes here - why doesn't java have this as standard?
"""
	mov #1,a
	jsr blahblah
        inline someroutine
""");
	
r.compile(); // throws CompilerException; otherwise creates a ByteBuffer full of machine code from the native architecture
r.getBytes(); // returns the ByteBuffer with the machine code in it
r.invoke(args); // pushes the various args onto the stack and then switches program counter to the start of the bytebuffer

In that crappily thought out example there, I create a named Routine as a String which I can compile. The name of the routine means I can reference its address in other routines by name (“jsr blahblah”) or inline the machine code from other named Routines (“inline someroutine”).

The invoke() method would use some wizardry to convert ByteBuffers etc. into native pointers and so on.

I just see this as being the ultimate and generic solution to writing high-performance code in Java such as codecs and DSP. You just need the appropriate assembly strings for each architecture you need to deploy on.

Obviously the overhead of calling a Routine for just a few machine code instructions will massively outweigh the benefits, unless the JIT compiler is aware of the technique and inlines or directly jumps to the routine’s code. So you’d probably stick to processing large chunks of data with this sort of approach.

The invoke() method on Routine would have to specify a “contract” per architecture on the state of registers and such afterwards; and notice there’s no “rts” command at the end of it too; the compiler would automatically add one of those for you as appropriate to how it eventually ended up calling the routine.

Cas :slight_smile:

… it just occurred to me you could even plug in a C compiler instead of assembly. The possibilities are endless :slight_smile:

Cas :slight_smile:

That makes my head hurt. That would be a huge a amount of work (don’t know of anything like this).

For decoding: why not use the GPU? :wink:

It is a fair amount of work but a worthy addition of Java to put it more firmly on a footing with C++ in performance in some areas.

Cas :slight_smile:

If mostly for SIMD, then a working OpenCL for java would cover a lot of ground (assuming it support both CPU & GPU code production).

[edit] Not that any OpenCL is a usable state yet.