So I started writing my own java assembler for this years competition. Sure, there already are java assemblers out there, but where’s the fun in that?
Here’s my first working hello world application:
CLASS Test java/lang/Object
METHOD main ([Ljava/lang/String;)V
getstatic FIELD java/lang/System out Ljava/io/PrintStream;
ldc "Hello"
invokevirtual METHOD java/io/PrintStream println (Ljava/lang/String;)V
bipush 10
invokestatic METHOD java/lang/System exit (I)V
return
END
When assembling that, the constant pool gets filled with 25(!) entries:
### Compiling with Chompiler ###
Constant pool contains 25 entries
1 Utf8Info -> "Test"
2 ClassInfo -> 1
3 Utf8Info -> "java/lang/Object"
4 ClassInfo -> 3
5 Utf8Info -> "java/lang/System"
6 ClassInfo -> 5
7 Utf8Info -> "out"
8 Utf8Info -> "Ljava/io/PrintStream;"
9 NameAndTypeInfo -> 7, 8
10 FieldrefInfo -> 6, 9
11 Utf8Info -> "Hello"
12 StringInfo -> 11
13 Utf8Info -> "java/io/PrintStream"
14 ClassInfo -> 13
15 Utf8Info -> "println"
16 Utf8Info -> "(Ljava/lang/String;)V"
17 NameAndTypeInfo -> 15, 16
18 MethodrefInfo -> 14, 17
19 Utf8Info -> "exit"
20 Utf8Info -> "(I)V"
21 NameAndTypeInfo -> 19, 20
22 MethodrefInfo -> 6, 21
23 Utf8Info -> "main"
24 Utf8Info -> "([Ljava/lang/String;)V"
25 Utf8Info -> "Code"
The resulting class file is 301 bytes, of which only 14 bytes is actual bytecode. This means that the largest part (by far) of the class file is constant pool data.
The “Code” Utf8 entry is a mandatory field for the bytecode in a method_field entry. This means that naming your class “Code” instead of “A” will actually take up LESS space, as you avoid an extra entry to the constant pool. It’s probably also a good idea to name one of your fields (method or variable) “Code” as well.
Of course, since most 4k games have a main method, you could use that name as well.
Another annoying thing is the duplication of “java/io/Printstream”. I wish the name of a class was in the same format as for field descriptors, as that would save having to duplicate the text in the constant pool.
This happens every time you call a method on a member of any class… first it uses the field descriptor to look up the field, then the class name to invoke the method… the difference between the two being an L in the start and an ; in the end.