Faster Than Light JSON Parser

KaiHH · July 3, 2017, 6:30pm

Probably everyone has done this at some point :), but here is my take on an event/push-based SAX-like JSON parser which I needed for a library I am working on.

I needed a very fast parser working on a byte[] input that just provides me with events about structural JSON elements so that a consumer can implement a hierarchical model on that. The parser had to be as straight-forward and as fuzz-less as possible. No abstraction, no internal AST building, nothing.

First I tried a JavaCC-generated one, which ultimately became the bottleneck of the whole library (needless to say that the requirements also ruled out any other JSON parsing library since most of them also handle marshalling/POJO-mapping which I did not need).
So I opted for a minimal, complete handwritten solution.
I also post this in the hope that someone will probably provide an even more optimized solution. I could really use it!

EDIT:
All parsers developed so far can be found here: https://github.com/httpdigest/ftljson/tree/master/src/ftljson

Sickan · July 3, 2017, 9:02pm

That is some well thought out code, really shows what can be done when performance is the priority. Besides, JSON validation should be done on the dev machine, not on the user’s machine.

KaiHH · July 5, 2017, 7:51pm

Since the first post I gratefully got lots of help and improvements from @Spasi. He and I optimized the parser to get to an absolute top-peak performance. The code of the initial post now shows the latest and fastest version of the parser.

On a Ryzen 1800X it achieves almost 1.1 Gigabyte per second of parsing performance (measured with a JMH benchmark reading a ~9KB file as byte[] array on JDK9 b176). The parser is now also able to handle escape sequences in object keys and string literals correctly, and to handle numbers in scientific notation. Of course, any actual conversion to a numeric type representation still needs to be done by the client.

I am pretty certain that you will not find any faster Java/JVM-language-based parser than this.
It is even 2x as fast as noggit, which claims to be “the world’s fastest streaming JSON parser for Java”.

Abuse · July 5, 2017, 10:20pm

Really minor tweak;

byte c in isescaped(…) should be an int. (As it is being used as a counter, not intermediary storage of a value copied from input)

…and the method itself should be named isEscaped(…) :point:

I’d also decide whether the class is going to be stateful, or not.
At the moment you’re storing pos as state, but passing the byte[] & ParseListener around as parameters.

Go with one, or the other; doing both looks messy.
Given the aim is minimal weight & dependencies, I’d be inclined to make it stateless so everything can be static.
Either way, I doubt it’d have any noticeable impact upon performance.

princec · July 5, 2017, 10:42pm

Now make it do HJSON

Cas

Spasi · July 6, 2017, 8:32am

This mix of functional/stateless parameter passing and internal state is by design. Because the JVM is the leakiest abstraction in modern software development. Possibly only rivaled by SQL query optimizers.

The problem starts with bytecode size. More bytecode in a method == harder for the JVM to optimize.
Once you start dealing with bytecode size, you need to extract repeating code and corner cases to other methods.
It all ends when you need to return two values from a method. You can’t do that without state in Java.

There’s a major difference between input/listener and pos. The former is external state, the latter private/internal. The JVM sees that it has full control over pos and is able to perform the full range of optimizations. It’s also likely that the entire parser instance can be eliminated via escape analysis.

Sickan · July 6, 2017, 10:27am

As it operates on bytes, it doesn’t support non-ASCII characters, no?

KaiHH · July 6, 2017, 10:32am

It does support the UTF-8 character coding, since all recognized tokens (", :, 0-9, etc.) are also in the ASCII range (highest bit not set). Or in other words: There will not be any byte of the 1-4 bytes in a single UTF-8 character encoding that can be misinterpreted as a recognized token by the parser, since all such bytes use the highest order bits to indicate that there are more bytes to follow. Quite clever of them to design the character coding this way.
So you can use any UTF-8 encoded character you like in object key names and string value literals.

Abuse · July 6, 2017, 11:31am

Makes sense; so ParseInterface#stringLiteral(int off, int len) is the number of bytes that make up the literal, not necessarily the number of characters.
Should make sure the ParseInterface javadoc makes that clear.

Abuse · July 6, 2017, 11:50am

Going back to isEscaped; granted it’s a fringe case, but rather than counting the escaped characters, just toggle a boolean?

It would eliminate the limitation of 2 billion sequential escaped characters :persecutioncomplex:
It might be quicker. (not that it’d ever be noticeable outside of the degenerate case of every character in every literal being ‘’ )
It demonstrates the intent of the code a little more obviously


    private boolean isEscaped(byte[] input) {
        int p = pos - 1;
        boolean escaped = false;
        while (read(input, p--) == '\\')
            escaped=!escaped;
        return escaped;
    }

Though looking at the bytecode generated by javac 8, it appears that negating booleans is done with a branch now!? (I believe it used to be done with an xor)
Presumably that’s something optimized by the JIT.

KaiHH · July 6, 2017, 12:04pm

Thanks! I went with this:


int p = pos;
boolean c = false;
for (; read(input, --p) == '\\'; c ^= true);
return c;

XOR’ing a boolean with true is quite faster (benchmarked with lots of escapes) than the not operator, since the latter incurs a branch.

Abuse · July 6, 2017, 12:34pm

I’m almost certain that a single case switch…

         switch (c) {
         case '"':
            string(input, listener, inobject);
            continue;
         }

…will be slower than a simple if branch.
The reason being even a tableswitch involves a branch to check bounds (both upper & lower).

[quote=https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html#jvms-6.5.tableswitch]The index must be of type int and is popped from the operand stack. If index is less than low or index is greater than high, then a target address is calculated by adding default to the address of the opcode of this tableswitch instruction. Otherwise, the offset at position index - low of the jump table is extracted. The target address is calculated by adding that offset to the address of the opcode of this tableswitch instruction. Execution then continues at the target address.
[/quote]
Of course your benchmarking might have found otherwise?

#flibble dibble silly 90% quote content check#

KaiHH · July 6, 2017, 12:40pm

There is a GitHub project now with a benchmark suite /src/bench/B.java using various test data JSON files.
Simply ./mvnw compile exec:java if you want to give it a go:
https://github.com/httpdigest/ftljson
Running the benchmark class as Java application also works very nicely in IntelliJ if you would like to hack and try an even faster version.

Riven · July 6, 2017, 8:15pm

XOR may be faster, but there is no reason that NOT would incur a branch.

It’s just an unconditional bit-flip, unless HotSpot screws up. :clue:

KaiHH · July 6, 2017, 11:21pm

The quest for ultimate performance is not over, yet…
Two more tricks gave another ~8% performance increase:

Replace the switches for the “numeric” and “ignore” cases, which were just used to decide true/false, with a byte[256] table lookup (benefit: reduced bytecode size)
Implement that table via a ByteBuffer and do the lookup with sun.misc.Unsafe.getByte() to get rid of the default Java array access bounds checks.

Source of the parser: https://github.com/httpdigest/ftljson/blob/master/src/ftljson/JsonParserKS9.java
Companion Unsafe class: https://github.com/httpdigest/ftljson/blob/master/src/ftljson/Unsafe.java

Now, I’m even getting 1.12 Gigabytes per second on my i7-3820QM DDR3-1600 notebook, which was earlier only possible on a high-end desktop class PC.

Abuse · July 6, 2017, 11:45pm

[quote=“KaiHH,post:15,topic:58728”]
Wow, you really are trusting the integrity of your data :clue:

KaiHH · July 7, 2017, 12:01am

Well, not more than trusting that (aByte & 0xFF) can only return values in the range 0-255.

CoDi_R · July 7, 2017, 11:28am

Interesting topic. FYI if I read the bechmark results right, ks5/6 are faster than ks7/8/9 here on my MacBook Pro 2013, Java version “1.8.0_112”.

abcdef · July 7, 2017, 6:59pm

I had 5 and 6 fastest overall on Linux Mint

abcdef · July 7, 2017, 7:29pm

I created a new test based on 5 and changed these two methods


        private static boolean ignore(byte b) {
        return (b<=32 || b == 44 || b == 58);
	}
	private static boolean isNumberPart(byte c) {
        return ((c>=48 && c <=57) || c ==43 || c==45 || c==46 || c==69|| c==101);
	}

The times improved further still on 5.