Faster Than Light JSON Parser

The OS variant is not as much interesting as the CPU instruction set (x86, amd64/x86_64, ARMv?), CPU vendor (AMD, Intel, ARM) and the Java runtime version you are using. Would be nice if you could provide any insight on those. Thanks for testing!

I hope I not mess something up)
Its 2 AM)


public class JsonParserKS6{
	public static interface ParseListener{
		void beginObject();
		void endObject();
		void booleanLiteral(boolean value);
		void numberLiteral(int off, int len);
		void stringLiteral(int off, int len);
		void nullLiteral();
		void beginObjectEntry(int off, int len);
		void beginList();
		void endList();
	}

	final private static byte[] ignore = _to_B("\t\n\r ,:");
	final private static byte[] isNumeric = _to_B("-0123456789");
	final private static byte[] isNumberPart = _to_B("0123456789+-.e");
	final private static byte _ByteIn = 1;

	private static byte[] _to_B(String text){
		byte[] data = new byte[256];
		for(int i = 0; i < text.length(); i++){
			char c = text.charAt(i);
			data[c] = _ByteIn;
		}
		return data;
	}

	private int pos = -1;
	private byte[] input;
	private ParseListener listener;
	private boolean isEscaped(){
		int p = pos;
		boolean escaped = false;
		for(; input[--p] == '\\'; escaped ^= true){}
		return escaped;
	}
	private void json(boolean inobject, boolean cntn){
		do{
			byte pByte;
			do{
				pByte = input[++pos];
			}while(ignore[pByte] == _ByteIn);
			
			int start;
			switch (pByte){
			case '"':
				start = pos + 1;
				while(input[++pos] != '"' || isEscaped()){}
				if(inobject){
					listener.beginObjectEntry(start, pos - start);
					json(false, false);
				}else{
					listener.stringLiteral(start, pos - start);
				}
				break;
			case '[':
				listener.beginList();
				json(false, true);
				break;
			case ']':
				listener.endList();
				return;
			case 'f':
				listener.booleanLiteral(false);
				pos += 4;
				break;
			case 'n':
				listener.nullLiteral();
				pos += 3;
				break;
			case 't':
				listener.booleanLiteral(true);
				pos += 3;
				break;
			case '{':
				listener.beginObject();
				json(true, true);
				break;
			case '}':
				listener.endObject();
				return;
			default:
				if(isNumeric[pByte] == _ByteIn){
					start = pos;
					while(isNumberPart[input[++pos]] == _ByteIn){}
					listener.numberLiteral(start, pos-- - start);
				}
				else{
					new RuntimeException("Unknown Byte=" + pByte + " '" + (char)pByte + "' at pos=" + pos);
				}
				break;
			}
		}while(cntn);
	}
	public int json(byte[] input, ParseListener listener){
		pos = -1;
		this.input = input;
		this.listener = listener;
		json(false, false);
		return pos;
	}
}

I’m afraid you did. :slight_smile: In the repo there is a unit test which you can use to verify the parser works correctly.

i don’t see it(
up: think i find) looks like i fix it(i may be wrong)
up3: nope it was benchmark, i find test only now XD
https://github.com/httpdigest/ftljson/blob/master/test/test/ParserTest.java

  • Test passed on file “json2.json” ^^

p.s hm - maybe JVM give better performance on small functions

The CPU is :

Intel® Core™ i7-4810MQ CPU @ 2.80GHz

Java version is :

java version “1.8.0_111”
Java™ SE Runtime Environment (build 1.8.0_111-b14)
Java HotSpot™ 64-Bit Server VM (build 25.111-b14, mixed mode)

Continued the work on ftljson recently. Now, added support for deserializing JSON into standard ArrayList/HashMap/boxed-primitives.
Code moved to here: https://github.com/ftljson/ftljson.git and https://github.com/ftljson/ftljson-bench.git

When is it useful? When you need really fast low latency JSON deserialization for cold code paths (those not called very often and thus not amenable for C1 or C2 optimization).
To compare with other parsers, so far the fastest other parse I found which is also used by the vast majority of people and other frameworks was Jackson. Of course, ftljson does not feature any of the extensive configuration support that Jackson has. It is more for very simple use-cases.

Comparison against Jackson (with cold code):
Benchmark (file) Mode Cnt Score Error Units DeserializeBenchmark.ftljson card avgt 5 29049,392 ± 22057,822 ns/op DeserializeBenchmark.ftljson widget avgt 5 7052,177 ± 7291,225 ns/op DeserializeBenchmark.ftljson json1 avgt 5 15003,690 ± 11196,802 ns/op DeserializeBenchmark.jackson card avgt 5 7374905,071 ± 63112563,477 ns/op DeserializeBenchmark.jackson widget avgt 5 7265142,002 ± 62411898,220 ns/op DeserializeBenchmark.jackson json1 avgt 5 7679414,695 ± 65708783,481 ns/op ParserBenchmark.ftljson menu avgt 5 521,959 ± 1380,733 ns/op ParserBenchmark.ftljson numbers avgt 5 15682,282 ± 55010,040 ns/op ParserBenchmark.ftljson json2 avgt 5 31761,068 ± 56871,547 ns/op ParserBenchmark.jackson menu avgt 5 162493,944 ± 1332179,760 ns/op ParserBenchmark.jackson numbers avgt 5 5355007,846 ± 45521546,651 ns/op ParserBenchmark.jackson json2 avgt 5 5388351,376 ± 45272262,987 ns/op

Ftljson is of course still preferable for hot code paths when optimized by C2.
Comparison against Jackson (with hot code):
Benchmark (file) Mode Cnt Score Error Units DeserializeBenchmark.ftljson card avgt 5 5734,811 ± 117,428 ns/op DeserializeBenchmark.ftljson widget avgt 5 1689,969 ± 19,116 ns/op DeserializeBenchmark.ftljson json1 avgt 5 3150,746 ± 70,403 ns/op DeserializeBenchmark.jackson card avgt 5 8339,463 ± 89,051 ns/op DeserializeBenchmark.jackson widget avgt 5 2529,571 ± 67,576 ns/op DeserializeBenchmark.jackson json1 avgt 5 4621,365 ± 1900,784 ns/op ParserBenchmark.ftljson menu avgt 5 346,498 ± 5,205 ns/op ParserBenchmark.ftljson numbers avgt 5 6625,824 ± 58,905 ns/op ParserBenchmark.ftljson json2 avgt 5 9861,452 ± 206,383 ns/op ParserBenchmark.jackson menu avgt 5 1351,376 ± 11,298 ns/op ParserBenchmark.jackson numbers avgt 5 14040,713 ± 167,340 ns/op ParserBenchmark.jackson json2 avgt 5 24857,699 ± 637,563 ns/op

EDIT:
Current performance figures with ftljson, jackson, jsoniter and gson compared, all deserializing into the same ArrayList/Map structures:
Benchmark (file) Mode Cnt Score Error Units DeserializeBenchmark.ftljson json1 avgt 5 3395,772 ± 391,756 ns/op DeserializeBenchmark.ftljson numbers avgt 5 12982,493 ± 274,072 ns/op DeserializeBenchmark.ftljson widget avgt 5 1755,446 ± 139,775 ns/op DeserializeBenchmark.ftljson json2 avgt 5 26088,057 ± 584,068 ns/op DeserializeBenchmark.ftljson escaped avgt 5 1985,217 ± 195,366 ns/op DeserializeBenchmark.gson json1 avgt 5 7075,436 ± 680,854 ns/op DeserializeBenchmark.gson numbers avgt 5 21269,264 ± 641,374 ns/op DeserializeBenchmark.gson widget avgt 5 3250,237 ± 447,465 ns/op DeserializeBenchmark.gson json2 avgt 5 58476,449 ± 831,019 ns/op DeserializeBenchmark.gson escaped avgt 5 3946,888 ± 108,950 ns/op DeserializeBenchmark.jackson json1 avgt 5 5453,855 ± 81,828 ns/op DeserializeBenchmark.jackson numbers avgt 5 22333,027 ± 221,605 ns/op DeserializeBenchmark.jackson widget avgt 5 2844,529 ± 77,761 ns/op DeserializeBenchmark.jackson json2 avgt 5 41457,379 ± 1126,078 ns/op DeserializeBenchmark.jackson escaped avgt 5 3019,624 ± 46,316 ns/op DeserializeBenchmark.jsoniter json1 avgt 5 10297,648 ± 1116,762 ns/op DeserializeBenchmark.jsoniter numbers avgt 5 59582,209 ± 8466,662 ns/op DeserializeBenchmark.jsoniter widget avgt 5 3073,891 ± 391,261 ns/op DeserializeBenchmark.jsoniter json2 avgt 5 69771,919 ± 8477,439 ns/op DeserializeBenchmark.jsoniter escaped avgt 5 2351,572 ± 262,206 ns/op