XHON: eXtensible Human Object Notation

Hello everyone!

After roughly 5 weeks of writing and lots & lots of thinking and tinkering, I finally finished the file format specification for a format I invented: XHON.
The name is a amalgation of ‘XML’ and ‘JSON’ and means ‘eXtensible Human Object Notation’.

I created XHON because I wanted a format like JSON, but with comments, graphs, s-expressions and a few more things. Since extending JSON is not possible due to it being standardized and all other formats (like YAML, Hjson, EDN and XML) are either way too damn complicated, use significant whitespace (which I absolutely hate) or have weird limitations like not supporting comments (Why JSON, why?), I decided to invent my own format for my projects that doesn’t have these problems; the result being XHON.

XHON is a superset of JSON and is thus fully backwards compatible to it but at the same time extending it with quite a bunch of features.

To give you a basic idea on whats in store, here is what XHON has on top of JSON:

  • Quoteless Keys
  • Comments (which can be stacked)
  • A more complete numerical type (that supports NaN and Infinity).
  • Strings that can go over multiple lines.
  • A byte array type for embedding binary data inside a plain text format.
  • An actual datetime type loosely based on the Joda-Time library.
  • A special ‘unit’ type for numbers to allow things like 200m².
  • A key/value type that allows duplication of keys and empty keys.
  • A type for writing paths made of elements instead of just strings.
  • A s-expression type ala LISP, making it possible to store expressions w/o strings.
  • A graph type for storing undirected/directed graphs in a format similar to TGF.
  • Parser function callbacks to allow easy extension and extension by macros.

…and all of this while still keeping the whole thing easely readable by both humans and computers.

Right now there is no parser implemented for XHON, though SHC is being awesome and is working on one!

Following is the link to the 19 page long PDF file that is the specification (hosted on google drive):

The specification is feature frozen, so the only thing left to do is to fix grammatical mistakes and (possibly) logical errors in the definitions.
So, if you happen to read the specification and stumble upon a grammar mistake or a logical error, please tell me!

Thank you and have a nice day!

  • Lars Longor K

PS: I’m pretty darn happy that I managed to finish this specification. Yay!

The reason .json is so bad when editing it by hand is because it was not designed for that purpose. It was meant to be used to pass data around applications. The data in .json is supposed to be program generated, not hand written.

Are you going to have an example implementation with some code examples? It would be nice to have code examples of situations where certain aspects of the language are useful.

SHC is working on a prototype implementation currently (he’s learning to write parsers).

Sadly, I am very (very) bad with thinking of examples, so it might take quite a while until I have some good examples.

EDIT: I will start implementing an actual parser as reference implementation tomorrow.

Go for registration with IANA! :wink:
Would be an awesome experience if your media type got accepted. Who knows.

Just a few little things that caught my attention:

  1. Please back all of your syntactic elements with a grammar definition

Currently, only your “number” type contains an EBNF grammar. It would be great if you had a grammar for every syntactic element in the language and then, optimally as appendix, the full EBNF grammar. Then, people could implement parsers for your language in a matter of ten minutes. Also, having a more formal definition of the syntax would rule out any misunderstanding and ambiguities that might occur when just reading the prose text and the examples.
Another advantage of having a grammar is that your claim of “easely readable by both humans and computers” can actually be proven (namely by implementing a parser that accepts the grammar generating the language).

  1. Is a Java “type structure” really necessary?

You show some Java “type structure” examples introducing Java classes “Element”, “NullElement”, “BooleanElement”, and so on. I believe you impose some subtyping relationship semantics onto your language which are not strictly necessary. It would suffice for your language to only have syntactic specifications in the form of an EBNF grammar, and not having semantics borrowed from and expressed in Java.
In my opinion, the Java examples would just confuse people not knowing Java and wanting to implement parsers for your language in C, for example.
A type system and semantic analysis is only necessary when you can express sentences in your language that are allowed syntactically but not semantically. And this seems not to be the case with your data format. A specification of the syntax is sufficient.

I’m not even going to think about submitting a registration request. Just take a look at that formular! Not gonna do that.
Yes, I will put a grammar definition to any and all elements (tomorrow); and remove the java examples while I do so.

It’s a good thing I learned EBNF a few weeks ago!

Edit: I stayed up until 3:30 in the morning and added grammar definitions to pretty much all elements; Updated PDF, link is the same.