Thinking about data formats - would appreciate opinions.

I’m getting ready to start creating some of the basic data for my game - enemies, items the player can use/equip etc. and I’m trying to decide on the best format for this data.

I know that JSON is really popular these days and it has clear advantages:

  1. Fast processing
  2. decent out of the box support in libGDX

but there are also things I don’t like about it. Despite what everyone on the internet seems to think, I don’t find it as easy to read as XML - or, at least, well-designed XML. XML also seems to have much better support for metadata than JSON does. I know XML is verbose, but I genuinely don’t care (I’ve never really understood the strong aversion to verbosity in XML or in Java) - unless someone knows of a really good technical reason to go with a less verbose format.

XML also has XPath. The ability to perform a query on the data is important to me and I know that XPath is good for this and, I believe, can be done reasonably quickly via a cheeky SAX parse. There is, of course, JSONPath - is that any good?

The other option is to use SQLite, which has all of the above advantages (clean structure, metadata support and queryable), but the downside of a performance and memory overhead. It’s a nice technology, though, but I suspect that either XML or JSON would perform better.

Anyway, I’m interested in the thoughts of the internet - so please weigh in. I’m expecting a lot of support for JSON, which is fine - it may well be the best solution, although please don’t just recommend it because it’s the most popular. I’ve found a number of discussions online about this where that seems to be the main argument for JSON - just that it’s the latest and greatest thing.

Obviously, if you’re aware of yet another solution I’d be really interested to hear about it.

I’m quite fond of my THJSON parser :slight_smile: Most of the advantages of XML, most of the advantages of JSON.

Cas :slight_smile:

Thanks - I’ll take a look!

Edit: Glad to see someone else who finds JSON “hurts the eyes”

If you like XML, then use XML.

How often is your game going to be reading these files? How large are they? How often do you, as a human, need to read or edit them?

My guess is they aren’t going to be huge, and you aren’t going to be reading them (either via code or as a human) very often, so you aren’t going to notice a huge difference in “performance” no matter what you choose.

So go with whatever you like the best.

They won’t be all that large and they should only be read at the beginning of each level during initialisation.

Not that I am saying it is finished yet, but if you like markup languages have a look at my TreeML, more compact than JSON and it has a nifty little schema language and a very basic path language (currently only really used in my unit tests).

Well… It all depends on your intention. Tell me more about what you want to do with it.

How often do you want to get data from your filesystem? For what? Do you want to do high freq queries on your storage? Do you want to modify your storage data at runtime?

For simple configuration files I prefer json or yaml (simplicity)
For sending data over the wire in human-friendly format I prefer JSON (ofc not if we want good performance :))
For configuration/data in interop with external software it’s possible that xml would be better (it has advantage for transforms and sticking to scheme)
For querying, I would use relational database
No-sql would be also good idea, but it’s tricky (depends what you need)
Maybe CSV?

About JSON vs XML -> I’ve found working with JSON much faster and simplier than XML, but sometimes you just can’t do this or it’s better to use xml. JSON is a popular serialization format for sending messages over the wire and I think that we know why :smiley:

My view is pretty simplistic. I don’t have to load any new libraries in order to use XML. Also, my programs are not pushing any envelopes (at most, just trying to ensure sufficient efficiencies for good audio processing).

I would think JSON would make the most sense in scenarios where one is either interacting with the JavaScript world or sending/receiving information from non-local URLs. If it is well integrated into LibGDX, that could also be a good reason to go for it, since overhead for handling the code is there anyway.

I agree it is nice to have the data that is saved/loaded in an easily readable form.

It really depends an the data regarding:
-who is editing / generating the data
-how often is the data read (loading in bulk to memory, or regularly querying it)
-is the data used to serialize state/objects and send it to another system over the network
-how important is it to be able to manually edit the data (favoring thus any text-based format)
-how much data it there (a small config file vs a JGO Database Dump)
-how many content providers have to agree on a format (making validation / schema necessary)
-how important is it to be able to manually read the data for debugging purposes (using a more verbose format early in development)

One example, that I use for processing numeric data, such as balancing data (Serverside):

I use Excel/OpenOffice and edit the data in a spreadsheet.
then I save it to an ods file (the native OpenOffice format)
This is basically just a zip file containing all the spreadsheed data, and (for my purpose unimportant) meta and format data
Then I just directly load the spreadsheet in Java, unpack it, get the “content.xml”, use the default DocumentBuilder to parse it, and extract my data as needed

The advantage it, that I can directly edit the data in Excel or OpenOffice, and just hit save (confusingly being a floppy disk button)
Voila, the data can be loaded directly.

(The easy implementation here is to just parse a csv, but I have often forgot to save to the original spreadsheet afterwards, so its less fail-proof)

Editing XML or even JSON can get tiresome when using a lot of numeric fields. Here a spreadsheet is a way better presentation.
Plus, I can apply logic (formulas) and evaluation directly on the data, while editing it.

When creating level data (topography, eg, a matrix of a lot of regular data, in a few layers) I whould use a simple CSV, or compressed binary data, when it gets too big.
When creating level data, that contains a list of many objects, each having a deeper hierarchy of sub-objects and their attributes, I would use XML (JSON would be an option, but has some drawbacks here)
When creating state-data, that has to be send over the network, I would use something like JSON at first, but then move to create a custom serialization later to reduce the total size.

SQL Databases:
I would not use a SQL database unless:
-there is a LOT of data, that should not be kept in memory
-there is a need to QUERY the data, and not just read all data in a big chunk
-serialization would be too complicated to do it with a trivial data-dump to a binary file

OK, to give a little more context.

The data is going to be used at the beginning of every level and only at the beginning. The game is based on a proc gen dungeon and I want to have some sort of data file(s) to list all the possible mobs and possible items.

It absolutely does need to be queryable and that’s actually causing the sticking point. In XML, you can fairly easily run an XPath query on a DOM, but that requires having a DOM which not only uses memory but is slow(ish) to create. Alternatively, I can use a SAX parse with a subset of XPath (adequate) and gain performance, but having to use a 3rd party library - defeating one of the (admittedly minor) benefits of XML. There is JSONPath, but, again, it requires 3rd party tools.

For example, each mob has a minimum depth attribute. So, if a mob has a MD of 10 then it mustn’t appear in levels 1 - 9. So, at the beginning of each level, I’m going to do a query along the lines of “get me all the mobs with a minDepth >= x”

Hope that adds some clarity!

Reckon your best bet is to a) use the data simply as persistence, not as a database - load it into the heap in a useful queryable POJO structure and b) use thjson which is precisely designed for this exact task you had in mind :stuck_out_tongue:

Cas :slight_smile:

I’ll have a play around with THJSON and see if I can figure out how to use the classes (no docs ;-p) - by the time I’ve written a couple of simple test classes, it should be obvious if it’s the solution for me.