XML files are GIGANTIC, especially if your player files are going to have a lot of data in them. They can get up to ~40kb with the amount of bytes written for everything.
I personally use Markus Persson’s NBT (Named Binary Tag) system, and if I don’t feel like implementing that, I just go through a simple GZIP compression processor with the Java API. NBT takes advantage of the 4kb Windows file header (assuming you’re using Windows, it auto-adds a 4kb header to every file) and then the data inside of it is also well-compressed and small.
Uh, they can get up to 40 gigs if you want, or bigger yet. There’s no property of XML that makes a certain size intrinsic to them. The average D&D character sheet will be perhaps a couple of kilobytes in any reasonable schema I can think of.
You can be penny wise and pound foolish and squeeze every last byte out of a file format you’re reading maybe a half dozen of once on startup, or you can use something that’s appropriate to the purpose. There’s no single right answer, and a lot of it depends on what you’re putting into the files, how fast you need to read and write them, and the constraints of the device you’re running it on. I wouldn’t store minecraft chunks in XML any more than I would store a character sheet in ASN.1 PER.
Obviously they can be as large as they need be, I thought it’d be obvious that I was talking in context of the OP’s situation
I used to use XStream, which allowed you to serialize Java objects into XML files, making the process of game player saving easy. Just that files got huge
Personally I have been experimenting with serialization A LOT lately. I like that I can instantiate an object fill its fields tweek its values then save it to a file or send it over any stream (including a socket).
I wrote a couple classes last week to facilitate loading and saving of serialized objects. If you would like I can post the source.
I’d actually love to see this since we’re going to be using serialization for our level editor. Would be really cool to see another approach, and we’ll post ours when it’s done as well.
Serialization is fast, but it’s awesomely brittle. Any change to the class, and you can no longer deserialize it without littering your class with weird schema evolution code that you can’t put anywhere else. It’s good for short-term caching and shoving objects over the wire, but it’s highly inappropriate as a long term persistence format.
If you use XML and reflection for a level storage mechanism, and then decide to change the constructor arguments of the class you’re trying to instantiate, isn’t this just as brittle?
I could argue the exact opposite. If your object is at its final revision then there is no chance of breaking. Now if you had to do an update you could keep the old version of the code and load the object into that class then construct an object from the new class then re-serialize it and re-write it to the same file
That is far from the only way you serialize to and from XML or JSON, and in fact it isn’t even a typical method for doing it. I’ve never used constructor-based initialization for my serialized objects, and not reflection either.
Keep the old version of the class in the same program? Or maintain a migration app for every prior version? Neither is anything I’d want to maintain. Whatever though, I’m not really inclined to go on a big advocacy tear, since none of this is my code.
I’m pretty sure this is just be being ignorant, I need to do more research on JSON and such things. Do you have an example or a description of how it actually works and how it avoids the maintenance problem of java serialization?
I agree with you completely that serialization looks to me like it was 100% designed for storing and reconstructing objects at run time, because there is no chance of the structure of your program changing. I’m just extremely new to the technology and can’t fathom any way that a file with some object data, XML or other wise, could somehow be okay if you make any changes to the class internally. Have you ever used JSON personally, and is the API comparable to java serialization if all I’m trying to do is export and import some data?
Your intuition is mostly correct. There is no silver bullet that will automatically handle serialization when you make changes to your classes.
This problem overlaps with API design. If you only add to your API then that is the easiest case to handle and you should be able to read your old files with little or no modification to your code.
Once you start changing or removing stuff then it becomes tricker. Either spend a lot of work on writing code that can upgrade your serialized data to the current version or decide that it’s not worth the effort and break backwards compatibility.
Is this why java deprecates old functions in its API, instead of outright removing them? Seems like a case of not wanting to break backwards compatibility.
The problem with breaking backwards compatibility for game development is that it’s inconceivable that every single class we’re ever going to need to serialize with our level editor is going to be at its final revision when we go into it. Just seems impossible.
Serialization only attempts to preserve data in serializable fields, correct? Is there a list somewhere of what changes made to a class will “Break” old binary versions of objects?
My say is either use http://blog.acfoltzer.net/tag/minecraft/ or save in a compressed stream, i.e. GZip. NBT is nice because of organization, the only problem is that you have to develop a large API just for handling it. I’m thinking about releasing that API, though, so it can be more easily used.
Not necessarily 100% at runtime – a cache might persist things beyond a program’s lifetime for example – but pretty close to it.
If you simply add a new field, then the xml deserializer simply won’t populate it from an old schema, and you’ll need to come up with a sensible default. If you completely rearrange your class, then obviously you’re going to have to redo your serializer. If you use an annotation-based one then chances are it’s already been rearranged for you. The gist of it is, your versioning logic can be done outside of readObject/writeObject methods, and often outside of java itself (such as xslt for xml serializations, if you’re that masochistic). You can get some of this in serialization too by maintaining the serialVersionUID yourself, but god help you if you do make an incompatible change and forget to update it.
The brittleness of Serializable can be an advantage: if serialVersionUIDs don’t match, it fails early and fast, so if you forgot to evolve the deserializer for the new version, you’re never left with half-baked objects. It’s also fast and reasonably compact. The reason (read: wild speculation, I’m only pretending to read Notch’s mind) that Minecraft doesn’t use it for storing chunks has less to do with the brittleness than the fact that it’s basically all-or-nothing. If chunks kept direct references to their neighbors, deserializing them would mean reading in the entire known world! Such references would have to be marked transient, but then it would take custom code to serialize something else like a chunk ID. Plus, the NBT format allows for skipping parts of the stream as needed, which is something peculiar to that format which you normally don’t do with any serialization API.