SQL or Hashmaps for data storage?

Hi,
This question is about whether it’s worth learning SQL or if I should roll-my-own data storage using HashMaps and serialisation so that I can leverage my knowledge of Java to manipulate data.

I have a (sort-of-a-game-but-not-really) hobby project web app programmed in java and hosted using Tomcat here: www.fightfinance.com.
For storage I’ve been using the SQL database H2(http://www.h2database.com/) which has been excellent. On Riven’s advice, I use only prepared statements and access the database through a ‘user access object’ which is just a class that holds all of the SQL code to do things. This has worked well so far for my simple queries.
The problem is that I now have some data and I want to do more interesting things with it, like delete (spamming) players and every bit of data associated with them (questions, answers, questions edits, ip address logs). Or find the hardest questions which have the highest proportions of wrong answers.
But my knowledge of SQL is so poor that I don’t know how to structure these queries. I could easily manipulate the data if it was a HashMap or List in java, but making SQL statements is a pain in the butt. Usually I find myself trawling stackoverflow for example code which I try to adapt but I don’t really understand, or I pull in whole tables out of the SQL database and dump it into ArrayLists in my java code and then sort thru it there which must be quite inefficient.
I’ve read about things like jooq(http://www.jooq.org/) and Empire DB(http://empire-db.apache.org/) which sound easier to use than pure SQL.
So far as I can see these are the advantages (+) of SQL/H2/jooq/empireDB:

  • Stable and used by everyone.
  • SQL databases have everything built in such as serialisation/persistence.
  • Access on a remote server thru a web interface which is handy.
  • Learning SQL might be useful for other projects or even getting a job.

Disadvantages:

  • SQL is a pain, seems clumsy compared to java syntax which I already know.
  • Black box and performance-wise I have no idea what’s going on in there.
  • No compile-time safety at least with a pure SQL database (but there is for JOOQ and EmpireDB).

Advantages of a roll-my-own data storage setup with HashMaps and serialisation:

  • High performance using HashMaps.
  • All-java, no external config files or any other garbage to worry about.
  • Complete control.

Disadvantages:

  • Have to be careful about persistence thru serialisation when the data structure changes.
  • No web-interface, I’ll have to make my own.
  • database might grow so big that it can not all be stored in RAM as HashMaps on my small 256MB RAM server.

What are your thoughts? As far as I know, most people do not use databases for storing data in their games. But many of you have experience in databases from your corporate work so I thought you’d be a great bunch to ask about this.
Cheers,
Keith

I am split because I want to say two opposing things without confusing anyone.

First, SQL is easy to learn. It is good for certain things, eg. deleting items associated with an ip range, finding items in a range, finding posts in a thread, returning the top N players, etc. It probably is better than trying to design your own system with an indexing system. HashTables and Lists won’t suit you well because they won’t be structure the same way a table can be. Just don’t trust StackOverflow.

On the other hand, SQL doesn’t seem to fit games well. You don’t want to read and write to it during gameplay. Normally you want to structure your program to perform in a RAM cache conscious way. Uncached RAM access is slow, SQL is even slower because it involves IO. You definitely don’t want it in a single player game. (JSON is my recommendation; not because it’s my favorite format, but because it is a good choice when there is no obvious requirement that demands XML or something else, works well in most scenarios, and looks like Java/C++/C#.)

You probably don’t want it in a real time game even if it is multiplayer. A lot of people jump to using databases because they think its What You’re Supposed to Do ™. They’re clearly ignorant and just go to it automatically. When they try to fake knowledge, you can tell that they’re just throwing jargon around. If you’re not a database “expert”, then stray on the side of not using them in critical routines. If you’re neither a database expert nor an “expert” on hardware performance concerns, then definitely do not use a database. Try to handle everything in RAM. Avoid hard drive IO (and indirect IO through a database) until it’s absolutely required.

A lot of big servers (in the general computing world) work by storing their entire database in RAM. If you think that defeats the purpose of a database because it’s not persistent, then you shouldn’t use a database. It’s not the purpose and there are work arounds. I do not think most online games work on a single machine. There are several game servers that have to communicate just to maintain consistency. If there is a place for SQL that is probably it. However the SQL server probably runs on a dedicated machine. When the game is running, most of the data stays on the server the player is logged in to and user caps are used to prevent overload.

Got one computer? Don’t know what the benefits of SQL? You don’t need to learn SQL unless you want to. And, as with all performance related advice, performance isn’t a concern until your computer runs out of time where its sleeping or until you hear grinding noise coming from non-solid-state hardware if you have any. Both of which probably won’t happen unless you’re very ambitious. If using your own method is easier in the medium time range, then it’s a good idea.

Thanks, interesting comments.
That’s why I started using a database, because in web app programming it’s ‘what you’re supposed to do’. I was also hoping that databases would be easy and that it would be obvious how to do things since it’s a well-worn road.

Turns out that SQL is not easy or obvious, the syntax is completely foreign. I’m at the point where I will have to invest a week to learn it properly or give up and just do it using HashMaps which will be much more straight-forward and flexible.

Couple of thoughts:

  1. SQL isn’t particularly difficult, in fact it’s mighty simple for 95% of cases IMHO. For the use-cases that I imagine you need from the description of your game, I wouldn’t have expected you need more than the most straight-forward SQL statements to implement your game. If you’re not particularly au-fait with SQL or haven’t got the time/patience to get to grips with it I’m sure there would be people on here or other sites that could help you out.

  2. An alternative is to use Hibernate as a wrapper layer for your database - it abstracts away most (or all for many cases) of the nasty SQL syntax and provides a criteria-based API that is simple to use for most use-cases but powerful enough to handle the complex ones as well. Might be worth looking into, although of course it’s another new technology with a learning curve, time investment, etc.

  3. Another alternative might be to use an object based database such as Mongo or db4o (there are lots of others but these are the ones I’m familiar with off top of my head). These are non-SQL databases that are very easy to use and essentially the same as your hashmaps-and-serialization approach but without having to roll-your-own (and therefore less bugs / maintenance hopefully). Again it will need some time and effort, but considerably less than the 2 options above I reckon.

  4. Why hashmaps in particular? If you’re happy using Java serialization as a persistence mechanism why not save/load your game model classes directly? What do HashMaps have to do with it? (just curious)

  • stride

I think it’s because Maps provide functionality to write and read objects to fields, in a single structure.

Thanks for your points.

  1. Ok well I’ll persevere in SQL since you and SHC think it’s easy and worth learning.
  2. Hibernate sounds interesting but it looks like it’s for people who are already strong with SQL, and also it seems to be best set up using config files rather than code which I dislike.
  3. NoSQL databases sound good, but after trying the mongo db browser shell tutorial it has its own silly syntax of doing stuff like:
db.scores.find({a: {'$gte': 2, '$lte': 4}}); 

Which means loop over all entries and get any that are >= 2 and <= 4. I think this kind of pseudo-javascript syntax is worse than SQL.
4. HashMaps are quick for gets and deletes, compared with lists. That’s the only reason I think they’re good to hold data in. Most of the data in my game are just giant collections (maps or lists) of objects.

Thanks for the pointers everyone.
Cheers,
Keith

SQL is not a simple language, and basically requires learning it, having experience with it, just like any other programming language. It probably takes a few minutes or hours to grasp simple queries, but it will take years to master it.

In previous jobs I’ve worked with many enterprise developers, who had 5-10 years of solid fulltime SQL programming under their belt, but when debugging their code it turned out they had really strong misconceptions about the underlying mechanics, that they somehow didn’t notice over the years. A dialect like MySQL certainly doesn’t help, because it accepts blatantly wrong SELECT statements in case there is a GROUP BY clause, yielding by definition undefined results in the resultset, most likely based on the order of rows on disk, which can change at any time.

One can probably say the same thing about any language, but it’s especially the case in SQL, as you can’t easily verify intermediate results of a query. (I think it’s comparable to an OpenGL shader, where you look at the output on screen, and assume you did a good job if the output looks like you expected - which is a very dangerous ‘validation technique’ in the database world)

As for what CommanderKeith should do in his specific case, in my opinion; use SQL for the trivial stuff. For complex things (in simple projects) it’s perfectly acceptable to read a whole table, perform calculations on it, and write back the results. Just ensure that concurrent queries/operations don’t corrupt things, but that’s more of a general concurrency issue, than a database related one.

If you like to learn SQL for future jobs, then go for it, but maybe go for it on a separate project. Learning and using SQL properly often involves designing an application around a database, instead of the other way around. For experienced developers this often happens at a subconscious level: like how Java developers are prone to squeeze every application design in an object oriented mold, SQL developers are likely to design their tables in a data oriented mold, while not realizing that newbies/juniors/mediors in this area will probably make (the most trivial) mistakes which will lead to too loose or too tight constrains, redundancy issues or no (proper) referential integrity (foreign keys alone won’t save the day).

Sorry for saying SQL is easy to learn. What I meant was that it has a shallow learning curve for the basic features, sort of like JavaScript. All the disadvantages you listed are absolutely true, especially the clumsy syntax. (Exactly like JavaScript.) Select, insert, set, and delete get you very far, but not necessarily in an optimal way. A lot of open source web applications seem(ed) to primarily use those constructs. The exception was for projects like MediaWiki where complex queries were crafted for each specific database, presumably to get the best performance on each one. (Portability isn’t good in SQL and performance requires knowledge of database specific voodoo.)

Different SQL databases have huge feature sets and it would take a whole career to know just one database inside in out. Some databases even let you define your own “functions”, but that is something you can ignore until you get a teacher or mentor. Advanced features vary in function and syntax. Plus the syntax is really bad because some features were developed ad hoc. Select, set, and delete are conceptually simple if you understand the concept of an SQL query (the table specification, column specification, and where clause.) Insert doesn’t involve many parameters, if any, besides the data you want to add.

The ability of SQL databases to index columns/keys could greatly improve performance over custom naive indexed data structures even if the user is a SQL novice (for certain applications). (Though SQL does make it easy to make security mistakes which is a good reason for novices to avoid it.) The difference would sort of be like switching from CPU rendering to (non-intermediate-mode) GPU rendering. Just utilizing the basic feature set may make it easier to process tons of data by making more efficient techniques more accessible. This is the reason I do not recommend it for game programmers, though. People think it’s absolutely necessary for things like save files, where there would be less overhead to use a text format like JSON, CSV, or an ini-style properties file. Games just don’t tend to involve record oriented data, so adding SQL doesn’t have the same automatic benefit you might have for certain web applications.

Great tips, thanks Riven. I’m glad that I’m not the only one to struggle with the more complex queries. That’s a relief that it’s common practise to dump out whole tables to sift the data in java.

And interesting to hear how DB vendors do some things completely differently under the hood Several KB. I guess that SQL is such an old language that it’s to be expected.

Still i do wonder why simple databases of HashMaps with serialisation are not more prevalent. I am quite naive about databases, but it seems like HashMaps would be quite an efficient and a pretty simple solution which most people can easily understand. The only things that might be added to a HashMap to make it on par with a bare-bones database is to make it so the HashMap could be serialised incrementally. And make it so when edits are made to the HashMap in RAM, those changes are written in another thread to the hard disk to back up the data in case the computer turns off or the program crashes.

The kind of object & key database you are thinking about is quite common. For example, Google App Engine Datastore stores your objects as objects, and stores indexes to the objects based on the queries you want to make. It’s not like an SQL database where you can do creative things with queries, in Datastore you have to design all your queries upfront, but in return the database has very fast read behavior.

That’s interesting, I’ll look into that. Thanks ags1.