My proposal for a possible rating system is somewhat complicated. I spent a lot of time on it, so please don’t disregard it too lightly
At the core of my proposal is a category-based rating system. Instead of having a single large point scale and leaving it up the judges to divvy out the points, let’s separate the points into specific, meaningful categories and ask the judges to rate against those categories. The rating scale for each category should be minimal, to help keep the results meaningful. My belief is that a scale of 1-5 for each category should be more than enough.
Game designers want feedback from judges on how well they achieved each of the categories that make up a game. Feedback also helps “justify” the scores taht judges give. By specifying these categories, the results become more self-documenting and require less writing by the judges. This also opens the door for “best in category” awards, which I believe is meaningful.
We need to try to reduce the impact on total score from judges that “just don’t like this genre” or “just don’t get it”. Given, “just don’t get it” responses should be minimized by having appropriate documentation on your game’s download page and/or forum page, but there’s not much you can do if a judge dislikes the genre that your game is in. Even so, judges can make unbiased opinions on a game’s graphics, controls, polish, technical achievement, etc. thus providing valuable feedback and still giving games a chance to win “best in category” awards.
Judges should be discouraged from the act of giving a lower score because they “might see something better later on”. My proposal would encourage the judges to rate each game on its own merits. After a judge has finished rating all of the games, the system will compute the judge’s top 4 games based on total weighted score. If there are any ties within those top 4 games, the judge will be given a chance to rank the tied games. This tie-break ranking will be completely subjective and the judge will not have to justify his decision. This may seem like a silly feature to include, but I believe that judges have implicitly stated their desire for such a feature by giving games a ‘99’ so that they have the option of giving a ‘100’ later on if they find what they believe to be the “best” game. This manual tie-breaking would happen on the single judge level only. For games that are tied based on total combined score from all the judges, the system would try to determine if a game appears to be “more liked” by the judges. This would be based on both the complete point-sorted list for each judge and by any tie-break rankings given by each judge. If the system could not determine if one game was “more liked” than another, then those games would be considered tied for their position.
I believe that we all agree that some categories are “more important” than others and should carry a larger weight in the totals. The thing that we’re not going to all agree on is which categories are the most important. My proposal is to let each judge decide which categories are most important to them. Before they can start judging the games, each judge decides which categories should carry higher weights. These weights will be made public during and after the judging period.
Here is my category list:
Originality
This is an indication of whether or not the game “brings anything new”. For games in well-established genres, this can be taken to mean “Does the game bring any new gameplay elements to the genre”. For games in smaller genres, just having an entry may be enough to earn a good score in this category. By definition, full clones are going to score lower in this category.
Gameplay
Also called “playability” or “replay value”. This is an indication of how much fun/addictive the game is.
Technical Achievement
This a perceived value indicating whether or not you feel the game achieved something that is difficult or even impossible, given the limitations of the contest. For instance, a game may feel like it has more graphics or levels than you should be able to fit into 4K. You can also give a game a low “Technical Achievement” rating, indicating that you feel that the game could (or should) contain more than it does. It should be clear that 4K Java games are no longer considered technical achievements in and of themselves, so please do not give middle ratings for technical achievement just because someone submitted a 4K Java game.
Graphics
This is an indication of the quality of graphics used in the game. This is not an indication of whether or not the game “packed a lot of graphics”. That belongs under “Technical Achievement”.
Controls
This is an indication of the intuitiveness and ease-of-use of the controls for the game.
Responsiveness
This is an indication of how well the game interacts with the player.
Polish
This is an indication of how “complete” the game feels.
Progression Of Difficulty
This is an indication of how well the game balances the increase in difficulty as the levels progress. This includes the difficulty of the initial (first) level. Does the game start out too hard or too easy? Do the levels progress in a consistent, intuitive manner that builds off of knowledge gained from the previous levels?
Incentive To Keep Playing
This is an indication of how well the game “rewards” you for your efforts and keeps you playing in a effort to improve your results.
Sound
This allows you to give bonus points to a game that uses sound. Please do not give middle scores just because the game has sound. A game with bad sound could almost be considered worse than a game with no sound at all.
0-5 Bonus Points
These are bonus points that can be given for any reason, but you must give a clear explanation of what they’re for. Typical reasons might be multi-player support, excellent physics or sweet A.I.
0-5 Penalty Points
These are penalty points that can be deducted for any reason, but you must give a clear explanation of what they’re for.
I compiled this category list by analyzing all of the judge’s feedback on every game for the 2006 results.
My proposal breaks down the weights for categories as follows:
- 2 categories would have a weight of 3
- 4 categories would have a weight of 2
- 6 categories would have a weight of 1
This will give you a maximum combined score of 100 8)
NOTE: Both the “Bonus Points” category and the “Penalty Points” category can only have a weight of 1.
I have thought about several more aspects to this, but I think this will be enough and get some conversation going.
All feedback welcome, thank you for reading my post.
-Dave