Judging panel results.

It’s up to individual judges how they determine their score. The scores from the judges are normalized before they are averaged together.

For me, first I like to classify the game, as being either superb, good, fair, mediocre, poor, etc. I simply do this in my mind, and once I’ve decided that some game is superb, I can quickly say it deserves at least 90%, and then I compare it to other games that have 90% score and decide that it is better than this but not better than that, and come down to a score of maybe 93%. There are many factors I consider, as ra4king mentioned, graphics, ease of play, difficulty, technical factor, audio, gameplay of course, etc. However relying solely on these factors to judge a game is not good science either, because the game could have audio, nice graphics, technically impressive, but suck in fun and gameplay. So there are aesthetics and subjective factors as well.

But it isn’t accurate science. It is easier to do this for the top games, because they are fewer and easier to compare. But the lower you go in the rating, like 70-80%, it becomes much more difficult, because there are more games that fall into that bracket and comparing them is more difficult.

There are always ways to improve the judging process, but year after year the results seem to be alright. I did suggest in another thread to use buckets like “Superb”, “Great”, “Good”, “Fair”, “Mediocre” and “Poor”, decide on what games fall into what bucket, and then order the games. The judges would have to coordinate together and agree, and that is the biggest practical issue with it really.

then the final grade is (judge’s score 1 + judge’s score 2 + judge’s score 3)/3 ?

(normalized judge 1 score + normalized judge 2 score + normalized judge 3 score) / 3

@apo:
To be honest, because all of them are puzzle games. Puzzle is fun to play but each of them has different environment and rules, requires time to learn if haven’t been familiar at first. Action or fast games like Fuego or Rainbow Road are easier to understand. That’s not bad, I enjoy puzzle game (I created puzzle game and all of my installed android games are puzzle except Nun Attack), but when they come in group of more than 10 my mind blown a bit ;D

The lowest ApoNurikabe is too confusing. I even googled to know how to play but no success. I asked other like ra4king and he threw me a chair with foaming mouth ;D

About scoring, in my formula no bug/crash gives you 30% start. I know how it feels to fit everything to 4K so either gameplay and graphics has same chance to boost it into 99%.

Quick aside: I initially decided to not give anyone a 100% because no game should truly deserve it…then I saw Rainbow Road

[quote=""]
If you’re interested in a real chess-Sudoku, check out http://www.toothycat.net/wiki/wiki.pl?PeterTaylor/ChessSudoku

On a separate note, for the past few years I’ve calculated the correlations between the judges and the community; interestingly, although this year the correlation between the judges is the lowest since I started doing this, the (Spearman’s) correlation between the overall judges score and the community score is almost the same as last year, at 0.725.

Just back from a long weekend. Congrats to Morre.

Thanks to the Judges for reviewing the games.

Thinking about next year now :slight_smile:

Oh wow! Congratulations to Flywrench, my personal favorite, getting 1st on the judging list. It was well deserved.

Apo, it was amazing you were able to create so many high quality games in one sitting. Puzzle games is one of the most hardest genres to accomplish, and you managed to create over 10 unique puzzle games with decent difficulty, quality, and challenge in each puzzle. To be honest, that is a very great achievement in itself and you should be very proud, because your record will probably remain unbroken for a very long time.

Oh, and I would suggest not getting rid of the limit on games. It might help the quality, but it’ll will drastically reduce the number of games that will be able to be submitted. Since real-life can be a drag (and unpredictable), quantity is a good thing to strive for. People love numbers more than anything else. If they see that Java4K was able to crank out more than 1000 unique gaming entries; it might be a very good resource to showing people that Java gaming is a force to be reckoned with.

The contest was a great learning experience for me, and I am quite impressed that my entry which I only put 3 weeks of effort on, actually did decent for a first try. This game was actually a demake of a game I did before in The Games Factory for the VCade. My friend did not want to download Vitalize! for some stupid reason, so I remade the game in Java so he could see it for himself. It is just a coincidence it managed to fit in Java4K, so I tinkered with it and tried to make it as small as I could :P.

Congratulations to everyone who has produced games for this competition. See you guys in the next competition for 2014.

Here is the Top 20 according to each judge (percentages normalized):

Arni Arent (appel)

1,Flywrench4k,100.0%
2,Parasite Escape,95.0%
3,Farmer John and the Birds 4k,93.3%
4,Inf4ktion !,93.3%
5,Wizzy 4K,93.3%
6,ApoBrain4k,91.7%
7,Dord,91.7%
8,B4llBasher,90.0%
9,ApoBeam4k,86.7%
10,ApoClock4k,86.7%
11,ApoStress4k,86.7%
12,F4R KRY,86.7%
13,Sorcerer4K,86.7%
14,Space Devastation,86.7%
15,ApoBlockLock4k,83.3%
16,ApoMonoMirror4k,83.3%
17,tiny_world,83.3%
18,Toras Blocks 4k,83.3%
19,Frog Solitaire,81.7%
20,Rainbow Road,81.7%

Drabiter

1,Rainbow Road,100.0%
2,Fuego!,98.4%
3,M4nkala,98.4%
4,Die Z,96.8%
5,Dord,96.8%
6,Skyrim 4K,95.2%
7,Meltdown,93.5%
8,Plants 4K Zombies,91.9%
9,Farmer John and the Birds 4k,90.3%
10,Wizzy 4K,90.3%
11,4096 B.C.,88.7%
12,Flywrench4k,88.7%
13,Inf4ktion !,88.7%
14,Sorcerer4K,88.7%
15,Space Devastation,85.5%
16,ApoClock4k,83.9%
17,Galactic Conquest 4K - II,83.9%
18,Parasite Escape,82.3%
19,tiny_world,80.6%
20,Joe 4K,77.4%

Roi (ra4king)

1,Rainbow Road,100.0%
2,Galactic Conquest 4K - II,98.6%
3,tiny_world,97.1%
4,Flywrench4k,95.7%
5,Skyrim 4K,95.7%
6,Behind the wall of sleep,94.3%
7,CodeGolf4k,94.3%
8,fear4k,94.3%
9,Plants 4K Zombies,94.3%
10,4King & Country!,92.9%
11,Demolition Derby,91.4%
12,Farmer John and the Birds 4k,91.4%
13,Space Devastation,91.4%
14,B4llBasher,90.0%
15,Rogue 4k,90.0%
16,4096 B.C.,88.6%
17,Die Z,87.1%
18,Wizzy 4K,87.1%
19,ApoStress4k,85.7%
20,Fuego!,85.7%

@Appel

What were you thinking?! 2 judges put Rainbow Road in first place, the community put it in first place, and you put it in 20th place!

In fact, take a look at ApoBlockLock4k, an implementation of Rush Hour, Nob Yoshigahara’s sliding block puzzle from the 1970’s. Rush Hour puts the player in the role of a parking lot attendant that needs to figure out a way to get a car out of a congested lot. You put ApoBlockLock4K 5 places above Rainbow Road. Meaning, you’d rather re-park cars than race them?! From a game play and a technological achievement point of view, I can’t imagine how you came up with this ordering.

In your review of Rainbow Road, you wrote, “Oh my… this must be what it feels like driving on a rainbow. I was expecting a relaxed driving experience, but I got a thrilling and… a competitive race. One of the more interesting racing car 4k game made.”

You never played the N64 game that this 4K game was based on? You never played any version of Mario Kart? Why are you even a judge?

Not everyone is a Nintendo fanboy. There are enough of them that Nintendo remakes get a nostalgia bonus, but it’s a good thing that the judges have different backgrounds: if they were all clones then there wouldn’t be any point having more than one. And it’s a good thing that the competitors have different backgrounds, because there were some great games on different platforms that get a second breath of life from someone who used to play them.

Appel commented that when he was judging Apo’s games he made a conscious effort to judge each game on its own merits rather than against Apo’s other games. Similarly, judges should make a conscious effort to judge games on their own merits rather than on the merits of the games which influenced them.

Of course I knew it was based on Mario Kart.

The biggest drawbacks with Rainbow Road were that it was more of a thrill-ride game with no obvious goals or levels. After 5 minutes there was no end in sight, and I became quite lost and thought “What now?”.

I maybe agree with your normalized top 20 that it should be higher than some other games there, or some of the other games there should be lower. In any case, I like games that provide you with a well rounded gameplay that keeps you occupied and interested throughout, and not just for 2-3 minutes.

While Rainbow Road would definitely win a technical award, if we had one, we have to weigh in other factors as well.

But a game like FlyWrench4k has much more to offer:

  • It is original with clean graphics.
  • It has a small learning-curve, but is intuitive and comfortably challenging at the same time.
  • It is a puzzle game that offers new twists every new level.
  • It keeps you interested throughout, willing to retry to beat the new challenges.
  • It has simple gameplay, flapping the flywrench through a maze of seemingly impossible obstacles.
    So it’s very well rounded, while Rainbow Road just leaves you hanging after that initial thrill ride.

And to be honest, I think my fellow co-judges went a little overboard Rainbow Road scoring. Although I admit I probably didn’t do it justice. At the end of the day that’s why we have many judges at not just one, to balance out mistakes.

[quote=“Rooster_Miami,post:29,topic:41325”]
I must say I’m somewhat surprised to hear you, as your fifth post on this forum, ask “Why are you even a judge?” to Appel, who organizes the Java4K contest. I really can’t see the point of your little “analysis”.

And does having played some historical games qualify one to be a good judge? This is not the “Java4K-remake” contest ;D I’d argue that games should get bonus points of originality: it’s much harder to make something really original and innovative than to do a remake. Rainbow Road is a great entry, it’s graphically really awesome for a 4K game, and it’s even somewhat fun to play.

Dissenting opinions on games is a Good Thing. When we all think alike, no one thinks very much.

Hmm I don’t like where this going - questioning judge’s taste toward something that people think “obvious”.

Postings like these make me sad. I mean… WTF?!?!
I… err…
I’ll give it up… Anything i could answer, would be an insult as well. >:(

So - back to the judges and the results:
Thank you all for the vast amount of time you must have sacrificed to the judging-panel.

And thanks for the 18th place of Fuego! I’m really happy ;D

Based on a little extrapolation of my reached ranking from last year (49th of 51) to 2013 (18th of 68), i should easily win the java4k 2014 … Haha! :wink:

Big thanks to the judges for their work, really great to have such feedback. I always got worst score from Appel and that is cool by me.
The best part about it is the score metrics we get. You can make your game more popular by naming it FamouseIP4k and take a niche game-play mechanics form another game.

So Braid4k would be the BOMB!!! (seriously considering making it :persecutioncomplex: )

Haha something similar to Braid4k has already been made! Check out “Behind the wall of sleep” :slight_smile:

@appel did you not see the glittering bar on Rainbow Road showing the finished line? This game was an exact replica of the original with laps and everything. :slight_smile:

Either way I agree with the others that dissenting opinions is the point of multiple judges and provides the most well rounded results.

How about 4.01K? 401K has a nice ring to it. :wink:

Regarding judging, I thought the giving the community the opportunity to give feedback with the points was great. I hate to leave a comment on the game’s page itself, since it’s visible to everyone forever. It’s nice to have a way to give feedback to the developer that’s associated more closely to the round of judging.

I’m looking forward to next year’s contest. I had 3 or 4 games in the works that were playable but needed work; however time is my scarcest resource these days. A 1-year-old and a 3-year-old take up an awful lot of time and energy! I guess I have a starting point for next year’s contest.

Before the judges submit their final scores, they should be able to view a sorted list of their individual results. Or, why not let the judges simply order the games as oppose to assigning a number to them? The ordering itself could establish a score. For instance, if there were 50 games, the lowest in the sequence would be assigned 0% and the highest 100%. Everything else would be, in this case, 2% higher than its predecessor. I think that would have prevented Frog Solitaire and Rainbow Road from sharing the coveted 20th place.

I’m not sure what “20th place” you’re talking about.

Rainbow Road got #2nd place from the judges, not #20th place, and with only 0.9% less score than the one in #1st place. If you got a game that got 90%+, then you got a superb game. I don’t see how that is unjust. (The reason we don’t show normalized scores from individual judges is perhaps of this, one judge can be more wrong than a handful of them, which is why you have multiple judges in some judiciary systems, so “right” judges can cancel out “wrong” judges).

You have to realize the grades are subjective and games are reviewed through a period of couple of weeks, but most of all there were almost 70 games. Keeping all the knowledge of all games at the tip of your mind to be able to quickly order them, as you suggest, is pretty hard. You might have a handful of favorite of games, but we have to weigh nearly 70 games.

I have already suggested a new judging method based on classification buckets, which I may implement for next year. So there’s no need to create more complex grading rules which at the end of the day may not change anything for the final results.

Also, (not speaking of Rainbow Road here)

I’ve re-played many of the games that some thought were mistreated by the judges and I can’t say there were many mistakes that would affect the top 10. Maybe 10-20. Most likely 20-40. It’s easier to say which game is best and which is worst, than to re-arrange the games in absolute order of how good they are from the center.

Of course mistakes do happen, but with games like 4k games, the first-impression is often the right one. You also have to consider we also have to view these games as some casual player would view them… if you have to read manual, if it just doesn’t intuitively play from start, then casual players won’t bother with it.

Judges are also under a time-restraint, only allocating limited time to each game. The only reason for a judge to spend more than 20-30 minutes on a game is because it’s so fun to play and addictive. If he’s struggling with understanding it and spending a lot of time on it because of that, then there’s something wrong with the game, not the judge. That’s the reality, even if you may perfectly understand your own game, the same is not for others, judges or casual players. (And in my case about Rainbow Road, it was easy to start playing, but after a while it became quite pointless with no encouragement or challenge to continue).

I say, if you have to read the instructions to play, you can improve the presentation and how intuitive your game is. If you have to keep reading the instructions while playing, you’ve done something wrong.

Apo’s games were probably the most “mistreated”, at least a few of them, but other than that not much else would affect the final result.

I’ll be sure to invite you on the judging panel for next year :slight_smile:

Shame on you, appel, for not being a sheep.