Judging Results

Regarding “remove lowest score for everyone”:

I’m fine with this. An ever better way might be to remove the lowest and highest score for each game (just taking an average between the middle 3). I believe this would be a very fair way to do it (a Truncated mean as suggested by oNyx).

In either case, I think the scores should be finalized very shortly (I’m fine with whatever method the organizers decide to use), and then a plan set up for next year. I propose truncated average (remove lowest and highest score).

EDIT: On a side note, I believe the NPE discussed might be the one that mentioned in a comment by darkfrog for one of my games. Whether it’s because of faulty code on my behalf or problems with obfuscation or differences in the JVM is hard to tell. I might see if I can get some other Vista 64 user to run it and see if I can pinpoint it.

Excellent idea!

Although all these ideas are worth chatting about, I do not think applying them to the current results will help (the competition). Cooking more with the results will only do harm.

We will instead try to apply these ideas to next year’s judging. Which reminds me, we should probably discuss the overall judging process instead of just how to calculate the percentages. This is probably a topic for another topic.

Noooo! Now 4x4k is 30th! This can’t be right! 4x4K is much better than Jetp4k (now 26th)!
No offence to the judges but - WE NEED A BETTER METHOD!

EDIT: I’ll tone this down later hic but s’true! hic urp

IDEA ALERT

I’ve been thinking about the judging process and about this problem of “this game is better than that game” and using percentages to solve that. Doesn’t seem like the best way.

IMO we should just have, let’s say five bins, and allow the judges to put the games into in order to rate them, these could be:

Bin 1: SUPERB game
Bin 2: Excellent game
Bin 3: Good fun
Bin 4: Alright
Bin 5: Moving on

And within each bin, the judges could sort the games in the this-game-is-better-than-this-one order.

This could deliver much more meaningful results, as games are sorted better, and the daunting percentages are removed which actually hold no real meaning.

This also makes the judging process easier, as it’s much more easier to put many games into just a handful of categories, and then deal with the games in each category.

Virtual On 4K sucks T_______T
DeamonPants, you seem like you’d like Blue Fiend’s new controls. You should re-check it out.

I like it! And it would be cool if users could also use this method to judge (but keep their score separate to the real judges scores).

With the new method proposed, I feel the need to put n/a (“Couldn’t run” or the like) is important. Apart from that… Yeah, I’m fine with the idea of having more of a bins sort of rating (1-5 or 1-10 per game).

I’m not sure there’s much point sorting the entries within each bin if we’re talking community vote - that should sort itself out with more than say, 10 or 15 votes. As for judges… that’s trickier. :slight_smile:

A method used rather successfully in the Ludumdare contest is to present each participant with a random games order judging page. People will generally start from the top, so if everybody rates say, a third of the games, each game will still get a decent amount of votes. There are problems with this method as well, but it’s worked reasonably well for them I’d say. In order to be successful, however, it needs to be coupled with a strong encouragement for each participant to judge a few entries.

EDIT: Also, I feel like an idiot for having brought this whole 0-points issue up. Sorry about the mess! I’ve been coming out as all negative here, when all I really wanted was to point out the issue. I would’ve been entirely fine with the results table staying the same and new scoring mechanisms being used next year. With the new scoring table, two of my four games have an unfair advantage because darkfrog generally gave lower scores… :S This sort of problem is the reason we should remove lowest and highest scores for all games or normalize the judges’ results.

I’m feeling really guilty right now.

It was well after my bedtime and I wasn’t expressing myself as clearly as I would like. My point was that, leaving aside the issue of which exception is thrown, we’re targetting not one API/framework but several dozen, and we can’t test on all of them. Lurking in the back of my mind also was the fact that in efforts to save a few bytes some people are straying into areas which the spec doesn’t cover clearly. For example, AFAIK the spec for Applet.getGraphics() says nothing about it returning null at some stages in the applet’s lifecycle, but some people found that that was the case with a small number of VMs.

[quote]Since the results have now been updated to ignore zero anyway, whats to worry about?
[/quote]
Next year, obviously.

darkfrog’s standard deviation for presentation was 10.3, so with less precision nearly everyone would have been lumped together in three buckets under that scheme. In general most people scarcely use the bottom half of a 1-10 scale. I was thinking about something similar to appel’s buckets suggestion, although that needs work to get something quantifiable which can be averaged.

In general there seems to be a moderate “nostalgia bonus” for basing it on a game the judges played in their youth.

Someone else would have brought it up. Relax.

Fair enough.

Seems like that’s already being thought about, so thats great too.

I won’t be getting involved in the contest again. This sort of stuff afterwards just leaves what was fun activity with a nasty taste. The only good option is community voting and reviewing.

Kev

Come on, that’s late aprilsfool, isn’t it. It’s just natural to discuss the judging (and possible problematic scores) and I don’t think this makes the excelent games this year any worse.

Hmm, not sure about this. Should first be tried parallel to judge votes to see, if there are enough community votes/reviews for all games.

Yeah, I agree. This quote always reminds me why I quit hosting:

Thanks appel! The torch is yours!!

Congrats to all game devs who wrote excellent games and kept the 4K fun. For those who bicker about pointless stats, please take that shit to the Flash 4K contest or something. It’s not that big of a deal.

I think having (from next year on) the judges simply list the games from best to… uhm… least best is actually a very good idea. It removes all subjective scoring from the equation and enforces a uniform point system. Having bins as well has the added benefit of distinguishing games into discrete groups, so there could be two AWESOME, a hundred VERY GOOD, ten OK, and two NOT OK.

I don’t mind these discussion about scoring, but I’m very happy apple said there would be no more fiddling.

And don’t quit, kev… don’t be like that.

[edit:]
It seems like mojang.com is down… I can’t find out why until I get home tonight. Did we get slashdotted or something?

[edit edit:]
Nevermind…

I don’t remember which machine I was running yours on. It was either Java 1.6u7 or Java 1.6u12, but I’m not sure. However, since 1.6u7 is standard on Mac and 1.6u12 is the latest version I think that though it sucks for you, it is fair to expect the game should run without issue. Remember that we are judging a game that was coded and part of the coding is compatibility. If you haven’t tested your game to run properly on 1.6 or greater then you have to lose points. It sucks that it drops you to zero, but like Chris said, what other score can I give if I can’t play it?

What game was yours? Was yours the one I didn’t put any comment at all in? :o

It does run properly on the versions of 1.6 which were released when I finished it (i.e. up to 1.6u11). u12 came out in the last week of February, and u13 sometime in March, it seems, although I didn’t know it existed until today. The breakage in u12 looks like a bug in Webstart, and u13 seems to have a different bug in Webstart based on the stack trace a friend sent me today.

P.S. Surely the latest version is 1.7?

Interesting, was that what you used for NiGHTS and had problems with as well? Because I tested with 1.6 (and it’s working on 1.6.0_07 32-bit WinXP here), so it sounds like there’s something else going on. ???

I did in general give lower scores, but it would because I kept holding out for a game that stood out as exceptional to give high marks to. Last year I marked everything relatively high and then near the end found a game that complete changed the standard and had to go back and move everything else down to differentiate it. Please don’t take that to mean the games were of bad quality, but rather for the most part most of them were pretty high quality, and though I didn’t give 100s to anyone, the reason the score differentiation for me wasn’t very great for most was because most of the games were of a generally high quality. However, the fact that I scored everything relatively lower it shouldn’t have any impact on the order of results, just the end scores.

Latest stable version of Java is 1.6u12 and though it sucks to have this be an issue you have to be able to support the currently stable version of Java. I would be curious to know what it is in your code that is causing a breakage in Java in u12 though, that’s very odd.

I did about half my testing on a Mac with 1.6u7 and the other half on Windows Vista 64-bit with 1.6u12.

Anyone that I gave a zero score for failing to run the application please feel free to PM me if you’d like me to help you resolve the issue. I would also just like to be able to play your games. :slight_smile: