Four hour downtime as mysql tumbled

How do you randomize the connection string and not have a connection error? Don’t you always need to use the same URL when connecting to the SQL box?

jdbc:…/…?rndm=[nanos]

it’s not so much a TODO in sourcecode, I didn’t know it at the time, and couldn’t be bothered to fix/redeploy ancient code. having said that, i’ll have to find the code first :clue:

as for my team building day (i know you want to know!) it was basically half a day in a boat where everybody had to stfu as a guide was talking, and the other half we were driving around in jeeps screaming ‘left after 375m! 200m! 50m! LEFT!’ at the collegue behind the wheel, whilst banging heads against the roof.

is this considered team building? getting to old for this.

okay, it just crashed again, and this time I got:
[icode]tail -n 1000 /var/log/messages | grep mysql[/icode]


May 29 21:39:44 (none) kernel: mysqld invoked oom-killer: gfp_mask=0x10200da, order=0, oom_score_adj=0
May 29 21:39:44 (none) kernel: mysqld cpuset=/ mems_allowed=0
May 29 21:39:44 (none) kernel: CPU: 0 PID: 22025 Comm: mysqld Not tainted 3.18.5-x86_64-linode52 #1
May 29 21:39:44 (none) kernel: [14391]     0 14391     1048        5       7       33             0 mysqld_safe
May 29 21:39:44 (none) kernel: [21934]   105 21934   151610    13523      96     4306             0 mysqld
May 29 21:39:44 (none) kernel: mysqld: page allocation failure: order:2, mode:0x2000d0
May 29 21:39:44 (none) kernel: CPU: 0 PID: 21934 Comm: mysqld Not tainted 3.18.5-x86_64-linode52 #1
May 29 21:39:47 (none) kernel: [14391]     0 14391     1048       13       7       24             0 mysqld_safe
May 29 21:39:50 (none) kernel: [14391]     0 14391     1048       23       7       14             0 mysqld_safe

So mysql-server runs entirely out of memory, invoking the ‘oom-killer’.

At least that’s a place to start looking for solutions. I’ll try to fine tune my.cnf tomorrow, when I’m less braindead. If anything, I could upgrade the Linode node. We’re running on the cheapest node at the moment. It might be time to upgrade anyway. Obviously the root-cause of this may swamp a bigger memory pool similarly. We’ll see.

There. 13min later we’re running on a brand new VM :slight_smile:

Old trusty VM was hovering around 20MB free RAM just after launching MySQL. Now we have ~1GB headspace. Gotta get some sleep now!

Jesus christ, one round of applause for Riven.

Everyone: “EVERYTHING’S BROKEN”
5 min later
Riven: “Okay, migrated server, patched the broken SMF code, added some new features, repelled spam bots and hacking attempts, etc etc etc”
Everyone: “=O”

He’s like some sort of wizard or something…

:emo:

The MySQL database was corrupted, records were lost, table column definitions had lost their ‘default’ value, I actually had to reinstall MySQL from scratch, as critical metadata info (in the databases ‘information_schema’ and ‘mysql’) about tables was corrupted or had vanished completely. When starting the mysql service sys-log exploded with errors.

I tried to uninstall MySQL and reinstall it, but that retained the metadata and corrupt SMF table definitions. As a last resort, I uninstalled MySQL, dropped the datadir and turned to backups to rebuild the JGO database from there. :persecutioncomplex:

The last post is from ~10 hours ago, so quite a bit of today’s content has been lost, but at least we have write-access to the database again.

The root cause of all this instability is still a bit of a mystery, but ‘luckily’ it’s weekend, so I’ve got a ton of time to dig deeper.

Update: as JGO runs on a larger Linode VPS now, I have more diskspace to hold backups. For the kind souls mirroring http://java-gaming.org/recovery/ it’s important to know that the backup interval has been increased from once per day, to every 4 hours, increasing the size of the backups by factor 6.

As for why JGO was in limbo for such an eternity… the corrupted state of the database did not affect the homepage, causing Pingdom not to alert me of any problems… it was a busy day at work, and once again theagentd alerted me over Skype. It took me about 30 minutes to get home, and a horrifying full hour to get everything back up.

Now stop medal-slapping, I’m merely doing what I’m supposed to do :point:

[quote]Sorry, you can’t repeat a karma action without waiting 1 hours.
[/quote]

._.

Presumably you can’t sidegrade it to a proper database like Postgres or somesuch?

Cas :slight_smile:

I’m not too keen on rewriting hundreds of SQL queries. SMF queries really take the cake in being… awful.

Like GROUP BY clauses where the SELECT clause has varying dimensions for each resultset column. Whoever @ MySQL decided to support this madness: [icode]SELECT abc, def FROM tab GROUP BY abc[/icode] should be tickled to a hilarious death. Every proper db rejects this nonsense.

SMF is saturated with this query style, which means porting it to a proper SQL database will take eons. I heard something about a Postgres version with a MySQL compatibility mode, but given it was announced on some April 1st, I’m not holding my breath :slight_smile: