Inventive Dingo forums

Mayhem Intergalactic => General Mayhem => Topic started by: Chris on February 21, 2009, 05:10:47 am

Title: Downtime log
Post by: Chris on February 21, 2009, 05:10:47 am
In the spirit of being as open as possible, here is a log of all the server downtime we've had that I'm aware of. Most people don't publish this stuff (in the interests of not making themselves look bad), so you're often left wondering why you couldn't access the server at X time. I prefer not to be left in the dark, myself, and perhaps some of you feel the same way. Hence, this official downtime thread.

Also, hopefully it's entertaining for you guys to read about my battles with tech gremlins. Embrace the schadenfreude! ;D

Original post:

If anyone was getting a 500 Internal Server Error when trying to access the website or the Internet games list during the past hour or so, it's fixed now. Dynamic (run-time) linking spontaneously decided it didn't want to play ball, causing complete failure of most of the programs on the server. A server reboot magically cured everything.

We (ato, who is a genius at Linux stuff, and myself) have no idea why this happened. :-[  Our best theories are some kind of memory corruption bug in Xen, which is normally really stable. Let's hope there's no repeat performance.

That's Xen as in the server virtualisation software, not Xen as in the alien planet from Half-Life. The downtime was not caused by a resonance cascade. As far as we know.

Title: Re: Brief downtime
Post by: Chris on March 06, 2009, 12:22:11 am
Server was partially down for the past 15 minutes or so, after the server spontaneously decided to remount its file system read-only. We're looking into it. At first glance it appears to be a Xen bug with a known workaround, so this should be preventable in future.

Title: Re: Brief downtime
Post by: Chris on March 07, 2009, 12:26:10 am
And again. This is getting tedious! I've put in a support ticket and expect to have the issue resolved soon.

Update: Resolved, for good. Linode customer support rocks.

Title: Re: Brief downtime
Post by: Chris on March 07, 2009, 04:59:28 am
Server was down for maintenance to fix the read-only-mount problem for a period of 47 minutes, about an hour ago. Hopefully the above issue should be fixed now.

Title: Re: Brief downtime
Post by: Chris on July 01, 2009, 02:53:53 am
The forums were broken recently thanks to a SMF bug. Fixed now, obviously.

The bug is that the forum settings file (which contains important information like database passwords) can sometimes be spontaneously wiped blank if a database error occurs. This has been a known bug in SMF since at least 2007. The SMF team calls it a PHP bug and claims that there's no foolproof way to work around it. While this is broadly true, in my opinion they really shouldn't be writing database error codes to Settings.php, which is the usual cause of this problem. (The other possible cause is two people changing admin settings simultaneously.)

I've restored the settings file from a backup, and set it read-only to prevent this from happening again.

Title: Re: Brief downtime
Post by: Chris on July 02, 2009, 02:33:09 am
OK, so I guess I kind of asked for that - about 12 hours after my last post, the server spontaneously went totally haywire, for reasons I have yet to discover. :-\  As a result, it wasn't accepting connections for some time. Weird.

Title: Re: Brief downtime
Post by: Chris on July 14, 2009, 07:34:02 am
Forums and news and a few other things were broken as of a few hours ago, thanks to the server running out of disk space. Resolved for now.

Title: Re: Brief downtime
Post by: Chris on September 01, 2009, 04:54:47 am
Site was down for about 60 hours just now. Apparently some random internet loser decided to prove their supreme lack of machismo, skill, ethics, brains, etc. by launching an unprovoked DDoS (Distributed Denial of Service) attack against my server. Hope it was good for you too, man.

If it happens again, I'm going to track him down to his (parent's) house and then let the dingo loose. ;D

Sorry for any inconvenience, all!

Title: Re: Brief downtime
Post by: Kumlekar on September 01, 2009, 08:23:02 pm
I hope you don't have to get inventive with him...

Title: Re: Brief downtime
Post by: Chris on September 02, 2009, 03:57:41 am

Mine is an inventive laugh.

Title: Re: Downtime log
Post by: Chris on October 04, 2009, 11:18:42 am
In case anyone noticed the series of short outages we've been having - we had a few more incidents of the server getting hacked and hijacked by nefarious malware, as the server was behind on its security updates (mea culpa!). Probably unrelated to the earlier DDoS, just random botnet activity. The first outbreak actually managed to suck over 200GB in bandwidth before I noticed. Yikes. Not quite sure what it was doing, but my main theories are participating in an outbound DDoS (retrospective karma, anyone?) or trying to break into more boxes.

ato has kindly rebuilt the server from scratch, copied over all the data from the old one, and instituted some new security measures to prevent a recurrence. Everything seems to be working and we'll be keeping our eyes peeled... actually, eww. Let me rephrase that: We'll be keeping our eyes open for breakages and more infestations, and do let me know if you spot anything broken and I'll jump on it ASAP.

Some of the backups we used to rebuild the server were slightly out of date, but I don't think we lost much. Except that eztrezet will have to re-register on the forums. Sorry eztrezet! Nothing personal! :-[

Title: Re: Downtime log
Post by: Chris on November 29, 2009, 05:06:39 am
Found our first broken something: Somewhere in the ongoing process of hardening the server against attack, the program that sends email was being blocked from running. Oops! This affected both the forums and the update downloading functionality. I've adjusted some security policies and it appears to be working now.

Title: Re: Downtime log
Post by: Chris on January 07, 2010, 09:58:51 am
Just upgraded the database server, causing a few minutes of downtime. Seems like it all went smoothly apart from that.

Oh, and happy new year everyone. ;D