Baby, work today was a disaster….

Today we had a campus-wide “tabletop emergency response exercise” as part of the campus disaster planning thingie–I have no idea what they’re calling it–emergency ops? Whatev. Here are some pointers:

  • have a designated emergency operations center. This is where, if there’s a disaster, we would go, assuming it wasn’t wiped off the face of the earth. And, if it is wiped off the face of the earth, then the “grab bag” (see below) will allow you to recreate the ops center on the fly…
  create what I'm calling a "grab bag": this is the stuff you grab if there's an emergency: laptop

Another day, another outage

Today a car ran into something or other and took out the network for a large chunk of the state. I’m pleased to say that we handled this outage better than our last one: the emergency website went up, though not as quickly as I would have liked, and then we took it down as soon as we could. Campus communications went pretty well too. But there are always lessons learned. Here are ours:

  remember time zones! what may seem like an early morning outage to you was a peak day outage to others. You need to get your

IT disaster recovery / IT contingency planning

After last week’s outage, we’re revisiting our DR plan. It’s actually pretty good, and the scheme to activate our externally hosted alternate website for the school worked. However, there are two things to change. I noted them in my comments to Friday’s post, but I’ll make them a little more public here:

1. DNS TTL: our emergency site had a value of 30 minutes. This is too long. I think 5 minutes for the initial value, assuming we could change this on the fly if we felt the emergency site would be up for longer. We would like the real site