top of page

Stories of a CISSP: Mean Time Between Failure


You can learn more about MTBF in Chapter 16: Managing Security Operations page 678 in your Sybex 7th Edition, or Chapter 7: Security Operations page 971 in Shon Harris AIO 7th Edition.

Mean Time Between Failure

The amount of time before a device is expected to fail. This was amazing to me, that a hardware vendor can actually calculate the amount of time that will pass before their device is expected to fail, on average.

For example, a company that makes servers can run 1000 servers continuously to see how long it takes before they fail. They all won't fail at the same time, so the vendor provides an average, a mean time between failures.

Well here is a crazy thing that happened a few days ago.

213 Days

It was 2:30am on a Wednesday when a Cisco ASA acting as a VPN concentrator for a healthcare network suddenly rebooted itself, and was stuck at boot. If you're a network security engineer, and this happened to a firewall in a live production environment, you know the situation was dire. Don't even ask why this firewall wasn't high-availability, it was a standalone.

I was on-call and was woken up at 3:00am to assist with the situation. I couldn't SSH to the device, but thankfully we always implement an out-of-band solution, so we were able to login that way, upholding availability.

The image on the firewall was corrupted, it was stuck in something called ROMMON, the ASA's version of boot. We had to call someone who lived an hour away, at 4:00am, to drive to the data center and TFTP an uncorrupted image.

We sent a core dump file to Cisco for further investigation and to figure out the root cause of the incident. Needless to say, the customer was pissed and wanted answers.

I thought it was strange when Cisco asked us how long the firewall had been up before it suddenly rebooted?

We said exactly 213 days.

Apparently, some versions of Cisco ASA are expected to stop working after exactly 213 days!

NOW THAT IS A TRUE MEAN TIME BETWEEN FAILURE!

So accurate, so precise.

Thanks for reading.


CISSP Take-Away Concepts


Domain 1: Risk Management


Domain 4: Network Security

  • This is just about as technical as it gets. Firewalls, images, reboots, version upgrades, VPN concentrator - all in the day of a life of a network security engineer working SOC operations

POPULAR
future.jpg
bottom of page