Real Life Example of MTBF
You can learn more about MTBF in Chapter 16: Managing Security Operations page 678 in your Sybex 7th Edition, or Chapter 7: Security Operations page 971 in Shon Harris AIO 7th Edition.
Mean Time Between Failure
The amount of time before a device is expected to fail. This was amazing to me, that a hardware vendor can actually calculate the amount of time that will pass before their device is expected to fail, on average.
For example, a company that makes servers can run 1000 servers continuously to see how long it takes before they fail. They all won't fail at the same time, so the vendor provides an average, a mean time between failures.
Well here is a crazy thing that happened a few days ago.
It was 2:30am on a Wednesday when a Cisco ASA acting as a VPN concentrator for a healthcare network suddenly rebooted itself, and was stuck at boot. If you're a network security engineer, and this happened to a firewall in a live production environment, you know the situation was dire. Don't even ask why this firewall wasn't high-availability, it was a standalone.
I was on-call and was woken up at 3:00am to assist with the situation. I couldn't SSH to the device, but thankfully we always implement an out-of-band solution, so we were able to login that way, upholding availability.
The image on the firewall was corrupted, it was stuck in something called ROMMON, the ASA's version of boot. We had to call someone who lived an hour away, at 4:00am, to drive to the data center and TFTP an uncorrupted image.
We sent a core dump file to Cisco for further investigation and to figure out the root cause of the incident. Needless to say, the customer was pissed and wanted answers.
I thought it was strange when Cisco asked us how long the firewall had been up before it suddenly rebooted?
We said exactly 213 days.
Then they sent us this link: https://www.cisco.com/c/en/us/support/docs/field-notices/642/fn64291.html
Apparently, some versions of Cisco ASA are expected to stop working after exactly 213 days!
NOW THAT IS A TRUE MEAN TIME BETWEEN FAILURE!
So accurate, so precise.
Thanks for reading.