It was a chance to be immersed in multiple real-world CISSP topics - I volunteered immediately. My mentality is one in which I largely think only about the future while existing in the present - the past does not hold excitement for me. I am not interested in what could have been or what I could have done, at some level, I am not even capable of grasping these issues.
But it always seemed astonishing that whatever eagerness I approached towards work would always run into some escalating mishap. It had reached a point where each troubleshooting call would bring about some irksome problem that took up more life force than expected. It makes me wonder - with worried enthusiasm - if there would be a day in the future where I look back at this period in my career with longing, wishing for these simpler times.
A Priority Level 4 ticket was opened by a customer's Service Delivery Manager (SDM), and it stated the following:
"We require a maintenance window from 2 a.m. EST to 6 a.m. EST on Sunday morning to perform pen testing of our servers in order to fulfill our PCI DSS Compliance, Section 11.3 Requirements. During this time we will need the IP addresses of the pen testing team to be allowed through the firewall and then removed after the maintenance window. Please have two senior SOC firewall engineers attend this conference call in order to assist the pen testing team."
In short, they wanted to pen test some servers because it is part of their PCI DSS compliance requirement. These were topics from at least two CISSP domains, network security (Domain 4) and risk management (Domain 1).
From penetration testing, firewalls, PCI DSS compliance, network traffic monitoring, access control lists, documentation...it was too tempting to pass up, even if it meant working the nightshift on a weekend. Sacrificing a good night's sleep would be worth the years of knowledge gained from this experience. From my point of view, not only was I getting paid for my job, but I'm also taking advantage of being presented with deep security learning opportunities that others would give anything to have as a career. It's tough to get a security job, but once you're in, the knowledge gained is priceless. You literally can't put a price on it. Over the course of a few years, you can then leverage your security knowledge for a higher paying job or opportunities you don't even know exist yet. Trust me, it's worth it to get your CISSP.
Anyway, I asked my other senior engineer colleague if she was down to attend it with me, as both of us make a pretty deadly combination when working together. We do our jobs true to the meaning of "security professional" with deadly precision and timing. Our manager never has to ask us for an update on a project or task, and it is always resolved to completion.
She volunteered right away, because she knew it would be a great learning opportunity as well. As firewall engineers, we barely get to see how pen testing goes down, as it is not part of our job scope. So even if it meant that all we have to do is open up traffic to some IPs and then later remove them from the firewall rulebase, the opportunity to see traffic traverse in and out of the network, and see how the pen testers went about their operation, this was a giftwrapped information security learning opportunity with a bow on top. We let our manager know that we wanted to volunteer for the night shift maintenance window.
Our manager responded "Great! Now I don't have to "force" you two to volunteer".
Details of the Maintenance Window
1:45 a.m. All parties to convene to discuss a high-level view of who does what and when during the pen testing window
2:00 a.m. SOC to place ACL on firewall rulebase permitting pen testing team's public IP addresses
5:00 a.m. Pen testing team to conclude their activities
6:00 a.m. Post-maintenance window issues to be addressed and completed
Simple enough. At 2:00 a.m., we put a rule on the firewall that looked like this:
Source: <5 public IP addresses of the pen testing team>
The ACL says to allow 5 specific IP addresses to any server using any service through the firewall.
Now we just had to let the pen testing team do their thing while we observed the network traffic to get a clearer picture of what was going on.
Scope of Penetration Test
Just to note: our customer was not testing their firewall, just their servers. They didn't give permission to the pen testers to compromise the firewall, just to get to the servers. The information that relates to PCI DSS resides on the servers, not the firewall. For this reason, the scope of the pen testers and prior agreement with their organization is to just pen test the data servers, on a weekend, late at night - three factors that create the perfect time to perform maintenance with the least impact to users.
First, the testing team made several HTTPS (443) connections, followed RDP (3389), SSH (22), and Telnet (23). Second, they tried multiple ways to authenticate into the servers via admin credentials. Third, once access was gained, they checked to see if the private data was viewable or if it had any encryption controls in place (data at rest). We were able to determine whether they got access to the servers or not by performing a tcpdump and looking to see if the TCP handshake was fully completed. If so, they were able to gain access, or at least be faced with the login screen. If they weren't, then we saw a reset flag.
Watching what was going on in the network, I saw it as something else, different from our daily network traffic and behavior. It was as if there were ripples of something I could not even pretend to understand without any prior pen testing experience. The testers moved with shark-like stealth beneath the surface of the packet captures displayed on our monitors. The network was an amorphous soup of protocols, ports, and bandwidth usage. Simply amazing.
At around 4:45 a.m. we received a notification that the testing had concluded and that we were to delete the access rule. My colleague proceeded to delete the rule and push policy.
We both took off our headsets and briefly glanced at each other as if to silently say "Great work!".
Revelation 1 - Any/Any Rule Still Allowing Test Traffic
Then came another phone call.
"Hey SOC, did you guys remove the firewall rule?" the British accent said on the other side of the phone.
"Yeah we did, the policy just pushed through a few minutes ago too" I said, trying not to sound like this issue particularly concerned us anymore. But I was still on edge, as always.
There was a pause. "Well mate, we just tested to make sure and we're able to reach the servers still."
I put the phone on mute and leaned over to my colleague and asked her "Policy to delete the rule went through, right?"
She verified that policy pushed on the firewall GUI. "Yeah! Let me also check the CLI..."
She typed in "fw stat" on the Checkpoint firewall and confirmed the policy was pushed from the policy manager at 5:10 a.m.
We exchanged two quick professionals' nods and I unmuted the phone, "Yeah, we can confirm that policy has been pushed and the rule has been deleted from the firewall. Can you test again?"
"We just restarted our machines just in case and tried an HTTPs connection to one of the servers again, we still seem to be able to connect. Can you see anything in the traffic logs? Maybe the traffic is bypassing the firewall altogether?"
I digested this knowledge for a few moments before emitting it out as a small, yet troublesome, cud of certainty. Something was definitely wrong. From 4 years of network security experience, I just knew things were shaping up to be a complex and royal mess.
"Acknowledged. We're going to check the traffic logs and see what they say, please test again and we'll monitor."
This time, to monitor traffic, we wanted to see not only the connection hitting the firewall, but also what rule # was allowing the connection - so we couldn't just do a direct packet capture from the command line using tcpdump or fw monitor. We had to go to Checkpoint's traffic capturing software called "SmartView Tracker" where we got a lot more details about the connection.
Sure enough, the IP addresses were being allowed, and it was being allowed by a rule on the firewall too! And then, what I really hoped would not be the case of course materialized into reality: there was another rule on the firewall with "Source: Any" and "Destination: Any".
My good feelings about the work we did so far turned into a life-size origami toy, like an enfolded maze that could be easily crushed. I already was predicting the future of this issue: an investigation, root cause analysis report, escalation to managers, and the best part, figuring out who to blame! If I still smoked and could take a drag of a cigarette, I would have.
In case anyone needed to know, an any/any rule on a firewall is NOT good. This kind of rule would be allowing everything from the Internet access to the servers behind the firewall. Or allowing anything internally behind the firewall outbound to the Internet. It's as if there is no firewall in place at all. Firewalls are supposed to come out of the box with a straight up implicit deny everything rule, then additional explicit allow rules were to be added as needed (need to know/least privilege).
Now we had to make the tense call to the customer's Service Delivery Manager and tell him that the firewall has an any/any rule that is still allowing the pen testing team access (and who knows what else!). The SDM contacted the customer who was, rightfully, outraged. Of course, the first question was "Who put an any/any rule on our firewall?!". Just to answer this question will require at least a few hours to look back at change request logs and firewall rule comments to answer. But right now, we needed management to make a decision: should we disable the any/any rule or leave it?
The SDM called us back and said the customer wants the any/any rule disabled since they consider it a critical security vulnerability and will be asking for a root cause analysis from us later on.
The technical aspect of disabling the rule was in my scope of responsibility, and writing or sending the RCA is my manager's deal - so I was just going to do my part. I was going to disable the any/any rule. But, it's always a tense time when disabling firewall rules. You don't know what kind of other types of access it is going to stop.
We disabled the any/any rule and pushed the firewall policy.
Revelation 2 - Any/Any Rule Blocking Legitimate Traffic
The good news was that the pen testing team confirmed that they could not access the servers anymore.
However, six minutes later all hell broke loose. Phones were ringing. Skype chats started pouring in from service desk representatives with multiple users yelling about how they are unable to HTTPs or SSH to servers. Priority 1 tickets came flying into the ticket queue.
We knew instantly what was happening: the any/any rule was allowing legitimate traffic as well. About 60% of the company's users could no longer access resources behind the firewall because we disabled that any/any rule.
Now we were really looking for a major decision: re-enable the any/any rule to allow for legitimate access along with any potential malicious access? Or keep the rule disabled and just deny all traffic until we figure something out?
Thinking like a manager, the answer in this situation was of course to re-enable the any/any rule again so users can actually work. We found out that is exactly what management decided after we received the call from a frantic SDM compelling us to enable the rule again.
We enabled the rule and the user complaints subsided. Right now, it was 8 a.m. on Sunday. My colleague and I knew what was coming and called our manager to make him aware. This is how he answered the phone:
I feigned a chuckle, "Hey uhhh...so the maintenance went well, that's not an issue (trying to add some good news), but we found an any/any rule on the firewall. It was allowing access for the pen testing team even after we disabled our own rule, but then we saw their traffic still coming in and hitting that any/any rule. Then when we disabled that, their users started complaining they can't access their servers anymore. So the customer had us enable it back."
Our manager was a professional, so he knew what was coming. "So basically their firewall has a rule on it that is allowing any type of connection from anywhere?"
After a quick sigh, "Well, they're going to want a conference call, I'll join it. They're going to want a root cause analysis on why there is an any/any rule on the firewall. Can you two start doing your investigation and find out if the customer requested that rule at some point or if we are the ones that messed up somehow? I need documented evidence to show them regardless of whose fault it is. Remember, make sure everything is documented with timestamps. If it isn't written down, it officially didn't happen."
"Yeah, no problem, we'll email you what we find."
Revelation 3 - Our Fault
Now it was my turn to sigh. This was turning into a 15-hour shift from night to day. But that's okay, that's why I signed up for network security, that's why I chose to be a security professional. When things like this go down, we have to finish it to an acceptable level, and right now that meant gathering the evidence right as the incident was still fresh.
There is no going home at times like this. There is no saying "Oh, my shift is over, this isn't my problem anymore." There is no handing it over to another team member. We stay until there is a sense of things being completed and the customer is satisfied, then we can go home.
Some security professionals may opt to leave the situation as their shift ends. Some may say that there is nothing else they can do right now. Some may say that they've worked enough and are not getting paid to stay this long. These are the ones we remember later on. We remember as the ones who got going when the going got tough.
Going through tough situations with your fellow security professionals, even if it just means to hold your presence without running away from the situation, it builds a bond that lasts forever.
You remember those who went through hardships with you.
After 6 hours of investigation, we found out it was our fault. Totally our fault.
It was discovered that the customer had updated to a newer version of the firewall hardware two months ago. They went from a Checkpoint 2200 hardware appliance to a VSEC (virtual) firewall. The interface, routes, and firewall rule base had to be manually copied from the old firewall to the new firewall. Meaning engineers had to manually enter or copy paste configuration items to the new firewall.
During this time, some testing needed to be done and the junior engineer on duty decided to put an any/any rule to make troubleshooting easier. But the engineer also forgot to disable the rule or delete it altogether. A critical mistake. And as you know from our CISSP studies, internal user mistakes are the BIGGEST threat to an organization, this blog post proves this point.
For the customer's environment, when new servers and workstations were added to the network, the users stated they could access the servers right away, never thinking that they also didn't request a firewall rule. And why would they? As long as they can connect, users don't care how it happens.
Luckily, we don't think there were any compromises to their network or data, although looking back at the logs, there were an unusually high number of passive scanning attacks (port scan, pings, Telnet connections).
The junior engineer's mistake reminded me that there were times at the Security Operations Center that I and some of the senior engineers stressed security best practices at all times, even during times of troubleshooting tests. But alas, they had the weapon of youth on their side and sometimes that harbored a know-it-all feeling.
I couldn't blame them; I was the same way - full of bravado at the age of 22. The action of telling them to adhere to our view of security was like telling school children to pay attention in class, but they rather dreamily just looked out the window with their intellectually attractive ideas instead of the dull curriculum we insisted on dispensing.
Damage Control for Customer
As a result of all this, we had to provide a root cause analysis as well as a plan on how this will never happen again. Our manager provided the customer the following steps to improve our services:
Nobody is allowed to ever put an any/any rule on the firewall without explicit permission from the customer
Nobody is ever allowed to put an any/any rule on the firewall for testing purposes ever again. It is best practice to use specific IPs to narrow security vulnerabilities
Firewall upgrades will require an engineer and a senior engineer. This is a form of making sure one engineer is doing something correctly. It is also a form of separation of duties, as the regular engineer will complete configuring a new firewall, and the senior engineer will look it over before calling it official
The SOC will also look over customer firewall rules every 3 months with a fine-tooth comb and look for any discrepancies (like any/any rules)
For this specific customer, a senior engineer will have to go back each time a junior engineer completes a change and verifies their work. It is an added bit of work to an already complex process, but that is the price we have to pay for messing up the customer's rulebase in the first place
My brain hurt. I was exhausted. This security thing, it's not like we were just sitting in our chair for 15 hours doing nothing or monitoring a ticket queue. We were actively using our brains to try to resolve complex issues, engaging with the other teams, and making sure we made ourselves and our company look good in the process. There were a lot of elements that had to be balanced to make sure things did not spiral out of control.
My colleague and I went home at 1 P.M. Sunday, about 15 hours after we started our shift.
I dread calculating the health cost of missing all the sleep from the sheer number of years which had been consumed by long work shifts as a CISSP.
CISSP Take-Away Concepts
Domain 1: Security and Risk Management
Maintenance window for the penetration test was conducted at a time with the least risk, in the middle of the night on a weekend
Management made the decision to delete the any/any rule to block out the pen testers
Management decided to then enable the any/any rule back to allow unexpected legitimate users to continue to work (even on a weekend)
Management asked for a root cause analysis from our team and steps on how it will not happen again, or at least how the risk can be reduced to an acceptable level
PCI DSS is a standard that is to be applied to any company that processes, stores, and transmits credit card data
Domain 2: Asset Security
The security of the assets (servers with data that need to comply with PCI DSS)
Domain 4: Network Security
A firewall rule had to be created, then disabled, then enabled again
tcpdump packet capture utility was used to observe TCP three-way handshake
After this incident, the entire firewall rulebase had to be reviewed stringently
Domain 6: Security Assessment and Testing
The penetration testing team had their scope, their permission to conduct the test, and their access all provisioned before they began their operation. They were testing the vulnerabilities of servers as part of PCI DSS Requirement 13.1 (states to conduct pen tests of an organization's environment)
Testing was done in manageable and controlled environment that strived to reduce impact
Domain 7: Security Operations
Change management was used for adding IP addresses and scheduling a maintenance window for the testing
Unfortunately, a change management process was not used when the engineer put an any/any rule on the firewall. Had there been proper change management, perhaps the engineer approving the change request for an any/any rule would have denied it, or reminded the junior engineer to remove it after it was done.