Working at a SOC (security operations center) has times which are excruciatingly slow. I sit ready for any sort of troubleshooting tickets to come in like the charcoal of a grill yearning for more lighter fluid. Then other times, it is just furiously busy with customer connectivity issues coming in to be resolved as if they are iron being pulled by the SOC like a magnet.
But other times, tickets come in that are just unusual or unexplained. There is actually a complete section in Shon Harris 7th Edition in Domain 7: Security Operations, Operational Responsibilities in a section called "Unusual or Unexplained Occurrences.".
This one particular ticket came in which dealt with a customer's site-to-site VPN setup. This VPN setup involved traffic between two sites which were managed by the same Security Management Server. Meaning, there are two firewalls: one in the West Coast and one in the East Coast, but they are both "managed" by the same server. All this information is not important for this article, it's just some background pertaining to Checkpoint firewalls.
Anyway, the customer was stating that their VPN traffic was not working. They could not see any encrypted packets leave the network, nor could they see any decrypted packets being received by the peer network. Not only that, the VPN tunnel itself was not coming up; there was no Phase 1 or Phase 2 (initial components of a site-to-site VPN).
First thing was first, we confirmed all VPN settings to make sure Phase 1 and Phase 2 settings were configured precisely per requirements. If you are studying cryptography in your CISSP books, learning about VPN setups are an excellent example of real-world application. For VPNs, we have to pick an encryption algorithm and a hashing algorithm. Both these work to provide confidentiality (encryption) and integrity (hashing). Below we chose the strongest algorithms for Phase 1 (AES256/SHA1) and a lower level encryption algorithm for Phase 2 with the same strength hashing algorithm (AES128/SHA1).
Why did we pick a lower level encryption value for Phase 2 and a stronger value for Phase 1? Because the customer would like the first phase which requires the most security for data in motion to have the strongest algorithm. Then once a strong and impenetrable secure VPN tunnel is established, the actual data being encrypted can use a lower level encryption value. Remember, encryption is heavy on processing on the firewall, so using a combination of stronger/weaker algorithms helps to use the CPU wisely. Or, maybe this customer didn't care about processing and just picked settings they thought fit. We always suggest to use the strongest encryption value for both phases, but have to go with what the customer wants.
Then we made sure the correct gateways were being used in the VPN Community. What this means is that the networks behind each firewall gateway in the community are the only ones which will have their traffic encrypted.
Then after checking the VPN parameters, we made sure there was actually a rule on the firewall that allowed the traffic! You'd be amazed at how some troubleshooting issues I encounter just needs a simple rule. The customer states something is not working, but in actuality there was never a rule in the first place to make it work! Those are the easy ones.
As seen above, Rule #3 was the VPN rule. Which also means that was at the top of the firewall rulebase. In all, everything looked fine from our point of view.
We asked the customer to test their traffic once again so we may filter, capture, and observe any VPN traffic.
Nothing. No VPN traffic. No Phase 1 VPN coming up either. Something was weird. Now, time is money and money is customer issues being resolved quickly. I start getting anxious when a resolution is not being reached quickly, or an issue is at least identified. At this point, I felt a sense of urgency and pressure closing in around me like a sack being drawn shut with a rope.
Had an idea, a very simple one, nothing clever. Instead of just filtering for VPN traffic, I decided to also filter for any regular plaintext traffic - traffic that was not going through a VPN.
Wouldn't you know it, we saw the traffic leaving the firewall in plaintext. We saw that the traffic was being allowed to leave the firewall per a plaintext rule already on the firewall, way down at the bottom of the rulebase at Rule #14:
But the thing is, the VPN rule was higher in the firewall rulebase, which means the firewall should have first registered the source IP address in this group of IPs to first hit the VPN rule, and not the plaintext traffic rule at the bottom. The first rule is the one the firewall will use, since its higher up on the firewall rulebase. A firewall reads rules from top to bottom as far as order of operations. The first rule the firewall sees matching the source IP is the one that it will register in its processing. That’s a core firewall concept.
But at the same time, why did the customer have a plaintext traffic rule when they also wanted a VPN rule? Unfortunately, these are questions we don’t ask the customer, because they know best. If they want a plaintext rule, we first advise them of the security factors, and then do what they say.
But luckily this customer was pretty chill.
So I had an idea.
We asked the customer if we could just temporarily disable the plaintext rule, and see if the firewall starts to then register the VPN rule, even though the VPN rule is above the plaintext rule.
The customer agreed.
Sure enough, the VPN traffic started working.
VPN Rule #3 started registering hits, traffic was being encrypted, and leaving the firewall after Phase 1 and Phase 2 completion.
No idea how that happened. But you know what? Not going to question it. The customer was actually like “keep the plaintext rule disabled and keep the VPN rule”.
At this point you could ask the customer why they had a plaintext rule in the first place?! But that’s not professional. We don’t want to question what the customer wants, they are paying us, we are not paying them. We don’t want to embarrass them; we have to make sure they save face in every situation. We can advise, but we can’t order them to do anything. They are the customer. Any aggression toward a customer that dared to take flight would only crash back down - the gravity of customer satisfaction and corporate profits is too strong.
I follow the same principle with Internet comments. Somebody says a negative comment toward me, I just ignore them, I don’t say anything. Because what’s the point? And who am I really arguing with? An invisible aggressor? Somebody I don’t know? Achieving technical understanding and knowledge awakening requires a force greater than scrolling with a mouse wheel. Books are what we use to obtain the intellectual weapons necessary for disagreement or acquiescence.
As an added note, there is great admiration for Shon Harris (RIP) for putting this small subsection into her book about "Unusual or Unexplained Occurrences" - it proves she has actual technical experience in the field. I am sure with a core dump and analysis or a deeper level investigation we would find out why the plaintext rule was being triggered before the VPN rule, but tickets were filling up in the engineer's queue like efforts of a multitude of CPUs working in parallel - their combined output resulting in my job security and steady salary.
Thanks for reading.
CISSP Take-Away Concepts
Domain 1: Security and Risk Management
Engaging the customer directly over the phone
Containing the risk of a dissatisfied customer, the customer was called directly to resolve an issue. This saves time over updating a ticket or sending email. It also decreases the time of any other risk such as the plaintext traffic rule existing on the firewall rulebase.
Domain 4: Network Security
Besides just a firewall policy, just having a firewall is one of the strongest network security devices we can have in a network - whether a simple network with a few hosts or a giant corporation with multiples networks.
East Coast and West Coast Firewall
Although controlled by the same management server, each firewall was located on opposite ends of America's coasts. This is a form of high-availability in case a disaster hits one of the sites. The odds of the same or different types of disasters hitting both data centers at the same time on different parts of the country was a risk management was willing to accept.
Domain 7: Security Operations
All changes should follow a change management process. By disabling the plaintext rule on a live conference call without going through a change process, this was going against proper change management to the firewall. However, a few years of experience and mutual trust with the customer (who owns the firewall) was a risk this engineer was willing to take. Although if something did go wrong, it would not be excusable and management would ask why change management was not being followed.