When cybersecurity becomes terrifying
Some cybersecurity horror stories are not your typical horror stories: there’s no danger from a chainsaw-wielding maniac hiding behind a server rack, the Candyman won’t appear if you say his name three times while staring at your 4K monitor, and it’s not like a vampire or werewolf can bite into a firewall.
Instead, the cybersecurity horror stories recounted here are tales that result in… (dramatic pause) …bad customer experiences.
The names of the actors have been removed to protect the innocent, but the horror… yes, the horror was very real. Fortunately, these tales serve as a learning experience for the rest of us.
A ghost in the machine?
There was once a company that built beautiful dashboards. Giant, sprawling things that monitored things like latency, packet jitter, and a whole slew of low-level network resources and processes. The security and reliability team loved their dashboards and all the data it provided to them.
One day, however, as the security and reliability team gazed at their wall of dashboards showing everything was normal, a support engineer walked in and informed them that the platform was down.
It turned out that despite all their dashboarding efforts, the company failed to monitor whether users could log into service. This created a situation where the reliability team didn’t know that the platform was down until support told them. Support only knew because a customer called and told them.
As the reality of the situation began to dawn on the reliability team, one team member gestured quizzically to the wall of screens displaying systemic harmony and, stunned, asked, “Is there something wrong with the packet jitter?”
Monitor with intent
To solve this issue, the security and reliability teams realized that they could detect platform access issues by tracking user logins. They even took this solution a step further and created synthetic user monitoring by having an application automatically attempt to login every five minutes from several, geographically distinct locations. This also provided insight into site reachability. This allowed them to be proactive and detect issues before customers did.
This company used time series data (login metrics) to see the relationship between the number of login connections and the amount of back-end data connections consuming CPU resources being used relative to time. The combined data from user logins and synthetic monitoring revealed that the problem was… congestion. Too many users were trying to login first thing in the morning, which overwhelmed the system.
Fortunately, they were able to increase the number of database connections to accommodate the increased service demands. They also learned a valuable lesson about monitoring: Don’t try to monitor everything. Instead, understand what you’re monitoring and what the purpose of monitoring that process is. The goal should be to surface the necessary insights to the people who need that information to take appropriate action.
Abnormal IP addresses
Having the ability to detect abnormal behavior can be a lifesaver. One prominent SaaS development platform found this out the hard way when hacked accounts went undetected for months and code from compromised repositories was leaked. The hack was traced to two IP addresses on the other side of the world that connected to thousands of accounts on the SaaS platform. No doubt this was a nightmare for everyone involved.
Tracking behavior
Luckily, you can prevent the same thing from happening to you by using behavioral modeling. Behavior modeling is a time series question because it involves tracking events over time. By modeling the behavior of a user, you can determine when they login, on what device, and where in the world they are located.
Tracking this data over time reveals normal usage patterns for users and organizations. You can also use this data to construct mathematical models of normal use data and then look for outliers.
In the example, the SaaS company could have monitored each account and organization, as well as how often clone operations occurred and where those tasks originated from.
Every SaaS consists of a set of characteristics that define its service. Any abnormal behavior in relation to that characteristic set becomes an instance. The monitoring team can then decide what the instance threshold is. For example, if three different types of instances occur in a short window then there’s a high probability that something is wrong.
As a bonus, understanding customer behavior can help companies provide better service to their customers. For instance, notifying the customer that they’re about to hit some sort of usage limit.
The point is that, from a cybersecurity perspective, time series data can uncover a wide range of problems, issues, and phenomena, you just need to look for them. For example, thousands of logins and code scrapes, across multiple accounts and organizations, coming from two IP addresses.
It can be challenging for security and reliability teams to stay ahead of nefarious actors, anticipate the limitations of their own infrastructure, or predict what their customers are going to do. Failure to think about and plan for these things can end in disaster. But viewing these same challenges through the lens of time series data reveals a range of solutions.