In late 2019 RiskRecon partnered with Cyentia Institute to examine the effect of cyber incidents on companies downstream of the initial attack. In part two of this blog series with the data scientists behind the report, Wade Baker and David Severksi, we discuss the analysis that came from the Advisen data and what next steps could come from this study.

Read part one of the blog here.Central-vs-downstream-orgs-blogRiskRecon: It took quite a bit of statistical analysis to come up with a workable model based on that data. Can you discuss that process and some of the challenges you faced?

David Severski: We know from personal experience, anecdotes, and being risk professionals and security professionals ourselves, that after a breach occurs, there is often a lag time in terms of when an organization detects it themselves, let alone when they go through the court system, through investigation and so on. tarting with that as a hypothesis, we worked with Advisen and confirmed that there's also delay between the time that an event actually occurs and getting enough information that it can be entered into an event feed.

From a modeling perspective, we said, okay, we want to say we see a trend go up, but we know that there is a delay going on in place. We looked at the dates of when the event actually occurred, versus the date that the actual event was first recorded into the feed.

We saw that there could be some substantial variations going on there. As systems become more accessible, that gap gets a little bit smaller over time, broadly speaking. But there is still a substantial gap out there. So we wanted to say, looking at those last 18 to 24 months' timeframe is there an actual trend there that we're just not seeing because of that lag?

We looked across multiple decades of data, though we focus on the past 10 years for this report. We then created a prediction model of the number of events that we have seen over the past 20 years. We looked for the 'goodness of fit' for that model to see how accurately it is predicting what we're seeing. And we got a highly accurate prediction model of what we expected to see in each year, and what we eventually saw out of the Advisen feed. We feel very confident that those last two years that we show--2018 and 2019—that they are directionally accurate. Now we're not saying we're going to get exactly this number of events. But it is directionally accurate to say we expect to see substantially more events coming out in 2018 and 2019.

RiskRecon: Can you talk a little bit about what ISN'T included in the Advisen data?

David Severski: There have been calls over the years for some sort of National Transportation Investigation Board for cyber breaches there. And there just isn't that kind of a central clearinghouse for breach information. So it really comes to firms like Advisen, and a few others, to really try to mine the publicly accessible information to see what they can find. And there are limits to that.

Now, Advisen is the most comprehensive data source I've seen in my years working in the industry. But I am sure there is the stuff that is not included in that. So if there is an event that disclosed information, but never makes it into the press, it is kind of handled quietly under the covers there, and there are no leaks about it, then it isn't going to be seen in a public breach data set such as Advisen. And then, in turn, will not be represented in this report.

Wade Baker: I think another weakness or limitation is the exhaustiveness about the size of the ripples. So, for example, we highlighted the Magecart example at the beginning of the report. In the Advisen data, there are 130 some odd ripples, but I'm seeing stories that say 800 or more companies were impacted in that event. So, we don't know exactly how far the ripples extend. But if I look at this event's data, it's clear that lots of other industries are impacted, and retail seems to be the biggest, et cetera. We can kind of make broad assumptions based on the data, but it's not perfect.

RiskRecon: What does this mean for measuring cyber ripples—what's still going unmeasured?

David Severski: There are definitely some things that are probably underrepresented. I think we are going to see strong correlations and things where there are strong regulatory incentives to track them. These are ones where there are legal investigations, or regulatory actions, that explicitly name other parties involved. HIPAA for instance, which has strong requirements upon knowing who your partners are and knowing who those partners are engaged with, are probably stronger.

When you get to different industries that don't have those connections, say perhaps manufacturing, those connections may be a bit more tenuous and there's less data there.

RiskRecon: As a data scientist what excites you about the work you're progressing toward with the ripples report? What's next as you try to find ways to look beyond the Advisen data set analysis?

David Severski: We really want to have a better understanding of what was the nature of these relationships between organizations, and what was the security posture of those organizations? We don't have good telemetry right now on what that looks like in terms of if an organization is doing business with another organization, what's the nature of that relationship? Is this a software as a service provider? Is it a service provider? Is it a data processor, et cetera?

We don't have information on those relationships right now, and we're really interested in marrying together something like the risk surface reports with this information to say, okay, how can we actually get to some better sense of trends that are happening out there that we didn't expect.

We're also curious to get more insight into the types of losses that are occurring. We didn't talk at all about the number of records involved in the ripple report, for instance, mostly because data is really hard to get reliable data. So that's not there right now. Also, as far as the types of losses that are involved, whether this is legal versus response costs versus lawsuits versus a loss of productivity. There's a little bit of that information in the Advisen dataset. We didn't feel that we could talk about that with confidence in this report, but it is definitely something we are interested in.

To read more about Cyber Ripples, check out the full report here.