Improving System Security with Big Data Techniques

April 11, 2013

Tudor Dumitras


Improving System Security with Big Data Techniques

Time:   11:00am
Location:   Meeting room 302 (Mountain View), level 3

Because computer systems operate in an ecosystem of users, attackers and inter-dependent software, their security depends on factors that are often specific to the deployment environments. Many security technologies are based on long-held assumptions about these factors. For example, we assume that software systems inevitably include security vulnerabilities, which may be exploited by cyber attackers, and that we can patch the vulnerable hosts before these attacks occur. However, we do not know which vulnerabilities are ultimately exploited in the field and for how long the end-hosts are susceptible to these exploits. To improve the security of systems in active use, we must understand the factors that drive the failures and vulnerabilities of software in the field.

In this talk, I will present two empirical studies that provide fresh insights into these problems and that suggest new opportunities for improving system security. The first study shows that zero-day attacks, which exploit vulnerabilities before their public disclosure, go on undetected for 312 days (approximately 10 months) on average. The duration of zero-day attacks had remained an open question for more than a decade because, in general, data is not collected until after the attack is discovered and because zero-day attacks are rare events that are unlikely to be observed in honeypots or in lab experiments. Additionally, I show that zero-day attacks are more common than previously though: 60% of the vulnerabilities identified in the study were not known to have been used in zero-day attacks. The second study shows that the fraction of vulnerabilities that are actually exploited in the field has been steadily decreasing over the past ten years. Moreover, alternative approaches to patching vulnerabilities, such as attack-surface reduction, have limited effectiveness because attack surfaces vary from one host to another and they cannot be reduced indefinitely without rendering the hosts inoperable.

These results derive from field data collected on 11 million hosts over a period of 3 years. I will also describe the Worldwide Intelligence Network Environment (WINE), the data analytics platform that enabled these studies. By sampling and aggregating up to 19 billion telemetry reports per day, WINE provides representative data for analyzing the past and present cyber-threat landscapes. WINE also allows security researchers to conduct experiments at scale and archives the raw data used in each experiment, for reproducibility.

The empirical results from WINE suggest that we must rethink our current security models, which guide public policy and the design of security technologies. For example, we should focus on accelerating the deployment of security patches (and not just their creation) through efficient mechanisms for online software upgrade. Additionally, these results illustrate the opportunities for improving system security in actively used systems by creating Internet-wide models, derived empirically and updated frequently, for the failures and vulnerabilities of software in the field.