The science behind the comic
The Exploit
Chapter 10 ▾
An increasing number of objects are connected to the Internet; soon, it will not only be considered perfectly normal that printers, computers, and telephones are online, but also cars, refrigerators and many more. Nine times out of ten, the software running on the connected devices has security gaps.
“On average, there are one or two safety-critical vulnerabilities per 20,000 lines of code in a well-maintained software,” says Thorsten Holz, Principal Investigator of the ERC project BASTION. Made up of 40 to 50 million lines, the Windows operating system probably contains thousands of security gaps. A printer has as many as several hundred thousand lines of code.
However, sometimes it takes a while until security gaps are noticed and fixed by the manufacturers. This is where the methods developed by Thorsten Holz and his team are expected to help. They protect users from attacks even if security gaps had not yet been officially closed – regardless if the object in question is an Internet browser, a phone or a refrigerator.
Chapter 9 ▾
Today many companies are gathering data based on our online footprint: what we write on social media, what we buy online, how much time do we spend online, what pages we surf, where do we get our news, what kind of entertainment we prefer (music and movie platforms, etc). Still, there is no “global” company like Globex that can gather all the data in one avatar. But for example, on a smaller scale Facebook has a huge amount of data on us - whose profiles we engage with, how much time we spend on it, and everything we have said in a message. It uses this data to build up a picture of who we are, which it can then us to customise the News Feed and target adverts. Just one dataset - what we have ‘liked’, for example - can reveal a lot about a person on its own. The University of Cambridge's Psychometrics Centre has built a tool that uses ONLY likes, and no other data, to build up a picture of your personality. If you share with the Cambridge Research Team your digital footprint - just your likes on Facebook - they will return you a psychological profile. The tool is called Magic Sauce and, if you dare, you can find it here: www.applymagicsauce.com
Chapter 8 ▾
What is a port?
To get free wifi, Lane finds a network port in the first place. A port is a hardware as well as physical device, but in both cases, it is an endpoint of communication. Wireless connections are terminated at ports of hardware devices - the one that Lane finds. At the software level, a network port is a number that identifies one side of a connection between two computers. Computers use port numbers to determine to which process or application a message should be delivered. As network addresses are like street address, port numbers are like suite or room numbers.
Chapter 7 ▾
What about the U+202E trick?
As explained in this chapter, the unicode character U+202E is truly a hacker trick. As Lane says, this invisible character changes all subsequent text to be right-to-left. In short this character leads to not only to to disguise malicious files, but also to not understandable behaviour in any environment.
And this is how it works!
Chapter 6 ▾
An attack against an embedded system (or any other type of computer) typically consists of several phases: First, an adversary closely analyzes the system to understand its inner workings, the goal is to understand in detail the different components of the system and both the hardware and software. In the second step, the attacker searches for weaknesses, typically by “thinking outside of the box”. The goal is to violate some assumptions about the system’s design that enable a compromise of the system, for example by identifying a weakness that enables an unauthorized access to data. In our research, we typically focus on the software side and to understand in detail how the software is implemented. Our focus is thus on phase 3, where an attacker tries to understand both the code and the data that represent the software. We develop methods and tools that enable us to analyze this software on the so called binary level, which is basically the representation of 0s and 1s that a processor understands. For us humans, this machine language does not make much sense, hence we use a so called disassembler that translates machine language into assembly language such that we can understand the software. Our research focusses on ways to enhance this process and we typically work on this low-level representation because it enables us to understand the code and data actually executed by a system.
Chapter 5 ▾
Within the BASTION project, we analyze binary code for common processors like the Intel x86 architecture. An important aspect is that we go a step further by also including other instruction set architectures (ISAs) like ARM or MIPS in our analysis. Furthermore, we also include common ISAs used in small embedded systems to cover a broad range of systems in our analysis. To this end, we first lift the concrete ISA to a so called intermediate language (IL). You can think of an IL as a mechanism to abstract away from a concrete platform by using an abstract language that models general mechanisms of a processor such as mathematical operations, jumps from one code location to another one, or memory operations. The immediate languages enables us to develop analysis algorithms in a generic way such that we can use them for many types of (embedded) systems. This approach is motivated by the observation that devices in the Internet of Things are handicapped by the fact that neither the data formats nor the communication links are uniform. By developing new methods to analyze a given software system on the IL level, we abstract away from obstacles when performing such an analysis on the actual assembly level. This enables us to re-use analysis techniques by adapting them to ILs and thus significantly accelerating analysis tasks. Note that we do not necessarily aim for a fully automated approach (which may be infeasible in the general case anyways), but rather develop methods and tools to support a human analyst and guide the analysis process as good as possible. More specifically, the goal is to push the binary analysis techniques to a new level and support common ISAs used in embedded devices.
Chapter 4 ▾
A major challenge we need to tackle in the project is the complexity of modern computing systems. The firmware image and the complete software stack can be several megabytes in size, which translates to millions of instructions that closely interact with each other. We need to carefully analyze the interaction of code and data such that we can detect vulnerabilities in a precise and sound way. Our goal is to avoid false warnings whenever possible because typically a human analyst needs to review all alerts generated by an automated analysis step. In an iterative approach, we develop analysis systems and perform evaluations on real-world code to determine how accurate our analysis is. Of course we sometimes make mistakes, then we need to refine our approach and come up with better analysis techniques. The goal is to refine our system to perfection, but theoretical results show that this is not possible via a static analysis and we will always make mistakes.
Chapter 3 ▾
Within the BASTION project, we also perform several case studies to demonstrate how the developed methods help us to understand the code and data actually executed by an embedded system. One of the first case studies focussed on so called defeat devices used by automotive manufacturers. A defeat device is any type of device (e.g., a piece of software or hardware) that reduces the effectiveness of emissions controls under real-world driving conditions, with the goal of bypassing emission control tests during a certification process. We analyzed several real-world defeat devices deployed within engine control units (ECUs) of several automotive manufacturers to understand how these software-based defeat devices are actually implemented in practice. Our work enables us to understand the inner logic of the defeat devices and study them in detail. A high-level summary of our findings can be found at https://blog.acolyer.org/2017/06/20/how-they-did-it-an-analysis-of-emissions-defeat-devices-in-modern-automobiles
Chapter 2 ▾
An attack against an embedded system (or any other type of computer) typically consists of several phases: First, an adversary closely analyzes the system to understand its inner workings, the goal is to understand in detail the different components of the system and both the hardware and software. In the second step, the attacker searches for weaknesses, typically by “thinking outside of the box”. The goal is to violate some assumptions about the system’s design that enable a compromise of the system, for example by identifying a weakness that enables an unauthorized access to data. In our research, we typically focus on the software side and to understand in detail how the software is implemented. Our focus is thus on phase 3, where an attacker tries to understand both the code and the data that represent the software. We develop methods and tools that enable us to analyze this software on the so called binary level, which is basically the representation of 0s and 1s that a processor understands. For us humans, this machine language does not make much sense, hence we use a so called disassembler that translates machine language into assembly language such that we can understand the software. Our research focusses on ways to enhance this process and we typically work on this low-level representation because it enables us to understand the code and data actually executed by a system.
Chapter 1 ▾
The Exploit of BASTION, an ERC project led by Thorsten Holz
With the rise of the information society, we are surrounded by many kinds of embedded systems that collect and process data. Often these devices do not look like a typical computer, but the operation mode is similar: hardware components such as a processor, memory or sensors are tightly coupled with software components that control the device and implement the desired functionality. Unfortunately, such devices can be easily manipulated and the manipulations can be innocent or malicious - it often depends on the intent. As a result, computer security has become an increasingly important challenge for all of us: we need mechanisms that enable us to understand vulnerabilities of such systems and methods to protect them against attacks. Within the project BASTION, we focus on security challenges of embedded systems such that we can secure them against manipulation and fraud.