Archive for July, 2009

Virtualization and Network-Security

Thursday, July 30th, 2009

Virtualization of physical network controllers creates another level of flexibility. Without virtualization a packet has to leave the physical host if one host wants to communicate with an other host. Physical components, like switches, connect hosts with each other. Within a virtualized environment no physical component is needed for two communicating hosts if they are located upon the same VMM. Packets don’t have to leave the physical host. A virtual component located within the VMM connects the virtual machines and their virtual network interface controllers. The “Virtual Switch” behaves like a physical network component, but it sends packets to virtual machines directly instead of pushing them into the physical network. Such a component can act as Switch or as Hub. In a physical network Switches are primarily used today in a LAN. With a virtual network component acting as Hub an attacker could sniff network traffic much easier, because a Hub broadcasts packets to every host connected to it. Chris Wolf shows that sniffing traffic of VMs from an other VM is often easy, because some VMMs implement such broadcasting virtual network components.

A company’s network is typically subdivided into physically isolated segments each requiring different levels of security. The virtual network component of a VMM typically provides routing functionality as well - and it’s able to create multiple network segments consisting of virtual machines upon a single physical host. Therefore, a company accomplishing a server consolidation using virtualization potentially changes its physical network topology dramatically. Hosts, usually located within different segments can be relocated upon the same physical host. Physical Isolation is decreased or lost, because we are using the same hardware for different hosts. Therefore, packets of different segments flow through the same hardware and they occur within a single physical network. An attacker escaping from a virtual machine is no longer physically isolated. Segments or DMZs with huge security needs aren’t protected on the same level any more.

The virtual network component doesn’t have to be part of the VMM-core, because it won’t have to communicate with physical hardware directly. The component can communicate with the physical hardware through the VMM. Therefore, the complexity of a VMM’s core doesn’t have to increase very much, but the complexity of the virtual network component can be very high, depending on its functionality, because it potentially provides routing and switching. In addition, the component has to be part of every VM’s Trusted Computing Base, whose complexity will therfore increase. All VMs use one common network stack: that’s conflicting with the common security principle of “least privilege”.

Traffic within a company’s network is usually analyzed and filtered by physical network components for security purposes. In a virtual environment packets potentially won’t leave the physical host. Therefore, traffic analyzers and packet filters won’t function. Traffic between virtual machines on the same VMM gets invisible. The VMM is a “Network Blind Spot” and the virtual infrastructure doesn’t integrate into the physical infrastructure transparently. A malicious virtual machine is able to attack other virtual machines located upon the same VMM, called Inter-VM-Attack, without intervention of physical security components. Therefore, it becomes easier to compromise a virtual machine if another virtual machine is already compromised by means of virtualization.

Invisible traffic makes common tools which let an admin analyze traffic within a router or firewall useless as well. Tools for virtual routers have to be developed first. Therefore, troubleshooting within a virtual network is much harder and potentially will affect availability. Invisible traffic is contrary to what is expected, because virtual machines are expected to integrate into a physical environment just like a physical machine - but they don’t.

To introduce network security components into a virtualized environment it is necessary to build virtual components, located within the VMM or between VMM and VMs, once more. But Virtual network security components increase the VMM’s complexity once more, too (TCB of VMs). These components introduce non-virtualization-specific mechanisms. VMMs contain packet filters and other security related mechanisms to analyze traffic for security purposes now. Physical network security components are long existing mechanisms, their virtual counterparts are still in their early days. Therefore, the risk of security incidents increases. In addition, the administrator has to maintain such a network security component within every VMM running in his network. The component’s configuration has to be exactly like its physical counterpart. Therefore, the administrator’s effort to configure all these components increases - any change has to be applied multiple times.

Instead of loacting these components into the virtual environment a VMM could pass the responsibility for network security to existing components by routing packets into the physical network - the VMM reroutes traffic into the physical network instead of routing them to VMs directly and accepts a degrate in performance and an increase in complexity according to Steven J. Vaughan-Nichols within “Virtualization Sparks Security Concerns”, because it’ll “be difficult to manage with large numbers of virtual machines”. In addition, the VMM is still capable of consolidating different network segments so that physical isolation still is potentially lost.

Security and IO-Virtualization

Friday, July 17th, 2009

x86-IO is very hard to virtualize, because x86 is a so called “open architecture”. Very different vendors can develop very different IO-components for x86. Therefore, an overwhelming number of components exist. Virtualization purposes are often considered when IO-components are developed for mainframes (see “I/o for virtual machine monitors: Security and performance issues” by Karger et al.). Components for x86 are usually developed to be driven below a single operating system. Virtualization of x86-IO-components while guaranteeing isolation suffers from possible limitations of these components.

A plenty of IO-components result in a plenty of drivers for all these different components, but drivers for x86 are usually bad: badly designed, unrieable and responsible for a huge degree of erros, flaws and vulnerabilities (see “Reconstructing i/o” by Keir Fraser et al.). Drivers are highly privileged software components and usually are not isolated from other privileged components.

Virtual machines can’t have direct hardware access on x86, the VMM has to multiplex hardware access of multiple suspicous guests. The VMM has to offer an interface like virtual hardware components. An access to virtual IO leads to an access to real IO through the VMM, but to access the real IO the VMM has to integrate multiple drivers into its TCB and the VMM has to trust into proper behaviour of these drivers. Nobody want bad driver code located within the VMM’s TCB, because the VMM is the most privileged component in a virtualized environment. Therefore the VMM wants to locate driver code out of its TCB. In an unvirtualized environment every OS drives IO for its own, with dedicated drivers. With a pure-Isolation-VMM every Guest gets its own set of IO-components and every Guest has its own set of drivers for these dedicated components: no sharing of io-components exist. A pure-Isolation-VMM pushes drivers out of its TCB into the untrusted Guests. Guests don’t use the same software stack for IO. The VMM’s TCB stays small and verifyable.

On x86 the VMM can’t do that, because IO-components have Direct Memory Access (DMA). The IO-components communicate directly, without any address translation, with memory locations. Drivers program IO-components to DMA (write) directly into specific parts of memory. They can overwrite memory-locations dedicated to other guests or the VMM. Therefore, a untrusted driver, located within an untrusted guest that potentially is compromised would be able to compromise the VMM if drivers are located within guests. Therefore, a VMM for x86 can’t locate drivers for IO-components within guests, because it would have to trust its guests. The VMM has to integrate untrustworthy drivers into its TCB. The VMM has to trust into these drivers, though they are not really trustworthy. A flaw within these drivers will compromise system reliability and all guests will be affected.

Well, the VMM now has the drivers, but it has to offer an opportunity to access IO-components by its guests. Different virtualization technologies offer different interfaces.

Full virtualization usually emulates typical IO-components. A guest accesses emulated IO through usual drivers available for these typical IO-components by default. An access to emulated IO is mapped to real IO by the VMM. To be able to use existing driver code within guests, the emulated IO has to behave exactly like its real original counterpart. The VMM only has to offer a small subset of available x86-IO-components, like a specific Network Interface Controller, but writing such emulated IO is a hard and complex task actually. Therefore a guest has to include very much and very complex emulation-code into its TCB. A Flaw within this code is very likely and not only security of guests suffers from this.

A paravirtual VMM usually exports just a small interface for different types of IO, like block-devices or Network Interface Controllers. It uses a Split-Driver-Model. Every guest implements very simple untrusted drivers that communicate with the interface. A paravirtual VMM does not have to use complex emulated IO that behaves like real IO and we don’t need emulation-code. The VMM remains small and simple. Flaws within its interface are less likely.

The VMM has to access real IO directly. Therefore, every VMM has to include drivers for each IO-component available for x86. A VMM is an Operating System and drivers have to be rewritten for every different OS. Therefore, all drivers for x86-IO usually have to be rewritten for each VMM. To avoid this huge implementation effort, a VMM can sit on top of an usual OS. Such a VMM gains access to every driver existing for the particular OS. This is VMware’s “Hosted Architecture” (or Type-II-VMM according to Popek et al.), for example. The Hosted Architecture of course has an important effect on security. The VMM’s TCB has to include a whole OS, including all vulnerabilities. Madnick et al. argued in their paper “Application and analysis of the virtual machine approach to information system security and isolation”“that a combined
virtual machine monitor/operating system (VMM/OS) approach to information system isolation provides substantially better software security than a conventlonal multiprogramming operating system approach”
, because a VMM is a very small program and a OS is much more complex than a VMM. With a Hosted Architecture this statement becomes false. The TCB of such a VMM is not smaller than the TCB of any OS, it is bigger of course.

Instead of a VMM running a Hosted Architecture, a VMM can run on a bare machine, that is: on top of its hardware directly. Such a VMM is called Bare-Metal-VMM - or Type-I-VMM.

To exclude drivers from a VMM’s TCB, a VMM can export drivers into n Guests. Driver-Domains share IO with other Domains (Guests) through interfaces of a VMM. Such a VMM tries to increase security isolation by removing untrustworthy drivers from its TCB and the whole system. Driver-Domains have special privileges, other Guests don’t have, because they have to access real IO directly. Every Guest having direct hardware access has to be considered as trusted, because it can DMA into all memory locations. A real security isolation is not achievable by Driver-Domains on x86. In addition, a Driver-Domain is in contact with sensitive data of guests. A TCB does not get smaller just by moving things around within a TCB. The VMM’s TCB will include all components of each Driver-Domain. Each Driver-Domain usually runs a complete OS for itself. Therefore, the VMM’s TCB will include n Driver-Domains and n Operating Systems. In addition the flow of information gets more complicated, because we have to take the indirection of Driver-Domains. The VMM has to route each IO-access to a specific Driver-Domain.

The Security of a VMM, running a Hosted-Architecture or running a Domain-Architecture won’t exceed security limitations of the Operating Systems the VMM runs on.

To address the issue of badly designed drivers, a VMM can restrict itself to only a particular subset of available IO-components available on x86, because we would be able to rewrite the driver code for this subset and consider it as trusted. We gain a smaller TCB, but we loose flexilbility, because such a VMM only runs on specific hardware.

The only way to solve security issues of IO-Virtualization is to use a I/O Memory Management Unit (IOMMU). An IOMMU is able to restrict DMA of IO-components, because it adds an additional layer of address translation. An IOMMU maps DMA-Addresses to physical addresses and the VMM programs the IOMMU how to map it. Therefore, an IOMMU offers flexibility of DMA-Address-Translation like a MMU offers for virtual memory. The IOMMU creates dedicated address spaces and an IO-component only has access within this space. Therefore, an IOMMU isolates accesses of guests to IO-components completly. With an IOMMU we are able to push drivers into Driver-Domains without an increase in complexity, because we can consider these guests as untrusted now and use the IOMMU to guarantuee isolation. The VMM’s TCB stays small and formally verifyable, an error within drivers can’t compromise system security.

IOMMU’s are not yet included with every VMM, but they are likely to be in near future, according to Leendert van Doorn et al. within “The Price of Safety: Evaluating IOMMU Performance”.

Security and CPU-Virtualization

Thursday, July 16th, 2009

The CPU is a just another ressource when dealing with virtualization and virtualization technologies differ very much in the way they try to virtualize this ressource. It’s easy to imagine that this results in different security aspects as well. In fact, the virtualization of the CPU is a security issue in general, because a CPU offers some instructions that manipulate hardware ressources. Anyone who can call these instructions directly would be the owner of the hardware and no security limits would apply when dealing with sensitive information. A CPU has a mechanism to enforce security, namely Protection Rings, 2 at a minimum (Ring 0 and Ring 1). Hierarchical protection rings distinguish user-mode in an outer ring and kernel-mode at the innermost ring. An inner ring offers more privileges than an outer ring does, which only offers a subset of the instruction set. The innermost ring offers all privileges, software running at that privilege level can use the whole instruction set of the CPU. Trying to call a highly privileged instruction from an outer ring fails.

Gerald J. Popek et al. defined the formal requirements of a virtualizable architecture within their paper “Formal requirements for virtualizable third generation architectures”. Privileged Instructions are instructions that trap when they are called from user-mode. No trap is generated when they are called from kernel-mode. A trap passes control to a predefined, trustworthy routine, a trap-handler, so that processor mode changes. In a usual operating system trap handlers are deeply located in the kernel. A sensitive instruction is an instruction that reads from or writes to sensitive registers or memory locations. For an architecture to be virtualizable, the set of sensitive instructions has to be a subset of the privileged instructions: all sensitive instructions have to trap. On such an architecture control can be passed to the VMM, which maintains control over sensitive instructions and the hardware. A guest is unable to access hardware directly - it can only access hardware using the way through the VMM that enforces the security policy. In addition the VMM is able to emulate the behaviour a guest expects. A guest trying to access sensitive data would read sensitive data of the real machine typically, but because the instruction traps and control is passed to the VMM the VMM can return data of the virtual machine by bookkeeping Shadow Structures for each virtual machine. This method is called Trap & Emulate.

We want secure systems to sit on top of small, isolated components. Trap & Emulate is very easy to implement, we only have to implement the trap handlers, so we win in a rather low complexity through it. Traps are existing mechanisms, deeply studied and in use for many years. These existent mechanisms can be used for virtualization resulting in high degree of logical isolation. The VMM remains small, easy and formally verifyable. The Trusted Computing Base (TCB) remains small promising a high degree of security, because it reduces the usual risks of flaws contained in complex code - and complexity is the point where x86-virtualization comes in.

Trap & Emulate is only possible on “virtualizable” architectures and unfortunately x86 is not virtualizable in that manner. Today many other ways exist to virtualize an architecture. So the formal requirements of Popek & Goldberg don’t tell if an architecture generally is virtualizable, they tell whether it is virtualizable using Trap & Emulate or not. x86 contains 17 sensitive, non-privileged instructions (they fail silently) disabling Trap & Emulate, because control won’t be passed to the VMM when these sensitive instructions are called. Therefore the VMM can’t emulate the expected behaviour and for example the guest can discover that it is running in a virtual environment, because the behaviour differs from the behaviour the guest usually expects.

Binary Translation tries to overcome the limits of x86 by translating the sensitive, unprivileged instructions on the fly. Binary Translation can insert Traps into the binary code so that these instructions will trap afterwards. Other instructions can be executed without any changes. The sensitive, unprivileged instructions can occur anywhere within the binary code. Therefore the Binary Translator has to scan through the whole code, because only scanned code is safe to execute. A special problem a binary translator has to deal with is self-modifying code. Self-modifying code is able to insert problematic instructions after the code is scanned by the binary translator. Trap & Emulate is resistent against self-modifying code, because a trap is generated exactly at the moment of execution. Binary Translation has to scan for this kind of code explicitly. Therefore, a Binary Translator is a so called Sanitizer, because the code gets cleaned explicitly to make it trustworthy. A Sanitizer is in conflict with the principle of Security by Design and fail safe defaults, because every flaw within a sanitizer puts system integrity at risk. The sensitive, unprivileged instructions get filtered explicitly instead of explicitly allowed. The whole code is scanned, filtered and translated, therefore the VMM has to use special ways to reach a satisfying performance. The Binary Translator can try to optimize the code just like a compiler tries to or it can establish a trace cache containing already scanned blocks of code to speed translation. According to Kevin Lawton in his paper on “Running multiple operating systems concurrently on an IA32 PC using virtualization techniques”, a Binary Translator has to insert breakpoints to interrupt execution. Such a Binary Translator has to deal with Code, reading parts of its own code and reading inserted breakpoints, too.

Binary Translation is a very complex method to overcome the x86-penalties for virtualization. The source and destination instruction set is identical, but much work is needed to scan, filter and translate a small subset of the instruction set: the sensitive, unprivileged instructions. According to John Scott Robin et al. in their paper “Analysis of the Intel Pentium’s Ability to Support a Secure Virtual Machine Monitor” “the complexity of this approach may render a highly secure VMM unachievable”. Therefore, a VMM using Binary Translation is not as trustworthy as a VMM using Trap & Emulate.

Paravirtualization uses an other idea to overcome x86. The set of 17 sensitive, unprivileged instructions is just a small subset of the instruction set of x86. These 17 instructions are called by the Guest-OS primarily. Therefore paravirtualization usually adapts each Guest-OS. Afterwards a Guest-OS does not use problematic x86 instructions and the Guest-OS can be run in an outer protection ring - and can’t call sensitive Instructions without VMM-intervention. The Guest-OS runs on top of an idealized Abstraction of x86, similar to the underlying architecture, but not the same. Any attempt to access hardware has to be permitted by the VMM first (fail-safe-defaults!). A paravirtual Guest-OS is “enlightened” and therefore VMM-Implementatation is much easier, because we don’t have to keep the illusion for the virtual environment to appear as the real one and the TCB can remain small. The VMM just has to offer a small interface encapsulating the problematic x86-instruction-subset. The VMM forces the Guests to cooperate or, if they won’t,  fail. Therefore a paravirtual VMM is more trustworthy than a VMM using Binary Translation and similar to a VMM using Trap & Emulate.

A parvirtual Guest-OS usually runs unmodified User-Space-Applications. Security-Patches usually have to be written for each specific architecture software runs on. Patches for User-Space-Applications will become public in time, but the Guest-OS runs on an additional paravirtual architecture. Conceptually, security patches for these Guest-OS-Kernels are not guaranteed to become public in time, because the patches have to be written or compiled for this architecture explicitly. Therefore, the risk of Zero-Day-Exploits for paravirtual Guest-Kernels rise.

Newer generations of x86-CPUs contain technologies to support virtualization (Vanderpool/Pacifica). A design goal of these technologies was to eliminate the need for Binary-Translation and Paravirtualization on the x86-Architecture. The CPU creates containers that conceptually are virtual CPUs and introduce new modes of operation. The modes distinguish whether the CPU is virtual or real and which privileges are associated with it. The VMM runs within a container of highest privileges, conceptually Protection Ring: -1. Guests run within a lower privileged container, but conceptually this container is a protection ring 0. Therefore, Guest-OS-Kernels can run in their usual ring. The Guests sit in their virtual environment, which will be left (exited) when the guest tries to execute privileged or sensitive instructions and trap to the VMM. That is just another way of Trap & Emulate, therefore all security issues of Trap & Emulate apply to CPU-supported Virtualization as well.

But, actually CPU-supported Virtualization does not solve all security issues, because we have to deal with hardware-emulation on x86, which is another big problem for security of virtualization technologies. But.. more about this next time.

KVM Thrashing

Tuesday, July 14th, 2009

I discovered a thrashing behaviour of the Kernel based virtual machine (KVM) during my research for my thesis on security within virtual os environments. This is how it worked. With the Kernel based virtual machine it is possible to overcommit memory between multiple virtual machines. Memory overcommitment is the fact that you assign more virtual memory than physically available.

Some background knowledge first. KVM is a kernel module. It is included within the vanilla linux kernel since version 2.6.20 (Status). KVM uses Qemu for hardware emulation. Qumranet, the leading company behind KVM has been bought by Red Hat Linux. Red Hat is planning to use KVM as a base technology for virtualization purposes, desktop-virtualization as well as for server-virtualization (RedHat). KVM establishes a full virtualization through Intels Vanderpool or AMDs Pacifica Technology. KVM uses Shadow-Page-Tables to virtualize memory, so KVM can swap pages to disk if the virtual amount of memory does not fit into the physical memory.

Following scenario.

  • QEMU PC emulator version 0.9.1 (kvm-72), Copyright (c) 2003-2008 Fabrice Bellard
  • 2 GB of physical memory
  • Intel Core 2 Duo E8400
  • 750 GB Hdd
  • 1 Gigabit ethernet controller
  • Debian 5.0 Lenny (18.03.2009) minimal
  • Kernel Linux version 2.6.26-1-amd64 (Debian 2.6.26-13)
  • 2 guests, good and evil, running same distribution
  • kvm evil.img -m 2048 -curses -k de
    -net nic,macaddr=52:54:00:12:34:56,model=rtl8139
    -net tap,ifname=tap1
  • kvm good.img -m 1024 -curses -k de
    -net nic,macaddr=52:54:00:12:34:55,model=rtl8139
    -net tap,ifname=tap0
  • Evil without additional services
  • Good runs an Apache/2.2.9 (Debian)

1 GB more than physically available has been commited to guests to force KVM to swap memory to disk when Evil is using all memory commited to Evil. I expected the performance of Good to be affacted in a high degree within this scenario. To measure the performance penalty I used the httperf benchmark utilty, version: httperf-0.9.0 compiled Jun 23 2008. The benchmark tried to query a static HTML page of 3704 bytes located on the apache webserver running on Good. I then forced Evil to use all its memory and KVM to swap pages to disk through a simple c-program.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MEMORY_MB 1950
#define MEMORY_MAX (MEMORY_MB ∗ 1048576)

main ( ) {
 char ∗ ch = (char∗) malloc(MEMORY_MAX);
 memset(ch, 0, MEMORY_MAX);
 printf("%s\n", "Memory allocated");
 while(1) {
  memset(ch, 1, MEMORY_MAX);
 }
}

I executed the thrashing and benchmark utility from an other host located within the same subnet.

httperf --server good.local --uri /test.html --num-conn 3000 --num-call 10 --rate 100 --timeout 5

This is a very simple benchmark under normal circumstances, but within this scenario I got 3000 errors of 3000 tries through timeouts on my first attempt. The second time 2269 errors and 2295 errors on my third attempt. So I softened the benchmark even more.

httperf --server good.local --uri /test.html --num-conn 500 --num-call 10 --rate 10 --timeout 5

But again, 97, 184 and 153 errors of 500 connection attempts. To be sure that this is due to memory overcommitment, swapping and thrashing I then quit the thrashing utility - no errors occured, so the webserver was not able to perform due to thrashing behaviour of the VMM and the attack is a DoS.

Why does it happen?

Memory is overcommitted. KVM has to swap, but KVM can’t have as much information about the pages it want to swap as an usual OS. So swapping can result in thrashing behaviour when KVM is choosing pages of highly used workloads.

Well, how risky is this kind of vulnerability?

An attacker first has to get access to a VM, but this access does not have to be highly privileged - so you have to consider it as a local DoS vulnerability. Memory has to be overcommited. The KVM version used here is not the newest one available, but it is the default one available with debian’s default distribution. I will check out newer versions soon. KVM is not the most used VMM for server virtualization, but Red Hat’s attempts to use it as a base for server virtualization show that its use potentially will grow.

Xen DOM-U Backup using LVM-Snapshots and SSH

Tuesday, July 7th, 2009

To backup xen domu’s while running, you can use the following shell script. You have to use lvm on your server (to create live-snapshots) and you have to use key authentification for ssh (to avoid password authentication). What is special about this script is, that it runs on a remote machine. It executes commands through ssh and receives the complete lv-dump through ssh. So you can create a central backup server, which will backup remotely on a scheduled basis and you can win a redundancy at different physical locations very easily, if your bandwidth and lv-size agrees. If your bandwidth is rather slow, you can pipe the output of dd into gzip on the remote machine - or on the backup server if your connection is fast, but your backup space is rather limited. After backing up, you can mount it the usual way:

mount -o loop backup.img /mnt

Here is the script:

#!/bin/sh

if [ $# -ne 7 ]
then
  echo “usage: xen-lvm-backup [HOST] [DOM_U] [BACKUP_LV] [BACKUP_LV_SIZE] [VG] [LV] [DEST]”
  exit 1
fi

HOST=”$1″
DOM_U=”$2″
BACKUP_LV=”$3″
BACKUP_LV_SIZE=”$4″
VG=”$5″
LV=”$6″
DEST=”$7″

if /usr/bin/ssh -o PasswordAuthentication=no -l root $HOST /bin/true; then # able to login?
  /usr/bin/ssh -o PasswordAuthentication=no -l root $HOST “/usr/sbin/xm pause $DOM_U”
  /usr/bin/ssh -o PasswordAuthentication=no -l root $HOST “/sbin/lvcreate -L$BACKUP_LV_SIZE -s -n $BACKUP_LV /dev/$VG/$LV”
  /usr/bin/ssh -o PasswordAuthentication=no -l root $HOST “/usr/sbin/xm unpause $DOM_U”
  /bin/echo “” > $DEST
  /usr/bin/ssh -o PasswordAuthentication=no -l root $HOST “/bin/dd if=/dev/$VG/$BACKUP_LV” > $DEST
  /usr/bin/ssh -o PasswordAuthentication=no -l root $HOST “/sbin/lvremove -f /dev/$VG/$BACKUP_LV”
fi

Use it without any warranty.