Major providers typically employ a (provider specific) compute performance rating system which indicates that supposedly identical instances should have the same performance level. However, they offer few, if any, guarantees regarding instance performance. In particular, there is no guarantee that supposedly identical instances will have equivalent performance, or even that they will be equivalent to within some degree of tolerance. Indeed, the SLA offered by AWS for EC2 is notable for the explicit exclusion of performance issues: ‘The Service Commitment does not apply to … Amazon EC2 or Amazon EBS performance issues’ (Amazon Web Services, 2017). As noted however in section 1.1, delivering consistent and predictable performance has proven difficult in practice for providers.
Indeed, we even find there are no stated service response time, that is, the time taken from request to an instance being available. Iosup et al. (2010) report differences on EC2 in resource acquisition times between C1 and M1 instance types, but note faster acquisition times in EC2 as compared to Flexiscale and GoGrid, the latter of which has the highest variation. Mao and Humphrey (2012), however, report no difference between instance type start up times on EC2, but do report differences between OS type, with Linux booting faster than Windows. They also report longer and more variable acquisition times on the spot market as compared to on-demand – this is most likely explained by the spot market auctioning process. Razavi et al. (2013) show that boot times vary by image size and so suggest pruning images of unnecessary packages, whilst Armstrong and Djemame (2011) consider the time taken for machine images, which are used to instantiate new instances, to propagate from a storage system to physical hosts.
Arguably, differences in service response times, availability and security features offered will diminish as providers offer increasingly similar services and compete on the basis of price. Indeed, in a commodity exchange, attributes such as boot times are likely to be part of the requirements that a provider must meet in order to demonstrate they are offering equivalent instances to that mandated by the exchange. As such, when we refer to the performance of resources obtained from an IaaS Cloud we solely mean the performance of the resource itself and not of the service that provides them.
Whilst providers do not guarantee consistent performance, they do stress the importance of measurement with AWS stating that: ‘There is no substitute for measuring the performance of your full application since application performance can be impacted by the underlying infrastructure...’. Further, AWS note that elasticity makes the measurement process straightforward as instances can be obtained, measured and returned. This does of course come at a cost: Lenk et al. (2011) report costs of $800 when conducting benchmarking experiments on the AWS Elastic Compute Cloud (EC2) and Akioika and Muraoka (2010) report costs of $600, whilst Stahl et al. (2013) note that although performance measurement is potentially time consuming and expensive, it helps to mitigate risk and they advise users to ‘properly’ benchmark their applications. However, this advice is predicated upon consistent performance, and so assumes the performance of unseen instances tomorrow will be the same as those measured today.
In light of performance variation, arguably all instances need to be measured when first acquired, and then periodically through their lifetime. Indeed, Mouline (2009) suggests that the measuring of Cloud performance should be conducted ‘…before deployment and continually in production…’. Pro-actively measuring instance performance is the approach followed by Netflix, a large user of EC2 running, at times, tens of thousands of instances. They use performance measurement for selecting instance types for workloads, as well as for ensuring instances meet a minimum performance level before being deployed into production.
Performance variation is a clearly undesirable characteristic of Clouds, and managing performance incurs costs, both monetary and time, for users. This raises the question as to how variation arises, and how best to define and measure performance. The rest of this chapter is dedicated to answering these questions. As our focus is primarily on compute and memory bound workloads, we limit our discussion of hardware resources, metric and benchmarks to those that are relevant to the question at hand. We begin with a discussion of virtualisation, unarguably one the major technologies used to build Clouds.
3.3 Virtualisation
Virtualisation is the abstraction, or partitioning, of a physical resource into one or more independent, isolated and secure ‘virtual versions' of that resource. Server virtualisation partitions a physical machine into multiple copies, potentially of different sizes, with each one being referred to as a Virtual Machine (VM). By the size of an instance we mean the number of virtual CPUs (vCPUs), amount of RAM and, possibly, storage assigned to it.
Notably, without further configuration, server virtualisation tools will not reserve resource capacity for a particular virtual machine, although a vCPU will typically attempt to schedule it a ‘fair’ amount of CPU time. As such it is possible to overbook resources, and as we discuss later in this section, doing so can have serious performance and availability consequences.
Popek and Goldberg (1974) provide a more precise definition of a VM as 'An efficient and isolated duplicate of the real machine'. The Virtual Machine Monitor (VMM), or hypervisor, is the software responsible for creating and managing virtual machines and should have the following properties:
1. Equivalence: A program running under the VMM should exhibit behaviour essentially identical to that demonstrated when running on an equivalent machine directly (Fidelity requirement).
2. Resource Control: The VMM must have complete control of hardware being virtualised (Safety requirement).
3. Efficiency: A statistically significant fraction of VM machine instructions must execute without VMM intervention i.e. directly on underlying CPU (Performance requirement).
Interestingly, the 3 largest providers (EC2, GCE and Azure) all use different hypervisors: Xen, KVM and Hyper-V respectively, with Xen being particularly prevalent amongst other providers. An OS running inside an instance is known as the Guest OS, and the equivalence property allows providers to offer a wide range of Guest OS choice, making Clouds suitable for a large variety of workloads.
The resource control property allows the hypervisor to securely isolate instances running concurrently on the same host. For Cloud providers this is essential, as it allows them to implement a multi-tenant system whereby a physical server can be shared by multiple different users concurrently in a secure fashion. However, security concerns persist as various hypervisor exploits, or breakouts, have been demonstrated, such as CVE-2017-7228 and CVE-2014-7188, with the latter requiring large numbers of instances on EC2, Rackspace and IBM SoftLayer9 to be rebooted. In practice providers tend to address these issues quickly.
Whilst the resource control property is required for isolation, it does impact the performance of workloads running inside instances as the hypervisor mediates access to the physical machine. Popek and Goldberg (1974) require a hypervisor to be efficient, meaning that the majority of instructions issued by the instances are executed directly on an underlying CPU without hypervisor intervention. If every instruction issued resulted in hypervisor intervention, a workload running in an instance would be significantly slower than if executed directly on the underlying hardware. Indeed, this is why emulation, which allows for a machine of one architecture to appear as if it is of another, is slow.
Hypervisors implement the efficiency/performance requirement by distinguishing between different types of instructions. Privileged instructions, for example memory access instructions, result in hypervisor intervention, whilst unprivileged instructions execute without intervention. I/O bound workloads can result in significant hypervisor intervention, whilst CPU bound workloads will run ‘efficiently’ as they run without intervention. Li et al. (2013) compare 3 different hypervisors, including KVM and Xen used on GCE and EC2 by respectively, by running a variety of CPU bound and I/O bound workloads. They report that CPU bound workloads have negligible variation however I/O bound workloads may suffer significant variation, and Felter et al. (2015) report that write operations have more variation than read. Interesting, they also find that Docker has less variation for I/O bound workloads than a virtual machine.
A hypervisor, without further specific configuration and capacity planning, does not reserve capacity for instances. It is entirely possible to create, for example, 8 instances each with a specified 8 GB of RAM running on a host with only 8 GB. In this case, instances will almost certainly experience severe performance degradation as memory pages are swapped in and out from the on-disk swap space. Worse yet, an out of memory (OOM) condition will result in either a workload, or the Guest OS itself, crashing.
Overbooking (also called over-commitment) of memory resources in Cloud environments is typically undesirable, and whilst it is commonly assumed not to occur, none of EC2, GCE or Azure (the largest providers) make guarantees of memory capacity in their SLAs. We note however that Banerjee (2013) claims that overbooking memory on ESX servers10, does not necessarily lead to performance issues. To avoid overbooking, the total amount of memory consumed by all instances on a host must be less than the total physical memory present. Further, sufficient memory space should also be reserved for the hypervisor.
CPU resources also require careful management. The CPU presented to a VM is known as a virtual CPU (vCPU), and appears no different to the Guest OS than would a ‘real’ CPU. However, to the hypervisor, a vCPU is a scheduling entity and one of its jobs is to schedule CPU time amongst the various vCPUs. In our example above, suppose each of the 8 instances was assigned 1 vCPU whilst the host has 2 CPUs each with 4 cores. Fair equal weighted scheduling, the Xen default, would mean each of the 8 vCPUs being scheduled onto one CPU core. However, should another 8 instances (of the same configuration) be started on the host we now have 16 vCPUs and the same scheduling scheme would see each vCPU being reserved 50% of a core.
Providing a guaranteed minimum CPU allocation under a fair equal weighted scheduling scheme requires a limit to the number of concurrent vCPUs. This is most simply achieved by scheduling instances of equal size onto the same host and limiting the number of concurrent instances11. As already noted, we would perhaps expect a limit of some sort so as to avoid overbooking memory. A host with k > 0 CPU cores and a limit of n > 0 concurrent vCPUs can provide a guaranteed minimum proportion of k/n of a CPU core for each vCPU. Whilst it is reasonable to expect minimum CPU allocations on instances, should we also expect a maximum? Under Xen it is possible to set a cap which defines a maximum CPU allocation, but by default this is not set, and so a vCPU can consume more than its allocated minimum whenever available - often referred to as bursting. If Cloud providers allow bursting we would expect a degree of performance variation across a set of instances. Does CPU bursting explain, either in part or whole, reported performance variation? On EC2 at least this is implied not to be the case, as they have recently introduced specific types of instances called ‘burst-able’ whilst referring to all others as constant performance instances.
Core sharing, which will occur whenever the number of vCPUs is higher than the number of physical CPU cores, can be problematic from a performance perspective. On Xen, each vCPU is scheduled a 30ms quantum during which it cannot be pre-empted. Lv (Xen, 2017) reports that this is good for CPU bound workloads but latency sensitive workloads may suffer as they wait for the guaranteed quantum of other instances sharing the core to finish.
Despite known performance issues with overcommitting resources, Cohen et al. (2017) propose that providers can reduce costs by overcommitting, with only a correspondingly small chance of violating capacity constraints. However, whilst acknowledging capacity constraints violations would lead to SLA violations, they explicitly exclude considerations of financial liability of them.
Whilst providers could allow users to specify bespoke instance sizes, this is typically not offered, perhaps due to complexities with scheduling instances of different sizes onto hosts12. Instead, providers offer instances in non-negotiable fixed sizes, referred to as to as an instance type, which we now discuss.
Dostları ilə paylaş: |