In section 7.1 we critiqued our choice of research problem. Performance variation for a wide range of workloads has been reported, and so the problem can potentially affect a large number of users. Whilst some users may tolerate a small degree of variation, arguably few will tolerate variation of the magnitude demonstrated in section 5.2 – 5.4. One of the main causes of variation is heterogeneity, and whilst instances types typically start as homogeneous, maintaining homogeneity appears difficult. Further, even in homogeneous instances types, co-locating instances can severely degrade the performance of its neighbours, as demonstrated in section 5.9. New CaaS and FaaS services are typically built atop infrastructure Clouds, and so performance variation in the latter has clear potential to affect the former. Indeed, in section 8.1 we note reports of performance variation in FaaS services offered by AWS, Google and Azure. Further, the advent of the IoT and the Cloud of Things model, is likely to see computation offloaded onto C/FaaS providers, raising questions as to the SLAs such providers can offer. We argue that the problem is of sufficient scale, impact and permanence to justify the investigation conducted in this thesis, and that this is highly likely to remain the case.
In section 7.2 we critiqued the empirical work presented in chapter 5. As performance variation was insufficiently characterised in extant work, we conducted a series of experiments measuring the performance of Public Cloud. The majority of experiments were conducted on EC2, this is in common with the majority of extant work making our results readily comparable. Further, due to its geographical spread and range of instance types, we are able to measure performance in different global locations, and with respect to a multitude of different instance types. This helps to ensure we are not observing a localised problem only.
Compute benchmarks were chosen from the SPEC CPU 2006 suite, which were designed to avoid the various issues with benchmarks, discussed in section 3.6, and show a preference for real-world workloads. We also noted that the number of benchmarks chosen is in-line with extant work.
In section 7.3 we critiqued chapter 6, the subject of which was simulating the broker in order to investigate profitability. Simulation is a well-established technique for investigating problems. However, it requires a model to be constructed which mimics reality. Pertinent questions are now: Validation: Are we building the right system? Verification: Are we building the system right? As discussed, we strive for high face validation as well as ensuring validation of data and structural assumptions. As noted, bar one extension presented in that chapter, the model so has been accepted for publication in FGCS and so been through a rigorous peer review process, providing high face validation. In addition, the model behaves as expected. For example, increased variation leads to increased revenues with all else being equal. We provide verification by replacing stochastic values with known ones, allowing us to compare simulation output with an expected answer. Further, we begin with a simplified model and add complexity once we are sure existing code is working as expected. As such, we have confidence that we have built the right system, and built the system right.
In the next chapter we present concluding remarks and future work.
23. 24.8 Conclusions and Future Work
Cloud providers offer on-demand access to a seemingly unlimited supply of compute resources, such as, instances, storage and networks for rent. The uptake of Cloud services has been significant, with Gartner suggesting many organisations are adopting a Cloud first policy. The Cloud provider business model is one of the mass production of standardised resources. However, there are no guarantees regarding their performance beyond a best effort basis. Variation in performance of instances sold as equivalent has been widely reported on, and leads to variation in costs for executing the same workload, or variation in number of instances required to complete a fixed amount of work in a given time period, again leading to variation in costs. Worse still, for the same price, users pay more to have work delivered slower, confounding usual expectations of such services, but they also miss out on the concomitant benefits of completing work faster.
Performance variation is a result of both heterogeneity and contention for shared resources. The former occurs when providers use different hardware to sell instances that are supposedly identical. Heterogeneity is widespread and appears to be inevitable to a degree, as providers expand and bring new locations on-line, as well as refreshing existing hardware. Unless otherwise specified (and a premium paid), instances of one user may co-locate on the same hosts as those owned by other users. This multi-tenancy is key to the high utilisation rates of resources that providers are commonly believed to achieve. However, due to limitations in hardware architecture, resources such as low level caches and memory bandwidth are subject to contention as they cannot be fairly allocated by the hypervisor. This leads to performance variation due to contention.
To address performance risk, we proposed a performance broker, by which we mean a broker that re-sells instances based on their performance and we say that the broker provides performance-assured instances. Implicitly the broker has obtained instances from some Cloud marketplace, rated them with respect to some performance metric(s), re-priced them and offered them for sale. The research question we addressed is the following:
To what extent can a performance broker profitably address performance variation in commodity Infrastructure Cloud marketplaces through offering performance-assured instances?
In the next section we discuss how the thesis has been structured in order to address the research problem.
Dostları ilə paylaş: |