Chapter 2: In chapter 2 we provided definitions of Cloud offerings and deployment models so as to establish a common understanding of them, with particular attention to elasticity and how it can be used to improve performance. We also discussed notions of utility and commodity, which are prevalent in Cloud Computing, and noted how performance links both these concepts when applied to Cloud.
Chapter 3: Chapter 3 discussed some hypothetical examples of how performance variation may lead to cost variation, providing motivation for the proposed broker. We discuss server hardware and server virtualisation, with particular reference to how performance properties transpire and how providers make use of virtualisation to partition the physical servers into multiple, secure and isolated instances – the primary object of study in this thesis. We reviewed literature related to performance metrics and benchmarks, and noted the preference for so-called task progression metrics such as workload execution time or work done in a specified period, over ones which describe machine operations such as number of floating point operations per second.
Chapter 4: In chapter 4 we reviewed work related to Cloud performance, performance improvement strategies, Cloud brokers, proposals for commodity exchanges for Cloud resources, and Cloud pricing. We found that performance variation is reported on for a wide range of workloads and instance types. We identified CPU bound workloads as readily offering an opportunity for performance brokerage due to the variation being tightly coupled to variation in underlying hardware. However, the extant data insufficiently characterised performance so as to allow for a realistic model to be constructed. The review of brokers reveals that supposed performance based services do not consider how costs incurred will be covered, as such we questioned their viability. We find profitability in one broker, but even in this case the opportunity appears to be now closed. A review of markets notes the prevalence of the CDA, which has been shown to rapidly produce an equilibrium price.
Chapter 5: The primary purpose of this chapter was to sufficiently characterise performance so as to allow for a realistic model of it to be constructed. We extensively benchmarked a number of instance types on EC2 using a variety of workloads. From the data collected we determined a set of performance model features.
Chapter 6: This chapter was dedicated to elaborating a model of the broker and determining conditions under which it is profitable. The broker will be profitable if there is a minimum rental fee of 60 minutes at the exchange, and instances may run on 2 different types of hardware with an average degrade in performance of 26% from best to worst. The highest achievable profit margin was just 4%, and this requires 4 different types of hardware producing an average performance degrade of 52% from best to worst. A loss is made with homogeneous instances with an average performance degrade of 20%, unless demand is increased to a level similar to that found in the Google workload trace. Further, a loss is always made with a minimum rental fee of 10 minutes.
Chapter 7: In this chapter we critiqued work presented in chapters 5 – 6. In particular, the critique of chapter 5 we justify our use extensive use of EC2, due to its global spread, wide range of instance types on offer and prevalence within the literature readily allowing for comparisons to be made. For chapter 6, our critique focuses on key concerns in the use of simulation: Validation: Are we building the right system? Verification: Are we building the system right? We are argue that the model has high face validity and we have confidence in our data and structural assumptions. Further, simplification and walkthroughs provide verification for the model. We are confident in the results produced by the model.
To structure and guide the investigation of our research problem we developed a number of research questions, and in the next section we consider whether or not we have been successful in addressing them.
8.2 Research Problem and Questions
In order to structure and guide the investigation of the research question we posed a number of research questions. In this section we consider to what extent we have addressed our research questions.
RQ0: How should we specify and measure instance performance?
In section 3.5 and 3.6 we discussed performance metrics and measures. As computer systems develop it is imperative that metrics and measurements to define and measure performance remain relevant. Metrics defined in terms of what a machine is physically doing, such as clock-rates and MIPS for example, lead to a number of difficulties; in particular, they are typically unsuitable for making comparisons with machines that work in a different manner. Indeed, they are typically poor at distinguishing likely performance of different workloads on the same machine. It is generally considered that the most useful and informative metrics for users are execution times and throughput. As such, we choose execution time as our metric for specifying performance, and make use of it in our empirical work and instance performance model.
Applications used for ‘real-work’ are generally considered preferable for use as benchmarks to specifically crafted ones. The latter tend to suffer from a number of issues, including a small working set, and so can typically fit into a CPU cache. Further, they usually have simple code dependencies, and so do not result in cache misses or pipeline flushes due to mis-prediction, both of which have an effect on performance. As such, we choose to use benchmarks from the SPEC suite, which are chosen to avoid these issues.
Contributions regarding suitable metrics for performance brokers have been published in: John O’Loughlin and Lee Gillam (2014) "Good Performance Metrics for Cloud Service Brokers". 5th International Conference on Cloud Computing, Grids and Virtualisation (CLOUD COMPUTING 2014).
RQ1: How do brokers offering performance based services address variation at the same price?
We categorise brokers that offer performance services into comparison services and acquisition services. The former is typified by Gottschlich et al. (2014), Li et al. (2010) and Lenk et al. (2011), and operate by measuring performance of providers in order to rank them. The focus is on price/performance differences across providers and not on variation amongst supposedly identical instances from the same provider, however, repeated measurement of the same provider will of course reveal this. However, such brokers suffer from a number of difficulties. In the case of no (or minimal) variation in instances of the same type, it is hard to see how repeat business can be generated as once a performance evaluation has been conducted there is no need for a client to pay for another one. This brings into the question the viability of the broker. However, where large variation is present, the broker cannot provide any assurance over the performance level obtained, beyond the likelihood of it being within the measured historical range. This reduces the usefulness of the broker, again raising questions as to viability.
Proposals for acquisition brokers, such as those of Pawluk et al. (2012) do not take into account that performance of instances of the same type may vary. As such, it may require multiple instances before one with the desired performance is obtained, if indeed it is at all. To be viable the broker must recoup these additional costs, and yet viability is not considered.
Contributions in the area of performance brokerage have been published in:
-
John O’Loughlin and Lee Gillam (2017) “A Performance Brokerage for Heterogeneous Clouds”. Future Generation Computer Systems (FGCS). In press. https://doi.org/10.1016/j.future.2017.05.005
-
John O’Loughlin and Lee Gillam (2014) "Performance Evaluation for Cost-Efficient Public Infrastructure Cloud Use". In J. Altmann et al. (Eds.): International Conference on Economics of Grids, Clouds, Systems and Services (GECON) 2014, LNCS 8914, pp. 133–145, 2014.
RQ2: What are the risks for users of Cloud systems when seeking to take advantage of performance variation?
Performance variation does offer opportunities for performance improvement strategies, as we discuss in section 4.2: insufficiently performing instances can be returned to the provider at any time and replacement ones ordered in the hope they will have better performance. This observation forms the basis of performance improvement strategies on Clouds, which are variously called ‘Ditch and Deploy’, ‘Instance Seeking’ or ‘Placement Gaming’ (Ou et al., 2013) and (Farley et al., 2012). However, a notable feature of the analysis is the assumption that in a heterogeneous environment the CPU model allocated to an instance is independent of any previous allocations. A priori, there is no reason why this should be case. Indeed, we show in section 5.5 that CPU allocation is far more irregular than would be the case if this assumption is true. Simulations show that when we assume a more irregular allocation, whilst the expected cost of obtaining particular performance levels is the same the standard deviation increases. That is, the risk of performance improvement strategies is greater than reported.
Contributions regarding instance seeking have been published in John O’Loughlin and Lee Gillam (2015) "Re-Appraising Instance Seeking in Public Clouds".
RQ3: How should we design performance based experiments?
A common approach in statistics is to take measurements from a random sample (of some size) of the population and use this to estimate population parameters. However, this approach typically assumes that the properties of the sample do not vary with time. In medical statistics, a cross-sectional study takes a sample from a population and measures a particular value, giving, in essence, a snapshot in time. A cross-section allows for differences, or variation, amongst the sample to be measured. A longitudinal study involves repeated measurements over time with the objective of understanding how the characteristics of interest of a particular representative of the population vary over time. A panel study is a combination of a cross-section and longitudinal study that repeatedly measures a cross-section of a random sample over time.
We are interested in variation amongst supposedly identical instances, as well as how the performance of individual instances vary over time. As such we conducted both cross-sectional and panel studies. The former allowed for the measurement of variation across a set of instances at a point in time, whilst the latter allowed for longitudinal variation of instances to be measured.
RQ4: What are the major characteristics of performance variation?
We begin by characterising the performance variation across a set of instances: Per CPU distributions of workload execution times are highly peaked close to best performance but have a long tail. As such, they are positively skewed and have excess kurtosis. The latter means there is a propensity for producing outliers. We showed that per CPU distributions are consistent over time and locations. In a heterogeneous environment the distribution as a whole (or inter CPU distribution) is characterised as: multi-modal with clustering based on CPU model, and we find: (1) for the same instance type different workloads have different variation; (2) for the same workload its variation on one instance type cannot be used to predict its variation on another; and indeed (3) for the same instance type different CPUs are better/worse for different workloads, and so there is no best/worst CPU for all workloads.
We characterise longitudinal variation of instances as follows: the time series generated by repeatedly benchmarking an instance is typically stationary, however, instances on supposedly identical hardware have different performance levels over the period. Further, we find examples of locally-stationary performance, characterised by ‘instantaneous’ jumps to new mean levels, and we even find a somewhat pathological instance.
Our contributions have helped to: (1) quantify performance variation; (2) identify causes and extent; and (3) characterize it. Some of this work has already been cited in a major survey of performance literature by Leitner and Cito (2016), who evaluated 54 papers contributing to empirical evaluation of Public Infrastructure Clouds. Leitner and Cito’s citations included:
-
John O’Loughlin and Lee Gillam (2014) "Performance Evaluation for Cost-Efficient Public Infrastructure Cloud Use". In J. Altmann et al. (Eds.): International Conference on Economics of Grids, Clouds, Systems and Services (GECON) 2014, LNCS 8914, pp. 133–145, 2014.
-
John O’Loughlin and Lee Gillam (2013) "Performance Prediction for Public Infrastructure Clouds: an EC2 Case Study". 2013 IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2013)
-
Lee Gillam, Bin Li, John O’Loughlin and Anuz Pratap Singh Tomar (2013) "Fair Benchmarking for Cloud Computing Systems". Springer Open Journal of Cloud Computing 2:6. DOI: 10.1186/2192-113X-2-6
RQ5: Can we find real-world examples of Cloud like systems whose users express performance needs and pay for different levels of performance?
The workload trace released by Google (Reiss and Wilkes, 2011) is the only example we can find where different performance levels are priced differently, albeit performance is indirectly expressed as a priority level and we do now know Google’s internal charging. The scale of the data set and the mixture of workloads (from production level services to free test and development) mean it is likely to be representative of the type of workloads we would expect to see on the Cloud. Further, priority can be considered as a performance requirement – higher priority jobs have a more urgent need of resource access than lower priority ones in order to perform some useful work. Indeed, as resource quota at a given priority must be purchased in advance, the trace is an example of performance pricing as it costs more to run a job with a higher priority than a lower one.
RQ6: How should we model the broker?
On-demand provisioning is arguably the unique and compelling Cloud characteristic, and so we choose to model the broker as providing an on-demand performance service offering performance-assured instances. In an on-demand service requests must be executed immediately, and so we implement an instance pool: A set of instances obtained from a compute marketplace in anticipation of future demand and from which client requests are satisfied. To reflect the commodity view of Cloud resources we model a commodity marketplace (CeX) that uses a CDA, in line with the majority of the world’s exchanges. We construct a population of clients for the broker using data from the Google trace, as described in section 6.6. Sellers on the CeX are potentially heterogeneous, and they allocate hardware to instances in an irregular manner and they provide instances from which the broker can determine a CPU model and measure performance. Our model does not impose any change to a provider’s business model. Instance performance takes into account the empirical findings in section 5.10, and described in detail in RQ3.
A notable contribution made is a model of instance performance that is capable of generating performance data for a group of instances with the following properties: (1) over time it produces qualitatively similar time-plots to those empirically observed and (2) a cross-section at any given moment in time must have the same characteristics as cross-sectional variations empirically observed. The presentation in this thesis of a consistent model of instance performance, is, as far as we aware, is a first. A model satisfying property (2) was presented in:
-
John O’Loughlin and Lee Gillam (2017) “A Performance Brokerage for Heterogeneous Clouds”. Future Generation Computer Systems (FGCS). In press. https://doi.org/10.1016/j.future.2017.05.005
RQ7: What are the risks the broker is exposed to when offering an on-demand performance service?
Perhaps the most obvious risk to the broker arises from the need to provision instances in anticipation of future demand, as predicting the future is always fraught with difficulties. Should the broker overestimate future demand, then instances will be provisioned into the pool, and costs incurred, for which there is insufficient demand. To be viable, these additional costs will need to be recouped, likely requiring an increase in the price the broker charges for instances. However, if the size of the pool the broker maintains is sufficient to only satisfy a small proportion of requests then its usefulness an on-demand service becomes questionable.
The small profit margin the broker is able to generate means that, at best, such a broker would support a low margin high volume business. The broker would need to consider the question of where opportunities for a high volume business may arise. The broker is sensitive to market competition, and in particular pricing competition from other brokers offering the same service but who have a more cost-efficient means of managing their resources. Further, the broker is exposed to vagaries in demand, exchange transaction fees and gaming strategies that clients may be able to employ, all of which may serve to close the opportunity.
A further risk is the problem of mutual performance degradation for co-locating instances, as demonstrated in section 5.9. We developed a strategy which two instances can co-operatively employ, enabling them to positively determine if they are host separated, whilst a failure to show host separation suggests co-location and repeated failure more strongly hints at this. The strategy is suitable for Xen based Clouds, and is addressed by the two publications listed below.
-
John O'Loughlin and Lee Gillam (2016) "Sibling Virtual Machine Co-location Confirmation and Avoidance Tactics for Public Infrastructure Clouds". Journal of Supercomputing Volume 72, Issue 3, pp 961-984: DOI 10.1007/s11227-016-1627-9
Dostları ilə paylaş: |