A performance Brokerage for Heterogeneous Clouds



Yüklə 1,38 Mb.
səhifə19/49
tarix09.01.2019
ölçüsü1,38 Mb.
#94329
1   ...   15   16   17   18   19   20   21   22   ...   49

4.3 Brokers

Buyya et al. (2009) define a broker as an entity that operates by: …’buying capacity from the provider and sub-leasing these to the consumers’, and profits through ‘…the difference between the price paid by the consumers for gaining resource shares and that paid to the providers for leasing their resources’. This is perhaps the most commonly understood definition of how brokers work. However, brokers need not be limited to this type of operation, as they can also mediate between providers and users to solve specific problems. Ficco and Rak (2011), for example, consider a broker offering ‘Intrusion Tolerance as a Service’ which focuses on maintaining some degree of degraded performance during a cyber-attack. Mell and Grance (2011) provide the official NIST definition of a Cloud Service Broker (CSB) as ‘…an entity that manages the use, performance and delivery of cloud services, and negotiates relationships between Cloud Providers and Cloud Consumers’. They further identify a number of ways in which they may operate:


Service Intermediation: A CSB enhances an existing service to add value to it.
Service Aggregation: A CSB integrates multiple Cloud services into one or more new services.
Service Arbitrage: A CSB can dynamically replace components of an aggregated service and so take advantage of price dynamics in the market.
We note that service arbitrage assumes that various components as offered by Cloud providers are fungible, meaning that resources from one provider can be substituted with suitably equivalent resources from another.
The original NIST definition was not without criticism primarily due to its emphasis on technical services at the expense of business or relationship services (Daconta, 2013). In an update, NIST now categorizes CSB services as either technical (intermediation, aggregate and arbitrage) or business, with the latter more akin to the traditional role of a broker in the financial and other markets. Our broker is in the category of Service Intermediation as it adds value to existing services by reducing performance risk. However, we may also consider the broker to be in the category of Service Aggregation as instances from multiple providers are aggregated to form a new service based on performance. Indeed, as instances from one instance may be substituted with that from another, so long as it has the same performance properties, we may also consider the broker to be in the category of Service Arbitrage.

The rest of this section considers related work on brokers, and for ease of exposition, we categorise brokers into those that: (1) offer performance based solutions; (2) offer risk aware services, and (3) operate reservation based systems.


4.3.1 Performance Related Brokers
By a performance broker we understand a broker that makes comparisons between providers, and acquires resources from them, based on the criteria of performance and price only.
Gottschlich et al. (2014) propose a match making service that operates by ‘…continuous monitoring and benchmarking of provider performance and use this information to redirect a customer demand to the best matching offer’. The UnixBench benchmark utility is used to measure CPU performance, and the best matching offer is determined by calculating the ratio price/performance. The choice of Unix Bench as a measurement for compute performance is not without issue. As we have already noted, due to its size, its working set can fit entirely into cache and so it does not stress the memory hierarchy. In practice such workloads are rare, and so UnixBench is not representative of typical CPU bound workloads. Further, a performance based service relying on measurements is only useful to the extent to which those measurements correlate with workloads that users wish to run. Consequently, there will be a limited audience for any service that uses only one benchmark.
In a similar vein, Li et al. (2010) propose CloudCmp, a service which measures compute, disk and network performance of a number of different Clouds and uses this to guide provider choice based on the user’s performance objectives. Interestingly, they note the limitations in using a small set of benchmarks and indeed note the need for workload specific brokerage. Similarly, El Zant and Gagnaire (2015) propose a broker that selects a Cloud provider by measuring performance with respect to a number of different criteria, including compute, disk I/O, memory bandwidth and provisioning time. The selection decision is then based on a price/performance calculation. We again note that a limited set of benchmarks will limit the audience for the broker.
Lenk et al. (2011) propose a performance comparison CSB that allows users to compare instance performance between providers using a range of different benchmarks. A customer begins by selecting a benchmark from a range on offer, and the CSB then reports the results of those benchmarks from the providers. By being able to choose from a range of benchmarks the customer has more confidence that the reported performance correlates with the workloads they intend to run. However, Lenk et al. do not discuss the frequency of measurement. In our initial investigation on EC2, we found 4 different CPUs for the M1 class, whilst later investigations uncovered an additional 2. More recently, the M4 has become heterogeneous, so any such service would have to incur costs in measuring instances on the new CPU model. Infrequent performance measurements will make the data less representative of current performance and hence of less value to users.
The Lenk et al. (2011) proposal has similarity to previous functionality offered by the performance comparison site CloudHarmony (CloudHarmony, 2017), where a user could choose a benchmark, instance type, CPU model and provider and CloudHarmony would return a performance measurement. However, it quickly became apparent that performance data was captured from very few instances and was infrequently updated, quite possibly due to costs involved, and this particular aspect of the service (CloudScores) no long appears available as of 10/03/2017, with CloudHarmony now offering a generic comparison of offerings and service availability data.
Tordsson et al. (2012) consider a broker that makes placement decisions of virtual machines across different providers based on performance and price considerations, together with constraints such as maximum number of providers, numbers of instances of a particular type and load balancing requirements. However, we note that the model of instances assumes instances of the same type are homogeneous and have an identical constant level of performance. Such assumptions are somewhat at odds with reported variation and arguably findings presented need to be re-evaluated.
Performance is just one criterion of interest when comparing providers. Amato et al. (2013) propose a CSB that makes use of the Open Cloud Computing Interface (OCCI) VM specification (Open Cloud Computing Interface, 2017) whereby users submit an SLA template containing multiple criteria including availability, reliability and performance. The CSB then finds a ‘best’ match from amongst offers made by providers. The OCCI specification only defines instance performance in terms of CPU clock speed, and yet this is known to be an unsuitable metric for compute performance, as discussed in section 3.5. Similarly, Mehta et al. (2016) and Sundareswaran et al. (2012) consider brokers that offer a service selection based on a set of expressed criteria.
In addition to provider comparison services, a number of resource acquisition brokers have been proposed. Given a set of requirements, these services will choose the best provider(s) for the workload deployment and will then acquire the resources. Pawluk et al. (2012) propose a CSB that acquires resources on behalf of users based on a set of ‘higher level objectives’ together with descriptions of ‘nodes’, which represent virtual machines available from different providers. Based on expressed requirements, the CSB attempts to deploy the workload with an objective of minimizing costs. However, to minimise costs requires knowledge of performance and we note that the notion of performance is left undefined. In future work they intend to implement just in time benchmarking (JITB) to allow for application specific deployment decisions. In this approach, resources are acquired from providers, benchmarked (using benchmarks that correlate with users’ workloads) and the results of this are then used in the decision making process.
As we have noted, Guzek et al. (2015) states that for a broker to be a sustainable business it must create a ‘value-add’ for which they can charge. Performance comparison/selection would seem to suffer from a common problem: What is the ‘value-add’ that can be charged for? Arguably, tools such as Expertus (Jayasinghe et al., 2012) and Cloud WorkBench (Scheuner et al., 2014) ease difficulties with automating large scale benchmarking of Clouds. Indeed, the emergence of application containers such as Docker should ease the problem further.
Perhaps a more significant issues for comparison/selection type proposals are costs incurred in benchmarking/measuring, which have to be recouped otherwise the broker will run at a loss. Measuring performance with respect to a small set of benchmarks may limit costs, but will also limit the potential audience for them. Indeed, Gustafson and Snell (1995) note that, given any two machines A and B it is nearly always possible to find workloads w1 and w2 such that w1 performs better than w2 on A, but the opposite is true on B. In order to appeal to a wide range of needs, will require a sufficiently large benchmarking set, driving up measurement costs.
However, in light of performance variation reported in section 4.1, arguably, the value of measurements is limited, as it can only tell you the performance of instances that have been obtained in the past, but cannot tell you precisely the performance of instances yet to be rented. Further, performance variation will present some interesting challenges for JITB, one of which is escalating service costs: in order to inform the choice of provider, instances need to be acquired and benchmarked, and some will be unsuitable for deployment but have incurred a cost. Having made a choice of provider, the broker will then need to acquire n > 0 instances at a particular performance level, in effect becoming an instance seeker on behalf of its users, with costs and risks associated, as discussed in section 3.2.
Whilst brokers may offer a range of performance based services, they cannot provide a guarantee of a particular performance level, and it is common to explicitly assume the providers may offer SLAs guaranteeing performance, for example Youn et al. (2017) make such an assumption when considering a broker that matches user requirements to provider SLAs. However, given performance variation, such as reported in section 4.1, it is far from clear that providers can make such guarantees. As such, there is a possibility that performance based SLAs, and indeed other SLAs, may fail. We consider in the next section some work that incorporates risk into SLAs.
4.3.2 Risk-Aware Brokers
Djemame et al. (2011) consider the brokering of risk-aware SLAs in grids, in which the SLAs issued by Grid providers include a probability of failure (PoF). The Broker maintains a historical database of SLAs and uses this to determine a provider’s reliability. Providers whose stated PoF differs significantly to their historical record are deemed unreliable. Note that a provider may have an unreliable service, but so long as the SLA PoF accurately reflects this unreliability they are deemed reliable. Further, we note that whilst this helps with identifying unreliable providers, the risk of SLA failure is still unmanaged as there is no risk pricing or insurance associated with the SLAs.
Li (2012) also comments that whilst Djemane et al. (2011) provides a probability of SLA failure for a provider, correlated risks are not considered. That is, the probability of failure of an SLA from a grid provider is assumed to be independent of the failure of SLAs from other providers. However, for brokering Cloud SLAs with respect to resource performance, there is no guarantee that performance of instances covered by the SLA is independent of each other. For example, the network performance of multiple instances running on hosts connected to the same switch will likely show correlation. Further, co-locating instances may be expected to show correlated performance – if they are on a ‘noisy’ host one would expect this to affect the performance of all instances.
Li (2012) applies a model of synthetic collateralized debt obligations (CDOs) to construct risk aware SLAs for Clouds. In particular, use is made of one factor Gaussian copulas for pricing. Copulas allow for the modelling of dependency between random variables and so we can consider a correlated probability of SLA failure. However, Gaussian copulas are symmetric and so the strength of the correlation is the same in the ‘upside’ as on the ‘downside’. This has proved to be somewhat controversial, particularly in the financial crisis of 2007-2008 where downside correlation was observed to be significantly stronger than upside, and as a consequence risk associated with CDOs (priced by this method) was underestimated. In section 4.8 we demonstrate asymmetric correlation between m4.xlarge instances; further, the degree of association differs between different workloads, and so a one-factor model may not fully capture the dependency between instances.
A specific risk the performance broker is exposed to is unknown future demand, indeed as are all providers. We propose an on-demand performance service, which implies that the broker must have already acquired and measured instances in anticipation of future demand for them. Should this demand fail to materialise, then the broker will made on loss on any instances it has rented but been unable to sub-let. However, the broker can limit these losses as instances can be returned to the provider. For providers, investing in hardware for an anticipated demand that fails to materialise may have more serious financial consequences. To alleviate this risk, many providers make use of reservation based systems, which we discuss next.
4.3.3 WZH and Reservation Based Brokers
Reservation based systems, whereby a user reserves future capacity for a fixed period, can be useful for providers as it aids them in resource planning. In order to provide an incentive for users, they typically provide a discount as compared to the on-demand price. However, uncertainty over future requirements makes standard reservation systems problematic for users. To illustrate the difficulty, Wu et al. (2005) provide the following example. Suppose a provider allows users to reserve future capacity as follows: In period 1 a user can reserve a resource for use in period 2 and pay a discounted price of 1. Or they can purchase the resource in period 2 with a spot price of C > 1. Suppose that C = 5, and that the user estimates his probability of requiring a resource in period 2 as p = 0.1. Further, suppose that the user places a value on the resource of 4. The user would never purchase the resource on the spot in period 2 as the price of 5 is above his value for the resource of 4: spot pricing is too expensive. However, in period 1, the expected value of the resource in period 2 is 4*0.1 = 0.4. This is below the reserved price – so the user would not reserve either! Uncertainty over expected usage prevents the user from purchasing.
Wu et al. (2005) propose a reservation system (the WZH system) based on financial options, which helps to alleviate problems for users faced with uncertainty over future requirements in general compute markets. Their system introduces a broker whose role is to absorb risk whilst making a profit. In period 1 a user can purchase an option from the broker on a resource for use in period 2. The broker charges the user a premium for the option, the price of which is based on the submitted probability. The option provides the user with the right to take delivery of the resource but not the obligation. The broker sums the submitted probabilities to obtain an estimate of resource demand in period 2, and then proceeds to reserve a quantity of resource from the provider. In period 2, should the user exercise their right and take delivery of the resource, they are charged an additional exercise fee, again based on the user submitted probabilities. If the broker has not reserved sufficient resource to cover all claimed resources they purchase additional resources from the provider at the on-demand price. Finally, at the end of period 2 all resources are reclaimed by the provider.
As the broker is using the probabilities to estimate demand in period 2, the more ‘truthful’ they are, that is, the closer to the actual probability of a user requiring a resource, the more accurate the estimate. WZH constructs functions for the premium and exercise charges, which encourages truthful submission from users and allows the broker to make a profit. In WZH systems, the user benefits from having a lower expected cost of a resource than if purchasing directly from the provider, the broker makes a profit and the provider benefits by selling reservations to the broker and so can estimate its future demand.
Rogers and Cliff (2012) propose a CSB based on the WZH system that makes use of EC2 reserved instances. In period 1, a user purchases an option from the CSB giving them the right, but not the obligation, to have exclusive access to one instance for a one month period starting in period 2. The CSB charges the user a premium for the exclusive use based on the submitted probability, and the CSB must then decide if it needs to purchase additional reserved instances or not. In period 2, users who claim their instances are charged an exercise fee based on their submitted probability. If the CSB has insufficient instances to cover demand then additional on-demand instances are purchased. CSB pricing must be below the on-demand price, so providing an incentive for users, but above the monthly reserved price, else they will make a loss.
In the standard WZH model the duration of the resource purchased from the provider is equal to the length of the period. However, in the Rogers and Cliff (2012) model the CSB is purchasing either 12 or 36 month reserved instances, but the submitted probabilities only provide an estimate for demand in the next month. If the estimated demand for the next month is greater than available capacity, then the CSB decides whether or not to purchase an additional reserved instance. The CSB calculates the so-called Marginal Resource Utilisation (MRU) of adding an additional reserved instance, a measurement of the expected usage of an additional reserved instance over the next 12/36 months. The CSB will only purchase additional reserved instances if the MRU is above a specified threshold.
Note that 0 <= threshold <= 1, and so setting the threshold to 0 means that in period 1 the CSB will always purchase additional reserved instances until there is sufficient capacity to cover estimated demand, whilst setting the threshold to 1 means the CSB will never purchase reserved instances, that is, will always be forced to buy on-demand and so will always make a loss. A value between 0 and 1 means the CSB will sometimes make additional purchases, whilst at other times will ‘wait and see’.
Rogers and Cliff (2012) investigate their CSB though use of a discrete event simulation (DES) proceeding through discrete points of time a month apart. As the CSB is purchasing instances in anticipation of future demand they are subject to demand side variation, i.e. they may purchase instances in anticipation of a level of demand that never materialises. In order to examine demand side variation, a simulation is given one of 4 possible demand profiles, but as there is limited public data concerning demand for Cloud services they make use of sales data of various market segments obtained from the UK National Statistics Office as a proxy. In particular, they use the normalised Non-Seasonally Adjusted Index of Sales at Current Prices for the following 4 sectors: (1) Non-Store Retailing: All Businesses, (2) Non-Store Retailing: Large Businesses and (3) Non-Store Retailing: Small Businesses (4) Retail of Compute and Telecoms Equipment (Val NSA): All Business Index. The data starts from January 1998 and, to date18, provides 270 monthly data points for each sector.
A population of 1000 users is constructed and in each month all users purchase an option on an instance. The proportion of users chosen to exercise their options in the next month is determined by the normalised demand pattern. For each demand profile and each value of the threshold from 0 to 1, increasing in increments of 0.01, they conduct 100 runs of the simulation, from which they calculate a 95% CI for profit (revenue generated from selling instances to users minus rental charges payable to AWS). The headline result from this work is that the CSB is profitable under all market profiles and for all values of the threshold strictly between 0 and 1. Further, 36 month reserved instances are more profitable than 12 months ones.
However, Cartlidge and Clamp (2014), when analysing the original simulation code, discovered 2 bugs which significantly inflated CSB profit – although it is still the case that the CSB is profitable except for values of the threshold less than 0.07 for 12 month reserved instances in the recovery and recession market. Clamp and Cartlidge (2013) conduct a sensitivity analysis of the Rogers and Cliff (2012) CSB, by adding white noise to the essentially static demand profiles. They show that the optimal value of the threshold, i.e. the value which generates most profit, varies in a non-linear manner with the amount of noise.
Further, Clamp and Cartlidge (2013) demonstrate sensitivity to changes in provider pricing. This makes determining an optimal value for the threshold a priori somewhat difficult. They introduce an adaptive mechanism which allows the optimal value of the threshold to be learnt – but at the expense of introducing new parameters for the CSB. Further, they argue that the introduction of the Amazon Reserved Instance Marketplace (ARIM), where users can sell unwanted reserved instance capacity, for example the remaining 6 months on a 12 month reserved instance, significantly impacts the opportunity identified by Rogers and Cliff (2012). Indeed, when the ARIM becomes a liquid market the opportunity is closed.
We note a further difficulty of Rogers and Cliff (2012) model. A user is required to submit a probability in any given month that they will require an instance in the following month. The WZH pricing scheme is designed to encourage truthful submissions of probabilities, but this places the onus on users to accurately estimate the probability. In the Rogers and Cliff (2012) simulation, the same population of users lived with the CSB for 276 months – 23 years - and users estimate their probability based on past needs. This does of course assume future needs mirror past needs. More problematically, in a dynamic population of users, where new users will enter the population and extant ones leave, accurate estimation of probability may be difficult. As a consequence, users may not gain full benefit, and neither does the CSB as their estimate for the following month is less accurate. Further, long lived users can estimate their expected in the next 12 months and based on this may decide to trade with the provider directly and purchase their own reserved instances. Finally, we also note transaction costs, which are common on commodity exchanges, are not considered.
When replicating the Rogers and Cliff (2012) model we obtain results which are both qualitatively and quantitatively close to those obtained by Cartlidge and Clamp (2014). Whilst we were unable to replicate the small loss made by the CSB in recovery and recession market for 12 month RI below thresholds values of 0.07, our results confirm that the original Rogers and Cliff (2012) results report significantly inflated profit.
The Rogers and Cliff (2012) model, and extensions of it, is notable as they not only propose a method of operation for a CSB but also seek to demonstrate profitability under market conditions - and indeed it is the only convincing example of such work we can find. However, whilst they consider different market conditions for demand, they assume a monopoly supplier, a limitation adopted due to the WZH model.
4.3.4 Summary of Brokers
A commonly considered form of broker operates as follows: there is a set of criteria, such as performance, availability, and security with which comparisons between providers can be made. Clients of the brokers submit requests for resources specifying constraints based on, for example, provider availability, and objectives such as best price. In response, the broker either returns a ranking of providers, and based on this the client can obtain resources, or acquires resources on behalf of the client. By a performance broker we understand a broker where the criteria of interest is performance and price only.
Can a performance broker be sustainable? Suppose performance variation does not exist. In this case, the broker will measure performance of different instances types across a number of providers with respect to a set of benchmarks, producing a static set of performance measurements. The broker needs to recoup costs incurred in measurement, which will include, amongst other things, instance rental costs. However, new developments have made benchmarking the Cloud easier, reducing the value of a broker offering this as a service. Further, as performance is constant, it is not clear how the broker will generate repeat business. A priori, it appears difficult for a performance broker to be sustainable in the absence of performance variation.
However, performance variation is challenging for brokers that operate by either ranking providers or acquire resources on behalf of clients. In the former case, such brokers must re-measure performance periodically to ensure their data accurately reflects current performance. This drives up costs for the broker, and yet they cannot provide a guarantee over the actual performance of instances obtained based on the rankings they provide. In the latter case, arguably, performance variation presents a sustainable opportunity as it introduces variable cost of Cloud use. Namely, the broker can offer instances at a price that is attractive to users, as compared to the expected cost a user would incur in obtaining instances with the same performance. However, we find no examples in the literature proposing how such a performance broker may operate, determining expected operational costs and investigating margin of profitability. Indeed, as we have noted, we find only one example of a profitable broker, and in this case the opportunity appears to have been closed, which serves to highlight the difficulty in proposing and demonstrating sustainable brokers.

Yüklə 1,38 Mb.

Dostları ilə paylaş:
1   ...   15   16   17   18   19   20   21   22   ...   49




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin