A performance Brokerage for Heterogeneous Clouds

Yüklə 1,38 Mb.

səhifə	44/49
tarix	09.01.2019
ölçüsü	1,38 Mb.
	#94329

1 ... 41 42 43 44 45 46 47 48 49

8.4 Future Work

8.3 Conclusions

To what extent can a performance broker profitably address performance variation in commodity Infrastructure Cloud marketplaces through offering performance-assured instances?
The broker can be profitable but requires a variety of market conditions to coalesce. The highest profit margin that could be achieved was just 4%, and achieving this requires: (1) 4 different types of hardware that exhibit an average performance degrade of 52% from best to worst; (2) a minimum charge at the exchange of 60 minutes; (3) an order arrival that is predictable and follows a homogeneous Poisson process with average per minute rate of arrival of 1; and (4) a set of orders as described by the Google workload trace in section 6.6.1.
In our model we explicitly assume that increasing heterogeneity led to increased performance variation, raising the question of potential for increased profitability when NUM_CPUS > 4. Increased variation will lead to increased markups the broker can charge. However, increase in broker prices will not necessarily lead to increase profit, as the price the clients will accept is limited by instance seeking costs. Indeed, potentially, increased variation may lead to less profit as the broker is priced out of the market. We note that we have not considered the question of identifying a set of market conditions which produces maximum profit.
That a minimum charge of 60 minutes is require for highest achieved profit is problematic: as of 22/10/2017 EC2, GCE and Azure all have a maximum charge of 1 minute. Indeed, with all else being equal, setting min_charge to 10 minutes, led to a loss. Increasing the order rate reduced the loss but did not produce a profit. We did not try an order arrival rate above that found on the Google submission for jobs of the same durations. As such, demand would need to be in excess of this to produce a profit, if indeed it does at all.
We assumed that demand is predictable and followed a homogeneous Poisson process. The low margin means of profitability of the broker is potentially sensitive to changes in the market. As with any business, there are likely to be periods where demand ‘waxes and wanes’, or indeed where abrupt changes in demand occurs. Arguably, there is a need for a sensitivity analysis of profit to changes in demand, similar to that conducted by Clamp and Cartlidge (2013) on the Rogers and Cliff (2012) broker. We also note the broker may be sensitive to changes in the order duration, which were derived from the Google workload trace.
A loss is made under a variety of conditions, one of which is min_charge = 10, as already noted. However, the broker also makes a loss whenever NUM_CPUS = 1, as would occur if the exchange mandated a particular CPU model for a given instance type. Although this is likely to lead to reduced liquidity, we cannot rule out such a possibility on future exchanges.
Even at its most profitable the broker is operating on a 4% margin. Suppose the broker is able sell one instance hour of a c4.xlarge, as benchmarked in section 5.6, every minute for a year. In total the broker sells 525600 instance hours, incurring a cost of at least $104,594 at current prices⁴². At a 4% margin the broker makes a profit of just $4,183. However, we define profit as revenue minus costs incurred in renting instances, and exclude all other costs. Arguably, at this level of demand the broker would be unlikely to meet other standard business costs. Further, transaction fees, as typically found on commodity exchanges, may wipe out any profit.
The small profit margin leaves the broker sensitive to market competition. Indeed, if a performance market is viable we would expect multiple sellers, which will bring price pressure. Further, gaming opportunities can also drive down profit, as users order and pay for tranche C instances, but make use of them for tranche A work. The broker may be able to increase profit by offering tranche A instances only, so in effect operating a ‘top end’ service. However, the increase in profit assumed the same level of total demand as for a multi-tranche offering. It is not clear that would be the case. At best, the performance broker would support a low margin high volume business. We do note that the IoT and the emergence of Cloud of Things, a discussed in section 8.1, may provide an opportunity for a high volume low margin business.
Our final conclusion is to simply note the apparent difficulty of demonstrating profitable brokers. This does raise the question as to where the profitable opportunities are for brokers in a commodity marketplace.

8.4 Future Work

Future work, as would extend naturally from the research presented, is focused in the following areas:

8.4.1 Sensitivity Analysis
The broker operates an on-demand model, but in order to do so requires sufficient instances in the pool. The broker must estimate demand in advance, and this introduces risk for the broker as overestimating demand that fails to materialise increases costs. However, underestimating demand may lead to lower client satisfaction as the broker will not be able to satisfy all, or even most, requests.

We have assumed that demand follows a homogeneous Poisson process. However, to understand the sensitivity of profit to different demand profiles we would simulate the broker with demand modelled as various in-homogenous Poisson processes. We also note the broker may be sensitive to job duration times which are different to those found in the Google workload trace, and we would also conduct an analysis of this.

8.4.2 Alternative Operational Model for the Broker
An alternative to an on-demand service is a reservation based system. In this case, a client specifies a future performance requirement. As with all reservation based systems, the advantage for the broker is that demand is known precisely in advance. A reservation based system appears similar to future/forward contracts. Of course, such contracts may be problematic for clients that can only estimate their future performance requirements. As Weinerman (2015) notes, the ability to purchase call options on future performance may be useful. We also note that as levels of available performance on an exchange varies, the contract itself has value derived from this variation. As a consequence, it appears that a genuine derivatives market is now possible. Initially, we would further investigate extant reservation based systems as as well the structure and pricing of various derivative contracts.
8.4.3 Security Concerns in a Practical Implementation
In our proposal, the same instance may be rented to different customers at different points in time, although never concurrently. Further, clients have root access to instances allowing them to install and run workloads. This does pose a security issue: How can we be sure that once a client has terminated their lease they can’t gain access to the instance again in the future, by, for example, installing a backdoor? Given the concerns raised about sharing hardware where each tenant is within a secure by design (if not always in practice) instance, this is an issue that must be resolved satisfactorily.

A potential solution lies in recursive or nested virtualisation, where a hypervisor can be installed into a virtual machine from which new virtual machines, under the control of the new hypervisor can be created. Notably, in their seminal paper Popek and Goldberg (1974) derive conditions on the ISA which if met allow for recursive virtualisation. Such a solution would allow for the broker to create new instances within extant ones to which clients are given access. When leases expire the instances are destroyed. Whilst there is still the possibility the client can break out of their instance into the broker’s instance, this is the same risk as is present on all Cloud platforms.

Running an instance inside an instance is likely to throw up a host of practical, and indeed other, issues. For example, what is the performance overhead? Spillner et al. (2012) report minimal overhead for CPU bound workloads, but we note the benchmarks used are likely to have problems similar to ones described in section 3.6. For future work we would investigate recursive virtualisation in respect to how a broker can securely offer access to instances. In addition, we would identify key concerns that need to be addressed before a viable practical implementation could be built.

8.4.4 Correlated Risks and SLAs
Cloud systems concentrate vast quantities of physical resources at various locations globally known as Regions, which are predominately but not entirely independent of each other. On EC2 for example, Regions have a common authentication system allowing a single account to access multiple Regions with the same credentials. Regions are typically comprised of multiple AZs, which are distinct locations within a Region with their own independent power supply and network connections. However, problems in one AZ have been known to cascade, resulting in multiple concurrent AZ failures.
The AZ is the ‘lowest’ level of infrastructure within which resources can be placed. AZs themselves are comprised of one or more data centres, each of which has many physical servers. However, once instances have been placed within an AZ no information is provided regarding the degree of separation between them and correlated risks abound in regard to both availability and performance.
Indeed, instances on servers within the same rack typically share networking infrastructure such a switches and cabling, and so problems with these will affect all instances using them. With regards to performance, we have already demonstrated in section 5.9 how instances on the same host may have correlated performance. However, even instances which are host separated may have correlated performance. For example, instance families such as the M4 and C4 family on EC2 have no local storage and instances from these families have their root disk on network storage. Host-separated instances may well have their root disks on the same storage components, and variation in performance of the latter will affect the former.
It is vital then, that the degree to which correlated risks exist within Cloud systems is understood, as this allows for the estimation and hence pricing of risk. We note again how availability SLOs within Cloud SLAs apply at the Region level. As a starting point co-location detection, of the type discussed in chapter 7, can be used to estimate the probability of instances being on the same host.
With regards to pricing of SLAs we note the work of Li (2012) who uses synthetic CDOs to construct SLAs that incorporate performance risk. However, the model used is based on the one factor Gaussian copula which due to its symmetry means the degree of correlation in the upside is the same as in the downside. Our empirical work in section 5.8 however demonstrates asymmetric correlation. Further, in a one factor model the degree of correlation is the same, however, the degree to which different instances on the same host will correlate depends upon a number of factors such as number of co-locating instances and workloads being run. We would investigate alternative CDO pricing models, as described in Burtschell et al. (2008), to address these concerns. In addition, we would investigate actuarial approaches to SLA pricing in the presence of correlated risks, and compare and contrast with a CDO approach.
8.4.5 Spot Market
Our primary consideration in this thesis is with on-demand instances and how a performance broker may provide a value-add to this market. However, for certain use cases the spot market may be well offer best price for a workload, but is somewhat idiosyncratic. In particular as it is not on-demand there is no guarantee of resource availability and users must bid for instances. Whilst there is a current spot price i.e. the last agreed price, there is no visible order book and so no visibility over current bids. Further, there is no visibility over available supply, and so users may be bidding for a quantity of resource that is not available. This means the spot market is not suitable for users with immediate, on-demand needs. Further, resources may be claimed back if the current spot price rises above a user’s bid. As such spot instances are not suitable for long running services, or for users whose jobs are not pre-emptible.
Arguably, there is still a significant use case for spot instances, in particular for workloads which can be interrupted, or for Monte Carlo simulations where multiple independent runs across a number of different machines are conducted. How best then to make use of the spot market? There is potential for the broker to operate a mixed pool of spot and on-demand instances for use with suitable workloads, for example a stateless service. In this case, the broker can simply guarantee a replacement with the same performance should a spot instance be reclaimed. Future work would initially focus on investigating the relationship between time to live and bid prices.
8.4.6 Impact of Performance Variation on Loosely Coupled Services
Whilst instances running stateless services are independent of each other, this is not the case for all services, and Kang et al. (2016) note an increasingly common design pattern found on the Cloud is the so-called micro-services architecture. Applications are broken down into a number of independent tasks, each of which is implemented as a self-contained service, or a micro-service, and these tasks work together to deliver the application’s functionality. However, this loose coupling may lead to complex performance interactions as some instances have to wait on others before they can complete their portion of a task. There is potential therefore for a small number of instances to have a disproportionate effect on the performance of the system as whole. Indeed, complex systems have the potential to be very sensitive to initial conditions and variation in the performance of individual instances may lead to unpredictable performance of the service as whole. We intend to measure and quantify the impact of performance variation amongst loosely coupled services.

Yüklə 1,38 Mb.

Dostları ilə paylaş:

1 ... 41 42 43 44 45 46 47 48 49