8.
10.Table of Figures
11. 12.1 Introduction 1.1 The Cloud Performance Problem
Cloud Computing services offer access to a seemingly unlimited supply of compute resources such as instances (running virtual machines), storage and networks for rent. These resources can be obtained on-demand, that is, without need for prior reservations, usage commitments or up-front fees and a on a pay-per-use basis1. The uptake of these services sees computing resources move from on-premise facilities to large scale data centres owned and operated by major providers, such as Amazon Web Services (AWS). Due to their scale, ubiquity and standardised offerings, Cloud providers often draw comparisons with utility providers, and Armbrust et al. (2009) suggest that ‘…Cloud Computing is the long held dream of computing as utility…’.
Due to the flexibility with which resources can be obtained and released back to providers, Clouds services are often described as elastic, or elastic infrastructure. Sommerville (2013) considers elasticity to be the defining characteristic of Cloud systems that differentiates them from other forms of distributed computing, such as Grids. Elasticity makes Clouds particularly suitable for workloads whose demand for resources varies, as they can be rapidly acquired and released according to needs. This includes, for example, long running services with variable demand on them, or periodic execution of batch jobs. Further, elasticity makes possible the running of massive simulations on an ad-hoc basis, where the ability to do so would previously have been the preserve of a few select well-funded organisations. It is no surprise that such usage is prevalent in Cloud use case studies (Amazon Web Services, 2017). More generally, Cowhey and Kleeman (2012) discuss the positive impact Clouds may have on emerging economies, whilst Nikolova (2012) notes the potential for Cloud to reduce the cost of computing across Government through consolidation.
However, Cloud usage is not without issue, and a number of problems regarding Cloud use have been raised. These include, but are not limited to: data security, privacy and sovereignty, service availability and the potential for provider failure, vendor lock-in, and performance. We initially define performance as ‘the amount of useful work a system can do in a given time period’. With regards to performance problems on Clouds, the issue is one of variation in the performance of supposedly identical resources which attract the same price within the same geographical location. This problem has been widely reported on: Leitner and Cito (2016), O'Loughlin and Gillam (2014), Farley et al. (2012), Phillips et al. (2011) and Osterman et al. (2010). Indeed, the degree of variation found is sufficiently large for Armbrust et al. (2009) to identify performance unpredictability as one of the 10 key obstacles to widespread adoption of Cloud, whilst in an IDC survey (Gens, 2009) reports that performance is the third major concern following security and availability.
Performance variation across instances at the same price causes a number of problems for users including:
1. Prices are the same irrespective of performance, but costs will vary with the amount of time required to undertake a task;
2. As a consequence, when scaling an application the number of instances required to complete a certain amount of work in a given period of time may differ, and so the price for completing work is again variable.
Due to variation in execution times, and hence costs, users cannot be sure that workloads will complete either on-time or within a budget. Worse still, for the same price, slower instances not only lead to higher costs, but also results in users missing out on the concomitant benefits of completing work faster. Similarly, when scaling an application, users cannot be sure that a provisioned quantity of resources will have sufficient performance to meet expected demand on it. Presently, the fact there is no guarantee of a worst case for performance exacerbates the problem. The various issues performance variation amongst supposedly identical resources sold at the same price brings is referred to as performance risk. This thesis is concerned with addressing performance risk, in particular that found amongst instances (running virtual machines) and we propose a broker that can alleviate this for users by operating an on-demand service offering performance-assured instances, that is, instances with a known performance and priced according to that.
The use of a broker, which acts an intermediary (of some form) between providers of Cloud systems and their users to address Cloud problems, is commonplace in the literature, with Rogers and Cliff (2012), Gottschlich et al. (2014) and Lenk et al. (2011) proposing various brokers. Brokers may absorb various risks or add value by solving specific problems. However, performance variation results in a variety of different problems for users. For example, a user may want to scale out the infrastructure underlying their service and needs to ensure an estimated work requirement is met for a ‘best’ price, or wants to run a simulation and needs to ensure a sufficient number of jobs will complete within a specified time.
A broker can work with clients offering bespoke solutions to their problems; alternatively a broker could advertise instances rated at particular performance levels with a particular price. From performance and pricing information a client of the broker can determine whether or not their needs can be met. This latter approach is arguably preferable to the bespoke solutions as it does not require a client to supply information that may be commercially sensitive, such as deadlines. It also allows for clients who may not have explicit requirements but are simply ‘in the market’ for best possible performance as they can derive value from work being done faster. Further, a natural development is a performance marketplace consisting of multiple brokers selling instances whose price is based on performance.
There is still risk for the client as they may pay for an instance whose performance subsequently declines, so-called stragglers. However, by operating an on-demand service there are no usage commitments and instances can be returned to the broker at any time. This limits the downside risk for the client. Further, if performance improves, the client benefits from this with no increase in price.
For the rest of this thesis by the performance broker we understand a broker that re-sells instances based on their performance and we say that the broker provides performance-assured instances. Implicitly we understand the broker has obtained instances from some Cloud marketplace, rated them with respect to some performance metric(s), re-priced them and offered them for sale. For reasons we discuss in chapter 2, we consider the marketplace to be a commodity marketplace for Cloud resources.
The performance broker is also a broker in the more general and widely used sense of an agent that purchases goods or services in one market for sale in another. The primary market is dominated by a small number of large providers offering mass-produced standardised resources, whilst the performance broker(s) and users with specific performance needs come together to create a secondary marketplace. From this viewpoint, performance variation creates arbitrage opportunities: in the primary market, different levels of performance on the same instance types are sold at the same price; on the secondary market price varies by performance.
Guzek et al. (2015) notes that for a broker to be a sustainable business it must create a ‘value-add’ for which they can charge. The demonstration of profitability under varying market conditions for brokers is rare; and indeed Rogers and Cliff (2012) is the only example we can find. Notably, Cartlidge and Clamp (2014) demonstrate the opportunity for profit identified by Rogers and Cliff (2012) may already have been closed. Can the broker create a sustainable business by addressing performance requirements?
Dostları ilə paylaş: |