A performance Brokerage for Heterogeneous Clouds


Performance Interference from co-location



Yüklə 1,38 Mb.
səhifə32/49
tarix09.01.2019
ölçüsü1,38 Mb.
#94329
1   ...   28   29   30   31   32   33   34   35   ...   49

5.9 Performance Interference from co-location

The results in sections 5.7 and 5.8 demonstrate performance variation on homogeneous instances types. Our best explanation for why this exists is the noisy neighbour effect i.e. the performance of an instance is affected by the resource consuming actions of co-locating instances. Other explanations include differences in underlying hardware that we cannot detect and are not advertised, as well as the possibility of differences in hardware quality. We know for example that the CPU manufacturing process does not produce identical units. To investigate the noisy neighbour problem we formulate the following hypothesis:


H6: The actions of an instance can degrade the performance of its co-locating neighbour, the effect of which varies by workload.
Dedicated instances on EC2, where the user has full control over the number of instances on a host and is guaranteed to have no neighbours owned by other users, allow us to investigate noisy neighbours on the Cloud. In addition they provide an answer to the question of whether or not the observed best on-demand performance is indeed the best possible and we address this first: When repeatedly30 launching dedicated instances of type m4.large we find negligible difference between its performance and best performance from the on-demand pool. This shows that best observed performance is close to best possible performance as determined by a dedicated instance.
To investigate H6, we start 16 m4.large instances which are co-locating on the same host31. From AWS documentation a maximum of 22 m4.xlarge instances will co-locate, and so the host is at ~75% capacity. One of these instances we regard as our primary instance. Our objective is to discover the degree to which we can degrade the performance of our primary instance by running various workloads in its neighbour instances.
We proceed as follows: In our primary instance we start pbzip2 which is repeatedly executed with a minute of wait time between one execution finishing and the next starting, and we do this for the whole of the primary instance’s duration. We allow this to run for a period during which we execute no workloads in its neighbours. We call this period 1, of which we have 7 in total as follows:


  • Period 1: Neighbours not executing any workload

  • Period 2: All neighbours executing pbzip2

  • Period 3: Neighbours not executing any workload

  • Period 4: All neighbours executing sa-learn

  • Period 5: Neighbours not executing any workload

  • Period 6: All neighbours executing STREAM

  • Period 7; All neighbours executing 2 STREAM processes

Below is the time-plot for pbzip2 within the primary instance.



Figure : Time-plot of pbzip2 within the primary instance. Changes in performance are due to actions of neighbours. We observe jumps in performance corresponding to activity on co-located instances. Different types of workloads have a different effect on pbzip2 performance, with pbip2 being particularly sensitive to memory bandwidth contention, as demonstrated in period 7.


In period 1 the primary instance executes the benchmark at an average of 72s with essentially no variation. In period 2 we start executing pbzip2 in all neighbours and we observe a degrade in performance for the primary instance, and during this period it executes the benchmark at average of 87s, again with negligible variation, a ~21% increase in execution time from period 1 to 2. At the end of period 2 we stop executing pbzip2 in the neighbours and performance jumps back to its previous level for duration of period 3. We then execute sa-learn in all neighbours and performance jumps to 77s, an increase of 7%, and the instance remains at this level with negligible variation through period 4, at the end of which we stop executing sa-learn within the neighbours and performance jumps back to 72s. For period 6 we run STREAM in all neighbours and we see a large degrade in performance as it jumps from 72s to 143s, and in period 7 we run 2 STREAM processes in all neighbours at which point performance jumps from 143 to 325s , a ~450% degrade on best observed performance.
This result shows how resource contention causes performance degradation, but also that different workloads will have different effects, confirming H6. We reiterate our assertion that there is no reason to believe instances with the same CPU model will have the same performance over a given time period, as this will depend upon the degree of contention being experienced. Clearly, pbzip2 is sensitive to memory bandwidth contention as we have a significant degrade in periods 6 and 7.

5.10 Summary and Discussion

In this section we summarise the experiments conducted with a view to establishing the hypotheses stated in section 4.7, we also describe the features that a model of instance must have.


H1: Performance variation is commonplace across different instance types and providers, and is found on homogeneous and heterogeneous instance types. Heterogeneity of instance types leads to variation and increased heterogeneity typically leads to increased variation.
In section 5.2 we demonstrate that performance variation exists across multiple instance types and providers and this is true for both homogeneous and heterogeneous instance types. In the latter case we find no examples of different CPUs backing the same instance type with essentially indistinguishable performance, and indeed we find no reports of this in the literature. This confirms H1.
H2: Heterogeneity typically produces consistent ranges of variation.
In section 5.3 we demonstrated that performance variation is consistent across different cross-sections. That is, the range of both per CPU variation and variation between CPUs is consistent across different sets of instances of the same type but acquired at different times. This confirms H2.
H3: Different workloads have different levels of variation due to differences in how they utilise underlying hardware.
In section 5.4 we find that there is not necessarily a best CPU model for all workloads on a heterogeneous instance type as different workloads are better/worse for different CPU models. Further, newer CPU models are not necessarily better than older ones. For the same instance type different workloads will have different degrees of variation, whilst the same workload will have different degree of variation on different instance types, and so one cannot readily be used to predict the other. This confirms H3.
H4: The allocation of CPU models to instances made within the same request is more irregular than extant assumptions allow for.
In section 5.5 we find that for heterogeneous instance types performance varies by location, as the hardware backing it in different locations varies. However, within a given location the allocation of instances to CPU models is typically more irregular than would be the case if they allocated independently. Further, at times the allocation appears ‘erratic’ as we find new models not previously seen. This confirms H4.
H5: Instances running on supposedly identical hardware, as identified by having the same CPU model, do not necessarily have the same performance levels over a given period of time.
In section 5.7 we conducted a longitudinal study of 50 instances which finds that performance in the short term (48 hours) is typically stationary and so has a constant mean and variation making it predictable. Further, variation of individual instances over time was typically far smaller than the variation across the set of instances. When not stationary we find it is often locally-stationary, demonstrating jumps from one stationary level to another. However, we did observe a somewhat pathological instance with large and varying deviations. As such, performance risk primarily exists at the point of sale. Once an instance has been obtained we can predict future performance with a degree of confidence. This confirms H5.
H6: The actions of an instance can degrade the performance of its co-locating neighbour, the effect of which varies by workload.
In section 5.9 we considered how co-locating instances may degrade performance through resource contention. The effect is workload specific and in the worst case we find that we can degrade performance by ~450% when running two STREAM processes in the co-locating neighbours. This confirms H6.
In addition to establishing our hypotheses, the experimental data allows us to further characterise performance. Heterogeneity results in multi-modal histograms with results clustered by CPU model, and degrade from the best to worst is typically much longer and observed to be as a high 100%. Considering the workload/CPU distribution, we find it is typically highly peaked close to the best possible with a long tail. Indeed, we regularly find degrade of less than 5% from minimum execution time (so best performance) to the median, whilst we have observed degrade of 58% from best to worst.
In section 5.8 we consider correlation amongst instances. Computing Kendall’s Tau correlation coefficient we can measure the degree of association between the rankings of instances based on their performance. For some workloads we find strong correlation whilst for others we find weak to negligible correlation. As a consequence, performance of one workload cannot necessarily be used to predict performance of others. Further, even when we do find strong correlation it is typically asymmetric with strong correlation amongst better performance. As such, for correlated workloads, good performance for one workload will imply likely good performance for another, however from poor performance we cannot necessarily make inferences.
STREAM performance in heterogeneous environments is also primarily determined by CPU model, and this benchmark also provides an interesting example of how newer CPU models are not necessarily better as the E5-2651 has almost half the memory bandwidth of the previous generation E5-2650. We find the largest variation is in I/O performance, adding support for Leitner and Cito (2016) finding that I/O performance is highly inconsistent, with large deviations occurring within short periods of time. Notable here is EC2’s response to I/O variation to charge more for assurance of better performance, and so one can obtain assurances in terms of either IOPs or MB/s (latency and throughput). From the point of view of a performance broker, it would appear difficult to further add value for I/O performance.
Findings presented here characterise compute performance sufficiently to allow us to produce a realistic model of it, and we list here all the features a realistic model of instance performance must contain:

  • A heterogeneous instance type will have performance variation due to differences in CPU model.

  • In a heterogeneous instance type performance variation is workload specific, and so there is not necessarily a best CPU model for all workloads.

  • Workload/CPU histogram is typically highly peaked close to the best possible with a long tail.

  • The longitudinal performance of instances is typically stationary, and if not locally-stationary.

  • Instances with the same CPU model may have different mean levels of performance over a given period.

Arguably, the most important finding of this chapter is that performance variation primarily exists at point of sale and so making it feasible for the broker to offer performance-assured instances for compute bound workloads and memory bandwidth. Whether or not the broker can address performance needs and make a profit is the subject of the next chapter.





Yüklə 1,38 Mb.

Dostları ilə paylaş:
1   ...   28   29   30   31   32   33   34   35   ...   49




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin