A performance Brokerage for Heterogeneous Clouds


Performance Variation on EC2, Rackspace and GoGrid



Yüklə 1,38 Mb.
səhifə25/49
tarix09.01.2019
ölçüsü1,38 Mb.
#94329
1   ...   21   22   23   24   25   26   27   28   ...   49

5.2 Performance Variation on EC2, Rackspace and GoGrid

Mindful of being financially constrained26, when investigating performance variation across a variety of instances types on different providers we initially restrict the number of benchmarks used to two: one for compute, where we chose bzip2 and one for memory bandwidth, where we chose STREAM. This ensures all benchmarking, collecting and reporting of results could take place within an hour so as to not incur additional costs. However, based on extant results presented in section 4.2 we expect this to be a sufficient number of benchmarks to establish hypothesis H1, namely:



H1: Performance variation is commonplace across different instance types and providers, and is found on homogeneous and heterogeneous instance types. Heterogeneity of instance types leads to variation and increased heterogeneity leads to increased variation.
Our primary focus is on EC2, for reasons of both system popularity and research comparability as stated previously, where we benchmark ~1150 instances from the C1, M1, M2 and M3 classes27. In particular, we benchmark 200 instances each from the C1, M2 and M3 classes, all of which completed and returned results. For the M1 class we obtained 540 results. We excluded the cc2.8xlarge and cr1.8xlarge for reasons of cost management: the latter cost $3.50 per hour compared to a c1.medium which cost $0.13 as at 13/06/2107 within us-east-1 Region. Similarly, we exclude special purpose instance types such as those with a GPU or large local optimized storage. Mindful of reported variation by location on EC2 we run instances over a wide geographical spread and use 13 different AZs in 5 Regions. The study is a cross-sectional one, that is, we acquire a set of instances with a specific location, measure performance and then release the instances back to the provider.
For each instance, we detected and recorded its CPU model via /proc/cpuinfo. The table below records the CPU models (all Intel Xeon) detected for each instance class:

Table : CPU models found per class on EC2 during various experiments conducted in 2013 – 2016.



Class

CPU Models

M1

E5430, E5645, E5-2650, E5507

M2

X5550, E5-2665

M3

E5-2670

C1

E5506, E5410, E5-2650, E5345

Below we present histograms of the results for both bzip2 and STREAM, followed by summary statistics. Given the multi-modal nature of the results, we summarise performance through the use of percentiles.












Figure : Performance histograms for bzip2 on M1, M2, M3 and C1 respectively. As these are execution times lower is better. We note that for heterogeneous instance types the histograms are multi-modal and we can observe a clustering of results by CPU model. Further, we also observe that the variation across instances with the same CPU model can be significantly less than the overall variation.









Figure : Performance histograms for STREAM on M1, M2, M3 and C1 respectively. As STREAM measures MB/s higher is better. For heterogeneous instance types we observe significant differences between CPU models. We also observe minor variation amongst instances with the same CPU model resulting in a highly peaked multi-modal histogram. The performance of M2 instances backed by the X5550 CPU is somewhat of an anomaly as it is split into 3 distinct ranges.

Table : Minimum, 25th percentile, median, 75th percentile, 95th percentile and maximum value of bzip2 on M1, M2, M3 and C1 respectively. The large range (distance from best to worst) is noticeable.



Instance Class

Min(s)

25th Perc(s)

Median(s)

75th Perc(s)

95th Perc(s)

Max(s)

(m1.small) M1

425

460

502

534

642

716

M2

163

165

170

183

190

236

M3

130

131

135

137

147

206

C1

175

218

247

262

288

357

We discuss the bzip2 results first, the histograms of which are presented in Fig 1, and we recall that lower is better for execution times. An immediate feature of note is the size of the range i.e. the distance from min to max. We find, for example, an increase in execution time from min to max of 104% on the C1 family. Indeed, the lowest increase, found on the M2 family, is 45%. From a cost perspective, on the C1 family, such differences mean that the same workload costs twice as much to run on the worst instance as the best.


In order to better understand the cost implications of performance variation across the range as a whole (for each class), we consider the degradation, or equivalently cost increase, relative to the minimum. We recall that in Equation 3.2 we defined degrade(A,B) := execution_time(A)/execution_time(B) and if degrade(A,B) > 1 then machine A executes the benchmark in a slower time than B. By degrade(p) we understand the degrade of the pth percentile of performance relative to the minimum. For example, degrade(25) is the execution time corresponding to the 25th percentile of performance divided by the minimum (best) performance. In Table 5 below we present degradations with respect to 25th percentile, median, 75th percentile, 95th percentile and maximum value.

Table : 25th percentile, median, 75th percentile, 95th percentile and maximum value of bzip2 expressed as a degrade (slowdown) with respect to the minimum on M1, M2, M3 and C1 respectively.

Instance Class

degrade(25)

degrade(50)

degrade(75)

degrade(95)

degrade(100)

(m1.small) M1

1.08

1.18

1.26

1.51

1.68

M2

1.01

1.03

1.12

1.17

1.45

M3

1.01

1.04

1.05

1.13

1.58

C1

1.25

1.4

1.49

1.65

2.04

Notably, whilst degrade(100) is high for all 4 classes, for both M2 and M3 we find 50% of instances within 3% and 4% slowdown relative to best performance respectively, indeed, for M3 we find 75% of instances are within 5% .


The M3 instances were, as far as we could detect, entirely homogeneous; that is, all instances of this type had the same CPU model. In this case we find 75% of instances within 5% of the best observed and 95% within 13%. Arguably, this is closer to what we would reasonably expect of Cloud performance, where most, if not all, instances should be within a small slowdown of the best available. However, even though we have, prima facie, identical instances, we still find 5% of them are between a 13% and 58% slowdown. We describe this by saying that performance has a long tail, simply meaning that the degree of variation across the worst 50% of instances is significantly longer than the best 50%. We conclude that differences in homogeneous instances are one of the causes of performance variation as a whole.
Superficially, the bzip2 results for M3 and M2 class are qualitatively similar. However, we can clearly observe a bi-modal histogram i.e. 2 peaks for the M2 results. Further, these peaks are due to instances with the same CPU model, in this case either an X5550 or an E5-2265, having performance that peaks close to the per CPU min. Again, we observe each CPU to have a long tail. There are, however, differences in performance due to CPU model, with almost 95% of instances on an E5-2665 having better performance than the best instance on an X5550. Variation as a whole, then, is a function of: (1) per CPU variation i.e. slowdown relative to per CPU minimum; and (2) a difference between the per CPU minimums.
We find the largest slowdowns overall on the m1.small and C1 instances, both of which are heterogeneous with 4 different CPU models each. We can again observe multi-modal (multiple peaks) histograms with each CPU having performance that peaks close to the per CPU minimum, but with a long tail. Further, there are differences between the per CPU minimums. As such, as an instance class becomes increasingly heterogeneous it appears likely that the overall variation will increase due to per CPU differences as well as per CPU degradation.
In the tables below, we break the bzip2 results down by CPU model, presenting first the raw results followed by the degrade relative to the overall minimum.

Table : The minimum, median and maximum of bzip2 per CPU model for M1, M2, M3 and C1 classes respectively. We note how the distance from min to median is typically significantly smaller than distance from median to max. Visually, this manifests as highly peaked close to min, as we observe on Figure 1.



Instance Class

CPU

Min(s)

Median(s)

Max(s)

(m1.small) M1

E5430

425

444

482

E5-2650

443

469

519

E5645

488

507

544

E5507

578

612

716

M2

E5-2665

163

166

236

X5550

180

183

208

M3

E5-2670

130

135

206

C1

E5410

175

197

246

E5345

215

223

236

E5-2650

217

230

250

E5506

241

256

357


Table : The minimum, median and maximum of bzip2 per CPU model for M1, M2, M3 and C1 classes respectively, expressed as a degrade relative to lowest minimum. For each class we highlight the lowest and high degrade, for example, for the M1 class we have the lowest degrade (which is always 1) on a E5430 and the highest of 1.68 on an E5507.

Instance Class

CPU

degrade(min)

degrade(median)

degrade(max)

(m1.small) M1

E5430

1.0

1.04

1.13

E5-2650

1.04

1.1

1.22

E5645

1.15

1.19

1.28

E5507

1.36

1.44

1.68

M2

E5-2665

1.0

1.02

1.45

X5550

1.1

1.22

1.28

M3

E5-2670

1.0

1.04

1.58

C1


E5410

1.0

1.13

1.4

E5345

1.23

1.27

1.35

E5-2650

1.24

1.31

1.43

E5506

1.37

1.46

2.04

We recall that STREAM measures MB/s and so higher is better. We find that the overall variation observed within an instance type is predominantly accounted for by differences in per CPU performance, and these are surprisingly large. Indeed, on the C1 family those running on an E5-2650 have an increased memory bandwidth of a factor of 2 to 4, as compared to those on other CPU models. For workloads whose performance is sensitive to memory bandwidth, such differences will result in significant variations in workload execution costs. A notable exception to the typical STREAM performance, which is highly peaked with small variation, is M2 instances on an X5550. In this case we find a range of performance from ~5,000 MB/s to close to 9,000MB/s, and indeed we observe 3 peaks across this range. Potentially, this indicates underlying hardware differences amongst the X5550 instances that we have not investigated.


Finally, we also note that high performance for bzip2 does not necessarily imply high performance for STREAM, or vice versa. For example, on the C1 family, the best performing model for bzip2 is E5506 and yet this is the worst performing one for STREAM. Similarly, whilst E5-2650 is the best performing for STREAM it is only the third best for bzip2. On the m1.small we also find that the worst performing for STREAM, the E5430, is the best for bzip2! We describe this simply by saying that observed workload performance is CPU specific.
To find out if variation is specific to EC2, or is present on other providers we benchmark GoGrid and Rackspace. On the GoGrid Cloud28, we ran 30 instances across 2 zones using their standard type. We discovered 2 different CPU models: Intel Xeon 5650 and Intel Xeon E5520. Using the bzip2 benchmark as described above we find a range of performance of 179s to 261s, with a mean of 193 and 216 on X5650 and the E5520 respectively. At the same price, then, we find an increase of 12% in mean execution times of instances running on E5520 as compared to X5650.
On Rackspace Cloud29 we ran 50 instances of type large across 5 zones, and discovered 2 different CPU models: AMD 4332 HE and Intel Xeon E5-2670; mean performance with respect to the bzip benchmark was 208s and 190s respectively. At the same price, we find an increase of 10% in mean execution times on instances running on AMD 4332 HE as compared to the Intel E5-2670.
In summary, the results in this section demonstrate that performance variation is commonplace across a range of instance types and also exist for various providers, and so we would expect it to exist for all providers of sufficient size. This confirms H1. In the next section we address performance consistency, which demonstrates how past performance gives assurances over future performance.

Yüklə 1,38 Mb.

Dostları ilə paylaş:
1   ...   21   22   23   24   25   26   27   28   ...   49




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin