In previous sections we have demonstrated how the performance of an instance is primarily determined by its CPU model. In this section, we consider the question of how the proportion of CPU models found differs across different locations within a Cloud, and also how proportions differ when making multiple requests. The latter concern will address H4. In particular, the 540 m1.small instances whose performance we report on in section 5.2 run across 14 different AZs within EC2. In Table 15, below, we present the proportions of each CPU in each AZ.
Table : Proportions of CPU models E5430, E5-2650, E5645 and E5507 respectively across 14 different AZs in EC2. We note how proportions vary by AZ, indicating different differences in hardware composition across different AZs. Indeed we note the absence of older model, such as the E5403, in newer AZs. As a consequence performance varies across AZs. For each CPU we highlight the smallest and largest proportion found. For example, the largest proportion found of the E5-2650 is 86% in sa-east-1b.
Zone
|
E5430
|
E5-2650
|
E5645
|
E5507
|
us-east-1a
|
31%
|
0%
|
25%
|
44%
|
us-east-1b
|
5%
|
59%
|
29%
|
7%
|
us-east-1c
|
0%
|
47%
|
52%
|
1%
|
us-east-1d
|
18%
|
31%
|
44%
|
7%
|
us-west-1b
|
0%
|
0%
|
13%
|
87%
|
us-west-1c
|
8%
|
0%
|
18%
|
74%
|
eu-west-1a
|
4%
|
75%
|
19%
|
2%
|
eu-west-1b
|
28%
|
0%
|
44%
|
28%
|
eu-west-1c
|
4%
|
0%
|
63%
|
33%
|
ap-southeast-2a
|
0%
|
64%
|
36%
|
0%
|
ap-southeast-2b
|
0%
|
75%
|
25%
|
0%
|
sa-east-1a
|
0%
|
81%
|
19%
|
0%
|
sa-east-1b
|
0%
|
86%
|
14%
|
0%
|
us-west-2b
|
0%
|
73%
|
27%
|
0%
|
As we can clearly see from Table 15 the proportions of CPU models found varies by AZ. Indeed, in 5 AZs we only find E5645 and E5-2650 models. Notably, AZs within longer established Regions such as us-east-1 and eu-west-1 are typically more heterogeneous, and we find a mix of E5430, E5645, E5-2650 and E5507. However, even within a Region we find differences across AZs. In us-east-1 we find, for example, that 44% of instances within the AZ us-east-1a were backed by an E5507, whilst for all other AZs within the Region this drops to less than 10%; in us-east-1c we find just 1% of instances backed by an E5507.
The most likely explanation for the variation in CPU model distribution is that as Amazon adds new Regions, adds and expands zones in existing Regions, and makes hardware refreshes, the distribution of CPU models increasingly differs from zone to zone. Arguably, heterogeneity to some degree is, if not inevitable, highly likely for popular instance types.
We also find differences in CPU model distribution (for the same instance type) on GoGrid and Rackspace. In the latter case, for example, all instances in the Hong Kong Region were backed by an AMD Opteron 4332 HE, whilst in the Sydney Region we find a mix of Opteron 4332 and 4170. However, in the Chicago, North Virginia and Dallas Regions we find instances running on either of these AMD models or an Intel E5-2670. The GoGrid Regions us-east-1 and us-west-1 have similar proportions of Intel E5520 and X5650, perhaps an indication that they were established at the same time.
Given that (for heterogeneous instance types) performance varies by CPU model, the available performance within different locations varies, and so workload costs will vary by location. However, it is reasonable to question the consistency of proportions, and indeed the permanence, of CPU models within AZs and we recall hypothesis H4:
H4: The allocation of CPU models to instances made within the same request is more irregular than extant assumptions allow for.
We investigated this by conducting two resource allocation experiments on EC2, within us-east-1a and ap-southeast-2b respectively.
A request for 20 instances was made, the CPU models found were recorded and the instances were released. This was repeated 5 times in us-east-1a over a 24 hour period, and 10 times in ap-southeast-2b over the course of a week. In ap-southeast-2b all instances were backed by either E5645 or the E5-2650, whilst in us-east-1a we have instances backed by E5645, E5507 or the E5-2650. Tables 16 and 17 below show how many of each type we find. As we can see, there is variation in the number of each CPU model obtained per request.
Table : Number of E5645 and E5-2650 per sample, across 10 samples of 20 instances each obtained in ap-southeast-2b. We note the range is larger than we would expect if CPU models were allocated independently to instances. For each CPU we highlight the smallest and largest found across the 10 samples, for example, the highest number of E6545 was 12 in sample 10.
Sample
|
1
|
2
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
E5645
|
5
|
7
|
7
|
5
|
3
|
9
|
8
|
5
|
7
|
12
|
E5-2650
|
15
|
13
|
13
|
15
|
17
|
11
|
12
|
15
|
13
|
8
|
Table : Number of E5645, E5-2650 and E5507 per sample, across 5 samples of 20 instances each obtained in us-east-1a. We note the irregularity of CPU distributions across samples, and we highlight the difference between sample 4 and 5.
Sample
|
1
|
2
|
3
|
4
|
5
|
E5645
|
12
|
15
|
11
|
11
|
17
|
E5-2650
|
6
|
0
|
3
|
6
|
0
|
E5507
|
2
|
5
|
3
|
3
|
3
|
From Table 16, of the 200 instances started in 10 batches of 20 we find the proportion of E5645 to be 0.34 and so we estimate the probability of obtaining an instance backed by a E5645 as 0.34 within ap-southeast-2b. According to the assumptions made by Farley et al. (2012) and Ou et al. (2013), if we let X denote the number of E5645 returned by a request we would have X ~ Bin(20,0.34). Then E[X] = 6.8, and indeed in 8 of the samples we have X within 2 standard deviations of the mean. However, by calculation, P[X =3] = 0.038 and we would expect to see this value in (less than) one in 25 samples, whilst P[X=12] = 0.01, and so we would see this value one in 100. From this small set of samples, we can see bunching around the mean but with more variation in X than one would expect to see if X was binomial.
In our next example, from us-east-1a, we had 97 instances – obtained in 4 batches of 20 and one of 17. Of these, 15 are E5-2650, and so we estimate the probability of obtaining an E5-2650 as p = 15/97 = 0.15. This time, let X = number of E5-2650 obtained in a request for 20 instances, and so X ~ Bin(20,0.15). In our 5 samples we have X = 0 and X = 6 twice, and from a direct calculation using the Binomial, we find the probability of obtaining either X = 0 or X > 6 is less than 0.05 in both cases. We should see such values for X less than once in 20 requests.
The results of these experiments confirm H4, and suggest that the assumptions made by Farley et al. (2012) and Ou et al. (2013) are not supported by empirical evidence, and that they fail to adequately describe variation in CPU model distribution. Informally, we say that CPU model distribution is ‘lumpy’. Any performance model based on these assumptions will underestimate variation, and hence risk.
Dostları ilə paylaş: |