A performance Brokerage for Heterogeneous Clouds


Performance Refresh: The C4 family



Yüklə 1,38 Mb.
səhifə29/49
tarix09.01.2019
ölçüsü1,38 Mb.
#94329
1   ...   25   26   27   28   29   30   31   32   ...   49

5.6 Performance Refresh: The C4 family

Providers such as EC2 periodically add new instance types, either with features not available on extant types, or to provide a ‘next generation’. At the time of writing, AWS launched the M4 and C4 family, which are the latest generation in the ‘general purpose’ and ‘high CPU’ families respectively. To ensure that our previous conclusions are still valid, in particular our characterisations, we benchmark 200 C4 instances and discuss results in this section.


We take the opportunity to add two additional CPU workloads to those used previously: sa-learn and Hmmer, as described in section 5.1, allowing us to further our understanding of workload specific characteristics. We update the input set of POV-Ray to include additional scenes from the ‘advanced’ set – as well as the standard benchmark.pov. Further, we make use of pbzip2; as noted, this detects all available vCPUs and runs the compression in parallel, and is also provided with an enhanced input by combing the input file used in previous experiments with a range of additional files. Finally, we add an I/O benchmark Iostat, as well as the general system benchmark pgbench, giving us a broader range of workloads.
We begin by presenting histograms and summary statistics for CPU bound workloads on C4 instances.













Figure : Histograms of sa-learn, POV-Ray, NAMD, pbzip2, Hmmer and GNUGO on C4 respectively. We note high peaks indicating performance is typically close to best possible. POV-Ray and NAMD show negligible variation, although for other workloads, such as sa-learn, we note the long tail and positive skew.

Table : Minimum, 25th percentile, median, 75th percentile, 95th percentile and maximum value of pzbip2, GNUGO, POV-Ray, NAMD, sa-learn and Hmmer on C4 respectively.



Benchmark

Min(s)

25th Perc(s)

Median(s)

75th Perc(s)

95th Perc(s)

Max(s)

pbzip2

62

65

67

68

72

77

GNUGO

158

161

162

164

167

172

POV-Ray

453

454

455

456

456

469

NAMD

204

205

205

206

207

214

Sa-learn

72

74

75

76

79

84

Hmmer

1.48

1.51

1.53

1.6

1.7

2.04

Table : Minimum, 25th percentile, median, 75th percentile, 95th percentile and maximum values of pbzip2, GNUGO, POV-Ray, NAMD, sa-learn and Hmmer expressed as a degrade relative to minimum respectively. We note the distance from min to median is typically small, resulting in high peaks in histograms, whilst the distance to max is typically much bigger. We highlight the minimum, median and maximum.

Benchmark

Min(s)

25th Perc(s)

Median(s)

75th Perc(s)

95th Perc(s)

Max(s)

pbzip2

1.0

1.05

1.08

1.1

1.16

1.24

GNUGO

1.0

1.02

1.03

1.04

1.06

1.09

POV-Ray

1.0

1.0

1.0

1.01

1.01

1.04

NAMD

1.0

1.0

1.0

1.01

1.01

1.05

Sa-learn

1.0

1.03

1.04

1.06

1.1

1.17

Hmmer

1.0

1.02

1.03

1.08

1.15

1.38

The most noticeable results are for NAMD and POV-Ray where we have virtually no variation between 95% of the instances. Arguably, this is what we should expect irrespective of workload. Results for the other benchmarks are broadly in-line with those reported in sections 5.1 – 5.4, where we again observe small differences from minimum to median, producing a peak close, visually, to the best possible. Further, the difference from median to maximum is greater than minimum to median and so we again have a long tail.


It should be noted that our methodology is to benchmark each instance three times and then take the average. This means that each point in the analysed data represents a single instance, and so the histograms and summary statistics show variation between instances, or rather, variation between average instance performances. However, this does mean that we ‘smooth away’ some of the variation as the quantiles for all 600 data points, for NAMD and pbzip2 show:
Table : Minimum, 25th percentile, median, 75th percentile, 95th percentile and maximum of pbzip2 and NAMD, all results no smoothing, on C4 respectively. We note an increase in overall variation, and highlight the minimum and maximum values.

Benchmark

Min(s)

25th Perc(s)

Median(s)

75th Perc(s)

95th Perc(s)

Max(s)

pbzip2

62

63

65

69

78

87

NAMD

201

202

202

211

213

222

In both cases we find an increase in the width of median to maximum, and indeed for pbzip2 we find minimum to median is now narrower. For pbzip2 we have a degrade of 1.26 and 1.4 to the 95th percentile and maximum respectively, whilst for NAMD this is now 1.06 and 1.1.


We next present the results of the general purpose postgres benchmark:

Figure : Histogram of pgbench (Postgres) on C4. We note the large variation.

The degree of variation is evident from the histogram, and we find a difference of 100% from worst to the best. Summary statistics are presented below, with the metric being transactions per second. Note that higher is better.
Table : Minimum, 5th percentile, 25th percentile, median, 75th percentile and maximum value of pgbench on C4. We note the large range and highlight the minimum and maximum values.

Min

5th Perc

25th Perc

Median

75th Perc

Max

815

1015

1265

1371

1456

1550

The postgres benchmark stresses different components of an instance’s sub-systems. A potential explanation for the degree of variation found lies in variation in I/O performance, which we measured using Iozone, and we report summary statistics below (recall: Iozone reports in MB/s and so higher is again better).


Table : Minimum, 5th percentile, 25th percentile, median, 75th percentile and maximum of Iozone read and write on C4 respectively. . We note the large range and highlight the minimum and maximum values.

Iozone

min

5th Perc

25th Perc

Median

75th Perc

max

Read

113

155

244

878

2295

2928

Write

298

402

642

927

1252

1509

We see a large degree of variation, particularly so for read performance. Interestingly, whilst we find a high degree of linear correlation between read and write performance, with a Pearson correlation coefficient of 0.81, we find no correlation between read or write performance and postgres. This somewhat confounds our expectations as a priori we expected postgres to be predominately I/O bound. We have no explanation for the degree of postgres variation other than resource contention on the host.


The results in this section show that per CPU performance characterisation, as discussed in section 5.2 – 5.4, is still valid on the latest generation of instance types.


Yüklə 1,38 Mb.

Dostları ilə paylaş:
1   ...   25   26   27   28   29   30   31   32   ...   49




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin