5.6 Performance Refresh: The C4 family
Providers such as EC2 periodically add new instance types, either with features not available on extant types, or to provide a ‘next generation’. At the time of writing, AWS launched the M4 and C4 family, which are the latest generation in the ‘general purpose’ and ‘high CPU’ families respectively. To ensure that our previous conclusions are still valid, in particular our characterisations, we benchmark 200 C4 instances and discuss results in this section.
We take the opportunity to add two additional CPU workloads to those used previously: sa-learn and Hmmer, as described in section 5.1, allowing us to further our understanding of workload specific characteristics. We update the input set of POV-Ray to include additional scenes from the ‘advanced’ set – as well as the standard benchmark.pov. Further, we make use of pbzip2; as noted, this detects all available vCPUs and runs the compression in parallel, and is also provided with an enhanced input by combing the input file used in previous experiments with a range of additional files. Finally, we add an I/O benchmark Iostat, as well as the general system benchmark pgbench, giving us a broader range of workloads.
We begin by presenting histograms and summary statistics for CPU bound workloads on C4 instances.
Figure : Histograms of sa-learn, POV-Ray, NAMD, pbzip2, Hmmer and GNUGO on C4 respectively. We note high peaks indicating performance is typically close to best possible. POV-Ray and NAMD show negligible variation, although for other workloads, such as sa-learn, we note the long tail and positive skew.
Table : Minimum, 25th percentile, median, 75th percentile, 95th percentile and maximum value of pzbip2, GNUGO, POV-Ray, NAMD, sa-learn and Hmmer on C4 respectively.
Benchmark
|
Min(s)
|
25th Perc(s)
|
Median(s)
|
75th Perc(s)
|
95th Perc(s)
|
Max(s)
|
pbzip2
|
62
|
65
|
67
|
68
|
72
|
77
|
GNUGO
|
158
|
161
|
162
|
164
|
167
|
172
|
POV-Ray
|
453
|
454
|
455
|
456
|
456
|
469
|
NAMD
|
204
|
205
|
205
|
206
|
207
|
214
|
Sa-learn
|
72
|
74
|
75
|
76
|
79
|
84
|
Hmmer
|
1.48
|
1.51
|
1.53
|
1.6
|
1.7
|
2.04
|
Table : Minimum, 25th percentile, median, 75th percentile, 95th percentile and maximum values of pbzip2, GNUGO, POV-Ray, NAMD, sa-learn and Hmmer expressed as a degrade relative to minimum respectively. We note the distance from min to median is typically small, resulting in high peaks in histograms, whilst the distance to max is typically much bigger. We highlight the minimum, median and maximum.
Benchmark
|
Min(s)
|
25th Perc(s)
|
Median(s)
|
75th Perc(s)
|
95th Perc(s)
|
Max(s)
|
pbzip2
|
1.0
|
1.05
|
1.08
|
1.1
|
1.16
|
1.24
|
GNUGO
|
1.0
|
1.02
|
1.03
|
1.04
|
1.06
|
1.09
|
POV-Ray
|
1.0
|
1.0
|
1.0
|
1.01
|
1.01
|
1.04
|
NAMD
|
1.0
|
1.0
|
1.0
|
1.01
|
1.01
|
1.05
|
Sa-learn
|
1.0
|
1.03
|
1.04
|
1.06
|
1.1
|
1.17
|
Hmmer
|
1.0
|
1.02
|
1.03
|
1.08
|
1.15
|
1.38
|
The most noticeable results are for NAMD and POV-Ray where we have virtually no variation between 95% of the instances. Arguably, this is what we should expect irrespective of workload. Results for the other benchmarks are broadly in-line with those reported in sections 5.1 – 5.4, where we again observe small differences from minimum to median, producing a peak close, visually, to the best possible. Further, the difference from median to maximum is greater than minimum to median and so we again have a long tail.
It should be noted that our methodology is to benchmark each instance three times and then take the average. This means that each point in the analysed data represents a single instance, and so the histograms and summary statistics show variation between instances, or rather, variation between average instance performances. However, this does mean that we ‘smooth away’ some of the variation as the quantiles for all 600 data points, for NAMD and pbzip2 show:
Table : Minimum, 25th percentile, median, 75th percentile, 95th percentile and maximum of pbzip2 and NAMD, all results no smoothing, on C4 respectively. We note an increase in overall variation, and highlight the minimum and maximum values.
Benchmark
|
Min(s)
|
25th Perc(s)
|
Median(s)
|
75th Perc(s)
|
95th Perc(s)
|
Max(s)
|
pbzip2
|
62
|
63
|
65
|
69
|
78
|
87
|
NAMD
|
201
|
202
|
202
|
211
|
213
|
222
|
In both cases we find an increase in the width of median to maximum, and indeed for pbzip2 we find minimum to median is now narrower. For pbzip2 we have a degrade of 1.26 and 1.4 to the 95th percentile and maximum respectively, whilst for NAMD this is now 1.06 and 1.1.
We next present the results of the general purpose postgres benchmark:
Figure : Histogram of pgbench (Postgres) on C4. We note the large variation.
The degree of variation is evident from the histogram, and we find a difference of 100% from worst to the best. Summary statistics are presented below, with the metric being transactions per second. Note that higher is better.
Table : Minimum, 5th percentile, 25th percentile, median, 75th percentile and maximum value of pgbench on C4. We note the large range and highlight the minimum and maximum values.
Min
|
5th Perc
|
25th Perc
|
Median
|
75th Perc
|
Max
|
815
|
1015
|
1265
|
1371
|
1456
|
1550
|
The postgres benchmark stresses different components of an instance’s sub-systems. A potential explanation for the degree of variation found lies in variation in I/O performance, which we measured using Iozone, and we report summary statistics below (recall: Iozone reports in MB/s and so higher is again better).
Table : Minimum, 5th percentile, 25th percentile, median, 75th percentile and maximum of Iozone read and write on C4 respectively. . We note the large range and highlight the minimum and maximum values.
Iozone
|
min
|
5th Perc
|
25th Perc
|
Median
|
75th Perc
|
max
|
Read
|
113
|
155
|
244
|
878
|
2295
|
2928
|
Write
|
298
|
402
|
642
|
927
|
1252
|
1509
|
We see a large degree of variation, particularly so for read performance. Interestingly, whilst we find a high degree of linear correlation between read and write performance, with a Pearson correlation coefficient of 0.81, we find no correlation between read or write performance and postgres. This somewhat confounds our expectations as a priori we expected postgres to be predominately I/O bound. We have no explanation for the degree of postgres variation other than resource contention on the host.
The results in this section show that per CPU performance characterisation, as discussed in section 5.2 – 5.4, is still valid on the latest generation of instance types.
Dostları ilə paylaş: |