In this section, we consider the problem of how to model the performance of a set of instances over time, a vital consideration when simulating a broker that maintains a pool of instances. We let I denote a set of instances and inst ϵ I. Looking ahead, the broker advertises performance with respect to q > 0 different workloads which we denote by w1, w2,…,wq, and so for each inst ϵ I we must be able to specify the performance of inst with respect to wm at time t > 0.
We let perf(inst,m)(t) denote the performance of inst measured with respect to wm, where 1 <= m <= q, at time t > 0. Our objective is two-fold: (1) an instance’s time series perf(inst)(t0), perf(inst)(t1),…, perf(inst)(tn) measured at times t0, t1,...tn should be qualitatively similar to that found in section 5.7 and (2) a cross-section of performance taken for each inst ϵ I at time t > 0 should be qualitatively similar to the histograms presented in sections 5.2 – 5.4. To achieve this we incorporate the following as discussed in section 5.10.
-
A heterogeneous instance type will have performance variation due to differences in CPU model.
-
In a heterogeneous instance type performance variation is workload specific, and so there is not necessarily a best CPU model for all workloads.
-
Workload/CPU histogram is typically highly peaked close to the best possible with a long tail.
-
The longitudinal performance of instances is typically stationary, and if not locally-stationary.
-
Instances with the same CPU model may have different mean levels of performance over a given period.
We begin by considering the problem of how, for an inst ϵ I, perf(inst,m)(t) should vary as t varies. From the results in section 5.7 we observe that over a 48 hour period the performance time series of the majority of instances was either stationary or locally-stationary. For simplicity we therefore model an instance as having stationary performance. However, as we shall note shortly, it is straightforward to extend to locally-stationary.
The simplest time series model of a stationary process is a moving-average of order 1, MA(1), and so modelling the performance of a stationary instance as an MA(1) process with Gaussian noise we have :
perf(inst,m)(t) := constant + noise noise ~ N(0, σ2) 6.1
Note that we have E[perf(inst,m)(t)] = constant and Var[perf(inst,m)(t)] = σ2.
However, for the same workload wm we would expect different instances to have both a different mean performance and be subject to differing amounts of noise – else all instances will have the same performance. To reflect this we update eq. 6.1:
perf(inst,m)(t) := constant(inst,m) + N(0, σ(inst,m)2). 6.2
How do we generate values for constant(inst,m) and σ(inst,m) ? We begin with the former, and our strategy is as follows: for each CPU model an instance may run on we define a range of possible values for constant(inst,m) by specifying a minimum, median and a maximum. The minimum is the best possible constant(inst,m) an instance may be assigned, maximum is the worst possible whilst median is the 50th percentile for constant(inst,m). We begin by generating these values for a best CPU for the workload. The minimum, median and maximum for each of the remaining CPUs is obtained by applying a degradation/slowdown factor.
In detail: Suppose we have an instance that is heterogeneous of degree k i.e. there are k possible CPU models an instance may run on, which we denote by CPU1, CPU2, …, CPUk. We begin by choosing a best CPU model for wm at random from CPU1, CPU2,…, CPUk. Suppose that CPUj was chosen. Next, we set a min(j,m) for wm on CPUj by randomly selecting U[200,1000], that is min(j,m) ~ U[200,1000], values chosen on the basis of execution times observed. This is the best possible mean performance an instance, irrespective of CPU model, can achieve.
For each of the remaining CPU models CPUi ϵ {CPU1, CPU2,…, CPUk}\CPUj, we determine its min(i,m) for wm on CPUi by degrading minbest as follows:
min(i,m) := degrade_2_min*min(j,m) . 6.3
where degrade_2_min ~ [1.0, spread] and spread > 1.0 is specified as a parameter of the model. In this way we set a min value for each CPU by degrading the minimum value of the best CPU. Given a min(i,m) for CPUi we determine its median and max values as follows:
median(i,m) := min(i,m)*degrade_2_median. 6.4
max(i,m):= min(i,m)* degrade_2_max. 6.5
where degrade_2_median ~ [1.0, peak] and degrade_2_max ~ [ peak, tail ] where peak and tail are parameters of the model. We do this for each CPU.
Given an inst ϵ I running workload wm we generate constant(inst,m) as follows: First we determine the CPU model that inst is running on – and suppose this is CPUj. Next, constantm will be sampled from either U[min(j,m), median(j,m)] or U[median(j,m), max(j,m)], and with an equal probability.
Finally, and based on observed values, we assign inst a coefficient of variation: CoV ~ U[0.01,0.05] and from this we calculate:
σ(inst,m)2 := constant(inst,m)/ CoV . 6.6
The parameter spread controls the degree of variation across different CPUs, as each CPU will be degraded relative to best possible min by at most this factor, and so the higher the value the larger the variation. The parameters peak and tail control the degree of variation amongst instances with the same CPU.
Finally, we note that some instances in the results in section 5.7 were locally-stationary instances. That is, they are stationary for a period and then jump to a new stationary process with either a different mean or variance or both. To model this we simply re-generate constant(inst,m) and σ(inst,m)2 at random intervals. The frequency with which we do this is a parameter of the simulation when considering this type of performance.
We simulate the performance of inst at times t0, t1,...tn, by ‘measuring’ the instance i.e. perf(inst,m)(t0), perf(inst,m)(t1),…, perf(inst,m)(tn). Note that different runs over the same time periods will produce a different performance time series, and so each run is a realisation from the ensemble of all possible time series. Below we present a selection of cross-sections of performance for various set of instances.
Figure : Example histograms of cross-sectional results produced by the performance model.
We recall that our objective in this section is two-fold: (1) an instance’s time series perf(inst)(t0), perf(inst)(t1),…, perf(inst)(tn) measured at times t0, t1,...tn should be qualitatively similar to that found in section 5.7 and (2) a cross-section of performance taken for each inst ϵ I at time t > 0 should be qualitatively similar to the histograms presented in sections 5.2 – 5.4.
With regards to our first objective, we model an instance’s performance using a MA(1) time series. As this is always stationary it produces qualitatively similar series to those discussed in section 5.7, whilst we can model locally-stationary series by introducing jumps. With regards to our second objective we can observe highly peaked histograms which are characteristic of observed performance. Interestingly, for the case NUM_CPUS = 4 the histogram ‘splits’ into 2 clusters either side of execution times of 900 seconds, similar to the bzip2 histogram for the C1 class. Looking ahead to section 6.10, we consider workloads defined by the following parameters: NUM_CPUS := 1, 2, 4, spread := 1.1, 1.37, peak := 1.03, 1.1 and tail := 1.1, 1.25. For various combinations we generate 1000 different workloads and calculate degrade from best to worst. In the table below we present average degrade of a 1000 randomly generated per combination of parameters.
Table : Average degrade of 1000 workloads for the following combinations of parameters.
NUM_CPUS
|
spread
|
peak
|
tail
|
Avg degrade
|
1
|
N/A
|
1.03
|
1.1
|
10%
|
1.1
|
1.25
|
20%
|
2
|
1.1
|
1.03
|
1.1
|
15%
|
1.1
|
1.25
|
26%
|
3
|
1.37
|
1.03
|
1.1
|
40%
|
1.1
|
1.25
|
52%
|
Our model produces highly peaked multi-modal histograms, and the range of variation produced is also within the range of those empirically observed. As the model produces qualitatively similar results to that obtained empirically, this ensures the validity of data assumptions with regard to performance in the model. In the next section we discuss how the broker divides workload performance into tranches and how these are priced.
Dostları ilə paylaş: |