6.4Simulated annealing
The SA started with the heuristically selected initial solution that had the smallest total net benefits in enumeration. A random choice was made between available moves from the neighbourhood of the current solution N(x,σ) (i.e., a set of solutions that can be reached from the current solution x by a simple operation σ). For a discrete parameter, such an operation σ might be swapping, replacing, moving or combining two or more objects in a solution[102, 209, 210, 362]. The operator σ used in the hypertension SDDP model allowed moving to a new solution based on the randomly generated policy number within the neighbourhood of the current solution. As there was no information about the most appropriate size of the neighbourhood for the hypertension SDDP, different sizes of the neighbourhood (i.e., ±25, 50 and 100 from the current solution) were tested in the prior tests and found that the final solutions of SA were not sensitive to the size of the neighbourhood in the range between ±25 and ±100, whereas the number of total iteration was smallest where the size of the neighbourhood was ±50 (see Appendix 8). Therefore the hypertension SDDP model defined the neighbourhood ±50 from the current solution.
In order to avoid premature convergence, a random jump to another area was allowed after 30% of the neighbourhood was searched. This happened approximately every 30 iterations. The whole search space can be divided into four sub-spaces having 1,032 policies depending on the initial drug. The range of allowing the random jump was set to ±1000 so that the areas having a different initial drug were searched. The process re-started from the new solution and continued to iterate in this way until the following stopping criteria were met:
-
Temperature at which to stop is 1e-8.
-
Maximum number of consecutive rejections is 1,000.
If the incremental reward between the new candidate solution and the current solution was greater than £1e-6 of total net benefit, the current solution was replaced with the new solution. SA also allowed moving to worse solutions with a probability obtained from the Equation 3.3. That is, if a random number was greater than the probability to accept the worse solutions, the current solution was replaced with the new solution despite the current solution being better than the new solution: this is one of the key mechanisms of SA to avoid getting trapped in a local optimum.
The temperature parameter T was controlled by the initial temperature and cooling rate. It is generally accepted that the initial temperature should be high enough to allow a gradual annealing procedure and to avoid getting trapped into the local optimum. In the hypertension SDDP model, the initial temperature, InitTemp, was set to 1.
The initial temperature slowly and gradually decreased during the search process by a temperature reduction function Q(β,Tc-1). The cooling rate β was assumed to be constant (i.e., the temperature Tc-1 was decreased linearly by β) during the search like many other studies[60, 102, 209, 363]. A careful annealing through a series of temperature levels is desirable to reach a stationary distribution before temperature reduction. This model tested different combinations of cooling rates (β = 0.5, 0.7 and 0.9) and the maximum number of attempts within one temperature (MaxTries = 10, 30 and 50). As the initial temperature decreased, the probability of accepting worse solutions also decreased.
During the search process, the evaluation function ‘EvModel’ described a trajectory of the total net benefit of the optimal solution identified in each temperature. The total net benefit of the optimal solution was estimated based on the mean of 100 replications of the underlying evaluation model. Parallel computation was implemented to allow multiple replications to be tested at the same time. When the searching stopped, the SA algorithm provided a solution with its associated expected total net benefit.
The SA algorithm written in Matlab by Vandekerckhove[364] was modified for the hypertension SDDP model. Modifications were made for the neighbourhood structure and parallel computation. Key parameters for SA were also altered so that they were appropriated to the context of the hypertension SDDP. Figure 6. illustrates the pseudo-code for the SA algorithm used in this study.
% Select the parameters for simulated annealing.
Def = struct(
‘CoolSched’, @(T) (β*T), % Cooling schedule (β=0.5, 0.7 or 0.9).
‘Generator’, @(x) randi([x-50,x+50]), % Definition of the neighbourhood of the current solution x.
‘InitTemp’, 1, % Initial temperature.
‘MaxConsRej’, 1000, % Maximum no. of consecutive rejections.
‘MaxSuccess’, 100, % Maximum no. of successes within one temperature.
‘MaxTries’, 100, % Maximum number of tries within one temperature.
‘StopTemp’, 1e-8, % Temperature at which to stop.
'k', 1); % Boltzmann constant.
T = InitTemp; % Initialize the temperature.
Parent = Policy(3257,:); % Select the initial solution.
OldReward = EvModel(Parent); % Evaluate the initial solution.
|
Dostları ilə paylaş: |