In primary hypertension, the optimal solution identified by enumeration is to start with ACEIs/ARBs, followed by Ds+ACEIs/ARBs, Ds+CCBs+ACEIs/ARBs and Ds+BBs+ACEIs/ARBs as second, third and fourth-line treatments. The total expected net benefit for this optimal sequential treatment policy was £330,080 (95% CI £330,013-£330,147). There were seven policies, which were not significantly different with the optimal solution at a significance level of 5%. These solutions include Ds or ACEIs/ARBs as the initial drug, Ds+ACEIs/ARBs for the second-line drug and Ds+CCBs+ACEIs/ARBs for the third-line drug, whereas the fourth-line drug was various.
Sensitivity analyses by population characteristics, objective function, SBP lowering effect, the extension of drug switching period, the use of AE rates and random treatment scenario for CVD and DM showed that the optimal solution in primary hypertension is likely to be starting with ACEIs/ARBs or Ds and then adding Ds to ACEIs/ARBs, where ACEIs/ARBs is used initially, or adding ACEIs/ARBs to Ds, where Ds is used initially. For the third-line treatment, Ds+CCBs+ACEIs/ARBs were optimal in most scenarios, whereas the optimal fourth-line drug was sensitive to the previously used drug and the assumption used in sensitivity analyses. The optimal second-line drug was sensitive to the change in the patients’ initial SBP.
The optimal solution(s) identified by enumeration was not in agreement with the NICE clinical guidelines, which currently recommend starting treatment with ACEIs/ARBs for people aged less than 55 years old and CCBs for patients over 55 years old, and then using a combination of ACEIs/ARBs and CCBs as the step 2 treatment[63]. This may be related with the data used for SBP lowering effect and the assumption to reduce the SBP lowering effect gradually where a drug is used continuously over time. As the SBP lowering effect in three months is higher in CCBs than Ds or ACEIs/ARBs in the hypertension SDDP model, more patients who use CCBs as the initial drug are likely to stay with the same drug in the next period than Ds or ACEIs/ARBs. As the SBP lowering effect is assumed to be gradually reduced where a drug is continuously used, the subsequent SBP lowering effect is smaller for the patients who continues the same drug than those who switches to the next drug in most cases. Furthermore, the size of subsequent SBP lowering effects in CCB is smaller than other single antihypertensive drugs. These cause a relatively smaller treatment benefit in subsequent states where CCBs are used initially than where Ds or ACEIs/ARBs are used initially. The optimal third-line drug was the same with NICE’s recommendation. As this model did not include four drug combinations, which are recommend as step 4 treatment from the NICE hypertension model, optimal fourth-line treatments were not compared with the NICE hypertension model.
The results from the cluster analyses imply that the cost-effectiveness of antihypertensive drugs can be affected by the subsequent drug use, particularly the use of second-line drug. Where 4,128 sequential treatment policies were divided by initial drug, the cluster analysis showed that there was no significant difference in total net benefit depending on the initial drug. However, a significant difference in total net benefit was found where the cluster analysis was undertaken in 39 groups defined by the combination of the initial and second-line drugs. Regardless of initial drugs, using CCBs+ACEIs/ARBs as the second-line treatment provided the higher total net benefits than other second-line treatment options. The policies using a single drug, BBs+CCBs or BBs+ACEIs/ARBs as a second-line drug provided a significantly lower total net benefit, whereas the policies using Ds+CCBs or Ds+ACEIs/ARBs as a second-line drug were not significantly different with the policy using CCBs+ACEIs/ARBs as the second-line treatment. In the cluster analyses of top 10% policies, most policies had CCBs+ACEIs/ARBs, Ds+ACEIs/ARBs or Ds+CCBs as the second-line drug, while the first and fourth-line drug were distributed evenly.
Table 7. compares the base-case of enumeration, SA, GA and RL in terms of the size of the decision space, optimal solution, maximum total net benefit, the number of iterations, search rate, computation time, the probability to find the optimum and the average penalty rate. The results of the hypertension SDDP model showed that, in spite of computational complexity of the underlying evaluation model, SA and GA are capable of identifying good solutions in reasonable computational times. While enumeration took 12.20 hours to identify the optimal solution, SA and GA achieved the same or equivalent solutions by only taking 4.1-4.56 hours. The probability to find the optima was 100% in both methods after tuning the models. The reason that the point estimate of the maximum total net benefit in GA is higher than enumeration or SA is due to noise from the random drug allocation after the use of fourth-line drug and for the patients who have a CVD or DM. SA searched 32.85% of the search space with 2,220 repetitions, whereas GA searched 14.85% of the search space with 1,380 repetitions. The efficiency in search can be improved by adjusting the key tuning parameters in the algorithm.
The quality of solution identified by RL was less favourable in spite of more complex coding. The maximum net benefit identified by RL was the lowest, even with 1,000,000 cases, which took longer than enumeration. Although the quality of solution was improved where more cases were observed or two or three-step future rewards were considered, it was still not good enough compared with SA and GA. One of the potential reasons can be found in the structure of the hypertension SDDP model. In the hypertension SDDP model, the impact of the costs and effectiveness from the long-term CVD model is relatively huge after the drug switching period compared with those during the drug switching period. Therefore, the mechanism of updating Q-values based on the immediate reward or the reward from future transitions may not fully consider the potential huge impact after the drug switching period. Despite the convergence of the Q-values where 100,000 or 1,000,000 cases were used, the solutions from RL were sensitive to a slight fluctuation in the Q-values. This may be because the difference in the total net benefits between sequential treatment polices was small. No benefit by allowing a freedom in drug choice, like the simple hypothetical case, was observed.
The direct comparison of the iteration number between RL and SA (or GA) was not possible because the time period evaluated in each iteration was different. For enumeration, SA and GA, each iteration passes the whole underlying evaluation model to calculate the total net benefit for lifetime, whereas each iteration in RL involves calculating the transitions in one or two-steps and related rewards from the decomposed decision tree. Although RL started with a smaller size of the decision space, it was shown that the computational time can be much longer than SA and GA depending on how many cases are required to achieve the convergence in Q-values.
Compared with the simple hypothetical SDDP in section 3.5.4, the size of the decision space was considerably larger in the hypertension SDDP model. This increase was mainly due to the increase in the size of the search space, especially where the search space was not decomposed (i.e., where enumeration, SA and GA were used). Computational time was also substantially increased in the hypertension SDDP model. The increase in the computational time was around three times for enumeration compared with SA, GA and RL. This increase would have been much greater if Iceberg and parallel computing were not supported. The advantage of SA and GA in computational efficiency was clearly evident in the hypertension SDDP, whereas it was unclear in the simple hypothetical SDDP. With a smaller search rate and iteration number, SA and GA found the same optimal solution that was identified in enumeration.
Table 7.. Comparison of the optimisation results in the case study of primary hypertension
|
Enumeration
|
Simulated annealing1)
|
Genetic algorithm2)
|
Reinforcement learning3)
|
The size of the health state transition space Z(HS)
|
Z(HS)=31
|
Z(HS1)=3^0
|
Z(HS2)=3^1
|
Z(HS3)=3^2
|
Z(HS4)=3^3
|
The size of the search space Z(SS)
|
Z(SS)=4,128
|
Z(SS1)=4
|
Z(SS2)=10
|
Z(SS3)=14
|
Z(SS4)=14
|
The size of the decision space Z(DS)
|
Z(DS)=31*4,128*8=1023744
|
Z(DS1)=1x4=4
|
Z(DS2)=3^1x10=30
|
Z(DS3)=3^2x14=126
|
Z(DS4)=3^3x14=378
|
Optimal solution number
|
3720
|
3720
|
3721
|
See Table 7. - Table 7..
|
Maximum total net benefit (£)
|
330,080
|
330,060
|
330,120
|
320,158
|
The number of iterations
|
4,128
|
2220
|
1380
|
2690000
|
Search rate (%)
|
100
|
32.85
|
14.85
|
N/A
|
Computation time (h)
|
12.20
|
4.56
|
4.1
|
4.51
|
Probability to find the optima (%)
|
N/A
|
100
|
100
|
N/A
|
Average penalty rate (%)
|
N/A
|
0
|
0
|
N/A
|
1) The cooling rate of 0.9 and the maximum tree of 30 was applied.
2) The generation number of 50, the population size of 30, the crossover rate of 0.7 and the mutation rate of 0.1 was applied.
3) 5000 iterations per health state and drug in each period was allowed. The feedback is based on the immediate reward and one-step future reward
1) Z(HS) represents the size of the health state transition space; Z(SS) represents the size of the search space; Z(DS) represents the size of the decision space.
Figure 7.. Comparison of the size of decision space between the simple hypothetical model and the hypertension SDDP model
Figure 7.. Comparison of the computational time, search rate and iteration number between the simple hypothetical model and the hypertension SDDP model
Dostları ilə paylaş: |