Sequential drug decision problems in long-term medical conditions: a case Study of Primary Hypertension Eunju Kim ba, ma, msc


Comparison of the optimal solution from the different optimisation approaches



Yüklə 10,52 Mb.
səhifə44/116
tarix04.01.2022
ölçüsü10,52 Mb.
#58520
1   ...   40   41   42   43   44   45   46   47   ...   116

3.6.6Comparison of the optimal solution from the different optimisation approaches


Table ‎3. compares the different optimisation approaches in terms of the size of the decision space, the objective function, the number of search and the optimal solution. According to the results from enumeration, the global optimal solution in the simple hypothetical SDDP was π6=(drug3,drug2,drug1) whose total net benefit was £85,716. SA tested in the simple SDDP also found exactly the same optimal solution as the enumeration method. However, the number of iterations in SA was slightly higher than enumeration: this demonstrates that the possible advantage of SA in computational efficiency may not appear in a small size of decision problem.

Classic DP gave the optimal solution, which had the higher net benefits of £86,004. There are two possible reasons for this. Firstly, the higher net benefit from DP may be attributed to infeasible solutions because classic DP was limited to consider the medical history under the backward induction. Table ‎3. showed that the optimal treatment pathway, which was identified by DP, included infeasible solutions against our assumption, which did not allow using the same drug when the drug was not effective. For example, drug3 was the optimal drug in the second period if the initial optimal drug, which was also drug3, failed to control Hu. This is an important finding in this simple example of SDDP as this results show that it would be better to continue the current drug for one or two more cycles rather than switching to another drug straight away, considering the cost-effectiveness from the subsequent treatments in future. This also brings up an issue about whether the assumptions made in the model are wrong or the model is missing some negative impact on the total net benefit from infeasible solution. Although this thesis did not further discuss this issue in the simple hypothetical SDDP, the feasibility assumptions need to be fully justified and discussed if this happens in the real SDDP.

This improvement over enumeration from DP is also partially because of the stochastic scheme for choosing a next action where a problem is decomposed. Whereas enumeration or SA worked with complete solutions whose next drug is already determined in both cases of Hu and He by the pre-set sequential treatment policy, DP or Q-learning is free of the pre-set sequential treatment policy so that any drug with the highest net benefit in Hu and He can be adapted separately. In Table ‎3., for example, the optimal solution for Hu and He were drug3 and drug1, respectively, where DP was used, whereas enumeration was forced to allocate the same drug for Hu and He at t=2. Even after the feasibility of solutions was considered (i.e., a drug is switched to another drug in case of treatment failure and the same drug is continued in case of treatment success), the Q-learning provided the solution with a total net benefit of £85,804, which was higher than enumeration, but lower than DP. This implies that the total net benefit could be improved by assigning a tailored treatment depending on the patient’s health state rather than recommending the fixed treatment sequence.

The number of iterations should be carefully compared as the time periods considered in each iteration are different. For enumeration and SA, each iteration passes the whole successive decision tree to calculate the total net benefit for nine months, whereas each iteration in DP and Q-learning involves calculating one or two-step rewards from the decomposed decision tree. Computational intensity was higher in DP and Q-learning compared with enumeration and SA. In particular, the applied Q-learning required a large number of cases, which was more than 10 times the number of all possible combinations of health states and drugs (i.e., 3H3*3H3=729), to achieve the convergence to the optimum. Thus, Q-learning is only recommended where the size and complexity of the given problem is large enough to justify the computational time and effort required to implement Q-learning.


Table ‎3.. Comparison of different optimisation approaches proposed in the classification




Enumeration

Simulated annealing

Dynamic programming

Q-learning

Construction of decision space

A set of possible treatment sequences

A set of possible drugs

The size of health state transition space Z(HS)


Yüklə 10,52 Mb.

Dostları ilə paylaş:
1   ...   40   41   42   43   44   45   46   47   ...   116




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin