Sequential drug decision problems in long-term medical conditions: a case Study of Primary Hypertension Eunju Kim ba, ma, msc


FOR An infinite number of times. Randomly simulate the initial state st. FOR



Yüklə 10,52 Mb.
səhifə37/116
tarix04.01.2022
ölçüsü10,52 Mb.
#58520
1   ...   33   34   35   36   37   38   39   40   ...   116
FOR An infinite number of times.

Randomly simulate the initial state st.



FOR each step of episode

  • Choose a from s using policy derived from Q-values, using the ε-greedy method.

  • Take action a, observe st+1 and r.



  • stst+1 until st is terminal.

END



END

Figure ‎3.. Q-learning algorithm[200]

3.6Application of the key optimisation methods on a hypothetical case

3.6.1Description of the simple hypothetical sequential drug decision problem


A simple hypothetical SDDP was used to allow full enumeration and to illustrate how some of the proposed optimisation methods can be applied. As this simple example was of low complexity, a successive semi-Markov decision-tree model was used to calculate the value of the objective function (see Figure ‎3.). Four optimisation methods – enumeration, DP, SA and RL (i.e., Q-learning) - were tested and compared in the context of computational efficiency, intensity and the quality of solution for SDDPs. DP was applicable due to the low complexity of the hypothetical case, whereas GA was not included because SA was enough to represent the nature of meta-heuristics in such a small problem. The Matlab version 7.6.0, R2008a (The MathWorks, Natick, Massachusetts, U.S.A) was used to develop both the evaluation and optimisation models.

Figure ‎3.. Decision-tree of the hypothetical simple SDDP



The problem was mathematically defined as follows:

  • The total follow-up period is divided into three 3-monthly decision-making periods T=(t1,t2,t3). A maximum number of two drug switches are allowed.




  • There are three possible health states H={Hu,He,Hn}, where Hu represents the undesirable health condition, He represents occurring AEs and Hn represents the health condition under control. There are 27 possible disease pathways, where the initial health state s1 is Hu and the subsequent health states s2, s3 and s4∈H.


HS={ θ1=(Hu,Hu,Hu,Hu), θ2=(Hu,Hu,Hu,He), θ3=(Hu,Hu,Hu,Hn),

θ4=(Hu,Hu,He,Hu), θ5=(Hu,Hu,He,He),θ6={(Hu,Hu,He,Hn),

θ7=(Hu,Hu,Hn,Hu), θ8=(Hu,Hu,Hn,He), θ9=(Hu,Hu,Hn,Hn),

θ10=(Hu,He,Hu,Hu), θ11=(Hu,He,Hu,He), θ12=(Hu,He,Hu,Hn),

θ13=(Hu,He,He,Hu), θ14=(Hu,He,He,He), θ15=(Hu,He,He,Hn),

θ16=(Hu,He,Hn,Hu), θ17=(Hu,He,Hn,He), θ18=(Hu,He,Hn,Hn),

θ19=(Hu,Hn,Hu,Hu), θ20=(Hu,Hn,Hu,He), θ21=(Hu,Hn,Hu,Hn),

θ22=(Hu,Hn,He,Hu), θ23=(Hu,Hn,He,He), θ24=(Hu,Hn,He,Hn),

θ25=(Hu,Hn,Hn,Hu), θ26=(Hu,Hn,Hn,He), θ27=(Hu,Hn,Hn,Hn) }.


  • Where a decomposition method is used (i.e., where DP and Q-learning are used), the health state space HSt in each period is constructed as following so that the model considers the different transition probabilities depending on the disease history:


HS1={Hu}

HS2={ θ1=(Hu,Hu), θ2=(Hu,He), θ3=(Hu,Hn) }

HS3={ θ1=(Hu,Hu,Hu), θ1=(Hu,Hu,He), θ1=(Hu,Hu,Hn),

θ2=(Hu,He,Hu), θ2=(Hu,He,He), θ2=(Hu,He,Hn),

θ3=(Hu,Hn,Hu), θ3=(Hu,Hn,He), θ3=(Hu,Hn,Hn) }

HS4={ θ1=(Hu,Hu,Hu,Hu), θ2=(Hu,Hu,Hu,He), θ3=(Hu,Hu,Hu,Hn),

θ4=(Hu,Hu,He,Hu), θ5=(Hu,Hu,He,He),θ6={(Hu,Hu,He,Hn),

θ7=(Hu,Hu,Hn,Hu), θ8=(Hu,Hu,Hn,He), θ9=(Hu,Hu,Hn,Hn),

θ10=(Hu,He,Hu,Hu), θ11=(Hu,He,Hu,He), θ12=(Hu,He,Hu,Hn),

θ13=(Hu,He,He,Hu), θ14=(Hu,He,He,He), θ15=(Hu,He,He,Hn),

θ16=(Hu,He,Hn,Hu), θ17=(Hu,He,Hn,He), θ18=(Hu,He,Hn,Hn),

θ19=(Hu,Hn,Hu,Hu), θ20=(Hu,Hn,Hu,He), θ21=(Hu,Hn,Hu,Hn),

θ22=(Hu,Hn,He,Hu), θ23=(Hu,Hn,He,He), θ24=(Hu,Hn,He,Hn),

θ25=(Hu,Hn,Hn,Hu), θ26=(Hu,Hn,Hn,He), θ27=(Hu,Hn,Hn,Hn) }.

The health states at t4 are the terminal states from sequential drug use. No decisions are made at this stage.




  • There are three possible drug treatment options A={drug1,drug2,drug3}, where drug1 has smaller treatment effect but lower risk of AEs; drug2 has moderate treatment effect and moderate risk of AEs; and drug3 has larger treatment effect but higher risk of AEs. There are six possible sequential treatment policies, where the problem is not decomposed, as following:


SS={ π1=(drug1,drug2,drug3), π2=(drug1,drug3,drug2), π3=(drug2,drug1,drug3), π4=(drug2,drug3,drug1), π5=(drug3,drug1,drug2), π6 =(drug3,drug2,drug1) }.


  • Where the decomposition method is applied (i.e., where DP and Q-learning are used), the search space SS is equivalent to A. The feasibility of each drug for each state is considered by a penalty function, which forces the net benefit to 0.


SS1 = SS2 = SS3 = A = {drug1,drug2,drug3}.


  • It is assumed that a drug is switched to another drug in the case of Hu or He (i.e., treatment failure) and the same drug is continued in case of Hn (i.e., treatment success).




  • To consider the impact of disease history, it was assumed that the baseline risk of Hu increases by 5% after two successive treatment failures (i.e. for the patients who went through Hu-Hu-Hu in previous periods) and by 10% after relapse (i.e. for the patients who went through Hu-Hn-Hu in previous periods). For the patients who had a relapse, treatment effectiveness was also assumed to decrease by 20% for drug1, 10% for drug2 and 5% for drug3. The data used to populate the model are presented in an Appendix 4.




  • The objective function to maximise the treatment net benefit was:


Equation 3.11.


Yüklə 10,52 Mb.

Dostları ilə paylaş:
1   ...   33   34   35   36   37   38   39   40   ...   116




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin