Sequential drug decision problems in long-term medical conditions: a case Study of Primary Hypertension Eunju Kim ba, ma, msc


Identification of the candidate methods



Yüklə 10,52 Mb.
səhifə29/116
tarix04.01.2022
ölçüsü10,52 Mb.
#58520
1   ...   25   26   27   28   29   30   31   32   ...   116

3.2.8Identification of the candidate methods


In previous chapters, it was stated that a classic DP has a limited capacity in most large, complex and uncertain SDDPs that have a non-linear objective function with a large number of probabilistic elements. Sutton and Barto said the key assumptions to solve the Bellman optimality equation: 1) the dynamics of the environment is known exactly (2) computational resources are enough to complete the computation of the solution; and (3) Markov property[200]. For general SDDPs, various combinations of these assumptions are likely to be violated. Firstly the dynamics of the health state in SDDPs involve random events, which results from the probabilistic nature of stochastic models. The Markov assumption would be also restrictive for most real SDDPs because previous drug uses or health states may have a substantial impact on the subsequent drug decision and health state transitions. In addition, the number of possible health states and treatment options for a long-term follow-up period significantly increase the number of the transition probabilities and the transition rewards to be computed and cause the curses of modelling and dimensionality.

As the alternative, this thesis focused on approximate optimisation methods, which usually use a simulation model to approximate the value function (see 1) The methods tested in this thesis are in grey.). The underlying evaluation model could be a cohort model or an IBM depending on the size of the health state space and the assumptions in the transitions between the health states. If the number of potential disease pathways is manageable, the SDDP can be depicted in a successive decision tree; otherwise, more efficient modelling methods to handle the large number of disease pathways, such as Markov model or IBM, needs to be considered. If a cohort model is capable of handling the complexity of the SDDP or an efficient and flexible programming language is available, a Markov model can be used. With the memoryless assumption, however, it may be restricted to consider the dependency of drug effectiveness on the timing of the drug used and on the current and potentially previous health states. In this case, a semi-Markov model would be appropriate for the SDDP. DES can be a better option where a cohort model is inefficient or insufficient to describe the dynamic relationship between the disease pathway and time-dependent variables of patients.






1) The methods tested in this thesis are in grey.

Figure ‎3.. A decision algorithm to select the optimisation method for SDDPs




Once the underlying evaluation model is decided, a search method should be decided based on the given time to make a decision. If the time available for making the decision is sufficient, enumeration will guarantee the optimal solution; otherwise, various heuristics (or meta-heuristics) can be applied to search for near-optimal solutions in a feasible time. All the heuristic methods in Table ‎3. are potentially applicable to SDDPs: however, SA and GA were selected as the representatives of meta-heuristics as they work differently each other but both are theoretically well established with a plenty of evidence in their performance. Because of the generality, SA and GA have been widely applied to various optimisation problems and shown that they can provide a good solution for large and complex problems where the enumeration is practically inefficient and impractical.

The basic idea is that a random choice is made between available moves from the current neighbourhood to a neighbouring solution. The neighbourhood is a set of candidate solutions, which can be generated by a small perturbation to the current solution. Searching within a neighbourhood of the current solution is a useful compromise because the current solution imposes a bias on the next search area, retaining the information obtained in the earlier search process[153]. SA and GA are also flexible to combine with a simulation model and other heuristic concepts such as the decomposition method if necessary.

The difference between SA and GA is the number of candidate solutions used at the same time. SA, which is a single solution heuristic method, explores a trajectory of the objective value during the search process. In contrast, GA, which is a population-based heuristic method, deals with a set of solutions in every iteration and describes the evolution of a set of points in the search space. The performance of SA depends on the problem representation and the neighbourhood structure, whereas the performance of GA depends on the way the population is manipulated.

RL, which is a kind of ADP, is another promising method to solve SDDP. In a broad sense, RL is also included in the category of simulation-based optimisation as it incorporates simulation into the DP procedure. The way that RL works is considerably different with local search methods such as SA and GA. Where the decision space is decomposed into a set of sub-problems DS1, DS2,…, DSn defined by time periods, RL solves the sub-problems sequentially[201, 202]. Each decomposed problem involves a decomposed health state space HSt, which is the same with a set of l possible health states H={h1,h2,…,hl } where the Markovian assumption is used, and a decomposed search space SSt, which is the same with a set of m possible drug alternatives A={a1,a2, …,am} if there are no constraints in the use of drugs depending on the medical history. Where a semi-Markov assumption is used, the size of HSt can be increased, as seen in Equation 3.1, to consider the different transition probabilities according to the disease history. The number of potential drugs, m, can be further reduced by reference to the decision-making rules for each health state; e.g., where the contraindications for a specific health condition are considered, or where a drug cannot be re-used if it has been previously used for treatment.
, where the Markovian assumption is used,

, where a semi-Markovian assumption is used.

where l is the number of potential health states; m is the number of potential drugs; and n is the time periods.

Equation 3.1.
The use of the decomposition method is useful to speed-up the searching process because the size of the decision space can be significantly reduced compared to the decision space defined in Equation 2.6. However, additional computational complexity occurs to combine those decomposed states and solutions.

Where the decision space is decomposed, the problem solving procedure can work either forward or backward. Where the algorithm works forward, the optimal solution for s at t maximises the expected rewards r from one-step transition or multiple transitions depending on the value function works (see Equation 3.2). This approach is expected to facilitate considering the impact of medical history on the total net benefit and the contraindications for a specific health condition. However, in many situations of decision-making, short-sighted approaches, which take the optimal actions at each separate step based on the largest immediate reward, may not be good enough in the long-term because the action selected at present affects the subsequent events of the problem.


Equation 3.2.
In contrast, a backward approach tries to find the global optimum by balancing the immediate reward and the future reward. As they work backward, the estimates of the value function at i are conditional on using the optimal drug at i-1 (see more discussion in section 3.5.2). However, there is a concern about how to implement the backward approach in economic evaluation modelling framework, whose cost and effectiveness are dependent on the patient’s medical history.

In the following sections, the theoretical background and practical application of the three key methods is discussed in detail. The application of the three key methods to the simple hypothetical SDDP provides the further discussion about the feasibility and applicability of each method to a real SDDP.



Yüklə 10,52 Mb.

Dostları ilə paylaş:
1   ...   25   26   27   28   29   30   31   32   ...   116




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin