2.3Comparison between sequential drug decision problems and representative combinatorial optimisation problems
Most sequential decision problems, which involved discrete finite state/action spaces and probabilistic transitions, were formulated as an MDP[76-79]. The representative scenarios have the various forms of traveling salesman/shortest path/vehicle routing[80-82], job scheduling[83-85], timetabling[86], inventory control[87-89], facility location[90-93], resource allocation[94-97] and others. If an SDDP of interest tries to control the disease as quickly as possible (i.e., the objective is to minimise the time to control the disease) or to minimise the total treatment costs, SDDP has a connection with the traveling salesman problem (or shortest path problem) in that it also requires finding a permutation of cities that achieves minimum distance or costs[62] (see Table 2.). Considering the interaction between health states and sequential treatment over time, the job-shop scheduling problem, which processes a set of jobs J=(j1, j2, …, jn) and a set of machines M=(m1, m2, …, mm), is also similar with SDDPs[98, 99]. Many variations of the job-shop scheduling problem exist. In the stochastic job-shop scheduling problems, each operation of job j on machine m has processing time tjm which is assumed to be unknown[85, 100]. In the dynamic job-shop scheduling problems, job release times are random variables described by a known probability distribution[101, 102]. Jobs may have constraints, for example, a job i needs to finish before job j can be started or single jobs cannot be performed in parallel. There may be the mutual constraints between jobs and machines, for example, certain jobs can be scheduled on some machines only[103-105].
Table 2.. Comparison between SDDPs and the representative combinatorial optimisation problems: traveling salesman problems and job-shop scheduling problems
|
Sequential drug decision problems
|
Traveling salesman problems
|
Job-shop scheduling problems
|
Problem
|
To identify a sequence of drugs along the disease pathway of a health condition with the objective of maximising the net benefits of treatment.
|
To identify the shortest travel route given a list of cities and the distances between each pair of cities.
|
To allocate the jobs on the capable machines to minimise the total (or average) completion time.
|
Time parameter
|
Discrete or continuous.
|
Discrete or continuous.
|
Discrete or continuous.
|
States
|
A set of possible health state combinations within the time dimension.
|
-
|
A set of jobs that consists of a chain of operations.
|
Actions
|
A set of possible drug sequences.
|
The set of permutations of n cities.
|
A set of machines that handles one operation at a time.
|
Random variable
|
Transition probability between health states depending on action (e.g., treatment success rate).
|
Travel time and client’s need.
|
The processing time, release time, demand and deadline of jobs.
|
Constraints
|
History dependent decision-making and any contraindication of drugs for a specific health state.
|
The rules for visiting each city (e.g., visiting each city once and returning to the starting city).
|
Deadline and mutual constraints between jobs and machines.
|
The potential differences between SDDPs and the conventional job-shop scheduling problem are that 1) the transition between states are stochastic in the SDDP, whereas the chain of operations is deterministic in most cases of the job-shop scheduling problems; and 2) the drugs in SDDPs are related to each other, whereas the set of machines are assumed to be unrelated parallel in most cases of the job-shop scheduling problems[106-108]. Therefore, the size of the decision space in SDDPs tends to increase more rapidly than the job-shop scheduling problem to consider the potential health state combinations, which could happen stochastically. Unlike most job-shop scheduling problems, furthermore, an SDDP model will need to embed a certain level of memory into the model to consider potential impact of treatment history on the transition between health states. The no-forgetting assumption, which uses all informational predecessors for later decisions[109, 110], causes exponential growth of memory usage, computational complexity and runtime, regardless of the modelling method. As the alternative, the limited-memory assumptions relaxes the no-forgetting assumption to consider only a limited number of previous observations[60, 111, 112]. However, this may be associated with the trade-off between the quality of solution and the computational complexity.
The set of time T can be classified in either a discrete set (which is called discrete-time MDP)[113, 114] or a continuous (which is called continuous-time MDP)[115] depending on whether the events are assumed to occur at a discrete time interval or at a point on a continuum. The set of time T can be further classified as either finite horizon[116-118] or infinite horizon[119, 120] according to whether the set of time T is finite or infinite. The problem formulations in those four cases are almost identical, but computational methods differ considerably. Where the total future expectation is always finite, for example, DP can use the expected total reward over the decision making horizon. For a problem assuming infinite horizon, however, average expected reward per unit time is used because the policy and expected reward can be assumed to be independent of time t over the infinitely long trajectory.
SDDPs can be also specified in different ways depending on the decision-maker’s perspective. Although the decision-makers generally pursue maximising total treatment net benefits in SDDPs, one may be interested in developing an evidence-based stepped-care guideline in a population perspective (i.e., what is the optimal initial drug and then what is the optimal second or third-line drug after the previous drug fails to control the disease), but the other may be interested in dynamic treatment assignment in an individual patient’s perspective to provide a tailored optimal treatment sequence to the patients’ need for treatment.
For the former, the search space can be constructed with a set of complete solutions, which is a set of all possible treatment sequences in the context of SDDPs. In this case, each complete treatment sequence is referred to as deterministic policy because mapping from the current drug to the next drug is already determined by the complete treatment sequence. As the algorithm works with complete solution, it can provide at least a potential answer based on the search experience, wherever it stops. For the underlying evaluation model, cohort models such as decision trees and Markov models would be more relevant.
The latter will need a more flexible approach to generate an individual patient’s disease pathway and to assign a drug to individual health state. Instead of having a fixed patients’ case to optimise, the overall problem can be divided into a set of sub-problems, which have a set of possible health states and treatment options, by decision-making point. A simulation model randomly generates the individual patient’s health state in each period and allows the drug selected the actual instance is revealed. This approach, then, combines the individual or partial solutions to get a complete answer for the overall problem. This enables one to operate the sequential decision-making process, considering inter-patients variability in risk factors and response to treatment over time. The allowable set of incomplete or partial treatment sequences resides in a subset of the original decision space by health state, and then varies as patients’ health state progresses with a given probability. The knowledge about the previous search can be used to advance the search after a change. The decision rule is said to be stochastic (or randomised) because mapping from the current state to the next state is stochastic.
On the premise that SDDPs have a similar property of the stochastic sequential decision problems, the following section mathematically defined SDDPs using the framework of MDP, assuming that the given SDDP is a discrete-time Markov problem having a finite state/action spaces in a finite set of decision times. The decision-maker was assumed to pursue maximising total treatment net benefits in SDDPs from a population perspective. These assumptions were used in the whole thesis, but can be extended to DES using a continuous-time for patient’s perspective in future research.
Dostları ilə paylaş: |