Optimisation is the process of finding the conditions that give the maximum or minimum value of an objective function under given circumstances[46]. Classic optimisation technique is mathematical programming, which finds the global optimal solution where the objective function and constraints are described as the functions of certain decision variables. Linear programming (LP) can be applicable for a problem whose objective function and the constraints appear as linear function of the decision variables. Key assumptions of LP are 1) proportionality of the value of the objective function and constraints to the level of the input parameters, 2) additivity of the individual components of the objective function and constraints, 3) divisibility of the decision variables into non-integer values and 4) certainty of all model parameter values[47]. Any problem, which satisfies the four assumptions, can be solved using the simplex method[48]. However, it is common in real applications that some or all of the four assumptions do not hold. In this case a number of useful alternative models are available. If the proportionality assumption is violated (e.g., due to an increasing marginal return) or the additivity assumption is violated (e.g., due to interaction between the individual components of the objective function and constraints), nonlinear programming can be used[49]. Integer programming can be used if the divisibility assumption does not hold[50]. If the parameter values used involves some degree of uncertainty, a simulation model can be incorporated into the optimisation procedure[51].
Sequential decision problems involve the extension of conventional optimisation problems to determine the best order of the decision variables given a set of circumstances. In operational research, such a problem belongs to combinatorial optimisation problems, which find a set, or a sequence, of discrete decision variables to achieve the stated objective[52]. Most of the early attempts to solve sequential decision problems (i.e., combinatorial optimisation problems) used variants of LP methods, generally by taking the values 0 or 1, in order to produce an LP formulation[53]. However, such LP formulations often involve very large numbers of variables and constraints for large and complex sequential decision problems. As the alternative, dynamic programming (DP) was introduced for sequential or multi-stage decision problems[54]. It reduces multistate problems to a set of smaller problems defined by the decision period and solves them using a backward induction considering the future expected reward. DP is still widely used as an efficient optimisation technique for solving many problems involving a sequence of interrelated decisions and has been a theoretical basis of many approximate optimisation methods. This will be further discussed in section 3.5.
Stochastic sequential decision problems are often formulated as a Markov Decision Process (MDP), which is a controlled Markov chain allowing actions in each state to maximise the decision maker’s goal[55]. The elements of MDP P(T,S,A,p,r) include the set of time T; the set of possible discrete states S; the set of allowable actions A; the set of transition matrices p where a∊A is used at t; and the set of immediate rewards r where a∊A is used for s∊S at t[55-57]. In MDPs, the decision maker selects a particular action x∊A for h∊S at t, and this leads to an immediate reward and specifies a transition probability where x∊A and h∊S. In most cases, the transition probability follows Markovian assumption whose transition probabilities to a subsequent state depend only on the current state and action selected. This Markovian property can be relaxed by varying the transition probabilities or the timing of events based on the given information to a limited extent, which is called a semi-Markov decision process (SMDP). The global reward for a sequence of decision (i.e., the value of the objective function) is estimated by DP: therefore MDP is also called stochastic DP[51]. The difference between stochastic DP and deterministic DP is that the next state ht+1 is not completely determined by the current state ht and action xt. Rather, stochastic DP uses a probability distribution dependent on the conditional expectation of the previous state(s) and action(s).
As MDP has the form of a successive decision tree, it may suffer the curse of dimensionality in large and complex sequential decision problems. Where a SMDP is used, in particular, considering the previous states and actions inevitably increase the computational complexity: therefore some studies tried to alleviate the computational complexity using the state aggregation[58], partial-observability[59] or limited-memory[60]. State aggregation reduces the number of potential states by combining several similar states with respect to transition probabilities or reward to an aggregated state. Partial-observability and limited-memory assumptions reduce the amount of memory used for each decision by relaxing the no forgetting rule, which uses all information known for those previous decisions.
An alternative stochastic optimisation method is simulation-based optimisation, which uses a simulation model to evaluate the objective function[61](see Figure 1.). The problem formulation is identical with MDP; however, the problem solving procedures can differ. Whereas mathematical programming and MDP inherently integrate the evaluation process and the optimisation process, simulation-based optimisation is constructed with two key components; an optimisation model and an evaluation model. The optimisation model includes the overall process of guiding the search procedure, generating candidate solutions and determining the optimality of the candidate solutions based on the expected value provided by the evaluation model. The underlying evaluation model is not restricted to individual-based models (IBMs) such as DES, but may include any decision analytic models such decision-tree and Markov models. The problem solving procedure can work forward or backward depending on the optimisation algorithm used.
Figure 1.. Simulation-based optimisation model
Many search algorithms have been developed. The simplest search method is enumeration, which examines all possible candidate solutions and ranks them based on the results obtained from the simulation model[51]. If the solution space is small, enumeration will guarantee the global optimal solution in a reasonable amount of time. However, sequential decision problems often have an enormous number of possible solutions. In this case, enumeration is not practical or efficient. The alternative is using a local search method, which is one of the fundamental elements of heuristic methods. Rather than examining all possible solutions to the problem, local search methods focus on searching a neighbourhood of particular solutions by introducing certain pre-set subjective factors (more details are described in section 3.2.8). They would not guarantee an optimal solution, but can be attractive because it provides ‘near-optimal’ solutions in a reasonable amount of time by reducing the risk of spending too much time solving the problems[62]. Therefore, the term ‘heuristic’ is often used equivalently with approximate optimisation methods, as a contrast to mathematical programming (or exact algorithm).
These stochastic optimisation methods introduced above have been widely used to solve sequential decision problems in computer science, applied mathematics, artificial intelligence, engineering and operational research. Some of them also have been introduced for healthcare-related sequential decision problems. For example;
Hauskrecht and Fraser introduced a framework to use a partially observable MDP (i.e., MDP assuming the partial observability) for the management of patients with ischemic heart disease[59]. The set of actions included no treatment, pharmacological treatment, surgical procedure such as angioplasty (PTCA) and coronary artery bypass surgery (CABG), and investigative actions such as coronary angiogram and stress test. The set of patient’s states included coronary artery disease, ischemia level, acute myocardial infarction (MI), decreased ventricular function, history of CABG and PTCA, chest pain, resting electrocardiogram ischemia, catheter coronary artery result and stress test result. Some of them were assumed to be observable, whereas the others were assumed to be hidden. The objective was to minimise the expected cumulative cost of the treatment.
Van Gerven et al used a dynamic limited-memory influence diagram (i.e., successive influence diagram assuming a limited memory) for the underlying evaluation model to identify the optimal permutation of different levels of chemotherapy for high-grade tumour patients[60]. The decision problem was whether to administer chemotherapy or not at each decision moment in cancer population. The set of the patient’s health states includes normal, mild complaints, ambulatory, nursing care, intensive care and death, which were assumed to depend on the patient’s age, gender, tumour mass and the treatment strategy. Effectiveness is tumour progression depending on tumour response. The objective was to maximise the global utility, which was the quality of life (QoL) minus cost of chemotherapy. Experimental results from single policy updating, single rule updating and simulated annealing (SA) showed that the framework was applicable to find reasonable treatment strategies for complex dynamic decision problems in medicine.
Although many optimisation algorithms have been developed, there is no single method, which is generally known to be consistently better than the others in the context of sequential decision problems. The performance is problem-specific and varies depending on the way that the problem is constructed and the applied problem solving procedure. Therefore this thesis tries to identify the key approximate optimisation methods, which are applicable to SDDPs, and to discuss about the feasibility and the performance of the proposed methods using a case study of primary hypertension. The key criterion to decide the performance is the quality of solution, which shows how close the final objective function values and the real global optimum are, where the real global optimum is known; or how good the objective function values of the finally generated solution are, where the real global optimum is unknown. Computational efficiency, which means finding a good solution in a reasonable amount of computation time, is also an important criterion to compare the performance of approximate optimisation methods.
Dostları ilə paylaş: |