Sequential drug decision problems in long-term medical conditions: a case Study of Primary Hypertension Eunju Kim ba, ma, msc


INPUT transition-probability matrices and reward matrices. FOR



Yüklə 10,52 Mb.
səhifə34/116
tarix04.01.2022
ölçüsü10,52 Mb.
#58520
1   ...   30   31   32   33   34   35   36   37   ...   116
INPUT transition-probability matrices and reward matrices.

FOR i=1:T

FOR all states s∊Hi

Initialize Vi(s)=0 (or select an arbitrary number), for all states s∊H.



WHILE is not changed.



LOOP

LOOP

LOOP

Figure ‎3.. Value iteration algorithm[200]
Policy iteration starts with an arbitrary policy and repeats replacing the current policy by a better one until an optimal policy is achieved. Since there are a finite number of states and actions, the algorithm will eventually terminate with a policy that cannot be further improved (see Figure ‎3.).


INPUT transition-probability matrices and reward matrices.

FOR i=1:T

FOR all states s∊Hi.

WHILE is not changed.
Select a random policy a∊Ai(s).

Compute the value of the current policy for the given state.


IF i=1, ,

ELSE,

END

Update the optimal policy for the given state.





LOOP

LOOP

LOOP

Figure ‎3.. Policy iteration algorithm[200]
The backward approach involved in DP reduces the computational burden by chopping down the branches, which are unlikely to have a better future value of the objective function. The idea of separating past and future costs enables DP to provide the global optimal solution by balancing the immediate reward and the future reward. The proof of convergence to an optimal policy in DP can be found in most optimisation text books[56, 57, 219].

In contrast, DP is known to be limited for many large and complex real problems, because of the amount of computational requirements (e.g., computation of the matrices of transition probability and the rewards for the transition times for every decision). Particularly, where the sub-problems are dependent upon each other (i.e., if past states or decisions affect the current decision), obtaining or storing those data can be very challenging and computationally expensive: this motivated the development of various ADP techniques.




Yüklə 10,52 Mb.

Dostları ilə paylaş:
1   ...   30   31   32   33   34   35   36   37   ...   116




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin