Sequential drug decision problems in long-term medical conditions: a case Study of Primary Hypertension Eunju Kim ba, ma, msc

INPUT transition-probability matrices and reward matrices. FOR

Yüklə 10,52 Mb.

səhifə	34/116
tarix	04.01.2022
ölçüsü	10,52 Mb.
	#58520

1 ... 30 31 32 33 34 35 36 37 ... 116

WHILE
INPUT
ELSE

INPUT transition-probability matrices and reward matrices.

FOR i=1:T

FOR all states s∊H_i

Initialize V_i(s)=0 (or select an arbitrary number), for all states s∊H.

WHILE is not changed.

LOOP

LOOP

LOOP

Figure ‎3.. Value iteration algorithm^[200]
Policy iteration starts with an arbitrary policy and repeats replacing the current policy by a better one until an optimal policy is achieved. Since there are a finite number of states and actions, the algorithm will eventually terminate with a policy that cannot be further improved (see Figure ‎3.).

INPUT transition-probability matrices and reward matrices.

FOR i=1:T

FOR all states s∊H_i.

WHILE is not changed.
Select a random policy a∊A_i(s).

Compute the value of the current policy for the given state.

IF i=1, ,

ELSE,

END

Update the optimal policy for the given state.

LOOP

LOOP

LOOP

Figure ‎3.. Policy iteration algorithm^[200]
The backward approach involved in DP reduces the computational burden by chopping down the branches, which are unlikely to have a better future value of the objective function. The idea of separating past and future costs enables DP to provide the global optimal solution by balancing the immediate reward and the future reward. The proof of convergence to an optimal policy in DP can be found in most optimisation text books^{[56, 57, 219]}.

In contrast, DP is known to be limited for many large and complex real problems, because of the amount of computational requirements (e.g., computation of the matrices of transition probability and the rewards for the transition times for every decision). Particularly, where the sub-problems are dependent upon each other (i.e., if past states or decisions affect the current decision), obtaining or storing those data can be very challenging and computationally expensive: this motivated the development of various ADP techniques.

Yüklə 10,52 Mb.

Dostları ilə paylaş:

1 ... 30 31 32 33 34 35 36 37 ... 116