Background: Sequential drug decision problems (SDDP) occur when assigning drugs sequentially in long-term medical conditions. SDDPs are important for both clinical decision-making and resource allocation. They can be large and complex because of the considerable number of drug sequences and disease pathways and the interdependence between them over time. Where classic mathematical programming has a limited capacity for dealing with the complexities of a sequential decision problem, approximate optimisation methods have been widely used to solve the problem more efficiently using simulation.