Dynamic Stochastic Control (DSC)

This technique might be translated into non-statistical parlance as 'the calculus of making a sequence of optimal decisions', in particular, in situations involing quantifiable uncertainty. It applied to control decision processes that are modelled as evolving randomly over time (e.g. share dealing and drug trials). It requires that the problem be formulated according to a specific framework. I concentrated on using the finite-horizon framework to model the problem of game playing.

DSC Framework

The actionspace, A, is the set of actions available. In the games I modelled, A consists of the set of gametree searches and possible game moves. The statespace, S, corresponds to the set of all possible game states. Actions taken influence the next state arrived at in a random fashion, according to the transition matrix. A mapping from S->A is termed a policy, since it described the action(s) to take each state. The thesis introduces some search games and deduces policies which are optimal in the sense of maximising the expected payoff from the game.

Models Developed

The thesis develops 1- and 2- player search game models. The statespace is typically a vector (boardposition, search status, time left), allowing for rational decisions both of how to allocate time units between moves in a game as well as which nodes to expand. The most powerful new game developed is an extention of the branching bandits model by Weiss, proved in a different fashion which allows the generality. Although optimal policies are derived for the majority of the models developed, they remain necessarily limited in their scope by the need to prove optimality of the policies derived, so there remains a large gap between this and the ad-hoc search methods studied in other parts of this thesis.


Click here to get back to my Ph.D. page.