Conference paperABATE Alessandro, ČEŠKA Milan and KWIATKOWSKA Marta. Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations. In: Proceedings of 14th International Symposium on Automated Technology for Verification and Analysis. Heidelberg: Springer Verlag, 2016, pp. 116. ISBN 9783319465197. Available from: http://link.springer.com/chapter/10.1007%2F9783319465203_2  Publication language:  english 

Original title:  Approximate Policy Iteration for Markov Decision Processes via Quantitative Adaptive Aggregations 

Title (cs):  Kvantitativní adaptivních agregace pro MDP 

Pages:  116 

Proceedings:  Proceedings of 14th International Symposium on Automated Technology for Verification and Analysis 

Conference:  14th International Symposium on Automated Technology for Verification and Analysis  ATVA 2016 

Series:  LNCS 9938 

Place:  Heidelberg, DE 

Year:  2016 

URL:  http://link.springer.com/chapter/10.1007%2F9783319465203_2 

ISBN:  9783319465197 

DOI:  10.1007/9783319465203_2 

Publisher:  Springer Verlag 

Keywords 

Markov Decision Process, Policy Interaction, Approximation, Adaptive aggregation 
Annotation 

We consider the problem of finding an optimal policy in a Markov decision process that maximises the expected discounted sum of rewards over an infinite time horizon. Since the explicit iterative dynamical programming scheme does not scale when increasing the dimension of the state space, a number of approximate methods have been developed. These are typically based on value or policy iteration, enabling further speedups through lumped and distributed updates, or by employing succinct representations of the value functions. However, none of the existing approximate techniques provides general, explicit and tunable bounds on the approximation error, a problem particularly relevant when the level of accuracy affects the optimality of the policy. In this paper we propose a new approximate policy iteration scheme that mitigates the statespace explosion problem by adaptive statespace aggregation, at the same time providing rigorous and explicit error bounds that can be used to control the optimality level of the obtained policy. We evaluate the new approach on a case study, demonstrating evidence that the statespace reduction results in considerable acceleration of the policy iteration scheme, while being able to meet the required level of precision. 
BibTeX: 

@INPROCEEDINGS{
author = {Alessandro Abate and Milan {\v{C}}e{\v{s}}ka and Marta
Kwiatkowska},
title = {Approximate Policy Iteration for Markov Decision Processes
via Quantitative Adaptive Aggregations},
pages = {116},
booktitle = {Proceedings of 14th International Symposium on Automated
Technology for Verification and Analysis},
series = {LNCS 9938},
year = {2016},
location = {Heidelberg, DE},
publisher = {Springer Verlag},
ISBN = {9783319465197},
doi = {10.1007/9783319465203_2},
language = {english},
url = {http://www.fit.vutbr.cz/research/view_pub.php.en.iso88592?id=11211}
} 
