Tuesday, February 7, 2012

Departmental Seminar @ Memorial University: Symbiosis, Complexification and Generalization: A case study in temporal sequence learning

Another seminar upcoming for the Department of Computer Science at Memorial University of Newfoundland.

Dr. Malcolm Heywood
Department of Computer Science
Dalhousie University

Symbiosis, Complexification and Generalization: A case study in temporal
sequence learning
Department of Computer Science
Thursday, February 9, 2012, 1:00 p.m., Room EN-2022


 Hierarchical reinforcement learning traditionally represents a framework in which a machine learning algorithm is applied to build solutions to temporal sequence style problems under the guidance of a priori identified sub-tasks. Once learning relative to one set of subtasks is complete, these can then be reused to build more complex behaviours. The principal caveat is that appropriate subtasks can be identified, preferably without requiring a priori knowledge. This work proposes a generic architecture for evolving hierarchical policies through symbiosis. Specifically, symbionts define an action and an evolved context, whereas each host identifies a subset of symbionts. Symbionts effectively coevolve within a host. Natural selection operates on the hosts, with symbiont existence a function of host performance. It is now possible to support hierarchical policies as a symbiotic process by letting hosts evolved in an earlier population become the symbiont actions at the next.
 Two benchmarking studies are performed to illustrate the approach. An initial tutorial is conducted using a truck reversal domain in which the benefits of evolving a hierarchical solution over non-hierarchical solutions is clearly demonstrated. A second benchmarking study is then performed using the Acrobot handstand task. Solutions to date from reinforcement learning have not been able to approach those established 13 years ago using an “A*” search and a priori knowledge regarding the Acrobot energy equations. The proposed symbiotic approach is able to match and, for the first time, better these results. Moreover, unlike previous works, solutions are tested under a broad range of Acrobot initial conditions, with hierarchical solutions providing significantly better generalization performance.

No comments: