Langevin, 29 rue d'Ulm
Many types of intelligent behavior can be framed as a search problem, where an individual must explore a vast set of possible actions, while carefully balancing the exploration-exploitation dilemma in order to gain rewards. Under finite search horizons, optimal solutions are normally unobtainable, yet humans manage to satisfyingly solve these problems on a daily basis. How do humans navigate vast state-spaces, where the key question is not “when” but “where” to explore? One key ingredient of human intelligence seems to be the ability to generalize from observed to unobserved outcomes, in order to form intuitions about where exploration seems promising. Using a variety of structured bandit tasks, we study how humans search for rewards under limited search horizons, where an underlying dependence between arms provides traction for generalization. We find evidence that Gaussian Process function learning--combined with an optimistic Upper Confidence Bound sampling strategy--provides a robust model for how humans use generalization to guide search. Our modelling results and parameter estimates are recoverable, and can be used to simulate human-like performance, while also suggesting a systematic---yet sometimes beneficial---tendency towards undergeneralization.
Evénement organisé par Stéfano Palminteri.