🔌ICML 2018 | Machine Theory of Mind
type
status
date
slug
summary
tags
category
icon
password
Task
Simulate an observer, who gets access to a set of behavioral traces of a novel agent in each episode, and must make predictions about the agent’s future behavior.
Method
Training involves a series of encounters with individual agents. The observer sees a set of full of partial “past episodes”, wherein a single, unlabelled agent , produces trajectories , as it executes its policy within the respective POMDPs, . Prerequisite—Dec-POMDP
Architecture
- The goal of the character net is to characterize the presented agent, by parsing observed past episode trajectories, , into a character embedding, .
- The goal of the mental state net is to mentalize about the presented agent during the current episode, by parsing the current episode trajectory into a mental state embedding .
- The goal of the prediction net is to leverage the character and mental state embeddings to predict subsequent behavior of the agent.