Inner Monologue Through Self-attention Andrzej Banburski , Fernanda De La Torre, Tomaso Poggio (Brain and Cognitive Sciences, MIT, Cambridge, MA ) C12
Is self-awareness an accident in evolution, or is there an actual computational advantage to it? One such advantage could come from the apparent freedom we have in assigning meaning and value to outcomes of actions that a priori have no value. Previous experiments in reinforcement learning have shown the great computational advantage that a hand-designed reward signal can bring. Could this be achieved by modeling a self-aware agent? Here we consider one tractable aspect of self-awareness - that of the internal monologue - and model it as self-attention to inner computations leading to an executive summary that should simplify decision-making in a collaborative task that allows one to communicate their own goals. We build this following the recent work on modeling the theory of mind and applying it to the agent itself. The agent is considered as a collection of interlinked neural network modules and outputs - one of those modules models the environment and the other agents in it, while a similar module can be applied to a simplified model of the functionality of the agent itself. The decisions would then be made based on the executive summary of the belief of the agent on the state of the world, as well as itself. Apart from modeling an aspect of consciousness, externalizing this inner speech can be also seen as a useful tool to make neural networks self-explainable. An interesting side result is that in this model there is a limit to how much conscious processing there can be compared to unconscious computational processes - beyond some threshold, the more detail the agent pays to itself through self-attention, the less computational gain there is in decision making. Another deeply connected aspect we model is that of introspection - assigning custom inner rewards that dynamically adapt to the task. Inspired by the curiosity-driven learning, we assign intrinsic value to the agent's actions based on the executive summary model that correctly predict the outcome, leading to potential computational advantages stemming from this self-awareness. This has the effect of adding a layer of recurrence to the action-reward cycle, by making the rewards dependent on the actions and the actions dependent on the rewards in a way that goes beyond the standard reinforcement learning setting given by the Bellman equation. The reinforcement learning task we need for this agent has to be a partial information game in which it is crucial to model the decisions other players will make based on your models of their decisions. Examples of these are the tabletop games of One Night Ultimate Werewolf, Coup or Hanabi. We present the preliminary results on testing whether a model of self is helpful by opposing a self-aware agent against agents only capable of modelling their opponents on several simplified versions of these games.