Training Transformers as a Universal Computer | AI Research

Key Takeaways

Training Transformers as a Universal Computer This research explores whether a standard transformer model can be trained to function as a universal computer....
We demonstrate that a small transformer can learn to execute programs in MicroPy, a simplified yet computationally universal programming language.
Given procedure definitions together with an expression to evaluate, the transformer predicts small-step execution using PENCIL scaffolding for space-efficient execution within a bounded context window.
We note that the trained model can achieve out-of-distribution generalization; i.e., evaluate novel programs from distribution on programs.
Since MicroPy can express any computation, our results provide empirical evidence that a standard transformer can be trained to act as a universal computer.

Paper AbstractExpand

We demonstrate that a small transformer can learn to execute programs in MicroPy, a simplified yet computationally universal programming language. Given procedure definitions together with an expression to evaluate, the transformer predicts small-step execution using PENCIL scaffolding for space-efficient execution within a bounded context window. After training on randomly generated, meaningless MicroPy programs, the learned transformer generalizes to various human-written programs including bit copying and flipping, binary addition and multiplication, and SAT verification and solving. We note that the trained model can achieve out-of-distribution generalization; i.e., evaluate novel programs from distribution on programs. Since MicroPy can express any computation, our results provide empirical evidence that a standard transformer can be trained to act as a universal computer.

Training Transformers as a Universal Computer

This research explores whether a standard transformer model can be trained to function as a universal computer. By teaching a small transformer to execute programs in a simplified, computationally universal language called MicroPy, the authors demonstrate that these models can learn to perform complex logical tasks and generalize their execution capabilities to novel, human-written programs.

The MicroPy Approach

The researchers utilized MicroPy, a programming language designed to be both simple and capable of expressing any computation. To enable the transformer to process these programs within the constraints of a bounded context window, the team employed "PENCIL scaffolding." This technique allows the model to perform space-efficient, small-step execution of code, effectively breaking down the evaluation of procedure definitions and expressions into manageable steps that the transformer can predict.

Training and Generalization

The model was trained exclusively on randomly generated, meaningless MicroPy programs. Despite never being trained on functional or human-written code, the transformer successfully generalized its learned execution logic to a variety of practical tasks. These include bit manipulation (copying and flipping), binary arithmetic (addition and multiplication), and complex logical operations like SAT verification and solving.

Out-of-Distribution Performance

A significant finding of this study is the model's ability to achieve out-of-distribution generalization. This means the transformer was not merely memorizing the training data; it was able to evaluate novel programs that differed from the structure and content of the random programs used during the training phase.

Implications for AI

By demonstrating that a small transformer can learn to execute arbitrary programs, this work provides empirical evidence that these models can act as universal computers. This suggests that the underlying architecture of a transformer is capable of learning the fundamental mechanics of computation, rather than just pattern matching, which has broad implications for how we understand the potential and limitations of large language models.