Analyzing Transformer Dynamics as Movement through Embedding Space

by   Sumeet S. Singh, et al.

Transformer language models exhibit intelligent behaviors such as understanding natural language, recognizing patterns, acquiring knowledge, reasoning, planning, reflecting and using tools. This paper explores how their underlying mechanics give rise to intelligent behaviors. We adopt a systems approach to analyze Transformers in detail and develop a mathematical framework that frames their dynamics as movement through embedding space. This novel perspective provides a principled way of thinking about the problem and reveals important insights related to the emergence of intelligence: 1. At its core the Transformer is a Embedding Space walker, mapping intelligent behavior to trajectories in this vector space. 2. At each step of the walk, it composes context into a single composite vector whose location in Embedding Space defines the next step. 3. No learning actually occurs during decoding; in-context learning and generalization are simply the result of different contexts composing into different vectors. 4. Ultimately the knowledge, intelligence and skills exhibited by the model are embodied in the organization of vectors in Embedding Space rather than in specific neurons or layers. These abilities are properties of this organization. 5. Attention's contribution boils down to the association-bias it lends to vector composition and which influences the aforementioned organization. However, more investigation is needed to ascertain its significance. 6. The entire model is composed from two principal operations: data independent filtering and data dependent aggregation. This generalization unifies Transformers with other sequence models and across modalities. Building upon this foundation we formalize and test a semantic space theory which posits that embedding vectors represent semantic concepts and find some evidence of its validity.


page 5

page 10

page 27

page 28

page 32

page 33

page 34

page 35


Interpreting Embedding Spaces by Conceptualization

One of the main methods for semantic interpretation of text is mapping i...

On Isotropy Calibration of Transformers

Different studies of the embedding space of transformer models suggest t...

Deriving Contextualised Semantic Features from BERT (and Other Transformer Model) Embeddings

Models based on the transformer architecture, such as BERT, have marked ...

Analyzing Transformers in Embedding Space

Understanding Transformer-based models has attracted significant attenti...

How to Dissect a Muppet: The Structure of Transformer Embedding Spaces

Pretrained embeddings based on the Transformer architecture have taken t...

A Variational AutoEncoder for Transformers with Nonparametric Variational Information Bottleneck

We propose a VAE for Transformers by developing a variational informatio...

Grounding and Distinguishing Conceptual Vocabulary Through Similarity Learning in Embodied Simulations

We present a novel method for using agent experiences gathered through a...

Please sign up or login with your details

Forgot password? Click here to reset