Visualize the attention weight matrices of each attention block within the GPT-2 (small) model as it processes a given prompt. Attention heads are stacked upon one another on the y-axis, while token-to-token interactions are displayed on the x- and z-axes.
Drag and zoom-in to see different parts of each block. Hover over specific points to see the actual attention weight values and which query-key pairs they represent.
Please switch to a desktop with sufficient memory to use this tool.
Acknowledgements
This project was by inspired by Cho et. al's Transformer Explainer (my tool actually uses the ONNX file from this project!), Brendan Bycroft's 3D LLM, and other great ML web visualizations.
I want to dedicate this project to my mother, who has always been my biggest supporter and was actually the first person to get me interested in AI/ML. At the time of me writing this, the date is April 18th, 2025... which is her birthday! Happy birthday, Mama — I hope you enjoy this little project! ❤️