Explore how language models represent words as points in a high-dimensional space. Use PCA, t-SNE, or UMAP to project them and discover semantic structure.
scatter_plotPCA, t-SNE, UMAP projection
searchSemantic nearest-neighbor search
upload_fileUpload your own embeddings
An embedding is a dense numerical representation of a word, phrase, or token. A model converts text into vectors of hundreds or thousands of dimensions, where the distance between vectors reflects semantic relationships.
PCA reduces dimensionality while preserving maximum variance. It is fast and deterministic, ideal for seeing global structure. The first 3 components capture the axes of greatest separation between concepts.
t-SNE groups nearby points in the original space while preserving local structure. It reveals clusters of similar words. It does not preserve global distances: two separated clusters do not imply they are conceptually distant.
UMAP preserves global structure better than t-SNE and is faster. It maintains both local proximity and inter-group relationships. It is the modern standard for exploring embedding spaces.
In a well-trained space, king − man + woman ≈ queen. This vector arithmetic emerges from the statistical structure of the corpus. Select words in the Projector and observe their nearest neighbors.