Embeddings

The Dimension Question How many dimensions does a text embedding need? The field has settled on conventions: 768 for BERT-scale models, 1536 for OpenAI’s ada-002, 4096 for some recent models. But these choices reflect architectural constraints, not fundamental requirements. We investigate what happens when we scale embedding dimensions to 7680—ten times the BERT baseline. Why Higher Dimensions? Capacity Arguments A $d$-dimensional embedding space can represent $\mathcal{O}(e^d)$ nearly-orthogonal vectors. For semantic search, we want documents with different meanings to map to different regions....

Embeddings

7680-Dimensional Embeddings: More Dimensions, Better Retrieval

Embedding Spaces at 7680 Dimensions