2024-12-22
“Compute costs scale with the square of the input size” RAG is a workaround, but scaling compute at O(n^2) is not great... https://arstechnica.com/...
Ars Technica
Exploring the scaling challenges of transformer-based LLMs in efficiently processing large amounts of text, as well as potential solutions, such as RAG systems
Large language models represent text using tokens, each of which is a few characters. Short words are represented by a single token …