/
Navigation
C
Chronicles
Browse all articles
C
E
Explore
Semantic exploration
E
R
Research
Entity momentum
R
N
Nexus
Correlations & relationships
N
~
Story Arc
Topic evolution
S
Drift Map
Semantic trajectory animation
D
P
Posts
Analysis & commentary
P
Browse
@
Entities
Companies, people, products, technologies
Domains
Browse by publication source
Handles
Browse by social media handle
Detection
?
Concept Search
Semantic similarity search
!
High Impact Stories
Top coverage by position
+
Sentiment Analysis
Positive/negative coverage
*
Anomaly Detection
Unusual coverage patterns
Analysis
vs
Rivalry Report
Compare two entities head-to-head
/\
Semantic Pivots
Narrative discontinuities
!!
Crisis Response
Event recovery patterns
Connected
Nav: C E R N
Search: /
Command: ⌘K
Embeddings: large
TEXXR

Chronicles

The story behind the story

days · browse · Enter similar · o open

Allen Institute for AI launches Bolmo 7B and Bolmo 1B, claiming they are “the first fully open byte-level language models”, built on its Olmo 3 models

and every token gets the same compute, regardless of complexity. Benjamin Minixhofer / @bminixhofer : There are also some things Bolmo lets us do which we just can't do using subword-level LMs. For example, we can increase the compression in bytes per patch to achieve an arbitrary speedup⚡ In contrast, subword-level LMs eventually get yote back by the Softmax Bottleneck. [image] @allen_ai : Introducing Bolmo, a new family of byte-level language models built by “byteifying” our open Olmo 3—and to our knowledge, the first fully open byte-level LM to match or surpass SOTA subword models across a wide range of tasks. 🧵 [image] Edoardo Ponti / @pontiedoardo : Finally, you can count the r's in strawberry and check if 3.11 is higher than 3.9 without tokenisation interfering: Here's Bolmo, a fully open byte-level LLM with latent tokenisation, derived from a SOTA LLM (Olmo 3). Promising on coding and char-level understanding! Benjamin Minixhofer / @bminixhofer : We are releasing Bolmo today! Bolmo is the best byte-level model so far. It comes close to and sometimes surpasses Olmo 3. Bolmo also performs competitively in terms of speed & is fully open. I was skeptical of byte-level models for a long time but I finally switched camps🧵 [image] @allen_ai : On our eval suite & character-focused benchmarks like CUTE & EXECUTE, Bolmo matches/surpasses subword models while excelling at character-level reasoning. Once you byteify a base model, you can import capabilities from post-trained checkpoints via weight arithmetic. [image] Luca Soldaini / @soldni : I've been grumbling about tokenizers but it took a @bminixhofer to do something about it! Really neat approach: rather than committing to tokenizer-free methods from the get-go, we show how to switch from BPE at any point during training run 👾 @teortaxestex : incredibly based. Now that we have fast long contexts and an abundance of compute, it's about damn time to explore byte-level models again. Meta has disappointed me with MegaByte, but Meta generally is bad at execution. This path is not yet closed... [image]

VentureBeat