Bits-per-Byte (BPB): a tokenizer-agnostic way to measure LLMs

Karpathy recently released nanochat repo which cotains code for training the best ChatGPT under $100. While skimming the high level code, I noticed across bits per bytes instead of typical cross entropy loss. And, i found it interesting, so i decided to dig in. TL;DR Bit per byte (BPB) is just cross-entropy measured per byte. We divide cross-entropy by log(2) to convert to bits. Because it’s per byte, BPB is tokenizer-agnostic and lets you compare models fairly even when they use different vocabularies and rules....

October 15, 2025 | Estimated Reading Time: 3 min |  Author: Dipkumar Patel