Posts

Bits-per-Byte (BPB): a tokenizer-agnostic way to measure LLMs

Karpathy recently released nanochat repo which cotains code for training the best ChatGPT under $100. While skimming the high level code, I noticed across bits per bytes instead of typical cross entropy loss. And, i found it interesting, so i decided to dig in. TL;DR Bit per byte (BPB) is just cross-entropy measured per byte. We divide cross-entropy by bytes and log(2) to convert to bits. Because it’s per byte, BPB is tokenizer-agnostic and lets you compare models fairly even when they use different vocabularies and rules....

Creativity Is a Luxury

Creativity is a luxury. It demands time, energy and space: things that feel scarce when rent, groceries, and the next shift loom larger than any poem or prototype. Most of us are caught in a slow-spinning loop of laundry, commutes, and alarms that reset before the dream has even ended. It is also a luxury that needs literal room: a quiet corner, a desk that isn’t the dinner table, a door that closes....

GPT-5 Router - Inevitable Future of Chat Interfaces

OpenAI GPT-5 Router is like Apple removing headphone jack. It sucks but everyone will follow it. — immortal (@immortal_0698) August 14, 2025 What is GPT-5 Router The GPT-5 router picks the right model for each request in real time. In plain English: easy stuff goes to the small model; complex stuff goes to the big brain. The goal is simple, better answers per dollar and millisecond by mixing models instead of forcing a single static choice....

Instruction Aware Embeddings

Why Your Retriever is Failing and How Context Can Save It Imagine asking “I want to buy apple” – do you mean Apple Inc. stock, the latest iPhone, or simply fruit? Without context, your retriever may serve you the wrong results. 1. What Is the Problem in Your Retriever & Embedding? Modern retrievers map queries and documents into high-dimensional vectors (embeddings) and rank by cosine similarity. But when a query is ambiguous, plain embeddings struggle:...

Improving Retrieval in RAG (via Recall, Precision, and NDCG)

Improving Retrieval in RAG (via Recall, Precision, and NDCG) Introduction Retrieval-Augmented Generation (RAG) is the superhero sidekick that grounds your Large Language Model (LLM) in cold, hard facts. But here’s the dirty secret: if your retrieval sucks, your RAG system is just a fancy chatbot with a broken brain. Weak retrieval = missed documents, irrelevant results, and rankings that make no sense. This guide cuts through the noise. You’ll learn how to turbocharge your RAG retrieval with a no-fluff, step-by-step approach to maximize recall, sharpen precision, and nail NDCG....