Preview

TLDR

Together With

TLDR Data 2026-03-05

The way people ask questions has changed. Can your analytics stack give answers? (Sponsor)

Dashboards were built for predefined questions and scheduled updates. But when every new question becomes a ticket, analytics slows the business.

With AIDA, Starburst's AI data assistant, everyone from analysts to CEOs can explore enterprise data in plain language, run complex analyses, and apply their own business rules in real time.

✅ No SQL

✅ No tickets

✅ No BI backlog

Ask a question, get an answer. Skip the dashboard. Keep going.

Get a demo →

📱

Deep Dives

How we rebuilt the search architecture for high availability in GitHub Enterprise Server (5 minute read)

GitHub rebuilt the search architecture using Elasticsearch's Cross Cluster Replication (CCR) to run independent single-node clusters per instance (primary and replicas), enabling durable persistence, asynchronous replication triggered after Lucene segments are created, custom workflows for setup and failover management, zero-downtime migrations, and automatic replica promotion for failover.

Optimizing Recommendation Systems with JDK's Vector API (9 minute read)

Netflix reduced CPU utilization for its Ranker service's serendipity scoring feature from 7.5% to ~1% per node by re-architecting its scoring logic. Key optimizations included transitioning from O(M×N) scalar dot products to batched, cache-friendly matrix multiplies with flat buffers, leveraging the JDK Vector API for SIMD performance gains in pure Java, and eliminating unnecessary allocations. These changes yielded a 7% CPU drop, 12% latency reduction, and 10% improvement in CPU/RPS.

Zero-Waste Agentic RAG: Designing Caching Architectures to Minimize Latency and LLM Costs at Scale (19 minute read)

A validation-aware, two-tier caching strategy for production-grade RAG systems reduces LLM token costs by over 30% and slashes response times from ~36 seconds to milliseconds for semantically similar queries. Combining semantic caching (embedding-based, ~95% similarity) and retrieval caching (context/topic-level, >70%), the architecture addresses redundancy, data staleness, and cache invalidation via timestamp checks, SHA-256 fingerprinting, and predicate caching.

🚀

Opinions & Advice

SQL Is Solved. Here's Where Chat-BI Still Breaks (7 minute read)

Empirical testing of agentic chat-BI systems using BIRD and DABStep benchmarks revealed high SQL generation accuracy (over 70% correct on BIRD) but exposed critical failure nodes: ambiguous metric definitions, out-of-scope questions, and common-sense gaps. Context and rule files (e.g., RULES.md) help but induce compounding errors and overfitting as complexity grows. Iterative human-in-the-loop evaluation, structured error classification, deterministic metric definitions, and reproducible CI testing are essential for reliability.

The Reckoning Is Already Here (3 minute read)

AI tools are already replacing a lot of routine data engineering and analytics work right now (not in the future), so prioritize deep business understanding, irreplaceable domain expertise, strong community ties, and staying ahead by mastering the newest AI models.

Layer by Layer, We Built Data Systems No One Understands (6 minute read)

The modern data stack has evolved into incomprehensible "fractal" complexity through endless layering of tools, driven by promises of "ease" that enable rapid prototyping but foster departmental silos, decision avoidance, unchecked AI/LLM code generation, business logic over-modeling, and disconnection from real business value.

How Long Until We Call AI Agents Data Products (7 minute read)

AI agents in production must be managed as full-fledged data products, requiring rigorous observability, security, and iterative product analytics beyond standard logging. Treating agent interactions as actionable feedback loops drives roadmap decisions, while layered security and conversational discoverability are essential for user trust and adoption.

💻

Launches & Tools

Stop Calling Tools, Start Writing Code (Mode) (8 minute read)

The code mode pattern improves MCP tool usage by having the LLM write and execute a script that composes multiple tools in a sandbox, instead of calling tools sequentially. This reduces context window bloat and round-trip overhead, making large tool catalogs far more scalable and efficient for LLMs to use.

PostgreSQL Blink-tree Implementation (7 minute read)

PostgreSQL implements a high-concurrency version of B-tree indexes called Blink-Tree, adding a simple "link" pointer between sibling nodes and a "high-key" boundary marker in each node. This lets searches move quickly to the right sibling if needed without holding locks across multiple levels (no lock-coupling during reads), while structure changes like page splits use brief bottom-up lock-coupling on just a few nodes at a time, reducing lock contention dramatically.

PgJitter (GitHub Repo)

PgJitter is a lightweight PostgreSQL extension that replaces the default LLVM JIT compiler with faster alternatives (sljit, AsmJIT, and MIR), enabling native code generation in microseconds instead of milliseconds. This dramatically reduces compilation overhead and makes JIT practical for a wider range of queries, especially OLTP workloads.

🎁

Miscellaneous

AI Evals in the Real World: Human Judging, LLM Judges, and the Gaps Between (4 minute read)

Regular AI test scores don't work well for customer-service bots that need to keep conversations going, understand hidden intent, and actually get users to share contact info. The team built a better scoring system that mixes human taste-testing for tricky parts with LLM-as-judge auto-scoring for scale, plus human spot checks on bad cases.

Something is afoot in the land of Qwen (5 minute read)

The Qwen 3.5 open-weight model family from Alibaba is gaining attention for delivering strong performance across a wide range of model sizes, including very small models that run locally while still supporting reasoning and multimodal tasks. However, the project's future is uncertain after the sudden resignation of its lead researcher and several core team members following an internal Alibaba reorganization.

⚡

Quick Links

Queues for Apache Kafka Is Here: Your Guide to Getting Started in Confluent (11 minute read)

Queues for Kafka is now generally available on Confluent Cloud and will be released shortly on Confluent Platform, introducing queue semantics and elastic consumer scaling natively to Kafka via KIP-932.

The next evolution of Delta - Catalog-Managed Tables (6 minute read)

Delta Lake 4.1.0 introduces catalog-managed tables, shifting control from filesystem paths to a central catalog for metadata, governance, and commits, improving discovery and cross-engine interoperability.

Want to advertise in TLDR? 📰

If your company is interested in reaching an audience of data engineering professionals and decision makers, you may want to advertise with us.

Want to work at TLDR? 💼

Apply here, create your own role or send a friend's resume to jobs@tldr.tech and get $1k if we hire them! TLDR is one of Inc.'s Best Bootstrapped businesses of 2025.

If you have any comments or feedback, just respond to this email!

Thanks for reading,
Joel Van Veluwen, Tzu-Ruey Ching & Remi Turpaud

Preview

TLDR Data 2026-03-05

Deep Dives

Opinions & Advice

Launches & Tools

Miscellaneous

Quick Links

More templates

1 week until London Marathon 🇬🇧

You don't want to miss these

New Shapes

Shorts and skirts for spring