Optimizing Recommendation Systems with JDK's Vector API (9 minute read)
Netflix reduced CPU utilization for its Ranker service's serendipity scoring feature from 7.5% to ~1% per node by re-architecting its scoring logic. Key optimizations included transitioning from O(M×N) scalar dot products to batched, cache-friendly matrix multiplies with flat buffers, leveraging the JDK Vector API for SIMD performance gains in pure Java, and eliminating unnecessary allocations. These changes yielded a 7% CPU drop, 12% latency reduction, and 10% improvement in CPU/RPS.
|
Zero-Waste Agentic RAG: Designing Caching Architectures to Minimize Latency and LLM Costs at Scale (19 minute read)
A validation-aware, two-tier caching strategy for production-grade RAG systems reduces LLM token costs by over 30% and slashes response times from ~36 seconds to milliseconds for semantically similar queries. Combining semantic caching (embedding-based, ~95% similarity) and retrieval caching (context/topic-level, >70%), the architecture addresses redundancy, data staleness, and cache invalidation via timestamp checks, SHA-256 fingerprinting, and predicate caching.
|
|
SQL Is Solved. Here's Where Chat-BI Still Breaks (7 minute read)
Empirical testing of agentic chat-BI systems using BIRD and DABStep benchmarks revealed high SQL generation accuracy (over 70% correct on BIRD) but exposed critical failure nodes: ambiguous metric definitions, out-of-scope questions, and common-sense gaps. Context and rule files (e.g., RULES.md) help but induce compounding errors and overfitting as complexity grows. Iterative human-in-the-loop evaluation, structured error classification, deterministic metric definitions, and reproducible CI testing are essential for reliability.
|
The Reckoning Is Already Here (3 minute read)
AI tools are already replacing a lot of routine data engineering and analytics work right now (not in the future), so prioritize deep business understanding, irreplaceable domain expertise, strong community ties, and staying ahead by mastering the newest AI models.
|
Layer by Layer, We Built Data Systems No One Understands (6 minute read)
The modern data stack has evolved into incomprehensible "fractal" complexity through endless layering of tools, driven by promises of "ease" that enable rapid prototyping but foster departmental silos, decision avoidance, unchecked AI/LLM code generation, business logic over-modeling, and disconnection from real business value.
|
How Long Until We Call AI Agents Data Products (7 minute read)
AI agents in production must be managed as full-fledged data products, requiring rigorous observability, security, and iterative product analytics beyond standard logging. Treating agent interactions as actionable feedback loops drives roadmap decisions, while layered security and conversational discoverability are essential for user trust and adoption.
|
|
Stop Calling Tools, Start Writing Code (Mode) (8 minute read)
The code mode pattern improves MCP tool usage by having the LLM write and execute a script that composes multiple tools in a sandbox, instead of calling tools sequentially. This reduces context window bloat and round-trip overhead, making large tool catalogs far more scalable and efficient for LLMs to use.
|
PostgreSQL Blink-tree Implementation (7 minute read)
PostgreSQL implements a high-concurrency version of B-tree indexes called Blink-Tree, adding a simple "link" pointer between sibling nodes and a "high-key" boundary marker in each node. This lets searches move quickly to the right sibling if needed without holding locks across multiple levels (no lock-coupling during reads), while structure changes like page splits use brief bottom-up lock-coupling on just a few nodes at a time, reducing lock contention dramatically.
|
PgJitter (GitHub Repo)
PgJitter is a lightweight PostgreSQL extension that replaces the default LLVM JIT compiler with faster alternatives (sljit, AsmJIT, and MIR), enabling native code generation in microseconds instead of milliseconds. This dramatically reduces compilation overhead and makes JIT practical for a wider range of queries, especially OLTP workloads.
|
|
Something is afoot in the land of Qwen (5 minute read)
The Qwen 3.5 open-weight model family from Alibaba is gaining attention for delivering strong performance across a wide range of model sizes, including very small models that run locally while still supporting reasoning and multimodal tasks. However, the project's future is uncertain after the sudden resignation of its lead researcher and several core team members following an internal Alibaba reorganization.
|
|
|
|
|