Unified Context-Intent Embeddings for Scalable Text-to-SQL (17 minute read)
Pinterest's Analytics Agent transforms data discovery and Text-to-SQL by leveraging 2,500+ analysts' query history, governance metadata, and AI-generated documentation into a centralized, semantically searchable knowledge base. By embedding SQL intent and structural patterns, the system delivers context-aware, asset-first analytics, achieving 40% user adoption, cutting documentation effort by 70%, and dramatically improving query reliability and data trust.
|
The Practical Limits of DuckDB on Commodity Hardware (9 minute read)
DuckDB delivers warehouse-style, columnar analytics with sub-second performance on datasets up to 5 million rows and remains comfortably interactive for GROUP BY and percentile queries up to 10 million rows, even on $500 laptops (16GB RAM). On this setup, window functions become noticeably slower beyond 10M rows (1.7s at 5M, 6s at 10M, ~1 minute at 50M rows). Memory usage remains modest (<1.2GB for 50M rows), making the 1M-20M zone the sweet spot for local interactive analytics with DuckDB on consumer laptops.
|
|
Claude Code + Dives = Any data UI (11 minute read)
With Claude Code connected to MotherDuck's MCP server for live data access, anyone can quickly build custom, interactive, refreshable data apps and visualizations as shareable βDivesβ (React files with embedded live SQL queries), iterate rapidly using diff previews, and publish them directly in MotherDuck for team use.
|
It's about the strategy, stupid (15 minute read)
Most analytics work focuses on tactics (tools, audits, and tracking improvements) without asking how the data will actually support business strategy. Effective analytics starts by understanding the company's goals and decisions first, then designing data and metrics that directly support those strategic priorities.
|
|
Feldera (GitHub Repo)
Feldera is a query engine for incremental computation written in Rust. It continuously updates materialized views from inserts, updates, and deletes instead of recomputing everything. It supports full SQL, handles larger-than-memory data, connects to sources like Kafka, S3, CDC, and warehouses, and provides strong consistency so results match equivalent batch execution, with low-latency, high-throughput processing for real-time analytics and ETL workloads at scale.
|
Skore Is Live: Track Your Data Science (8 minute read)
Skore is an open-source layer around scikit-learn focused on model evaluation, comparison, and experiment/report persistence for teams. It reduces evaluation boilerplate, adds methodological guardrails, and makes collaboration more reproducible. Skore means cleaner handoffs from notebooks to production through structured artifacts, metrics, and project-level tracking.
|
Inside the flight path of real-time ingestion in Apache Pinot (13 minute read)
To guarantee a single consistent "winning" segment across multiple Pinot server replicas consuming the same Kafka partition, Pinot uses a lightweight controller-orchestrated blocking commit protocol where the controller elects a committer based on the max offset, while non-committers discard locals and download the official version.
|
|
Scaling to 120+ AI Agents Without Losing Control (23 minute read)
Switch to a multi-agent system with a conductor-specialists pattern when scaling beyond 15 tools or 3 conflicting domains. Multi-agent design maximizes per-task quality but adds orchestration complexity, hybrid retrieval outperforms single approaches, and tightly-scoped tool profiles reduce token waste. Example architecture: VoltAgent for orchestration, SurrealDB for unified vector/graph/relational storage, hybrid retrieval (0.6 vector, 0.2 graph, 0.2 keyword), and cost control via dynamic task routing (with Haiku for classification).
|
Anthropic's Compute Advantage: Why Silicon Strategy is Becoming an AI Moat (11 minute read)
Anthropic has established a structurally superior, diversified compute stack (leveraging AWS Trainium2 and Google TPUv7) delivering 30β60% lower per-token costs than Nvidia-only configurations and enabling 2+ gigawatts of dedicated capacity. This architecture, secured through $52 billion in long-term commitments with Broadcom, AWS, and Google, grants Anthropic unmatched negotiating leverage, cost-efficiency, and iteration speed. In contrast, OpenAI and Microsoft remain largely dependent on Nvidia, facing significantly higher inference costs and delayed internal silicon programs.
|
|
|
|
|