Senior Data Infrastructure Engineer

Judgment Labs•San Francisco, CA

33d•Onsite

About The Position

Judgment Labs builds infrastructure for Agent Behavior Monitoring (ABM). While traditional observability focuses on logging exceptions and latency, our ABM surfaces behavioral anomalies such as instruction drifts and context retrieval loss in scaled production environments. Hundreds of teams building autonomous agents rely on Judgment to understand how their systems are behaving post-deployment. Instead of reactive incident triage, they cluster patterns across conversations and workflows, correlate regressions to specific interaction types, and pinpoint where reliability breaks down in their usage context. We’ve raised $30M+ across two rounds in the past five months. Our investors include Lightspeed, SV Angel, Valor Equity Partners, Nova Global, Chris Manning, Michael Ovitz, Michael Abbott, Cory Levy, Kevin Hartz, and others. The Role: We are looking for a Senior Data Infrastructure Engineer to build and scale the real-time data pipelines that power agent behavior analysis at production scale. This role is crucial for processing hundreds of thousands of traces per second, running LLM-based scoring and clustering in near-real time, and delivering the low-latency query performance that enables teams to understand agent behavior as it happens. We need someone who has built petabyte-scale data systems, knows how to squeeze performance out of OLAP databases, and can own the data infrastructure from ingestion through analytics.

Requirements

Experience building and tuning high-throughput Petabyte-scale data pipelines
Deep knowledge of data infrastructure (Apache Spark, Ray, dbt, Airflow/Dagster)
Experience with OLAP database engineering
Comfortable with cloud infrastructure and batch + streaming pipelines
Senior-level ownership: you will own infrastructure roadmap, architecture design, set practices, identify bottlenecks, ship fixes.

Nice To Haves

Experience working with LLM Inference and Serving optimization techniques such as:
Speculative Decoding
Continuous batching and dynamic batching strategies
KV cache optimization and management
Quantization techniques (INT8, INT4) for reduced memory footprint
Multi-GPU serving and tensor parallelism

Responsibilities

Design the streaming pipeline that scores and clusters 100k+ traces/s workload using LLM APIs in near-real time (Kafka + Spark/Ray).
Identify LLM API Serving bottleneck via looking at flamegraphs and raise RPS via smart batching/streaming, adaptive concurrency, and connection pooling.
Speedup Clickhouse Database query, reduce p95/p99 for queries with better schemas/partitions, projections/materialized views, and tiered storage.