Staff Engineer, Engineering Productivity & AI Quality

Harper•San Francisco, CA

1d•$253,000 - $308,000•Onsite

About The Position

The company is experiencing rapid growth, adding approximately 1,000 customers per month and growing 100x year-over-year, scaling towards a Series B funding round. The increasing volume of AI-generated code presents scaling challenges, creating surface area, review burden, and architectural drift that mimic a much larger organization. This role is crucial for building the necessary infrastructure ('rails') to manage this AI-generated code effectively, preventing services from becoming rework traps and ensuring the CTO's vision for an efficient engineering organization is realized. The core thesis is that successful AI companies build an invisible machine of harnesses, tests, instructions, and review loops to enable small teams to operate with immense leverage. This position is the founding seat for that machine, responsible for translating the CTO's preferences into systems like PR preflight, integration tests, architecture rules, agent instructions, evaluation gates, and feedback loops that enhance the daily experience of every engineer. The ultimate goal is to make the correct engineering practices the easiest to follow, allowing Harper's engineering output to compound with each deployment.

Requirements

8+ years of software engineering experience, with at least 3 years at a Senior+ level in a high-growth company.
Proven track record of building developer productivity, platform, CI/CD, build systems, test infrastructure, or internal tooling that has been adopted by other engineers.
Experience with production AI/ML systems, including agent harnesses, evaluation frameworks, LLM-as-judge graders, and prompt/context engineering.
Strong written communication skills, demonstrated through RFCs, architecture-rule documentation, lint-rule rationales, and internal playbooks.
Based in San Francisco or willing to relocate.

Nice To Haves

Experience building or contributing to evaluation framework infrastructure (open-source or internal).
Experience building developer platforms at an AI-native or high-growth company.
Experience with custom lint-rule / structural-test authoring at scale.
Experience building or operating agent harnesses (sandboxing, isolation, agent execution environments).
Experience working alongside a CTO whose architectural taste needed to be encoded into mechanical rules.

Responsibilities

Define and enforce CI/CD quality gates across Harper's critical services to set the minimum standard for code merging.
Develop integration test harnesses that address real failure modes, turning every recurring operational failure into a regression test, validation, or architecture rule.
Build and maintain the agent harness substrate, including sandbox lifecycle, tool routing, prompt/context layers, model-provider abstraction, and multi-agent coordination.
Manage repo-level agent instructions and context hygiene, including AGENTS.md files, canonical data model documentation, and banned patterns to shape the information environment for coding agents.
Implement automated PR preflight checks to provide summaries of service impact, tests run, missing tests, model/migration changes, and critical-path warnings before human review.
Enforce architecture rules through custom linters and structural tests that mechanically translate the CTO's architectural preferences.
Develop and maintain the eval framework infrastructure, including pre-merge eval gating, experiment runs against curated datasets, and production trajectory monitoring.
Track and report on key engineering metrics such as rework rate, escaped defects, flaky test count, deploy rollbacks, time-to-confident-ship, and AI-generated PR quality, focusing on impact rather than vanity metrics.