Senior Testing Engineer

Robots and Pencils

About The Position

We’re looking for a Senior Testing Engineer to join our team and own quality across a cloud-native AI/ML platform built on AWS. This is not a QA position. It is a hands-on engineering role for someone who writes test code across a production Python and AWS stack. In this role, you will evaluate the platform, create a comprehensive test coverage plan, and drive best practices across the team. You’ll propose and implement improvements to our testing infrastructure, modify production code to improve testability when necessary, and work with the broader engineering team to establish patterns that other developers can adopt and carry forward. You will be the authoritative voice on testing automation and test engineering on this engagement, working with minimal supervision. At Robots & Pencils, we design AI systems for a human world. Our name says it all. Robots and pencils means engineering paired with creativity, because every agent we ship has to work for real people in real workflows. That balance is baked into how we operate. This platform delivers AI-powered learning and automation to real users—and its reliability depends on the quality infrastructure you build. You won’t be auditing a test suite. You’ll be building one from the ground up, shaping what “done” means across every layer of the stack: Lambdas, DynamoDB, SQS, event-driven flows, and agentic AI pipelines. When this platform works, people learn better and move faster. That’s what’s on the line.

Requirements

5+ years of professional software engineering experience with a strong focus on testing—unit, integration, E2E, and/or AI/ML system testing
Strong Python programming skills; this role writes test code, not just test plans
Hands-on experience with AWS services including Lambda, DynamoDB, SQS, S3, and EventBridge; CDK experience a strong plus
Deep expertise with PyTest and Python-native testing frameworks, with a track record of designing and scaling test automation infrastructure
Experience writing and maintaining E2E and integration tests for event-driven, serverless, or microservices architectures
Familiarity with DynamoDB single-table design and the specific challenges of testing against it
Experience building or validating agentic or LLM-based systems; comfort with evals, output consistency testing, and hallucination/accuracy validation
Strong CI/CD expertise, with experience owning quality gates in delivery pipelines (e.g. GitHub Actions)
Working knowledge of AI safety and responsible AI principles as they apply to validating LLM behavior, prompt injection defenses, and PII handling in test data
Demonstrated ability to work independently, drive architectural recommendations, and deliver with minimal supervision
Demonstrable usage of AI-forward tools such as Claude Code and Cursor
Strong problem-solving skills and sound judgment in ambiguous technical territory

Nice To Haves

CDK experience a strong plus

Responsibilities

Evaluate the platform, produce a thorough test coverage plan, and design a scalable testing architecture for the Python/AWS stack (Lambda, DynamoDB single-table design, SQS, S3, EventBridge, CDK) across unit, integration, E2E, agentic eval, and synthetic learner layers
Write production-grade test code using PyTest and Python-native frameworks, build and maintain agentic evals and synthetic learner pipelines that validate AI-driven workflows end-to-end, and own quality gates in CI/CD pipelines (e.g. GitHub Actions)—modifying production code to improve testability when warranted
Bring an AI-forward mindset to your daily work, using tools like Claude Code and Cursor to ship higher-quality work at pace
Partner with engineering and product leadership to align test strategy with delivery goals and platform architecture decisions
Translate test coverage status, quality risks, and recommended investments into terms technical and non-technical stakeholders can act on
Lead test planning sessions and release readiness assessments, driving clear go/no-go signals across the team
Establish the testing standards, frameworks, and patterns the broader engineering team adopts and extends, mentoring junior and mid-level engineers on testing practices to raise quality ownership across the team rather than centralizing it on yourself
Take ownership of quality end-to-end, including the unglamorous work of stabilizing flaky suites and paying down test debt
Evaluate and introduce emerging tools and methodologies, continuously improving testing quality and velocity without chasing novelty for its own sake

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume