QA Engineer

Titan AI

4d•Remote

About The Position

Titan is scaling rapidly and needs to establish a formal QA function to ensure the quality and reliability of its AI software for banks. Currently, there is no formal QA process, evaluation framework, regression baseline, or quality gate in CI/CD. This role is critical to address these gaps before customer growth accelerates. This is a hands-on, individual-contributor role where the successful candidate will be responsible for building the QA function from the ground up, including writing test cases, developing an evaluation framework, implementing CI/CD quality gates, and triaging bugs. Once the QA practice is stable and documented, the role will involve bringing in additional QE engineers to scale the function.

Requirements

Seven or more years in software QA engineering.
At least two years personally testing AI or ML systems.
Experience writing test cases against LLM outputs.
Experience building evaluation pipelines from scratch.
Fluent in Python.
Experience building automated test suites using pytest, Playwright, or Selenium.
Hands-on experience with RAGAS, DeepEval, LangSmith, or comparable evaluation tooling.
Ability to trace failures from the application layer to infrastructure.
Understanding of Azure, async systems, and REST APIs.
Experience integrating QA gates into CI/CD pipelines and owning the process end to end.
Not here to manage, but to build and test.

Nice To Haves

Experience in fintech, banking, or another regulated environment.
Familiarity with document processing pipelines.
Familiarity with multi-agent architectures.
Familiarity with RAG validation.
Familiarity with observability tooling such as Arize or Langfuse.

Responsibilities

Design and execute the evaluation framework for LLM and agentic AI outputs across Foundry, Agent Builder, and client-deployed instances.
Write assertions and define behavioral contracts for AI outputs.
Own regression baselines for model behavior, considering distributions and confidence intervals.
Build tooling to support AI evaluation.
Write and maintain automated test suites, including end-to-end, integration, and regression tests for backend APIs, document ingestion pipelines, AI inference workflows, and frontend surfaces.
Own performance and load testing for latency-sensitive inference paths.
Set up and enforce quality gates in CI/CD pipelines.
Triage production bugs, write reproduction cases, and own regression tests.
Produce test artifacts, audit logs, and process documentation to meet SOC 2 Type II standards.
Work with Forward Deployed Engineering on client-side validation and production issue reproduction.