Senior AI Platform Engineer

Infios
$170,000 - $190,000

About The Position

Infios is seeking a Senior AI Platform Engineer with deep expertise in spec-driven AI SDLC and strong hands-on experience with AWS AI infrastructure (Bedrock, Bedrock Agents, Agent Core). The role involves championing a specification-first approach to AI development, translating product requirements into AI specs, building LLM-powered and agentic applications using Spring AI, and managing the full lifecycle from prototype to production on AWS. The company values excellent problem-solving, clear communication, and engineers who bring discipline and craft to AI product delivery. Infios is a leader in supply chain software solutions, developing future technologies to improve supply chains.

Requirements

  • Spec-Driven AI SDLC: Deep expertise in the AI software development lifecycle with a specification-first mindset.
  • Experience authoring AI feature specs (acceptance criteria, evaluation metrics, prompt contracts) and driving the full lifecycle from prototyping through evaluation frameworks, A/B testing, deployment of non-deterministic systems, and production monitoring (drift detection, quality scoring, feedback loops).
  • Track record of shipping AI-powered features through multiple product cycles with engineering rigor.
  • AWS AI Infrastructure: Strong hands-on experience with Amazon Bedrock, Bedrock Agents, Agent Core, SageMaker, and Amazon Q.
  • Solid knowledge of core AWS infrastructure including compute (ECS/EKS, Lambda), databases (RDS, DynamoDB, ElastiCache), networking (VPC, ALB, CloudFront), and security (IAM, KMS, Secrets Manager).
  • Experience architecting AI infrastructure pipelines with cost optimization and high availability.
  • LLM Frameworks & Agentic AI: Hands-on experience building production applications with Spring AI.
  • Solid understanding of LLM application patterns (prompt management, RAG, context orchestration, vector stores, evaluation) and agentic workflows (multi-step agents, tool-use orchestration, planning loops).
  • Java, TypeScript & Python: 5+ years of professional software engineering with strong proficiency across all three languages — Java (Spring Boot, Spring Cloud), TypeScript (Node.js, modern frameworks), and Python (AI tooling, evaluation frameworks).
  • Comfortable choosing the right language for each task.
  • Enterprise & Large-Scale Systems: Experience designing and operating distributed systems at scale.
  • Familiarity with event-driven architectures, message brokers (Kafka, SQS/SNS), caching (Redis, ElastiCache), and relational/NoSQL database design.
  • DevOps & Infrastructure: Proficiency in CI/CD pipelines, Infrastructure as Code (Terraform, CloudFormation), containerization (Docker, Kubernetes/EKS), and GitOps workflows.
  • Problem Solving & Communication: Excellent analytical skills and the ability to tackle complex, ambiguous challenges independently.
  • Outstanding written and verbal communication — able to articulate technical concepts to diverse audiences and collaborate effectively across teams.
  • Education: Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related field (or equivalent practical experience).

Responsibilities

  • Define AI feature specifications upfront — including acceptance criteria, evaluation metrics, prompt contracts, and expected behaviors — and champion this spec-driven approach across the team.
  • Own end-to-end AI feature delivery across the full AI SDLC: spec definition, prototyping, development, evaluation, deployment, and production monitoring.
  • Build production-grade LLM and agentic AI applications using Spring AI — including RAG pipelines, agent orchestration, tool-use patterns, guardrails, and human-in-the-loop workflows.
  • Architect and operate AWS AI infrastructure (Bedrock, Bedrock Agents, Agent Core, SageMaker) alongside core AWS services (ECS/EKS, Lambda, S3, DynamoDB, RDS, API Gateway).
  • Design and implement scalable microservices and distributed systems in Java, TypeScript, and Python that power the Archer AI platform.
  • Build CI/CD pipelines for AI workloads — including LLM evaluation pipelines and automated regression testing for AI outputs — using Terraform, CloudFormation, Docker, Kubernetes, and GitHub Actions.
  • Drive AI-specific operational practices: observability, drift detection, quality scoring, feedback loops, and incident response for non-deterministic systems.
  • Communicate technical concepts clearly to both technical and non-technical stakeholders; author AI specs, design documents, and architectural decision records.
  • Mentor engineers, conduct thorough code reviews, and champion engineering excellence.

Benefits

  • Competitive Medical, Dental, and Vision insurance
  • 401K matching program
  • Flexible Time Off
  • 11 paid holidays
  • 1 volunteer day per year
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service