Senior AI Platform Engineer

Infios

1d•$170,000 - $190,000

About The Position

Infios is seeking a Senior AI Platform Engineer with deep expertise in spec-driven AI SDLC and strong hands-on experience with AWS AI infrastructure (Bedrock, Bedrock Agents, Agent Core). The role involves championing a specification-first approach to AI development, translating product requirements into AI specs, building LLM-powered and agentic applications using Spring AI, and managing the full lifecycle from prototype to production on AWS. The company values excellent problem-solving, clear communication, and engineers who bring discipline and craft to AI product delivery. Infios is a leader in supply chain software solutions, developing future technologies to improve supply chains.

Requirements

Spec-Driven AI SDLC: Deep expertise in the AI software development lifecycle with a specification-first mindset.
Experience authoring AI feature specs (acceptance criteria, evaluation metrics, prompt contracts) and driving the full lifecycle from prototyping through evaluation frameworks, A/B testing, deployment of non-deterministic systems, and production monitoring (drift detection, quality scoring, feedback loops).
Track record of shipping AI-powered features through multiple product cycles with engineering rigor.
AWS AI Infrastructure: Strong hands-on experience with Amazon Bedrock, Bedrock Agents, Agent Core, SageMaker, and Amazon Q.
Solid knowledge of core AWS infrastructure including compute (ECS/EKS, Lambda), databases (RDS, DynamoDB, ElastiCache), networking (VPC, ALB, CloudFront), and security (IAM, KMS, Secrets Manager).
Experience architecting AI infrastructure pipelines with cost optimization and high availability.
LLM Frameworks & Agentic AI: Hands-on experience building production applications with Spring AI.
Solid understanding of LLM application patterns (prompt management, RAG, context orchestration, vector stores, evaluation) and agentic workflows (multi-step agents, tool-use orchestration, planning loops).
Java, TypeScript & Python: 5+ years of professional software engineering with strong proficiency across all three languages — Java (Spring Boot, Spring Cloud), TypeScript (Node.js, modern frameworks), and Python (AI tooling, evaluation frameworks).
Comfortable choosing the right language for each task.
Enterprise & Large-Scale Systems: Experience designing and operating distributed systems at scale.
Familiarity with event-driven architectures, message brokers (Kafka, SQS/SNS), caching (Redis, ElastiCache), and relational/NoSQL database design.
DevOps & Infrastructure: Proficiency in CI/CD pipelines, Infrastructure as Code (Terraform, CloudFormation), containerization (Docker, Kubernetes/EKS), and GitOps workflows.
Problem Solving & Communication: Excellent analytical skills and the ability to tackle complex, ambiguous challenges independently.
Outstanding written and verbal communication — able to articulate technical concepts to diverse audiences and collaborate effectively across teams.
Education: Bachelor’s or Master’s degree in Computer Science, Software Engineering, or a related field (or equivalent practical experience).

Responsibilities

Define AI feature specifications upfront — including acceptance criteria, evaluation metrics, prompt contracts, and expected behaviors — and champion this spec-driven approach across the team.
Own end-to-end AI feature delivery across the full AI SDLC: spec definition, prototyping, development, evaluation, deployment, and production monitoring.
Build production-grade LLM and agentic AI applications using Spring AI — including RAG pipelines, agent orchestration, tool-use patterns, guardrails, and human-in-the-loop workflows.
Architect and operate AWS AI infrastructure (Bedrock, Bedrock Agents, Agent Core, SageMaker) alongside core AWS services (ECS/EKS, Lambda, S3, DynamoDB, RDS, API Gateway).
Design and implement scalable microservices and distributed systems in Java, TypeScript, and Python that power the Archer AI platform.
Build CI/CD pipelines for AI workloads — including LLM evaluation pipelines and automated regression testing for AI outputs — using Terraform, CloudFormation, Docker, Kubernetes, and GitHub Actions.
Drive AI-specific operational practices: observability, drift detection, quality scoring, feedback loops, and incident response for non-deterministic systems.
Communicate technical concepts clearly to both technical and non-technical stakeholders; author AI specs, design documents, and architectural decision records.
Mentor engineers, conduct thorough code reviews, and champion engineering excellence.