About The Position

AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. The Data Center Gen AI team builds generative AI solutions for AWS data centers. Our systems orchestrate physical work processes across AWS's worldwide data centers, directly impacting millions of customers who rely on AWS services. You'll contribute to transforming data center operations through AI/ML innovations, helping build the platform primitives that dozens of teams across the Data Center Community rely on. You'll work on developing intelligent systems that optimize technician workflows, automate decision-making processes, and enhance operational efficiency across AWS's global infrastructure while supporting AI-powered capabilities for a 30K+ globally distributed user base. This role offers the opportunity to contribute to the future of AWS data center operations through innovative AI/ML solutions while working with advanced technologies at unprecedented scale.

Requirements

  • 3+ years of non-internship professional software development experience
  • Experience programming with at least one modern language such as Java, C++, or C# including object-oriented design
  • 1+ years of designing and developing large-scale, multi-tiered, multi-threaded, embedded or distributed software applications, tools, systems, and services using: C#, C++, Java, or Perl experience
  • Bachelor's degree or foreign equivalent in Computer Science, Engineering, Mathematics, or a related field

Nice To Haves

  • Generative AI / Agentic Systems — LLM integration, prompt engineering, RAG architectures, tool-calling patterns, agent frameworks (Strands, LangChain)
  • Full-Stack Serverless Engineering — AWS Lambda, API Gateway, CloudFront, DynamoDB/RDS, EventBridge, SQS, CDK Infrastructure-as-Code
  • Frontend & SDK Development — React, TypeScript, Cloudscape Design System, component library development, streaming interfaces (SSE)
  • Search & Knowledge Systems — OpenSearch/Elasticsearch, vector embeddings, hybrid retrieval, document processing pipelines, semantic chunking
  • ML & Data Engineering — SageMaker, time-series analysis, anomaly detection, classification models, feature engineering, ETL pipelines
  • Platform & DevOps — CI/CD pipeline development, progressive deployment, synthetic monitoring, observability (CloudWatch, X-Ray, OpenTelemetry)
  • Thrives in ambiguous environments and is eager to learn and adapt quickly
  • Demonstrates a bias for action with the ability to deliver results in fast-paced settings
  • Builds and maintains solid technical depth while staying customer-focused
  • Shows genuine curiosity and enthusiasm for AI/ML advancements
  • Has working knowledge of AI/ML technology application (LLMs, agents, RAG, Skills, ML models)
  • Takes ownership of features and components end-to-end, driving them to completion with guidance from senior engineers
  • Balances pragmatic execution with creative problem-solving
  • 3+ years of full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations experience
  • Experience with Machine Learning and Large Language Model fundamentals, including architecture, training/inference lifecycles, and optimization of model execution

Responsibilities

  • Design and develop AI/ML platform features and solutions, contributing to systems that serve both data center operations and engineering teams
  • Own end-to-end delivery of features and components, including AI integrations, deployment pipelines, and user-facing interfaces for non-ML experts
  • Collaborate with senior engineers and cross-functional partners to integrate ML solutions into existing DC workflows, ensuring system quality and scalability
  • Write clean, well-tested, and maintainable code while actively contributing to improvements in development processes, particularly for GenAI development and deployment
  • Participate in technical design discussions and code reviews, bringing a customer-focused perspective to architectural decisions
  • Design and implement reusable components and tools that enhance team productivity and system reliability
  • Stay current with AI/ML advancements and proactively identify opportunities to apply new techniques to data center challenges

Benefits

  • health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
  • 401(k) matching
  • paid time off
  • parental leave
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service