About The Position

The AWS Managed Operations (MO) organization was founded in April 2023, with the objective to reduce operational load and toil through long-term engineering projects. MO is building the best-in-class engineering and operations team that will own the day-to-day operations for AWS Regions; improving the availability, reliability, latency, performance and efficiency to operate AWS regions. Amazon is looking for highly motivated Systems Engineers who can balance the day-to-day operations of AWS’ software systems with long-term software engineering to reduce operational toil. We need engineers who enjoy constantly learning and diving deep into the wide range of systems and technologies that make up one of the world’s largest cloud providers. The AWS Operations Management (AWSOM) team’s mission is to launch a new offering that will drive security, availability, performance and efficiency improvements to operate AWS Regions globally. We relentlessly remove operational toil through automation to run day-to-day operations at scale. We increase collaboration, bridge development and operations while prioritizing the needs of our customers. AWSOM team’s vision is to own the operational responsibility for all Utility Compute (UC) services in AWS commercial and sovereign regions to free up service teams time to continue innovating quickly for our customers. AWSOM will be responsible for service availability, latency, performance, efficiency, change management and monitoring. AWSOM will also directly influence the experience of our customers through recommending to service teams to build resilience and reliability into the products at the forefront.

Requirements

  • 5+ years of Linux experience
  • 4+ years of site reliability engineering (SRE), systems engineering, systems administration, DevOps, security administration, or network administration experience
  • Experience in any of the following: Python, Java, Perl, PHP, Ruby, Bash, Shell or equivalent
  • Experience developing, deploying and managing AI products at scale

Nice To Haves

  • Demonstrated ability to leverage Generative AI tools and AI-assisted development environments to accelerate prototyping, development, validation, and testing workflows
  • Experience using AI-powered code generation and review tools to rapidly iterate on infrastructure automation scripts, configuration templates, and systems tooling and to automate investigative, diagnostic, or operational tasks
  • Ability to apply GenAI capabilities to accelerate test case generation, integration testing, and validation of complex infrastructure components — reducing cycle times without sacrificing quality or security rigor
  • Comfort using GenAI tools to rapidly synthesize technical documentation, compliance frameworks, and architectural patterns — accelerating research and decision-making in ambiguous problem spaces
  • Awareness of the responsible use of GenAI in security-sensitive environments, including understanding of data handling boundaries, model limitations, and appropriate human-in-the-loop validation practices
  • 5+ years of site reliability engineering (SRE), systems engineering, systems administration, DevOps, security administration, or network administration experience
  • Knowledge of TCP/IP and networking protocols such as HTTP and DNS
  • Experience designing and developing scripts to automate operational burdens and reviewing scripting changes to ensure they meet the standards for maintainability, scalability and security
  • Experience working in 24/7 production environment

Responsibilities

  • Collaborate across diverse teams, projects, and environments to have a firsthand impact on our global customer base.
  • Solve challenging technical problems, often ones not solved before, at every layer of the stack.
  • Leverage Generative AI (GenAI) tools and AI-assisted development environments to accelerate prototyping, development, validation, and testing workflows

Benefits

  • health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
  • 401(k) matching
  • paid time off
  • parental leave
  • sign-on payments
  • restricted stock units (RSUs)
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service