Staff AI Platform Engineer

CarParts.comLong Beach, CA
$166,000 - $232,000Onsite

About The Position

This role is for an exceptionally capable, AI-native engineer who will be responsible for the entire platform engineering and SRE function. The engineer will leverage autonomous agents, LLM-powered pipelines, and MCP-based tooling to manage the platform. The position is on-site and requires close partnership with engineering leadership. The engineer will inherit a mature, fully containerized AWS estate including EKS clusters, an Akamai CDN layer, GitHub Actions and Jenkins CI/CD pipelines, and an operational AI agent platform called OpsWhisperer. The primary responsibilities include extending these systems, automating manual processes, and ensuring speed, precision, and intelligence in deployments, incidents, and infrastructure changes.

Requirements

  • 10+ years of hands-on DevOps, SRE, or platform engineering experience in production AWS cloud environments.
  • Deep AWS expertise: EKS, EC2, SQS, CloudWatch, IAM, Organizations, and multi-account architectures.
  • Strong Kubernetes skills: cluster operations, node group management, workload isolation, taints/tolerations, auto-scaling.
  • Experience with Akamai or equivalent enterprise CDN — configuration, purge operations, traffic routing rules.
  • CI/CD ownership: GitHub Actions and/or Jenkins pipeline design, monorepo build systems, release gating.
  • Production experience building or operating AI agents — LLM integration, autonomous workflow design, prompt engineering.
  • Proficiency in Node.js and/or Python for automation, tooling, and MCP server development.
  • Observability stack ownership: Elastic/Kibana, log analysis, alerting design, SLO/SLI instrumentation.
  • Comfortable owning on-call responsibility for a production e-commerce platform with significant revenue exposure.
  • Strong written and verbal communication skills.
  • Based in or willing to relocate to the Los Angeles / Long Beach area for on-site work.

Nice To Haves

  • AI fluency is a non-negotiable expectation for this role.

Responsibilities

  • Own the entire platform engineering and SRE function using autonomous agents, LLM-powered pipelines, and MCP-based tooling.
  • Extend the existing AWS estate, including EKS clusters, EC2 worker nodes, SQS pipelines, and AWS Bedrock for AI workloads.
  • Manage Kubernetes and containerization, including EKS clusters, node group management, Kops clusters, and environment isolation.
  • Oversee CI/CD and release management, including GitHub Actions workflows, Jenkins pipeline management, Turbo build system, and canary release gating/rollback automation.
  • Manage the Akamai CDN layer, including Property Manager configuration, Phased Release Cloudlet, security, throttling, monitoring, and cache invalidation.
  • Own observability and incident response, including Elastic/Kibana, CloudWatch, business performance monitoring, SQS backlog alerting, and AI-assisted triage.
  • Build, operate, and improve autonomous agents for monitoring, alerting, triage, and routine operational work.
  • Extend OpsWhisperer, contribute to the Axle platform, build MCP servers, and apply LLM-powered reasoning to infrastructure problems.
  • Handle on-call responsibility for a production e-commerce platform.
  • Interface with engineering leadership and present findings to executives.

Benefits

  • Opportunities for growth and advancement (Promote from Within)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service