Platform Engineer — Cloud Infrastructure (SMTS)

SalesforceRedwood City, CA
$148,500 - $246,000Hybrid

About The Position

The SMTS role is part of our Platform Engineering team within the Cloud Infrastructure organization. Platform Engineering is made up of platform engineers, SREs, and DevOps specialists who design, build, and operate the internal developer platform powering hundreds of Kubernetes clusters across AWS, Azure, GCP, and OCI. Whether we are automating cluster lifecycle management, hardening our GitOps delivery pipelines, or architecting autonomous agents to manage production systems, we strive to give every product team a fast, secure, and reliable path to production. We are looking for a Senior Member of Technical Staff with strong AI/ML software engineering expertise to build the next generation of intelligent, self-healing platform tools. Instead of managing GPU hardware, your focus will be applying AI solutions directly to infrastructure and operations problems. You will write core platform services in Go and Python, design multi-agent workflows to automate complex operational tasks, build RAG systems over engineering documentation, and act as the core AI amplifier — architecting the intelligent systems that multiply the entire engineering organization's output.

Requirements

  • 5+ years of professional experience in software engineering, platform engineering, or DevOps, with a recent, heavy focus on building and implementing AI solutions.
  • Strong understanding of core AI and ML concepts applied practically to software engineering, including LLM context window optimization, embedding models, semantic search, vector databases, and prompt engineering/tuning.
  • Experience building with agentic frameworks and LLM orchestration tooling to execute multi-step, autonomous tasks.
  • Good programming skills in Golang and Python, with the ability to build production-grade backend services, APIs, and microservices.
  • Solid fundamental knowledge of cloud-native infrastructure, with hands-on experience in Kubernetes and multi-cloud environments (AWS, Azure, GCP, or OCI).
  • Familiarity with continuous deployment and infrastructure-as-code concepts (GitOps with Flux/Argo CD, Pulumi, or Terraform).
  • Demonstrated agentic and automation mindset — you have a proven track record of using AI to automate complex workflows and can speak deeply on how you design AI systems to handle edge cases, tool-calling errors, and non-deterministic outputs.
  • Strong communication and collaboration skills, with a passion for teaching, raising the team’s AI literacy, and evangelizing AI solutions across engineering boundaries.

Nice To Haves

  • Hands-on experience building custom extensions, plugins, or Model Context Protocol (MCP) servers for agentic developer tools like Claude Code or GitHub Copilot.
  • Experience applying AI specifically to observability data (parsing logs, analyzing metrics, or correlating distributed traces) for predictive scaling or automated alerting.
  • Deep experience working with vector databases (e.g., Pinecone, Qdrant, Milvus, pgvector) inside platform applications.
  • Experience operating AI-driven tools within compliance-driven environments (FedRAMP, SOC 2), ensuring strong data privacy boundaries, LLM guardrails, and secure handling of sensitive cloud credentials.
  • Experience with internal developer platforms (IDPs), platform APIs, or building developer experience (DevEx) tooling.
  • Contributions to open-source projects is a plus

Responsibilities

  • Design, build, and operate platform services and infrastructure automation in Go and Python, embedding AI capabilities directly into the core platform software.
  • Architect and implement intelligent, closed-loop automation systems (AIOps) that leverage LLMs and autonomous agents to detect anomalies, perform root-cause analysis, and execute self-healing remediation playbooks.
  • Build and maintain Retrieval-Augmented Generation (RAG) applications over internal platform documentation, runbooks, and historical incident data to drastically reduce engineering MTTR.
  • Develop custom tools, CLI plugins, and Model Context Protocol (MCP) integrations that connect our cloud infrastructure APIs to agentic coding tools (like Claude Code), turning standard automation into autonomous workflows.
  • Partner with SRE, security, and platform specialists to identify highly repetitive operational work and build agentic solutions that delegate that toil to AI.
  • Maintain and improve standard continuous deployment pipelines using GitOps tooling (Flux, Argo CD) and infrastructure-as-code frameworks (Pulumi, Terraform) to ensure safe, repeatable delivery of both traditional platform code and AI-driven solutions.
  • Participate in design reviews, write clear technical documentation and RFCs, and mentor traditional platform engineers on AI/ML concepts, prompt engineering, and agentic design patterns.
  • Contribute to on-call rotations and continuously bring an AI-first perspective to improving incident management and platform post-mortems.

Benefits

  • time off programs
  • medical
  • dental
  • vision
  • mental health support
  • paid parental leave
  • life and disability insurance
  • 401(k)
  • employee stock purchasing program
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service