Production Support Engineering LMTS

Salesforce•Seattle, WA

60d•Hybrid

About The Position

Salesforce is seeking a Production Support Engineer (LMTS) to join their embedded reliability team. This role is for a senior technical lead focused on ensuring the reliability and scalability of the Agentforce for Supply Chain platform. The engineer will work on production excellence, performance tuning, and infrastructure automation, with a seat at the table during design reviews to ensure new features are built to scale. The team operates as a high-velocity startup within Salesforce, focusing on scaling architecture, hardening systems, and integrating with the Agentforce ecosystem.

Requirements

5+ years of experience in SRE, Production Engineering, or Backend Engineering with a heavy focus on operations and scale.
Proven Scaling Experience: Previously helped take a product through a high-growth phase, dealing with technical debt and architectural shifts.
Technical Breadth: Strong proficiency in Kubernetes, Terraform/OpenTofu, and AWS/GCP/Azure.
Coding Mastery: Ability to write and review production-level code in Golang, TypeScript, or Python.
Systems Expert: Deep understanding of distributed systems, including debugging complex interactions between microservices, databases, and AI agents.
Low-Ego Collaboration: Experience working within a senior team of Principal engineers, capable of leading initiatives and supporting the broader group’s technical vision.
Demonstrated, genuine AI-first approach to engineering.
Experience using AI tools (e.g., Claude Code, GitHub Copilot, Codex, Cursor, etc.) in development workflows.
Advanced prompt engineering skills and the ability to write precise, structured prompts and cultivate system context.

Nice To Haves

M.S. in Computer Science or equivalent practical experience.
Strong experience with PostgreSQL at scale (partitioning, indexing, query tuning).
Advanced knowledge of microservice orchestration and durability patterns, including hands-on experience with Temporal for workflow reliability and service mesh.
Experience with the unique data constraints and reliability requirements of manufacturing or global logistics.
Familiarity with Salesforce infrastructure, Hyperforce, or Data Cloud.
Deep knowledge of networking, security, and identity management within major cloud providers.

Responsibilities

Own the reliability roadmap for major product areas, transitioning systems from startup-speed architectures to highly-available, global-scale enterprise solutions.
Partner with PMTS-level engineers to refine infrastructure strategy, contributing senior-level perspectives on system design, capacity planning, and bottleneck identification.
Maintain and evolve automated environments, focusing on making the "infrastructure-as-plugins" model more robust and developer-friendly.
Support the scaling of AI/ML infrastructure, ensuring models have the necessary GPU resources and data pipelines.
Lead the hardening of the observability stack, building tooling to prevent incidents and telemetry to explain them.
Deep-dive into SQL optimization, API latency, and cross-service communication to ensure the platform remains performant under heavy load.
Utilize AI tools (Claude Code, etc.) to automate routine operational tasks and accelerate infrastructure delivery.
Contribute to building and maintaining the shared system context for AI operations.
Critically evaluate code (Human or AI-generated) for correctness, quality, security, and performance.