Principal Operations Engineer

SalesforceNew York, NY
$197,300 - $344,700

About The Position

Salesforce's Digital Enterprise Technology (DET) organization is establishing a new, engineering-first operations function. This function aims to transition the entire organization from reactive, manual processes to automated, intelligent, and proactive operations at scale. As a Principal Operations Engineer focused on Operational Excellence, you will be a foundational technical leader in this team. Your role will involve defining how DET detects, responds to, and prevents issues, while simultaneously eliminating toil and enhancing the reliability of critical, customer-facing systems. This is a high-visibility, high-impact position for an individual eager to influence not only what is built but also how an organization operates.

Requirements

  • 12+ years of experience in engineering, operations engineering, SRE, or related roles.
  • Proven track record of automating complex operational workflows and improving reliability and operational maturity at scale.
  • Deep expertise in incident management systems, observability (metrics, logging, tracing), and distributed systems and microservices.
  • Strong experience with automation frameworks, scripting, Infrastructure as Code, and modern DevOps practices.
  • Experience operating high-availability, customer-facing systems in enterprise environments.
  • Strong written and verbal communication skills with the ability to influence senior engineering leaders and drive outcomes across teams without formal authority.
  • A related technical degree required.

Nice To Haves

  • Experience building self-service or platform-based operational tooling.
  • Background in automation-driven operations or platform engineering.
  • Experience leading large-scale incident management transformations.
  • Familiarity with AI/ML-driven operations (AIOps).
  • Experience in SaaS/PaaS enterprise environments.
  • Salesforce ecosystem experience (Apex, LWC, APIs, etc.).

Responsibilities

  • Lead the design and implementation of automation-first operations, eliminating manual workflows across incident management, alerting, escalation, runbooks, and day-to-day operational processes.
  • Build and scale alert-to-incident automation pipelines to accelerate detection and response times.
  • Identify and prioritize high-impact toil reduction opportunities across the ecosystem.
  • Drive adoption of self-healing systems and automated remediation patterns.
  • Provide Tier 2+ advanced application support for complex production issues and lead deep-dive investigations into system failures.
  • Drive a culture of automation-first thinking, ownership, accountability, and continuous improvement.
  • Lead the evolution from reactive incident response to proactive reliability engineering, improving MTTD, MTTR, and the percentage of incidents detected automatically.
  • Serve as a key technical leader in incident management, escalation strategy, and post-incident analysis.
  • Establish and enforce SLI, SLA, and SLO frameworks across critical Tier-1 services.
  • Drive deep understanding of system dependencies and failure modes.
  • Architect operational strategies with a focus on customer intent, experience, and outcomes.
  • Identify and prioritize critical user journeys, ensuring they are observable, reliable, and performant.
  • Align operational priorities with business impact.
  • Partner with stakeholders to define and execute quarterly and annual operational roadmaps (OKRs).
  • Translate business needs into scalable operational capabilities, balancing reliability, speed, and cost efficiency.

Benefits

  • time off programs
  • medical
  • dental
  • vision
  • mental health support
  • paid parental leave
  • life and disability insurance
  • 401(k)
  • employee stock purchasing program
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service