Sr. Reliability Engineer, Digital Commerce

SkechersManhattan Beach, CA
Hybrid

About The Position

The Sr. Reliability Engineer, Digital Commerce is responsible for ensuring the stability, performance, and operational readiness of the global digital commerce ecosystem. This role owns end-to-end reliability of the customer shopping journey – from storefront experience and product discovery through checkout, order lifecycle, and commerce integrations – with a specific focus on the Salesforce Commerce Cloud (SFCC) ecosystem including B2C Commerce storefronts, integrations, and commerce services. Working at the intersection of engineering, product, and operations, this engineer drives proactive reliability practices, observability standards, incident management discipline, and automation initiatives that reduce operational risk and strengthen digital commerce resilience at global scale.

Requirements

  • Hands-on experience supporting Salesforce Commerce Cloud (SFCC) production environments, including composable commerce ecosystems integrating SFCC with CMS, search, personalization, and middleware platforms.
  • Experience supporting high-traffic global eCommerce environments with modern commerce architectures including headless, composable, and microservices-based platforms.
  • Strong background in incident management, observability, and operational excellence practices, with hands-on experience with observability platforms such as Datadog.
  • Familiarity with order management systems, payment platforms (such as Cybersource or Adyen), or commerce SaaS ecosystems; exposure to Manhattan Active Order Management (MAO) is a strong plus.
  • Experience with CI/CD pipelines, deployment strategies, release governance, APIs, event-driven systems, and commerce integrations.
  • Strong understanding of distributed systems, cloud-native infrastructure, and performance optimization for web applications and backend services.
  • Experience leveraging AI-assisted engineering tools to improve operational efficiency and automation.
  • Strong analytical mindset with the ability to connect technical reliability to business outcomes and communicate effectively with both technical and non-technical stakeholders.
  • Bachelor's degree in Computer Science, Engineering, or related field, or equivalent experience.
  • 7+ years in Site Reliability Engineering, Production Engineering, or Digital Commerce Platform Operations.

Responsibilities

  • Own end-to-end operational reliability across the digital commerce stack, including storefront availability, product catalog and pricing services, search and discovery, checkout and payment processing, order lifecycle, and fulfillment integrations (OMS, WMS, payment gateways, tax, fraud, and shipping).
  • Ensure stability and performance of the Salesforce Commerce Cloud (SFCC) ecosystem, including Business Manager configurations, WebDAV operations, replication processes, cartridge-based customization layers, and headless/microservice components integrated with SFCC.
  • Establish operational standards and reliability guardrails for commerce services and all dependent systems across varying traffic conditions, including peak demand periods.
  • Partner with order management teams to ensure reliability across Manhattan Active Order Management (MAO) order routing, fulfillment execution integrations, and downstream fulfillment event integrity, including BOPIS flows.
  • Design and implement monitoring frameworks across digital commerce services, with proactive detection of conversion-impacting issues before they affect customers.
  • Define and manage SLIs, SLOs, and alerting strategies tied to business impact including conversion degradation, checkout failure rates, order placement success, and site performance and latency.
  • Build operational dashboards that translate technical signals into revenue and customer experience insights.
  • Implement monitoring across SFCC-specific signals including pipeline performance, OCAPI health, SCAPI latency, cache effectiveness, replication health, third-party integration response times, and MAO order orchestration signals such as routing latency, fulfillment status synchronization, and exception queue health.
  • Lead coordination of high-severity commerce incidents, including triage, root cause analysis, systemic remediation planning, and improved MTTR through automation, tooling, and process optimization.
  • Establish and maintain incident runbooks, operational playbooks, and continuous operational readiness standards across commerce platforms.
  • Own operational readiness and release planning for major commerce launches, campaigns, and seasonal peak events, including SFCC traffic scaling strategy validation.
  • Partner with Salesforce Commerce Cloud support during platform incidents, managing severity escalation processes and coordinating internal response during platform-level disruptions.
  • Identify and remediate performance bottlenecks impacting site speed, checkout latency, and service responsiveness, including SFCC-specific optimization across page caching, CDN configuration, search indexing, and cartridge execution efficiency.
  • Partner with engineering teams to drive performance optimization initiatives, support load testing, and own capacity planning and peak readiness validation.
  • Ensure commerce systems scale reliably to support business growth and global expansion.
  • Develop automation to reduce manual operational effort and recurring incident classes, including SFCC deployment validation, replication monitoring, integration failure detection, and release risk scoring.
  • Implement reliability engineering patterns such as automated recovery workflows, self-healing service orchestration, reliability validation pipelines, and operational health scoring.
  • Drive adoption of reliability engineering best practices across delivery teams.
  • Partner with product, engineering, merchandising, marketing, and operations teams to align reliability priorities with business objectives, serving as a reliability advocate during architecture design and solution reviews.
  • Act as the reliability liaison between internal commerce engineering teams and Salesforce Commerce Cloud platform teams, coordinating with external vendors and SaaS providers during incident resolution and performance optimization.
  • Translate technical reliability risks into clear business impact narratives for both technical and non-technical stakeholders.

Benefits

  • Comfort innovation is at the core of everything we do, driving the development of stylish, high-quality products at a great value.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service