Manager - Performance Engineering and Quality Assurance

Costco Wholesale CorporationIssaquah, WA
29d$137,000 - $200,000

About The Position

Costco IT is responsible for the technical future of Costco Wholesale, the third largest retailer in the world with wholesale operations in fourteen countries. Despite our size and explosive international expansion, we continue to provide a family, employee centric atmosphere in which our employees thrive and succeed. This is an environment unlike anything in the high-tech world and the secret of Costco's success is its culture. The value Costco puts on its employees is well documented in articles from a variety of publishers including Bloomberg and Forbes. Our employees and our members come FIRST. Costco is well known for its generosity and community service and has won many awards for its philanthropy. The company joins with its employees to take an active role in volunteering by sponsoring many opportunities to help others. Come join the Costco Wholesale IT family. Costco IT is a dynamic, fast-paced environment, working through exciting transformation efforts. We are building the next generation retail environment where you will be surrounded by dedicated and highly professional employees. We are looking for an experienced engineering leader to build and scale the next generation of our Application Performance and Reliability Engineering organization. In this role, you will shape the strategy, platforms, and practices that ensure Costco's global technology ecosystem is fast, resilient, and able to support the company's rapid growth. You will lead a team of engineers focused on performance engineering, reliability engineering, quality automation, and telemetry platforms. Your mission is to create a world-class, proactive approach to ensuring that every Costco application, from member-facing digital experiences to internal systems, operates reliably at global scale. This position offers the opportunity to influence engineering across the organization, solve complex distributed systems challenges, and build platforms that empower thousands of developers. If you are passionate about performance, scalability, reliability, and leading teams that build foundational engineering platforms, let's talk.

Requirements

  • Leadership experience managing engineering teams in performance engineering, reliability or SRE, observability, platform engineering, or similar disciplines.
  • Strong knowledge of distributed systems, large-scale application architectures, and cloud or hybrid environments.
  • Expertise with performance testing frameworks (K6, LoadRunner, Gatling, etc), telemetry systems (metrics, logs, tracing), and reliability tooling.
  • Ability to drive technical strategy and influence engineering leaders across an organization.
  • Experience building automated platforms and tooling that empower engineering teams.
  • Proven success delivering measurable improvements in performance, stability, and engineering efficiency.
  • Excellent communication and collaboration skills, with comfort interacting with executive leadership.

Nice To Haves

  • Background in large-scale digital ecosystems or high-traffic consumer applications.
  • Familiarity with SLO and SLA frameworks, error budgets, and reliability-driven development.
  • Experience with chaos engineering, resilience testing, or capacity modeling.
  • Experience with AIOps or automated anomaly detection platforms.
  • Strong understanding of modern CI/CD, DevOps practices, and cloud-native ecosystems.

Responsibilities

  • Lead and scale a high-performing engineering team focused on performance, reliability, and observability.
  • Set the vision and roadmap for Costco's performance and reliability strategy across all applications and services.
  • Build scalable, automated platforms for performance testing, resilience validation, telemetry, and continuous measurement.
  • Partner with engineering and architecture teams to ensure performance and reliability are built into product designs from the beginning.
  • Establish engineering standards for instrumentation, SLOs, distributed tracing, and system health metrics.
  • Drive proactive detection practices including load testing, anomaly detection, resilience testing, and capacity forecasting.
  • Improve system predictability and reduce incident impact through automation, deep telemetry, and modern AIOps concepts.
  • Coach, mentor, and grow engineering talent while fostering a culture of excellence, ownership, and continuous improvement.

Benefits

  • paid time off
  • health benefits - medical/dental/vision/hearing aid/pharmacy/behavioral health/employee assistance
  • health care reimbursement account
  • dependent care assistance plan
  • short-term disability and long-term disability insurance
  • AD&D insurance
  • life insurance
  • 401(k)
  • stock purchase plan

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Manager

Industry

General Merchandise Retailers

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service