Sr. DevOps Engineer (Reliability Focus)

Quantum Electronic Payments•Anaheim, CA

11h•$85,000 - $115,000•Onsite

About The Position

Unlock your potential with Quantum ePay® We're a full-service financial technology provider that helps businesses lower costs, earn more, and operate with confidence. We power mission-critical payment processing platforms used by merchants and partners across the U.S. We're seeking a Senior DevOps Engineer to take ownership of day-to-day DevOps operations and production reliability for our core systems. This role is hands-on and execution-focused, responsible for ensuring system availability, leading incident response, improving observability, and driving operational maturity across our infrastructure. You'll work closely with Engineering, Product, and Support teams to keep our platforms stable, performant, and resilient as we scale. This role is ideal for a DevOps engineer looking to take full ownership of production systems and reliability in a growing fintech environment.

Requirements

3+ years of experience in DevOps, production engineering, or related roles.
Prior experience leading or acting as a senior technical owner for production systems.
Strong hands-on experience with AWS and production monitoring/alerting.
Proven experience supporting high-availability, customer-facing platforms.
Strong written and verbal communication skills.

Nice To Haves

Experience in fintech, payments, or regulated environments.
Familiarity with event-driven architectures (e.g. Kafka).
Experience with CI/CD, automation, and infrastructure-as-code.
Experience owning on-call rotations, SLAs, and reliability metrics.

Responsibilities

Lead day-to-day DevOps and production support, ensuring system availability, performance, and reliability.
Drive incident resolution, root cause analysis, and long-term remediation.
Maintain and improve runbooks, SOPs, and escalation paths.
Continuously reduce MTTR (mean time to resolution) through tooling, automation, and process improvements.
Design and maintain monitoring, logging, and alerting across infrastructure and applications.
Optimize observability using tools such as AWS, Sentry, Grafana, Airflow, and Kafka.
Ensure alerts are actionable and dashboards provide real-time operational visibility.
Lead incident response and on-call coordination, including severity classification and real-time resolution.
Own post-incident reviews and corrective action tracking.
Monitor and report on MTTR, incident trends, and system availability.
Partner with engineering teams to improve resilience and fault tolerance.
Serve as the primary operational liaison between DevOps, Engineering, Product, and Support.
Provide clear, concise incident and operational summaries to leadership.
Improve DevOps workflows, incident tracking, and documentation using Jira and Confluence.
Ensure client-impacting issues are prioritized, resolved, and communicated effectively.

Benefits

This role includes biannual profit-sharing bonuses as part of a total compensation package, in addition to a full range of medical, dental, retirement planning, and other benefits.
Flex PTO!
New state-of-the-art, open-concept facility with stand-up desks, balance boards, stationary bikes, and more!
Work hard, play hard culture!
Monthly Beer Socials and BBQs!
Proven "promote from within" mentality!
Medical, dental, vision, acupuncture, and chiropractic
401k Safe Harbor; 100% employer match processed semi-monthly, up to 4%
Profit Sharing; paid on a biannual basis

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume