Staff Site Reliability Engineer

ProveChicago, IL
14h

About The Position

As the world moves to a mobile-first economy, businesses need to modernize how they acquire, engage with and enable consumers. Prove’s phone-centric identity tokenization and passive cryptographic authentication solutions reduce friction, enhance security and privacy across all digital channels, and accelerate revenues while reducing operating expenses and fraud losses. Over 1,000 enterprise customers use Prove’s platform to process 20 billion customer requests annually across industries, including banking, lending, healthcare, gaming, crypto, e-commerce, marketplaces, and payments. For the latest updates from Prove, follow us on LinkedIn. Prove is driving the future of digital identity. We are looking for Provers who know how to make an impact. We’re talking self-starting professionals who thrive in a fast-paced environment, process information quickly, and make intelligent decisions. The work is challenging and requires not only smart but natural curiosity and tenacity. Teamwork is also important to us – we work together and play together. Prove has big plans, and we’re excited about the future. If this sounds like the place for you – come join our team! We are seeking a Staff Site Reliability Engineer to join our Platform Engineering team. In this role, you will be instrumental in designing, deploying, and maintaining high-availability infrastructure leveraging automation, infrastructure-as code, and advanced monitoring. You’ll partner closely with our application engineering teams to ensure our services meet the highest standards of reliability, performance, and security. Staff SREs at Prove are responsible for driving 99.999% uptime for existing and developing products. Implementing, automating, and developing infrastructure are methods to these outcomes. Qualified candidates will be well versed in the difference between methods and ownership of outcomes and be able to demonstrate and document their previous relevant experience.

Requirements

  • 8+ years of experience in Site Reliability Engineering, Platform Engineering, or equivalent experience
  • Software Engineering roles with a strong infrastructure, operations, and/or production engineering aspect also qualify.
  • 4+ years of experience in technical project leadership
  • Deep understanding of cloud platforms, preferably AWS
  • Expert knowledge of observability platforms and practices (OpenTelemetry, Prometheus, Grafana, Jaeger, ELK stack)
  • Strong experience with Kubernetes and container orchestration
  • Experience with infrastructure-as-code tools (Terraform, Spacelift, OpenTofu)
  • Skilled proficiency in at least one programming language (Java, Go, Python)
  • Bachelor's degree in Computer Science, Engineering, or equivalent practical experience

Nice To Haves

  • Experience with distributed systems and microservice architectures
  • Experience working in a high compliance environment
  • Hand-on experience instrumenting code with OpenTelemetry
  • Deep familiarity with service mesh technologies
  • Contributions to open-source projects
  • Experience in the identity verification or financial technology industry
  • Application development experience

Responsibilities

  • Guide infrastructure system design and application architecture within Platform Engineering and across engineering teams
  • Develop and implement infrastructure-as-code standards using tools like Terraform and/or OpenTofu
  • Act as a strong technical partner with our engineering teams
  • Champion observability with product owners and engineering teams
  • Improve new and existing systems by increasing reliability, performance, and scalability
  • Automate routine operational tasks to reduce toil and improve efficiency
  • Ensure infrastructure security compliance and implement least-privilege access controls
  • Enhance existing CI/CD pipelines and feedback loops for maximum reliability
  • Enable auto-scaling infrastructure based on custom metrics for applications and critical observability infrastructure
  • Participate in a 24/7 on-call rotation, respond to our most critical alerts and resolve challenging events
  • Conduct thorough post-incident reviews and implement preventative measures
  • Use observability data to perform root cause analysis and identify system improvements

Benefits

  • Competitive salaries & Bonus Plan (for eligible roles) and Equity Plan
  • Modern Health for financial, mental, and physical wellness
  • 401(k) Retirement Plan & Match (US Offices) and Local Country Pension (International Offices)
  • Unlimited Vacation and Flexible hours
  • Comprehensive medical benefits for you and your family ❤️
  • Emotional & Physical Wellness – Access to wellness services (EAP & Prove Well-Being Reimbursement)
  • Bottomless snacks & beverages for certain office locations
  • Daily GrubHub stipend for lunch if coming into the office (US Offices)
  • A great place to work and connect with other talented Provers like yourself!
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service