Senior Site Reliability Engineer

CaptivateIQMenlo Park, CA
3dHybrid

About The Position

The Site Reliability Engineering team in CaptivateIQ operates across the engineering organization, supporting our development teams by providing them with the tools and processes they need to get their job done well. We ensure that the service provided by our product is great for the paying customers and when it isn’t we ensure that the business is well informed. We do this by providing infrastructure, platform, reliability, and observability support to our internal customers to help them achieve their goals. The team are thoughtful and pragmatic engineers who balance doing things right versus doing things right now. We invest in iterative efforts to refine or pivot our work, deliver real-world results, and reflect on the process in order to improve it incrementally. We are fully remote and invest in written communication for long term institutional memory while valuing the synchronous time we have together in order to build and strengthen our relationships.

Requirements

  • 5+ years of experience in Software Engineer, SRE or DevOps roles
  • Strong written and verbal communication skills (We use Slack, Notion, and Github)
  • Experience with Infrastructure as Code (We use Terraform and AWS)
  • Experience with containers and container orchestration tools (We use ECS)
  • Experience with authoring and maintaining code (We use Bash, Python, and Golang)
  • Experience with using and helping others observability tools and techniques (We use Datadog)
  • Love for the Oxford comma (We use, love, and respect it)

Nice To Haves

  • Experience with cloud cost management and FinOps
  • Experience in building, maintaining, and operating SaaS or Web based applications
  • Experience with distributed system principles their application
  • Experience building and operating multi-region or cell based applications
  • Experience with managing cloud vendor relationships
  • Experience with compliance and regulated environments (We use SOC2 and HIPAA)

Responsibilities

  • Learn by reading and writing designs, documentation, runbooks, and industry literature
  • Partner with development teams to design and implement reliable and resilient services
  • Build infrastructure automation that’s easy to use by other teams
  • Develop observability processes, reports, and tooling to diagnose performance and stability issues
  • Eliminate toil by automating manual processes
  • Ensure we exceed our compliance and security commitments
  • Act in an ethical and professional manner
  • Participate in an on-call rotation to provide after-hours support, ensuring timely resolution of critical issues and maintaining system uptime.

Benefits

  • (US-ONLY) 100% of medical, dental, and vision covered including 75% for dependents
  • Flexible vacation days and quarterly mental health days so you can recharge
  • (US-ONLY) 401k plan to participate in and save towards the future
  • Newest Apple products to help you do your best work
  • Employee Resource Groups (ERGs) to support and celebrate the shared identities and life experiences of communities within CaptivateIQ. ERGs directly support our company-wide DEI goals as a space for developing and retaining diverse talent
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service