Staff Software Engineer - DevOps

WellSkyOverland Park, KS
23hRemote

About The Position

The Staff Software Engineer - DevOps is responsible for all stages of the software development lifecycle using a variety of technologies and tools to build impactful software solutions. The scope of this job includes building and optimizing comprehensive solutions that prioritize end-user efficiency and experience. Key Responsibilities: Lead the design and architecture of major systems and services, and ensure software solutions are scalable, reliable, maintainable, and aligned with business needs. Collaborate with solution managers, engineers, data scientists, and other stakeholders to define and prioritize technical requirements that meet client needs and business objectives. Collaborate with teams to ensure sustained quality and reliability of our software solutions, and act as a go-to expert by identifying and resolving complex, high-priority issues in both development and production environments. Actively contribute to code reviews, provide constructive feedback on design and implementation, and provide technical guidance to other engineers to elevate skills, productivity, and overall effectiveness. Drive innovation by evaluating and implementing new technologies, methodologies, and AI capabilities that improve team efficiency, software performance, and development processes. Ensure code meets functional and performance requirements, advocate for high-quality software, and ensure rigorous testing processes, including automated unit tests, integration tests, and other testing frameworks. Leverage AI tools and platforms as an integral part of daily responsibilities to enhance decision-making, streamline workflows, and drive data-informed outcomes. Perform other job duties as assigned. Ensure the reliability, availability, and performance of our systems and services. Work closely with various teams to build and maintain scalable, efficient, and resilient infrastructure. Incident management; lead the response to system outages and incidents, ensuring quick resolution and minimal impact on end-users. Conduct post-incident reviews and implement improvements to prevent recurrence. Monitoring and Alerting; design, implement, and maintain monitoring and alerting systems using tools like New Relic, Grafana, and ELK stack to ensure system health and performance. Perform other job duties as assigned.

Requirements

  • Bachelor’s degree or relevant work experience
  • 8-12 years of relevant work experience
  • Proven experience in a Site Reliability Engineer role
  • Strong expertise in Kubernetes and container management
  • Experience with cloud platforms, preferably Google Cloud Platform (GCP)
  • Familiarity with observability and APM tools (e.g., New Relic, OpenTelemetry)
  • Proficiency in infrastructure as code (e.g., Terraform)
  • Solid understanding of CI/CD pipelines and deployment automation
  • Willing to work additional or irregular hours as needed
  • Must work in accordance with applicable security policies and procedures to safeguard company and client information
  • Must be able to sit and view a computer screen for extended periods of time

Nice To Haves

  • Experience with Azure DevOps Pipelines and Argo CD
  • Strong networking fundamentals, including experience with Istio and service mesh technologies
  • Healthcare industry experience

Responsibilities

  • Lead the design and architecture of major systems and services, and ensure software solutions are scalable, reliable, maintainable, and aligned with business needs.
  • Collaborate with solution managers, engineers, data scientists, and other stakeholders to define and prioritize technical requirements that meet client needs and business objectives.
  • Collaborate with teams to ensure sustained quality and reliability of our software solutions, and act as a go-to expert by identifying and resolving complex, high-priority issues in both development and production environments.
  • Actively contribute to code reviews, provide constructive feedback on design and implementation, and provide technical guidance to other engineers to elevate skills, productivity, and overall effectiveness.
  • Drive innovation by evaluating and implementing new technologies, methodologies, and AI capabilities that improve team efficiency, software performance, and development processes.
  • Ensure code meets functional and performance requirements, advocate for high-quality software, and ensure rigorous testing processes, including automated unit tests, integration tests, and other testing frameworks.
  • Leverage AI tools and platforms as an integral part of daily responsibilities to enhance decision-making, streamline workflows, and drive data-informed outcomes.
  • Perform other job duties as assigned.
  • Ensure the reliability, availability, and performance of our systems and services.
  • Work closely with various teams to build and maintain scalable, efficient, and resilient infrastructure.
  • Incident management; lead the response to system outages and incidents, ensuring quick resolution and minimal impact on end-users.
  • Conduct post-incident reviews and implement improvements to prevent recurrence.
  • Monitoring and Alerting; design, implement, and maintain monitoring and alerting systems using tools like New Relic, Grafana, and ELK stack to ensure system health and performance.

Benefits

  • Excellent medical with Rx, dental, and vision benefits
  • Mental Health support through EAP
  • Generous paid time off, plus 13 paid holidays
  • 100% vested 401(K) retirement plans
  • Educational assistance up to $2500 per year
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service