McKesson-posted 1 day ago
Full-time • Mid Level
Remote • Overland Park, KS
1,001-5,000 employees

McKesson is an impact-driven, Fortune 10 company that touches virtually every aspect of healthcare. We are known for delivering insights, products, and services that make quality care more accessible and affordable. Here, we focus on the health, happiness, and well-being of you and those we serve – we care. What you do at McKesson matters. We foster a culture where you can grow, make an impact, and are empowered to bring new ideas. Together, we thrive as we shape the future of health for patients, our communities, and our people. If you want to be part of tomorrow’s health today, we want to hear from you. Rx Savings Solutions (RxSS), part of McKesson’s CoverMyMeds business segment, is seeking a talented Site Reliability Engineer (SRE) to join our team! In this role, you will be instrumental in ensuring the reliability, scalability, and performance of our critical healthcare technology systems. You will apply software engineering principles to operations, focusing on automation, monitoring, and proactive problem-solving to maintain high availability and deliver exceptional user experiences. Our preferred candidate will reside in Columbus, OH, or one of our other hub locations of Overland Park KS, Irving TX or Atlanta GA. Position allows for primarily working from home, with occasional in-office time. We may consider a well-qualified candidate based not located in one of the above hub areas. At this time, we are not able to offer sponsorship for employment visas. We're unable to consider individuals currently on H1B, F-1 OPT, STEM OPT, or any other visa status that would require future sponsorship. Candidates must be authorized to work in the United States on a permanent basis without the need for current or future sponsorship.

  • System Reliability & Performance: Design, implement, and maintain robust and scalable infrastructure and applications to ensure high availability, performance, and disaster recovery capabilities
  • Automation & Tooling: Develop and implement automation scripts, tools, and processes to streamline operational tasks, reduce manual effort, and improve efficiency across the software development lifecycle
  • Monitoring & Alerting: Establish and maintain comprehensive monitoring, alerting, and logging systems to proactively identify and diagnose issues, understand system behavior, and track key performance indicators
  • Incident Response & Post-Mortem: Participate in on-call rotations, respond to and resolve critical incidents, and conduct thorough post-mortems to identify root causes and implement preventative measures
  • Capacity Planning & Optimization: Collaborate with development teams to analyze system capacity, forecast future needs, and optimize resource utilization to support business growth
  • Collaboration & Mentorship: Work closely with software engineers, product managers, and other SREs to promote a culture of reliability, share best practices, and contribute to continuous improvement
  • Documentation: Create and maintain clear and concise documentation for systems, processes, and incident runbooks
  • Security: Contribute to the implementation and enforcement of security best practices within our infrastructure and applications
  • Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent practical experience, and 2+ years of experience in a Site Reliability Engineering, DevOps, or highly related software engineering role
  • Strong proficiency in at least one scripting language (e.g., Python, Go, Ruby, Bash) for automation and tool development
  • Hands-on experience with cloud computing platforms (e.g., AWS, Azure, GCP). AWS experience is highly preferred
  • Experience with container technologies (e.g., Docker) and container orchestration platforms (e.g., Kubernetes)
  • Familiarity with Continuous Integration and Continuous Delivery (CI/CD) pipelines and tools
  • Experience with monitoring and observability tools (e.g., Datadog, Prometheus, Grafana, Splunk)
  • Strong understanding of Linux/Unix operating systems
  • Fundamental understanding of networking concepts (TCP/IP, DNS, HTTP, Load Balancing)
  • Excellent analytical and problem-solving skills with a proactive approach to identifying and resolving complex technical issues
  • Strong verbal and written communication skills, with the ability to articulate complex technical concepts to both technical and non-technical audiences
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service