AWS Cloud Site Reliability Engineer

UnitedHealth GroupBasking Ridge, NJ
1dRemote

About The Position

Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by diversity and inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health equity on a global scale. Join us to start Caring. Connecting. Growing together. You will be part of a world class identity matching solution building a state-of-the-art applications that is at the center of identity management for Optum Technology. You will have a true opportunity to change the healthcare landscape for the better. Role requires to provide 24×7 operational support to all production practices on holidays and weekends. Coordinate with various teams and raise support ticket for all issues, analyze root cause and assist in efficient resolution of all production processes. Maintain logs of all issues and ensure resolutions according to quality assurance tests for all production processes. Need to have good understanding of business processes within various systems used within the application. You will need to be ambitious and willing to work out of your comfort zone. You’ll enjoy the flexibility to work remotely from anywhere within the U.S. as you take on some tough challenges. For all hires in the Minneapolis or Washington, D.C. area, you will be required to work in the office for a minimum of four days per week.

Requirements

  • Bachelor’s degree OR CS OR IT related field
  • 3+ years of experience with Cloud SDKs with AWS using Java (spring boot microservices), Scala, and Python
  • 3+ years of experience with Distributed Data services (DynamoDB/Athena or similar)
  • 3+ years of experience with AWS Cloud: S3, CloudWatch, ECS, Lambda, RDS, EMR, AWS ECS
  • 3+ years of experience with CI/CD using GitHub Actions or similar

Nice To Haves

  • Experience in Unix, Hadoop, HBase and Hive
  • Experience working with offshore and onsite teams as part of job requirement
  • Proven good communication skills
  • 3+ years of experience in Elastic APM
  • 3 years with Scala
  • 3 years with Kubernetes Clusters

Responsibilities

  • Lead and mentor a team of SREs to ensure high-quality delivery and professional growth
  • Design, build, and maintain scalable and reliable systems using cloud-native technologies
  • Develop and implement monitoring, alerting, and observability strategies to ensure optimal system performance and user experience
  • Automate operational tasks and drive infrastructure-as-code (IaC) adoption
  • Proactively identify and resolve reliability risks, bottlenecks, and performance issues
  • Leveraging AI
  • Collaborate with engineering and product teams on architecture, code reviews, and incident response
  • Lead post-incident reviews (blameless retrospectives), root cause analysis, and continuous improvement initiatives
  • Streamline migration processes, ensure consistency and enhance efficiency through automation, AI and innovative solutions
  • Define SLOs/SLIs, track error budgets, and report on system health to stakeholders
  • Ensure compliance and security standards are integrated into system operations
  • Stay current with emerging technologies and SRE best practices
  • Leverage enterprise-approved AI tools to streamline workflows, automate tasks, and drive continuous improvement

Benefits

  • comprehensive benefits package
  • incentive and recognition programs
  • equity stock purchase
  • 401k contribution
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service