General Motors-posted 4 months ago
$195,000 - $298,800/Yr
Full-time • Senior
Hybrid • Austin, TX
5,001-10,000 employees
Transportation Equipment Manufacturing

The Software Engineering Site Reliability Engineer (SRE) is responsible for ensuring the reliability, scalability, and performance of software systems. Their job profile includes: System Monitoring and Troubleshooting: Monitoring the performance and availability of software systems, identifying and resolving issues, and implementing proactive measures to prevent future incidents. Automation and Infrastructure: Developing and maintaining automation tools and infrastructure to streamline software deployment, configuration management, and system monitoring. Performance Optimization: Analyzing system performance, identifying bottlenecks, and implementing optimizations to improve the efficiency and scalability of software systems. Incident Response and Root Cause Analysis: Responding to incidents, conducting root cause analysis, and implementing corrective actions to prevent similar incidents in the future. Collaboration with Development Teams: Collaborating with software development teams to ensure that reliability and scalability considerations are incorporated into the software design and implementation. Continuous Improvement: Identifying opportunities for process improvement, implementing best practices, and driving initiatives to enhance the reliability and performance of software systems.

  • Monitoring the performance and availability of software systems, identifying and resolving issues, and implementing proactive measures to prevent future incidents.
  • Developing and maintaining automation tools and infrastructure to streamline software deployment, configuration management, and system monitoring.
  • Analyzing system performance, identifying bottlenecks, and implementing optimizations to improve the efficiency and scalability of software systems.
  • Responding to incidents, conducting root cause analysis, and implementing corrective actions to prevent similar incidents in the future.
  • Collaborating with software development teams to ensure that reliability and scalability considerations are incorporated into the software design and implementation.
  • Identifying opportunities for process improvement, implementing best practices, and driving initiatives to enhance the reliability and performance of software systems.
  • Proficiency in at least one programming language (e.g., Python, Go, Java) and familiarity with multiple language ecosystems.
  • Solid understanding of operating systems, networking, distributed systems, databases, and storage architectures.
  • Deep understanding of how code runs on underlying hardware, including operating systems, algorithms, and data structures.
  • Experience handling production incidents, including root cause analysis, mitigation, and working through complex system failures.
  • Strong communication skills, with an ability to explain technical concepts to both engineering and business stakeholders.
  • Proven experience in automating manual processes, building deployment pipelines, or managing configuration systems.
  • Bachelor's degree in computer science or related field, or equivalent work experience.
  • Experience with cloud platforms (AWS, GCP, Azure).
  • Familiarity with container orchestration systems like Kubernetes.
  • A track record of managing or developing distributed systems.
  • Prior experience with Java in production.
  • 8+ years of experience.
  • Medical, dental, vision, Health Savings Account, Flexible Spending Accounts.
  • Retirement savings plan.
  • Sickness and accident benefits.
  • Life insurance.
  • Paid vacation & holidays.
  • Tuition assistance programs.
  • Employee assistance program.
  • GM vehicle discounts.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service