MTS Site Reliability Engineer

Aviatrix•Champaign, IL

55d•Remote

About The Position

The Aviatrix SRE team is a small but highly skilled global group of Systems Engineers/SREs dedicated to ensuring the reliability, availability, and performance of Aviatrix’s critical systems and services. Our mission is to build and maintain a robust, resilient infrastructure that enables Aviatrix to deliver high-quality services with agility through automation, best practices, and a culture of operational excellence. As a Member of Technical Staff (MTS) Site Reliability Engineer, you’ll be developing your foundational SRE skills while contributing to the reliability and performance of our systems. You’ll work under supervision to implement solutions, learn our infrastructure, and gain hands-on experience with production systems.

Requirements

Experience: 0-3+ years in software engineering, system administration, or related technical roles
Education: BS in Computer Science, Engineering, or related field
Programming: Basic proficiency in at least one programming language (Golang, Python preferred)
Cloud Platforms: Foundational knowledge of cloud platforms (AWS, Azure, GCP)
Linux: Basic Linux system administration skills
Version Control: Experience with Git and code review processes
Communication: Strong communication skills and eagerness to learn
Problem-Solving: Demonstrated ability to solve technical problems independently

Responsibilities

Kubernetes: Learn to manage basic application deployments, assist with troubleshooting, and support monitoring tasks
Infrastructure as Code: Implement IaC for straightforward provisioning tasks and configuration changes
Automation & Development: Contribute to existing automation tools and frameworks in Golang and Python
Basic System Maintenance: Contribute to system reliability through routine maintenance tasks and monitoring
Implementation Support: Implement well-defined solutions for moderate complexity technical problems
Reliability Engineering: Learn fundamentals of system reliability; contribute to maintaining uptime for well-defined services under guidance
Automation Excellence: Execute basic automation scripts; contribute to existing automation frameworks with supervision
Observability: Implement basic monitoring configurations; learn to read dashboards and interpret common metrics
Incident Management: Participate in incident response with escalation support; document findings and learnings
Performance Engineering: Monitor system performance using established metrics; escalate performance issues appropriately
Collaboration: Participate actively in team meetings; communicate status and blockers effectively

Benefits

US : We cover 100% of employee premiums and 88% of dependent(s) premiums for medical, dental and vision coverage, 401(k) match, short and long-term disability, life/AD&D insurance, $1,000/year education reimbursement, and a flexible vacation policy.
Outside the US: We offer a comprehensive benefits package which, (subject to regional variations) could include pension, private medical for you and dependents, generous holiday allowance, life assurance, long-term disability, annual wellbeing stipend

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume