Analog Devices-posted 4 months ago
$144,000 - $198,000/Yr
Full-time • Senior
Wilmington, MA
5,001-10,000 employees

We are seeking an experienced Site Reliability Engineering Manager with leadership skills to maintain and improve the reliability, availability, and performance of our cloud and on-premises-based applications and infrastructure. You will be responsible for designing, automating, and managing systems in a hybrid environment, ensuring that both on-prem and cloud services run seamlessly. This role requires strong collaboration with a globally distributed team to implement scalable, automated solutions and foster continuous improvement in operational processes.

  • Take an active, hands-on role in managing and improving the performance, scalability, and reliability of both cloud and on-prem systems.
  • Lead by example in resolving complex issues and troubleshooting system failures.
  • Design, implement, and manage infrastructure across on-premises, cloud-based, and hybrid environments to ensure high availability, performance, and cost-effectiveness.
  • Lead automation initiatives to streamline the management of on-prem and cloud environments, including infrastructure provisioning, deployment pipelines, monitoring, and incident resolution.
  • Ensure systems are highly available, resilient, and scalable, with proactive monitoring and incident response mechanisms across on-prem, hybrid, and cloud infrastructures.
  • Manage the challenges associated with globally distributed systems and teams, ensuring that systems are synchronized and can perform consistently across multiple regions.
  • Leverage Infrastructure as Code (IaC) tools like Terraform, Ansible, and CloudFormation to manage both cloud and on-prem infrastructure efficiently and consistently.
  • Collaborate with development teams to refine and improve CI/CD pipelines, ensuring seamless integration and delivery across a hybrid infrastructure environment.
  • Oversee incident management, ensuring that any issues across both cloud and on-prem infrastructure are resolved quickly.
  • Lead root cause analysis and postmortem activities to improve future reliability.
  • Lead and guide a team of App Hosting engineers across different geographies, ensuring effective communication, coordination, and collaboration with globally distributed teams.
  • Work closely with product, development, and infrastructure teams across different locations to design and implement solutions that meet the needs of our hybrid and global environments.
  • Provide mentorship to engineers, promoting a culture of continuous learning and fostering best practices in both hybrid and on-prem infrastructure management.
  • Oversee and prioritize initiatives that span multiple regions, ensuring that global infrastructure needs and timelines are met.
  • Define and implement processes to ensure the stability, scalability, and efficiency of systems across both cloud and on-prem environments.
  • Bachelor’s degree in Computer Science, Engineering, or related field (or equivalent work experience).
  • 10+ years of experience in SRE, DevOps, or Systems Engineering with expertise in managing hybrid and on-prem infrastructure in addition to cloud-based systems.
  • Strong leadership skills and experience managing globally distributed teams for at least 5+ years.
  • Expertise in cloud platforms (AWS, GCP, Azure) and container orchestration systems (Kubernetes, Docker).
  • Proficiency in programming/scripting languages such as Python, Go, Ruby, or Shell.
  • 5+ years of experience in hosting enterprise level applications on various platforms such as Windows, Linux, Cloud, OnPrem, etc.
  • Extensive experience with Infrastructure as Code (IaC) tools and automation frameworks (Terraform, Ansible, etc.).
  • Solid knowledge of monitoring and logging tools (Prometheus, Grafana, ELK Stack, Datadog, etc.).
  • Familiarity with CI/CD tools and processes (Jenkins, GitLab CI, etc.).
  • Strong experience with service-oriented architecture (SOA), microservices, and managing distributed systems across geographically diverse regions.
  • Excellent communication, interpersonal, and leadership skills with the ability to work effectively with remote and distributed teams.
  • Familiarity with VMs, Containers, networking, VPNs, and load balancing in hybrid environments.
  • Experience in the manufacturing industry or a similar domain.
  • Experience with SAP Datasphere and/or Databricks platform support and management.
  • Certifications in cloud platforms.
  • Medical, vision and dental coverage
  • 401k
  • Paid vacation
  • Holidays
  • Sick time
  • Discretionary performance-based bonus
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service