Senior DevOps Engineer - Edwin AI

LogicMonitorSan Francisco, CA
75d$120,230 - $165,330

About The Position

We love going to work and think you should too. Our team is dedicated to trust, customer obsession, agility, and striving to be better everyday. These values serve as the foundation of our culture, guiding our actions and driving us towards excellence. We foster a culture of performance and recognition, allowing us to transform growth as we enable our employees to do the best work of their careers. We are seeking a highly skilled Senior DevOps Engineer having 4+ years of experience to drive innovation, reliability, and security across our cloud infrastructure on the Edwin AI team at LogicMonitor. The ideal candidate has hands-on experience managing multi-cloud environments, automating infrastructure, and implementing modern DevOps practices that improve system performance, scalability, and cost efficiency.

Requirements

  • 4+ years of experience in DevOps or similar roles
  • Proven experience with AWS, Azure, and GCP in production environments.
  • Strong expertise in Infrastructure as Code practices.
  • Solid knowledge of Kubernetes (EKS), container orchestration, and cluster security.
  • Hands-on experience with Grafana, Prometheus, and alerting/monitoring systems.
  • Understanding of network connectivity over the private link endpoint, VPC, cross-account vpc connectivity, how to make things accessible internally, externally, etc.
  • Experience in deploying automated Canary and Integration testing pipelines, CI/CD pipeline etc.
  • Exposing internal self-hosted services like LangFuse via WebUI for internal users using Traefik or Ingress controller or any other tool
  • Experience in deployment of LLM related solutions that require MCP, LangFuse, Airflow, GraphDB, VectorDB, Redis etc.
  • Experience working with developers on on-demand JIT access to Prod clusters to troubleshoot/debug issues with tools like Teleport or some other.
  • Strong background in cloud security, access management, and encryption.
  • Proficiency in Python and Bash scripting for automation.

Responsibilities

  • Expand and manage application hosting across AWS, Azure, and Google Cloud, ensuring performance, flexibility, and resilience.
  • Develop and maintain Terraform or similar installers for Azure and GCP to fully automate infrastructure deployments.
  • Design and implement AWS cost optimization strategies, including reserved instances, right-sizing, and resource efficiency initiatives.
  • Strengthen infrastructure security with robust access controls, encryption, monitoring, and alerting frameworks.
  • Build and enhance monitoring platforms with Grafana dashboards and Prometheus alerts for real-time performance insights and proactive issue resolution.
  • Implement Role-Based Access Control (RBAC) and optimize Ingress controllers (Traefik or similar) for enhanced security and delivery resilience.

Benefits

  • Comprehensive health, dental and vision coverage
  • Generous parental leave policies
  • Access to our Employee Assistance Program and various Wellness programs
  • 401K with company matching
  • Learning and development stipend
  • Unlimited vacation policy
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service