Senior DevOps Engineer - Edwin AI

LogicMonitorSan Francisco, CA
73dHybrid

About The Position

We are seeking a highly skilled Senior DevOps Engineer having 4+ years of experience to drive innovation, reliability, and security across our cloud infrastructure on the Edwin AI team at LogicMonitor. The ideal candidate has hands-on experience managing multi-cloud environments, automating infrastructure, and implementing modern DevOps practices that improve system performance, scalability, and cost efficiency.

Requirements

  • 4+ years of experience in DevOps or similar roles
  • Proven experience with AWS, Azure, and GCP in production environments.
  • Strong expertise in Infrastructure as Code practices.
  • Solid knowledge of Kubernetes (EKS), container orchestration, and cluster security.
  • Hands-on experience with Grafana, Prometheus, and alerting/monitoring systems.
  • Understanding of network connectivity over the private link endpoint, VPC, cross-account vpc connectivity, how to make things accessible internally, externally, etc.
  • Experience in deploying automated Canary and Integration testing pipelines, CI/CD pipeline etc.
  • Exposing internal self-hosted services like LangFuse via WebUI for internal users using Traefik or Ingress controller or any other tool.
  • Experience in deployment of LLM related solutions that require MCP, LangFuse, Airflow, GraphDB, VectorDB, Redis etc.
  • Experience working with developers on on-demand JIT access to Prod clusters to troubleshoot/debug issues with tools like Teleport or some other.
  • Strong background in cloud security, access management, and encryption.
  • Proficiency in Python and Bash scripting for automation.

Responsibilities

  • Expand and manage application hosting across AWS, Azure, and Google Cloud, ensuring performance, flexibility, and resilience.
  • Develop and maintain Terraform or similar installers for Azure and GCP to fully automate infrastructure deployments.
  • Design and implement AWS cost optimization strategies, including reserved instances, right-sizing, and resource efficiency initiatives.
  • Strengthen infrastructure security with robust access controls, encryption, monitoring, and alerting frameworks.
  • Build and enhance monitoring platforms with Grafana dashboards and Prometheus alerts for real-time performance insights and proactive issue resolution.
  • Implement Role-Based Access Control (RBAC) and optimize Ingress controllers (Traefik or similar) for enhanced security and delivery resilience.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Career Level

Senior

Industry

Professional, Scientific, and Technical Services

Number of Employees

1,001-5,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service