Cloud Platform Lead Engineer (ML DevOps)

AllstateMcCullom Lake, IL
1d$110,000 - $160,000

About The Position

At Allstate, great things happen when our people work together to protect families and their belongings from life’s uncertainties. And for more than 90 years, our innovative drive has kept us a step ahead of our customers’ evolving needs. From advocating for seat belts, air bags and graduated driving laws, to being an industry leader in pricing sophistication, telematics, and, more recently, device and identity protection. Job Description The Platform Engineer is a full stack engineer who builds the cloud application development and hosting platforms for Allstate. This role will have the primary accountability of owning, developing, implementing and operating Allstate’s Cloud platforms. This role will also encompass developing, building, administering, and deploying self-service tools that enable Allstate developers to build, deploy and operate cloud native applications. As a Platform Engineer Lead Consultant, they will be a team anchor for engineering teams primarily working in a paired programming team, collaborating with different team members. They anchor the shared and collective knowledge of the team about the system they own and manage. They will lead a team in engineering and building out new platform products and services. They will lead the team and ensure that SLAs are met in executing operational tasks to maintain the platform and servicing customer requests. They will lead the engineering practices to improve the efficiency of the system itself, and the efficiency of the team in running said system. They will serve as a team anchor and establish practices for paired programming, test driven development, infrastructure engineering, and continuous delivery on the team.

Requirements

  • Proven experience leading cloud platform or infrastructure initiatives.
  • Strong hands-on experience with cloud platforms (Azure, AWS, and/or GCP).
  • Deep knowledge of infrastructure as code, automation, CI/CD, and reliability engineering.
  • Experience designing highly available and resilient distributed systems.
  • Experience with ML platforms or MLOps tooling (e.g., MLflow, Kubeflow, Azure ML, SageMaker, Vertex AI).
  • Familiarity with observability tools (e.g., Datadog, ELK, New Relic, Prometheus).
  • Ability to influence technical direction and collaborate across teams.
  • Strong communication skills and a leadership mindset.
  • 6 or more years of experience (Preferred)

Responsibilities

  • Lead the design, build, and operation of cloud infrastructure supporting ML experimentation, training, and production deployments.
  • Define technical direction and best practices for ML platforms, MLOps, reliability, and cloud infrastructure.
  • Architect ML platforms for high availability, fault tolerance, and resiliency across supported environments.
  • Establish and own observability standards, including metrics, logging, tracing, alerting, and SLOs for ML platforms.
  • Build and oversee CI/CD pipelines and automation for infrastructure and ML workflows.
  • Drive infrastructure-as-code, automation, and reliability standards across the platform.
  • Proactively monitor, troubleshoot, and improve platform availability, performance, scalability, and recovery.
  • Champion MLOps best practices including model versioning, validation, promotion, monitoring, and rollback strategies.
  • Ensure platform security, compliance, and cost optimization.
  • Partner with data scientists, ML engineers, and product teams to deliver reliable, self-service ML capabilities.
  • Mentor engineers through design reviews, code reviews, and hands-on technical leadership.
  • Contribute to roadmap planning, prioritization, and execution aligned with business and customer needs.
  • Participate in agile ceremonies and drive continuous improvement across the team.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service