Sr. Software Engineer II (DevOps)

Hi MarleyBoston, MA
$119,000 - $221,000Hybrid

About The Position

Hi Marley is transforming how the P&C industry communicates, making challenging moments faster, easier, and more empathetic for carriers and their customers. We build AI-powered software that keeps everyone in the claims conversation informed and connected. We are looking for a Sr. Software Engineer II (DevOps) to help us build and scale the infrastructure that powers both our core platform and our rapidly growing agentic AI services. This role is at the intersection of cloud infrastructure, AI operations, and platform engineering, building the foundation for reliable enterprise-scale operations and deploying autonomous AI agents in regulated insurance workflows. The role also involves setting infrastructure standards, driving technical decisions, and mentoring less experienced engineers. This role involves joining us in the Boston office for 2-3 days each week.

Requirements

  • 6+ years of DevOps/SRE/Platform Engineering experience
  • 2+ years of experience building or operating AI/ML infrastructure (model serving, inference, LLM orchestration, or agentic systems)
  • Bachelor’s degree in Computer Science, Engineering, or equivalent experience
  • Experience building and operating infrastructure for traditional and AI or ML workloads at a SaaS company
  • Proven ability to lead technical conversations and guide infrastructure decisions
  • Deep experience with AWS cloud services (ECS, Lambda, SageMaker, Bedrock, S3, DynamoDB, Redshift, or equivalent)
  • Strong infrastructure-as-code skills with Terraform and understanding of state management, modules, and multi-environment configurations
  • Understanding of data infrastructure: pipelines, warehousing, ETL/ELT, and supporting analytics at scale
  • Focus on observability including data integrity, SLOs, error budgets, and silent failure detection
  • Experience with compliance-sensitive environments and understanding of audit trails, access governance, and change management
  • Comfort operating in a fast-moving environment with evolving AI capabilities and regulatory implications
  • Effective communication with both engineering and non-technical stakeholders
  • Track record of leading cross-team technical initiatives and mentoring engineers on infrastructure and operational best practices
  • Strong proficiency in at least one programming language (Python, Go, TypeScript, or similar)
  • Experience with container orchestration (ECS, EKS)
  • Experience with monitoring and observability platforms (Datadog, CloudWatch)

Nice To Haves

  • Data infrastructure (Redshift, or similar data warehousing; Airflow, dbt, Dagster or similar pipeline tools)
  • Experience in regulated industries (insurance, financial services, healthcare)
  • Genuine curiosity about AI and emerging technologies, paired with the judgment to apply them thoughtfully and responsibly

Responsibilities

  • Design and operate cloud infrastructure on AWS that supports both our core SaaS platform and our agentic AI services, ensuring reliability, scalability, and cost efficiency
  • Build and maintain AI/ML infrastructure and monitoring for LLM-powered agentic services
  • Establish and enforce infrastructure-as-code standards using Terraform, defining the patterns other engineers follow for environment parity, drift detection, and automated compliance validation
  • Implement observability beyond availability — data integrity monitoring, SLO frameworks with error budgets, and automated regression detection for both platform and AI services
  • Build deployment automation including pre-deployment verification, migration script validation, and codified rollback procedures to eliminate human-memory dependencies
  • Support big data infrastructure: data pipelines, warehousing (Redshift), and analytics tooling that enables reporting, BI, and AI training workflows
  • Implement security and compliance controls for AI workloads operating in regulated carrier environments — including audit logging, access governance, and configuration management
  • Drive environment parity across all infrastructure with automated drift detection and remediation
  • Improve disaster recovery capabilities: documented and rehearsed DR procedures, defined RTO/RPO by service tier, and tested recovery runbooks
  • Lead architecture reviews for new services, integrations, and AI agent deployments — partnering with engineering, product, and security to ensure infrastructure decisions are sound before they ship
  • Innovate on developer experience: reduce friction in testing environments, CI/CD pipelines, and local development workflows
  • Act as a technical anchor for infrastructure decisions across teams — providing clarity when requirements are ambiguous and helping the organization converge on consistent, scalable approaches

Benefits

  • Equity grants for all employees
  • A 4% matching 401(k) program
  • Medical, dental, vision, disability, and life insurance coverage for employees working 30+ hours per week
  • Monthly wellness stipend
  • Paid parental leave
  • A flexible vacation policy
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service