Senior DevOps & Infrastructure Lead

Rv LifeDallas, TX
Remote

About The Position

RV LIFE is looking for a Senior DevOps & Infrastructure Lead to help us stabilize, document, and modernize the infrastructure behind our products. This is a hands-on senior role for someone comfortable inheriting real production systems, reducing operational risk, improving reliability, and moving us toward a documented, secure, automated, infrastructure-as-code operating model. We run production across DigitalOcean, AWS, Cloudflare, and other hosting providers, and are consolidating onto managed, infrastructure-as-code platforms. We need deep, hands-on expertise across these environments. RV LIFE is an AI-first engineering organization. We expect this person to use AI to accelerate discovery, documentation, runbooks, log review, scripting, and infrastructure-as-code drafting, while applying strict human judgment around security, secrets, production access, destructive commands, rollback, and correctness. This role focuses on the infrastructure path to reliability; application-level architecture changes are handled in partnership with our engineering team. It is not just about keeping servers alive. It is about building durable practices that reduce single-person dependency, improve visibility, and make our systems safer to operate. This is not a standard 9-to-5 role. Production issues do not keep business hours, so it carries real on-call responsibility: you need to be reachable and able to respond when unforeseen incidents arise.

Requirements

  • Senior-level experience operating production infrastructure.
  • Deep, hands-on Linux server administration (the traditional, "old-school" kind) : operating, securing, and troubleshooting manually managed production servers (LAMP/ LEMP , system services, cron, networking, SSH) directly at the command line, not only through a cloud console.
  • Experience with DigitalOcean, Linode, AWS EC2, bare VPS hosting, or comparable environments.
  • Senior database operations : migrating self-managed MySQL to a managed service, replication, backup validation, restore testing, and IO isolation.
  • Strong Cloudflare across DNS, WAF, CDN and caching behavior, page rules, Workers, Pages, and Zero Trust/Access, including traffic routing and origin protection.
  • PHP/Laravel application environments, and experience with a managed Laravel runtime (Laravel Cloud and/or DigitalOcean App Platform).
  • Datadog or a comparable observability platform for monitoring, alerting, dashboards, logs, and incident investigation.
  • Infrastructure-as-code such as Terraform, Pulumi, AWS CDK, Serverless Framework, or CloudFormation.
  • CI/CD pipelines and deployment automation.
  • Practical AWS experience (Lambda, IAM, VPC, CloudWatch, S3, SSM /Secrets Manager, queues).
  • Good judgment around production safety, access control, secrets, backups, and incident response.
  • Willingness to carry real on-call responsibility and respond to production incidents outside normal business hours; this is not a strict 9-to-5 role.
  • A habit of documenting what you learn and creating runbooks others can follow.
  • Practical experience using AI tools (ChatGPT, Claude, Cursor, GitHub Copilot, or similar), with strong judgment about where human verification is required.
  • Ability to work independently in a small, remote engineering organization where practical ownership matters more than bureaucracy.

Nice To Haves

  • Experience migrating manually managed services onto managed platforms or IaC.
  • Experience moving static frontends onto Cloudflare Pages.
  • Managed migrations for MongoDB, OpenSearch, or Valkey/Redis.
  • Experience supporting Node.js, React, and React Native alongside PHP.
  • Experience helping organizations reduce infrastructure bus-factor risk.
  • Experience working with external DevOps/security partners or auditors.

Responsibilities

  • Administer and improve existing DigitalOcean infrastructure.
  • Support and improve Linux-based production server environments.
  • Migrate self-managed databases onto managed database services, with validated failover, backups, and recovery.
  • Move applications onto managed runtimes (including Laravel Cloud where it fits), replacing manual deploy processes with automated, repeatable pipelines.
  • Expand and harden our use of Cloudflare for edge, static hosting, caching, and security.
  • Build a clear inventory of servers, services, databases, domains, access paths, backups, monitoring, and operational risks.
  • Create and maintain practical runbooks for common and emergency infrastructure workflows.
  • Improve incident response, escalation paths, monitoring, logging, and alerting.
  • Review and improve backup, restore, and disaster-recovery procedures.
  • Identify recurring manual work and convert it into safer procedures, scripts, automation, or infrastructure-as-code.
  • Help define infrastructure-as-code standards and move appropriate infrastructure into repeatable, version-controlled workflows.
  • Work with AWS services where needed (Lambda, VPC, IAM, CloudWatch, S3, SSM /Secrets Manager, queues).
  • Use AI tools to accelerate discovery, documentation, scripting, troubleshooting, and automation, with strong production-safety judgment.
  • Partner with engineering leadership to prioritize infrastructure risk and modernization; track work clearly in Jira/GitHub and communicate proactively about risks, tradeoffs, and blockers.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service