About The Position

The Senior Engineer, DevOps/Platform Reliability is responsible for building and operating the infrastructure, pipelines, and platform standards that keep PayCargo's global payments platform reliable, observable, and supportable. The role spans the full platform – EC2-based services, scheduled jobs, and file processing alongside containerized (ECS/Fargate) and serverless (Lambda) workloads – across a multi-account AWS environment, Terraform, and a GitHub and ZenHub workflow that ships through GitHub Actions and GitHub OIDC, with a focus on modernizing how PayCargo builds, deploys, and runs software. As one example, PayCargo's SFTP runs on AWS Transfer Family with a Lambda identity provider. This is a hands-on individual contributor role. The Senior Engineer, DevOps/Platform Reliability modernizes legacy scheduled jobs and file processes into containerized, observable services, codifies infrastructure as repeatable Terraform patterns, and creates standards that other developers can follow without depending on a single person for every implementation. The role requires strong judgment, strong follow-through, and a focus on reducing reactive fire drills and single points of failure. Working within PayCargo's DevSecOps model, the Senior Engineer, DevOps/Platform Reliability partners closely with Security, Engineering, Architecture, Product, Support, and executive stakeholders to deliver scalable, secure, and repeatable platform execution. This position has no direct reports. The role leads indirectly by defining infrastructure and deployment standards, guiding engineers toward repeatable patterns, and reducing single points of failure across the platform.

Requirements

  • 5+ years of hands-on DevOps, platform, or infrastructure engineering experience preferred
  • Strong experience with AWS (ECS/Fargate, Lambda, VPC, IAM), and working knowledge of Azure or Entra ID
  • Hands-on experience with infrastructure-as-code using Terraform, including reusable modules, remote state, and plan/apply in CI
  • Strong experience with Docker and container orchestration such as ECS/Fargate and ECR
  • Experience building and maintaining CI/CD pipelines, preferably with GitHub Actions, including OIDC-based cloud authentication
  • Experience with monitoring and observability tooling such as CloudWatch, SNS, Sentry, and Athena/Glue
  • Strong understanding of secrets management (Secrets Manager, SSM Parameter Store), environment configuration, and secure deployment
  • Strong troubleshooting, incident response, and root cause analysis skills
  • Ability to create repeatable standards and documentation that reduce single points of failure
  • Bachelor's degree in Computer Science, Information Technology, Engineering, or a related field, or equivalent practical experience
  • Demonstrated experience operating production infrastructure and CI/CD in cloud environments
  • Experience with containerization, infrastructure-as-code, and observability tooling

Nice To Haves

  • Experience modernizing legacy scheduled jobs and file-processing workloads into containerized services
  • Experience operating both EC2-based services and containerized or serverless workloads
  • Experience with disaster recovery, multi-region (us-east-1 / ap-east-1) redundancy, and failover design
  • Familiarity with secure AI/LLM platform patterns, whitelisted egress, and bounded environments
  • Experience with on-call workflows and tooling such as PagerDuty
  • Familiarity with zero-trust network access (Tailscale) and SSM Session Manager in place of bastion hosts
  • Experience in payments, fintech, SaaS, or other high-volume transactional environments
  • Familiarity with SOC and PCI control requirements as they relate to infrastructure

Responsibilities

  • Modernize legacy scheduled jobs, cron scripts, and file processes into containerized (ECS/Fargate), observable, supportable services
  • Build and maintain infrastructure patterns in Terraform, with reusable modules, remote state, and plan/apply through CI
  • Standardize environment configuration, secrets management (Secrets Manager and SSM Parameter Store), and repeatable deployment paths across environments and accounts
  • Create platform standards that other developers can follow without depending on DevOps for every implementation
  • Build, maintain, and harden CI/CD pipelines integrated with GitHub and ZenHub, with deployments authenticated through GitHub OIDC to eliminate static cloud credentials
  • Improve build, test, and deployment automation to make releases faster, safer, and more repeatable
  • Establish rollback, promotion, and environment-promotion practices that reduce release risk
  • Embed security and quality gates into pipelines in partnership with Security and Engineering
  • Implement and maintain monitoring, logging, and alerting using CloudWatch, SNS, and Sentry, with log analytics through Athena and Glue
  • Improve telemetry, dashboards, and on-call workflows (PagerDuty) so issues are detected and resolved quickly
  • Support disaster recovery, backup, and failover patterns across regions and accounts
  • Lead incident response and root cause analysis with clear, durable follow-up
  • Support the infrastructure for a contained AI platform, including whitelisted egress and approved deployment paths
  • Help operationalize controls such as stateless model access and bounded environments in partnership with Security and Architecture
  • Build deployment and monitoring patterns for AI-assisted applications so they are observable and supportable
  • Partner with Security to embed controls into pipelines, environments, and infrastructure-as-code, including OIDC roles, least privilege, mTLS, and Tailscale-based access
  • Work with Engineering and Architecture to translate designs into runnable, supportable infrastructure
  • Advise Product and Support on operational realities, trade-offs, and delivery risk
  • Implement and operate the infrastructure, pipelines, and environments according to the standards and architecture owned by the VP, Infrastructure & Security
  • Provide clear status, escalate risks early, and document infrastructure, pipelines, and runbooks

Benefits

  • competitive salary and bonus plan
  • vacation, sick, personal time off policies
  • a generous 401K match
  • strong healthcare benefits
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service