Senior DevOps/Platform Engineer

Jitsu

3d•$120,000 - $147,000•Remote

About The Position

Own key parts of the DevOps effort for a global engineering team and a platform supporting millions of transactions per day — and more than doubling annually. Drive improvement across our development process and tooling: source control, build, test, packaging, release, and deployment. Drive improvement in our infrastructure configuration, management, and cost efficiency. Drive enhancement of monitoring and observability across all infrastructure and services, for both availability and performance. Partner with other technical leaders to improve our architecture’s maintainability, scalability, and resilience. Serve as a technical lead to software and DevOps engineers on development and operations practices. Partner with other technical leaders to strengthen our information security posture. Contribute to the team’s development and operations standards and processes.

Requirements

CI/CD & release automation: Deep experience designing automated pipelines end to end — git-based source control and branching strategies, build and test automation (Jenkins, GitHub Actions, or similar), artifact and dependency management, and progressive, low-risk deployment patterns.
Infrastructure as Code: Strong, hands-on experience with Terraform (or similar) — reusable modules, state management, and managing multi-environment infrastructure (staging, beta, production) as code.
GitOps: Experience operating a GitOps workflow with ArgoCD (or similar) as the source of truth for declarative, auditable deployments.
Cloud infrastructure (GCP preferred): Strong experience building and operating cloud infrastructure at scale — compute, VPC networking, storage, message queues, serverless, DNS, load balancing, IAM, and logging.
Kubernetes operations: Production experience running and operating Kubernetes at scale — cluster lifecycle and upgrades, workload scheduling and resource management, autoscaling (HPA/cluster/event-driven), networking and ingress, and diagnosing complex cluster issues.
Helm: Authoring, configuring, and maintaining Helm charts for templated, repeatable application deployments across environments.
Databases at scale: Experience deploying, operating, and tuning a mix of data stores — relational (PostgreSQL / CloudSQL), NoSQL document (MongoDB), wide-column (Cassandra), and cache (Redis) — including replication, backups, scaling, and performance troubleshooting.
Reliability & monitoring: Demonstrated track record deploying and monitoring large-scale, mission-critical services — defining SLIs/SLOs, building actionable alerting, and driving incident response and blameless post-mortems.
Security: Solid grounding in cloud and infrastructure security — IAM, secrets management, network policy, and supply-chain hygiene.
Great communication and documentation skills
An obsession with automation and a desire to leave things better than you found them
A customer-first mindset and strong attention to detail
5+ years as a DevOps or Site Reliability Engineer

Nice To Haves

Java application release engineering.
A sense of humor
AI-driven SRE & DevOps mindset — familiarity with AIOps (anomaly detection, intelligent alerting, predictive scaling, automated remediation) and comfort applying AI/LLMs to simplify how we operate infrastructure, pipelines, and Kubernetes.
Building AI agents and automation that reduce operational toil — incident triage, runbook automation, log/RCA summarization, and AI-assisted CI/CD that moves us faster and toward self-healing infrastructure.

Responsibilities

Own key parts of the DevOps effort for a global engineering team and a platform supporting millions of transactions per day — and more than doubling annually
Drive improvement across our development process and tooling: source control, build, test, packaging, release, and deployment
Drive improvement in our infrastructure configuration, management, and cost efficiency
Drive enhancement of monitoring and observability across all infrastructure and services, for both availability and performance
Partner with other technical leaders to improve our architecture’s maintainability, scalability, and resilience
Serve as a technical lead to software and DevOps engineers on development and operations practices
Partner with other technical leaders to strengthen our information security posture
Contribute to the team’s development and operations standards and processes