Senior DevOps / Platform Engineer

Adobe•San Jose, CA

About The Position

We are seeking a Senior DevOps / Platform Engineer in Developer Platform Org to help design, build, and operate highly scalable, secure, and resilient cloud platforms that power a global developer ecosystem. This role sits at the intersection of Cloud Engineering, Site Reliability Engineering (SRE), Security, and automation powered by modern AI technologies. The ideal candidate thrives in complex, distributed environments and is passionate about operational excellence, automation-first engineering, and building modern, intelligent infrastructure platforms. You will play a key role in shaping next-generation cloud architecture, improving reliability and developer experience, and integrating AI capabilities into operational workflows. This position is suited for a forward-thinking engineer who combines deep technical expertise with strong collaboration skills and a commitment to continuous improvement.

Requirements

Bachelor’s degree in Computer Science, Engineering, or equivalent practical experience.
7+ years of experience in DevOps, SRE, or Cloud Infrastructure Engineering.
Strong expertise in AWS and Azure cloud services.
Deep experience with Kubernetes, Docker, and containerized workloads.
Proven experience designing and operating microservices-based architectures.
Strong programming/scripting skills (e.g., Python, Go, or Node.js).
Hands-on experience with Infrastructure as Code (Terraform preferred).
Experience with CI/CD tools such as GitHub Actions, Jenkins, or ArgoCD.
Experience with monitoring and observability tools (e.g., Prometheus, Grafana, Splunk, New Relic).
Strong understanding of cloud security fundamentals and secure architecture principles.

Responsibilities

Cloud Platform Engineering Architect, build, and operate multi-cloud infrastructure (AWS and Azure).
Design highly available, fault-tolerant, multi-region systems.
Support containerized and serverless platforms using Kubernetes and modern cloud-native technologies.
Improve scalability, performance, and cost efficiency of shared platform services.
Ensure high availability and quality of service aligned with enterprise SLA targets (99.99%+).
Define and track service-level objectives (SLOs), reliability metrics, and key performance indicators.
Strengthen observability through logging, monitoring, alerting, and distributed tracing.
Lead incident response, root cause analysis, and continuous reliability improvements.
Develop and maintain runbooks and standard operating procedures.
Drive the productization of SRE capabilities by building reusable reliability frameworks, self-service tooling, and standardized operational practices across platform services.
Implement Infrastructure as Code (IaC) using Terraform or equivalent tools.
Enforce automated validation, policy controls, and configuration consistency.
Improve CI/CD pipelines and deployment automation.
Champion “automate everything” practices to reduce manual effort and operational risk.
Embed secure-by-design principles into infrastructure architecture and delivery
Implement secure networking, encryption, secret management, and access controls.
Partner with security teams to maintain compliance with enterprise and regulatory standards.
Proactively reduce risk through architecture reviews and security hardening.
Integrate AI and automation into operational workflows.
Explore intelligent incident triage, log analysis, and predictive monitoring.
Build tools that use modern AI and LLM capabilities to reduce operational toil
Contribute to the evolution of platform operations using AI-enabled automation and self-healing systems.
Partner closely with software engineering, product, and architecture teams.
Advise teams on cloud-native design patterns and onboarding to platform services.
Contribute to strong documentation practices and share operational knowledge across teams.
Participate in an on-call rotation and support a geographically distributed engineering team.