Principal DevOps Engineer - USA (B)

BMC Software•Santa Clara, CA

27d•Hybrid

About The Position

BMC empowers nearly 80% of the Forbes Global 100 to accelerate business value, faster than humanly possible. Our industry-leading portfolio unlocks human and machine potential to drive business growth, innovation, and sustainable success. BMC does this in a simple and optimized way by connecting people, systems, and data that power the world's largest organizations so they can seize a competitive advantage. We are looking for a Principal DevOps Engineer to help build and operate our next-generation Agentic-AI Data Management platform from 0-1. This is a hands-on, delivery-focused role for a senior/principal engineer who thrives in early-stage environments, owns reliability and automation end-to-end, and takes pride in running production systems used by external enterprise customers. You will work alongside an established architect and senior product engineers, shape platform and operational architecture, and spend a significant portion of your time designing, building, and evolving the cloud, CI/CD, and runtime foundations of the product. Here is how, through this exciting role, YOU will contribute to BMC's and your own success: Platform & DevOps Engineering (Primary Focus) - Design, build, and operate the core cloud and Kubernetes-based platform that underpins a 0-1 data automation and management product, taking infrastructure and operational capabilities from concept through production. Hands-on Automation - Write production-grade automation in Python, Go, or similar languages to eliminate manual work across provisioning, deployment, scaling, monitoring, and incident response. Cloud & Kubernetes Architecture - Design and evolve Kubernetes-based platforms using Docker, Helm, and cloud-native services, balancing speed of delivery with long-term operability and cost control. SRE & Reliability Practices - Establish and enforce SRE best practices including SLIs/SLOs, alerting strategies, error budgets, incident management, and post-incident reviews to ensure enterprise-grade reliability. CI/CD & Release Engineering - Build and maintain robust CI/CD pipelines (e.g., GitHub Actions, Jenkins) to support frequent, safe, and repeatable deployments across multiple environments. Security & Compliance Enablement - Manage cloud environments in accordance with company security guidelines, embedding security, compliance, and access controls directly into infrastructure and pipelines. Operational Tooling - Build and maintain internal tools, services, and automation that support deployment, observability, debugging, and operational excellence while reducing human error. Integration & Cloud Enablement - Support deployments across AWS including integrations with enterprise systems and geographically redundant, highly available services. Product & Engineering Collaboration - Work closely with product engineering teams to design operable systems, influence architectural decisions, and ensure production realities inform development choices early. Founder-Level Ownership - Act with strong ownership: identify operational gaps, propose pragmatic solutions, and move work forward without waiting for perfect requirements or ideal conditions.

Requirements

10+ years of professional engineering experience, including building, deploying, and operating enterprise B2B systems in production.
Strong experience designing and operating cloud-native platforms on one or more major cloud providers (AWS, Azure, GCP).
Deep hands-on experience with Kubernetes, Docker, Helm, and microservice-based architectures in real production environments.
Strong automation skills using Python, Go, or similar languages, with a bias toward eliminating manual operational work.
Extensive experience with CI/CD pipelines, source control (Git/GitHub), and release engineering practices.
Strong Linux/Unix systems knowledge and experience operating distributed systems at scale.
Solid understanding of networking, security, and cloud infrastructure fundamentals.
Experience with observability tooling (metrics, logging, tracing) and production debugging.
Comfort operating in ambiguous, startup-style environments where DevOps engineers are expected to lead, not just support.
Familiarity with configuration management and automation tools (e.g., Puppet, Chef, or modern equivalents).

Nice To Haves

Experience operating large-scale data platforms and data orchestration systems from an SRE/DevOps perspective.
Familiarity with AI/ML-enabled platforms, including how LLM-driven systems impact reliability, cost, and observability.
Strong exposure to open-source technologies and tooling across the DevOps ecosystem.

Responsibilities

Design, build, and operate the core cloud and Kubernetes-based platform that underpins a 0-1 data automation and management product, taking infrastructure and operational capabilities from concept through production.
Write production-grade automation in Python, Go, or similar languages to eliminate manual work across provisioning, deployment, scaling, monitoring, and incident response.
Design and evolve Kubernetes-based platforms using Docker, Helm, and cloud-native services, balancing speed of delivery with long-term operability and cost control.
Establish and enforce SRE best practices including SLIs/SLOs, alerting strategies, error budgets, incident management, and post-incident reviews to ensure enterprise-grade reliability.
Build and maintain robust CI/CD pipelines (e.g., GitHub Actions, Jenkins) to support frequent, safe, and repeatable deployments across multiple environments.
Manage cloud environments in accordance with company security guidelines, embedding security, compliance, and access controls directly into infrastructure and pipelines.
Build and maintain internal tools, services, and automation that support deployment, observability, debugging, and operational excellence while reducing human error.
Support deployments across AWS including integrations with enterprise systems and geographically redundant, highly available services.
Work closely with product engineering teams to design operable systems, influence architectural decisions, and ensure production realities inform development choices early.
Act with strong ownership: identify operational gaps, propose pragmatic solutions, and move work forward without waiting for perfect requirements or ideal conditions.