Senior Data Engineer - Anywhere Cloud

Cloudera•San Jose, CA

1d•Hybrid

About The Position

At Cloudera, we empower people to transform complex data into clear and actionable insights. With as much data under management as the hyperscalers, we're the preferred data partner for the top companies in almost every industry. Powered by the relentless innovation of the open source community, Cloudera advances digital transformation for the world’s largest enterprises. The Anywhere Cloud (AWC) team is building Cloudera’s next-generation unified control plane, moving beyond traditional UI-driven workflows to an "AI-First" architecture. AWC enables the deployment of Data Services (like Spark, Trino, and Cloudera AI) across hybrid and multi-cloud environments, orchestrating complex Kubernetes infrastructures, foundational services (Service Mesh, Auth, Logging), and data engines. As a Sr. Data Engineer, you will write automation and tools to validate Cloudera certified data pipelines. You will own the test strategy for designing, building, and executing custom data pipelines, also known as Blue Prints. You will leverage your deep domain expertise in data ecosystem engines like Spark, Kafka, Apache Polaris, Trino, Airflow, and Lakehouse architectures to validate end-to-end use cases via custom blueprints. Your work will directly guarantee the functioning of the data pipeline for relevant use cases.

Requirements

AI First Mindset : Ability to learn and develop AI enabled test automation frameworks.
Engine SME Expertise : Hands-on understanding of modern compute and streaming engine internals like Spark, Kafka, Trino, Airflow
Kubernetes Expertise: Understanding of Kubernetes internals (CRDs, Controllers, Operators, Namespaces). You must understand how to debug and test complex Helm chart deployments and dependencies.
Language Proficiency: Expert-level proficiency in Python/Shell for scripting and automation.
Education: Bachelor’s or Master’s degree in Computer Science or equivalent experience.
Experience: 8+ years of software engineering experience with a focus on test automation, infrastructure, or backend development

Responsibilities

Design and execute test plans validating the end-to-end cluster creation flow on a kubernetes platform.
Managing complex data modeling and schema drift, as well as embedding automated data quality checks and statistical anomaly detection directly into pipelines to shift away from reactive, manual quality processes.
Working with governance layers to ensure policies like tag-driven Attribute-Based Access Control (ABAC), column-level masking, row-level filters, and zero-code lineage ingestion (e.g., Octopai) are accurately enforced at the data layer.