Data Reliability Engineer/Software Engineer

Red RiverLowell, MA
Onsite

About The Position

Red Hat Core Business Platforms is seeking a Data Reliability Engineer (DRE) to help build and operate a reliable, AI-ready data platform that powers both business operations and advanced AI initiatives. This role is focused on ensuring that data is trustworthy, observable, and resilient by design. You will work across data pipelines, data products, and AI systems to embed reliability as a first-class concern, enabling both human users and AI agents to confidently consume and act on data.

Requirements

  • 2+ years of experience in Data Engineering, Software Engineering, SRE, or DRE.
  • Strong SQL skills and proficiency in Python, Java, or Scala.
  • Experience with modern data platforms (Snowflake, Databricks, BigQuery, S3/MinIO).
  • Solid understanding of: Data modeling (dimensional, data vault, domain-driven design), Batch and streaming architectures.
  • Hands-on experience with: Data quality and observability frameworks, SLIs/SLOs and production operations.
  • Experience with CI/CD, testing, and version control.
  • Familiarity with Docker, Kubernetes, and cloud-native systems.
  • Strong communication skills and ability to drive a culture of reliability and ownership.

Nice To Haves

  • Experience with AI/ML systems in production (feature stores, model reliability).
  • Familiarity with agentic frameworks or AI agents (e.g., LangChain, MCP integrations).
  • Experience with streaming technologies (Kafka, Spark Streaming).
  • Experience with data governance platforms (catalogs, lineage).
  • Background in SRE/DRE/AIRE operating models.
  • Master’s degree in Computer Science, Engineering, or related field.

Responsibilities

  • Own Data Reliability End-to-End: Define and operate data SLIs/SLOs (freshness, completeness, accuracy, availability). Build automated data quality, anomaly detection, and observability frameworks. Implement proactive alerting, incident response, and root cause analysis for data issues. Continuously improve system reliability through post-incident reviews and systemic fixes.
  • Build and Evolve Reliable Data Products: Develop and maintain high-quality, production-grade data products (code + data). Transition pipelines toward efficient, scalable ELT architectures, including real-time capabilities. Enforce separation of source-aligned vs. aggregate data products to support domain ownership and governance. Ensure data products are composable, joinable, and reusable across the organization.
  • Design AI-Ready, Agentic Data Systems: Build data products optimized for ML models and AI agents, including: Feature consistency and reuse, Data versioning and lineage, Reproducibility for model training and inference. Enable agentic workflows (ADLC) by: Defining data contracts and logic in machine-readable specs (MD/DSL), Ensuring data products are self-describing and agent-consumable.
  • Operationalize Data + AI Workloads: Partner with data scientists to productionize models and AI agents with strong reliability guarantees. Deploy and operate data and AI services on MCP (Microservices, Containers, Platforms) using Kubernetes. Ensure systems meet high availability, scalability, and low-latency requirements.
  • Governance, Security, and Trust: Classify and tag data assets to enforce responsible usage and compliance. Apply masking, access controls, and RLS at the data product layer. Maintain rich metadata and catalog entries to support discoverability and reuse.
  • Enable InnerSource & Platform Adoption: Contribute to a high-velocity InnerSource model for data product development. Promote best practices in reliability engineering across teams. Drive adoption through well-documented, discoverable, and trusted data products.

Benefits

  • Comprehensive medical, dental, and vision coverage
  • Flexible Spending Account - healthcare and dependent care
  • Health Savings Account - high deductible medical plan
  • Retirement 401(k) with employer match
  • Paid time off and holidays
  • Paid parental leave plans for all new parents
  • Leave benefits including disability, paid family medical leave, and paid military leave
  • Additional benefits including employee stock purchase plan, family planning reimbursement, tuition reimbursement, transportation expense account, employee assistance program, and more!
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service