Crane Worldwide Logistics-posted 3 months ago
Full-time • Mid Level
Houston, TX
1,001-5,000 employees
Professional, Scientific, and Technical Services

The position involves monitoring and operating data pipelines using a stack that includes HVR, Prefect, dbt, Snowflake, Materialize, and Kafka. The role requires responding to alerts, leading incident workflows, diagnosing data quality issues, and ensuring operational readiness. Additionally, the candidate will maintain and enhance runbooks, implement observability, optimize multi-cloud infrastructure, and provide support for BI developers and analysts. The role also includes contributing to SLA dashboards and defining on-call schedules for 24/7 support.

  • Monitor and operate pipelines on top of a stack including HVR, Prefect, dbt, Snowflake, Materialize, and Kafka
  • Respond to alerts and lead incident workflows using incident.io and Jira
  • Diagnose and resolve data quality issues, failed connectors/jobs, and SLA degradations
  • Establish clear escalation paths and backup coverage to drive 24/7 operational readiness
  • Troubleshoot data quality issues, SLA degradations, and failed jobs or connectors
  • Maintain and enhance runbooks for Snowflake, Materialize, Prefect, and Kafka workloads
  • Implement observability with Grafana (monitoring, alerting, logs, and metrics)
  • Optimize cost and performance across multi-cloud infrastructure (Azure, AWS, Snowflake)
  • Improve incident detection and recovery through better alerting and automation
  • Provide Tier-2/Tier-3 support for BI developers, analysts, and client-facing integrations
  • Manage user access, role-based controls, and onboarding for shared environments
  • Partner with engineering and analytics teams to prevent recurring issues
  • Contribute to the rollout of SLA dashboards for Snowflake and Kafka
  • Champion use of incident.io for all P1/P2 incident coordination and documentation
  • Help define the on-call schedule and coverage model for 24/7 support
  • Experience operating and troubleshooting data systems such as Snowflake, Materialize, or Kafka
  • Familiarity with orchestration frameworks (Prefect, Airflow) and CI/CD practices for data (dbt, Terraform, GitHub Actions)
  • Hands-on use of monitoring/observability tools (Grafana, Datadog, Prometheus, or similar)
  • Strong incident management background, including on-call rotations and post-incident reviews
  • Knowledge of cloud infrastructure (AWS and/or Azure)
  • Proficiency in Python or Bash for automation and debugging
  • High School Diploma required
  • 8+ years' data engineering experience
  • Bachelor's degree in Information Technology preferred
  • Professional certification may be required in some areas
  • Quarterly Incentive Plan
  • 136 hours of Paid Time Off which equals 17 days for the year, that can be used for Sick Time or for Personal Use
  • Excellent Medical, Dental and Vision benefits
  • Tuition Reimbursement for education related to your job
  • Employee Referral Bonuses
  • Employee Recognition and Rewards Program
  • Paid Volunteer Time to support a cause that is close to your heart and contributes to our communities
  • Employee Discounts
  • Wellness Incentives that can go up to $100 per year for completing challenges, in addition to a discount on contribution rates
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service