We are seeking an early-career Business Systems Engineer with a strong foundation in Data Pipeline Management, supported by hands-on experience with dbt Core, SQL, and Databricks on AWS. This role is intended for candidates who already understand the fundamentals of building, deploying, and supporting Data Universe systems and want to apply those skills in a production data platform. You will work on Data Engineering workflows end-to-end, from preparing high-quality data to supporting Analytics which enabled business decisions making, with hands-on ownership. Data Pipeline Management Perform regular data validation and cleansing to ensure the accuracy, integrity, and reliability of datasets Identify and resolve data pipeline failures (debug data anomalies and issues using SQL and dbt test results) Build and maintain ETL/ELT processes to move data from various sources into data warehouses or lakes Write and optimize SQL transformations that support feature engineering and model training Setup data catalog, execute and monitor data and ML workloads using Databricks On-board data product owners to Data Universe platform Support AWS-based lakehouse architectures, primarily using Amazon S3 Setup IAM (Identity and Access Management) roles, permissions, and secure access patterns Troubleshoot and optimize cloud-based AI and data workflows Support batch and micro-batch processing using Spark Manage data governance and security access and discovery using Databricks Unity Catalog Enable AI-Ready Data Models Design and maintain high-performance Delta Lake pipelines using the Medallion Architecture (Bronze, Silver, and Gold) Apply dbt tests and documentation to ensure data quality for AI consumption Architect curated datasets that maintain strict alignment with upstream raw sources, ensuring a seamless and transparent flow of information from ingestion to consumption Execute code reviews and follow established dbt and SQL standards Build and maintain training and inference workflows on Databricks Prepare and validate feature datasets used by ML models, ensuring correctness, consistency, and timeliness Support LLM-enabled use cases, such as: embedding generation, semantic search, retrieval-augmented generation (RAG) Monitor model inputs and outputs for data quality issues and unexpected behavior Understand how upstream data changes affect model performance, stability, and bias.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Entry Level