Data Engineer

Lakeview Loan Servicing•New York, NY

19h•Remote

About The Position

The Data Engineer on the Nebula team plays a critical role in building and evolving the data foundation that powers analytics, reporting, AI development, and operational decision-making across the organization. This role is responsible for designing, building, and maintaining reliable, scalable, and flexible data systems that support a wide range of internal and external use cases. Working across data ingestion, transformation, storage, modeling, and delivery, this individual partners closely with Product, Engineering, AI, Analytics, and domain Subject Matter Experts (SMEs) to translate complex business processes and data needs into production-ready data pipelines and platforms. This role contributes to the development and evolution of core data capabilities, including batch and real-time pipelines, operational and analytical data stores, semantic models, and BI-ready datasets. Success requires strong technical depth across modern data tooling, sound systems thinking, and the ability to build reliable solutions in a cloud-based, regulated, high-stakes environment. The Data Engineer is expected to operate effectively in a modern engineering environment, using automation, observability, and infrastructure-as-code practices to deploy, manage, and improve data pipelines and data platforms. In parallel, this individual will help enable downstream analytics, reporting, product capabilities, and AI systems by ensuring that data is trustworthy, accessible, and fit for purpose.

Requirements

2-4+ years of experience building and operating production-grade data pipelines and data systems
Strong experience with industry-standard tools and platforms for ETL/ELT, orchestration, data warehousing, streaming, and BI
Experience working with both OLTP and OLAP systems, with a strong understanding of the tradeoffs between transactional and analytical workloads
Experience building flexible data pipelines that integrate with many different source and destination types, including databases, APIs, files, message queues, SaaS platforms, and event streams
Experience supporting both batch and real-time data processing patterns
Experience deploying and operating data infrastructure on major cloud platforms such as AWS, GCP, or Azure
Strong SQL skills and experience with data modeling, transformation frameworks, and performance optimization
Experience building AI-powered capabilities on top of LLMs, including orchestration, evaluation, and data integration patterns
Experience with modern programming languages commonly used in data engineering, such as Python, Java, Scala, or Go
Comfort working with CI/CD, infrastructure-as-code, observability, and production operations for data systems
Strong judgment in ambiguous environments where requirements evolve and systems must balance speed, reliability, and flexibility
Clear communication skills with both technical and non-technical teammates

Nice To Haves

Experience with modern orchestration and transformation tools such as Airflow, Dagster, dbt, or similar platforms
Experience with cloud-native data warehouses or lakehouse platforms such as Snowflake, BigQuery, Redshift, Databricks, or equivalent technologies
Experience with streaming and real-time data platforms such as Kafka, Kinesis, SQS, or similar systems
Experience enabling BI and self-service analytics through curated datasets, semantic layers, and reporting platforms such as Looker, Power BI, Tableau, or similar tools
Experience in fintech, mortgage, lending, payments, insurance, or other regulated domains
Experience building data platforms that support AI, machine learning, or decisioning workflows
Experience improving data quality, reliability, cost efficiency, and platform scalability as a system grows

Responsibilities

Design, build, and maintain robust data pipelines for a wide variety of input and output sources, including internal systems, third-party platforms, files, APIs, event streams, and databases
Develop scalable ETL and ELT workflows for both batch and real-time processing
Ensure pipelines are reliable, testable, observable, and easy to extend as business needs evolve
Build reusable data integration patterns that support growing volumes, new source systems, and downstream consumers across analytics, applications, and AI initiatives
Design and manage data architectures that support OLTP, OLAP, and reporting workloads across operational and analytical environments
Build and optimize data models, warehouse schemas, and curated datasets for analytics and BI use cases
Contribute to the design and operation of modern data platforms, including warehouses, lakehouses, streaming systems, and supporting orchestration frameworks
Help define patterns for data storage, partitioning, performance optimization, retention, and lifecycle management
Deploy, operate, and improve data pipelines and data stores on major cloud platforms such as AWS, GCP, or Azure
Use infrastructure-as-code, CI/CD, and automation practices to improve deployment speed, consistency, and reliability
Monitor production data systems using logging, alerting, and observability tooling to proactively identify and resolve issues
Support secure, resilient, and cost-conscious operation of cloud-based data infrastructure
Implement data quality checks, validation rules, reconciliation processes, and monitoring to ensure trustworthy data across systems
Establish and maintain standards for lineage, documentation, metadata, schema evolution, and operational runbooks
Partner with stakeholders to improve data accessibility, consistency, and usability while maintaining appropriate controls and governance
Contribute to practices that support security, privacy, auditability, and compliance in a regulated environment
Partner closely with Product, Engineering, and business stakeholders to understand data needs, workflows, and constraints
Translate business and operational requirements into clean, scalable, and maintainable data solutions
Support downstream consumers of data, including analysts, researchers, product teams, and operational users
Communicate clearly with both technical and non-technical stakeholders about data availability, quality, tradeoffs, and delivery timelines
Continuously improve pipeline performance, reliability, scalability, and developer productivity
Identify opportunities to simplify architecture, reduce operational toil, and improve data platform leverage across teams
Operate with a strong bias toward action and iterative delivery, moving quickly from problem definition to implementation and improvement
Help raise the bar on engineering quality through thoughtful design, testing, documentation, and operational discipline