Senior Data Engineer (Customer Data Products)

NBME•Philadelphia, PA

1d•$127,337 - $160,000•Hybrid

About The Position

NBME is seeking a Data Engineer to join a skilled team of data engineers and BI developers. This team has successfully launched and continues to enhance a data product on AWS for medical doctors. In this role, you will leverage your data engineering and problem-solving abilities to provide valuable insights to internal staff and external clients. The Data Engineer will be instrumental in modernizing, expanding, and optimizing NBME's data platform by constructing data lakes, intricate data integration pipelines, and scalable data solutions to support analytics, AI/ML, and strategic business decisions. This position will incorporate AI-assisted engineering practices to boost development efficiency and disseminate best practices for AI-assisted engineering throughout the IT department. While this role can be designated as remote, candidates have the flexibility to choose between working primarily remotely, in a hybrid model, or onsite. We are open to considering candidates located within 50 miles of our Philadelphia, PA office. Please be aware that onsite interviews and onboarding at our Philadelphia office may be required for this position. If this applies, advance notice will be given to facilitate planning. NBME is committed to continuous innovation and improvement in meeting the evolving needs of the healthcare community. This dedication begins and ends with our people. By attracting and empowering talented individuals from diverse disciplines and backgrounds, including those with varied life experiences, abilities, and perspectives, NBME adopts a well-informed and robust approach to advancing medical education and assessment for the future.

Requirements

Bachelor's Degree
At least 7 years of experience in application development (Internship experience does not apply)
At least 4 years of experience in big data technologies
At least 4 years' experience with cloud computing using AWS
4+ years of experience in application development including Python, SQL, Scala, or Java
4+ years' experience with Distributed data/computing tools (MapReduce, Hadoop, Hive, EMR, Kafka, Spark, MySQL etc.)
4+ year experience working on real-time data and streaming applications
4+ years of experience with NoSQL implementation (Mongo, Cassandra)
4+ years of data warehousing experience (Redshift)
6+ years of experience with UNIX/Linux including basic commands and shell scripting
7+ years of experience with Agile engineering practices
7+ years of experience with SQL optimization
4+ years of experience with PySpark
3+ year of experience with process orchestration including AirFlow, KubeFlow, AWS step functions, or Luigi

Nice To Haves

Proven experience implementing Generative AI, LLM data preparation pipelines, and Vector Databases (e.g., Pinecone, Milvus, pgvector).
Strong experience building and maintaining Feature Stores for machine learning models.
Experience building highly scalable, secure, and production-ready APIs and Data-as-a-Service (DaaS) platforms.
AWS Certified Data Engineer or AWS Certified Machine Learning – Specialty certifications.
3+ year of experience with Machine Learning
Experience with building a Data-as-a-service platform
Experience with building APIs

Responsibilities

Code, test, deploy, orchestrate, monitor, document, and troubleshoot cloud-based data engineering processes, feature stores, and vector databases in accordance with best practices and security standards throughout the development lifecycle.
Partner closely with data scientists, AI researchers, data and enterprise architects, and business stakeholders to identify, extract, clean, and format structured and unstructured data for AI/ML model training, fine-tuning, and feature extraction.
Lead evaluation, research, and experimentation efforts with batch and streaming data technologies, LLM data preparation frameworks, and MLOps tools to keep pace with industry innovation.
Act as a technical lead to showcase the capabilities of emerging AI and data technologies, enabling the widespread adoption of modern data techniques across the organization.
Significantly contribute to the definition and refinement of processes and procedures for the data engineering practice.
Educate and develop ETL developers on data engineering cloud-bases initiatives to enable transition to data engineer and practice.
Assures the integrity and accuracy of the corporate data, with particular attention to data security.
Responsible for ensuring high data quality for Data Services, Analytics and Master Data Management.
Helps coordinate technical solutions, takes responsibility for designs, development, testing and delivery of solutions.
Build automated, scalable, test-driven data pipelines.
Utilize software development practices such as version control via Git, CI/CD, and release management to enhance existing CI/CD pipelines in AWS.
Collaborate with Data Engineers, DevOps engineers and architects on improvement opportunities for DataOps tools and frameworks.