Senior ML Ops Engineer

Remitly•Philadelphia, CT

About The Position

This team powers Elsevier’s Health platforms: Clinical Key AI, Sherpath AI, and AI-driven automated clinical and content workflows. You will bridge Data Science and Engineering to turn experimental NLP/IR/GenAI models into secure, reliable, and scalable services. Our systems operate over one of the world’s largest medical and scholarly landscapes. As a Senior Machine Learning Engineer, you’ll work on AI-based features (GenAI, Agentic AI, RAG, etc.), search/ranking quality, and knowledge graph aware retrieval while enforcing content rights and editorial confidentiality.

Requirements

Current experience in ML Engineering, MLOps platforms, shipping ML or search/GenAI systems to production.
Hands-on experience with major cloud vendor solutions (AWS, Azure and/or Google).
Experience with Search/vector/graph technologies (e.g., Elasticsearch / OpenSearch / Solr / Neo4j).
Experience in evaluating LLM models.
A strong understanding of the Data Science Life Cycle including feature engineering, model training, and evaluation metrics.
Familiarity with ML frameworks, e.g., PyTorch, TensorFlow, PySpark.
Experience with large-scale data processing systems, e.g., Spark.
Experience with statistical analysis, machine learning theory and natural language processing.

Nice To Haves

Strong Python, Java, and/or Scala experience will be considered a plus.
Background in health technology and/or medical content workflows is preferred.

Responsibilities

Automate and orchestrate machine learning workflows across major cloud and AI platforms (AWS, Azure, Databricks, and foundation model APIs such as OpenAI).
Maintain and version model registries and artifact stores to ensure reproducibility and governance.
Develop and manage CI/CD for ML, including automated data validation, model testing, and deployment.
Implement ML Engineering solutions using popular MLOps platforms such as AWS SageMaker, MLflow, Azure ML.
Scale end-end custom Sagemaker pipelines.
Design and implement the engineering components of GAR+RAG systems (e.g., query interpretation and reflection, chunking, embeddings, hybrid retrieval, semantic search), manage prompt libraries, guardrails and structured output for LLMs hosted on Bedrock/SageMaker or self-hosted.
Design and implement ML pipelines that utilize Elasticsearch/OpenSearch/Solr, vector DBs, and graph DBs.
Build evaluation pipelines: offline IR metrics (NDCG, MAP, MRR), LLM quality metrics (faithfulness, grounding), and A/B testing.
Optimize infrastructure costs through monitoring, scaling strategies, and efficient resource utilization.
Stay current with the latest GAI research, NLP and RAG and apply the state-of-the-art in our experiments and systems.
Partner with Subject-Matter Experts, Product Managers, Data Scientists and Responsible AI experts to translate business problems into cutting edge data science solutions.
Collaborate and interface with Operations Engineers who deploy and run production infrastructure.