Senior Software Engineer, Machine Learning

Point72•New York, NY

77d•$175,000 - $250,000

About The Position

On the Knowledge Graph Intelligence team, you’ll work alongside product managers, engineers, and data scientists to build the next generation of intelligent systems through graph technology. We’re a team of experts who experiment and work to discover new ways to harness open-source solutions, modern cloud architectures, and sophisticated Artificial Intelligence (AI) solutions, while embracing enterprise agile methodologies. Our commitment to building and innovating in the AI space provides the framework intended to drive smarter decision-making and enhance how we build and operate our platforms and applications. In this data-heavy role, you will design and build mission-critical infrastructure that powers our machine learning lifecycle, from large-scale data processing and feature engineering to model training, real-time deployment, and monitoring. Specifically, you will:

Requirements

5+ years of experience in a software, data, or ML engineering role.
Strong proficiency in SQL.
Experience building and orchestrating data pipelines using tools such as Spark, dbt, and Dagster/Airflow, as well as data warehouses like Snowflake, Redshift, BigQuery.
Understanding of infrastructure as code, including experience with Terraform.
Proficiency with containerization and orchestration including Docker and Kubernetes.
Hands-on experience with CI/CD tools and ML lifecycle tools, including Jenkins, MLflow, Kubeflow, and W&B.
Experience with AWS and its core services including S3, EC2, Lambda, RDS, and EMR, and practical experience with Boto3 and AWS ML services SageMaker and Bedrock.
Understanding of modern ML models and ability to discuss the performance characteristics and engineering trade-offs that influence deployment decisions.
Experience with systems incorporating Graph Neural Networks (GNNs), recommendation systems, anomaly detection, and complex time-series models.
Commitment to the highest ethical standards.

Responsibilities

Architect and implement the full lifecycle of ML models, from data ingestion to production inference, contributing to the design of our next-generation, event-driven architecture, using technologies like gRPC, Kafka, and high-performance API frameworks, like FastAPI, Spring WebFlux, and Axum.
Engineer and automate robust, large-scale data processing pipelines (ETL/ELT) using tools like Spark, dbt, and workflow orchestrators, and lead the design and implementation of our Feature Store strategy.
Own the MLOps framework for model training, versioning, and deployment, including CI/CD pipelines, automated workflows, and experiment tracking and evaluation tooling.
Implement sophisticated deployment strategies, including canary, blue-green, shadow, and A/B testing, to ensure safe, zero-downtime releases, and optimize inference performance for LLMs and other large models.
Leverage cutting-edge tools and techniques like quantization and compilation to maximize throughput and minimize latency.
Collaborate with data scientists to develop models and optimize model performance for low-latency serving using techniques like Python performance tuning.
Define, provision, and manage our cloud infrastructure using Terraform, working hands-on with a wide array of cloud services across compute, storage, and machine learning platforms.

Benefits

Fully-paid health care benefits
Generous parental and family leave policies
Volunteer opportunities
Support for employee-led affinity groups representing women, people of color and the LGBT+ community
Mental and physical wellness programs
Tuition assistance
A 401(k) savings program with an employer match and more

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume