Senior Machine Learning Engineer

Ziphire.hr•San Francisco, CA

About The Position

As a Senior Machine Learning Engineer, you will build, deploy, and optimize the machine learning services that power the client's platform. You will be the primary builder of robust ML subsystems, translating high-level requirements into production-ready code. This is a hands-on role where you will focus on the reliability, speed, and scalability of our AI solutions. You will take ownership of specific model pipelines, ensuring they are efficient, testable, and maintainable.

Requirements

5+ years of professional software development experience including system design, large-scale services, and production-grade infrastructure.
3+ years of hands-on experience in machine learning engineering or applied AI, with a strong record of deploying and maintaining models in production.
Technical subject matter expertise in 3+ general areas of software development (e.g., server, database, security, etc) including machine learning infrastructure.
Demonstrated ability to deliver significant, measurable real-world impact through applied ML.
Proven ability to design and write modular, performant, and easy to read software that solves complex business problems.
Proficiency in Python, TensorFlow/PyTorch, and scikit-learn.
Strong background in MLOps and data infrastructure (e.g., Airflow, Spark, feature stores, MLflow, data versioning).
Proven ability to deploy and maintain ML models in production with CI/CD, monitoring, and alerting.
Familiarity with cloud ML environments (AWS, GCP, or Azure) and containerization (Kubernetes, Docker).
Experience building or fine-tuning Large Language Models (LLMs) or generative models for structured business processes.
Experience with retrieval-augmented pipelines or feedback-driven model retraining.
Excellent technical communication and a product mindset—comfortable driving initiatives from concept to delivery.
Experience with large-scale services
Strong record of deploying models in production
Experience with production-grade infrastructure
Ability to design modular software

Nice To Haves

Background in healthcare software operations, or financial automation.
Contributions to open-source ML infrastructure projects.
Published research or conference papers in machine learning, natural language processing, or applied AI
Experience leading AI reliability and observability initiatives — designing monitoring frameworks, drift detection, and alerting systems for multiple production models.

Responsibilities

Build, deploy, and optimize ML services
Write production-grade software for data ingestion
Implement CI/CD pipelines for model deployment
Construct and maintain data pipelines for training
Develop reusable software modules for the team
Translate business requirements into technical specs
Monitor performance of production models
Debug incidents and execute retraining workflows
Collaborate with Engineering and Product Managers
Estimate effort and flag technical risks
Write high-quality, production-grade software for data ingestion, feature extraction, and model inference, specifically focusing on optimizing code for latency, throughput, and resource efficiency.
Implement robust CI/CD pipelines, automated testing, and comprehensive logging/monitoring for the models you deploy to ensure immediate detection of issues.
Construct and maintain specific data pipelines required for training and inference, ensuring data quality and consistency at the component level.
Develop reusable software modules and utilities that streamline the development process for the wider team, while championing clean code and test-driven development.
Translate business requirements into technical specifications and execute them with precision, serving as an expert at breaking down complex tasks into deliverable units.
Monitor the daily performance of production models, debug incidents, and execute routine retraining workflows to address data drift.
Partner with Engineering team members and Product Managers to estimate effort, flag technical risks, and deliver features on schedule.