Data Infrastructure & ML Engineer (Hybrid Role)

Axcelis Technologies•Beverly, MA

1d•$122,133 - $183,200•Hybrid

About The Position

We are seeking a Senior Data Infrastructure & Machine Learning Engineer to design and implement scalable data systems and pipelines that support advanced analytics and machine learning workflows. This is a hybrid role where the primary focus is on data pipeline engineering and Python-based data processing, supported by strong database design and management expertise. Role Focus (Approximate Split) Data Pipeline Engineering & Data Flow (Critical): ~50% Python & Machine Learning Data Processing: ~30% Database Design & Management: ~20%

Requirements

Bachelor’s or Master’s degree in Computer Science, Engineering, or related field with 5+ years of experience.
Strong experience in database design and SQL-based systems.
Hands-on experience with distributed systems, partitioning, and sharding.
Proven experience building data pipelines (ETL/ELT).
Strong proficiency in Python for data processing.
Experience working with log-based and semi-structured data (e.g., JSON).
Understanding of data traceability, validation, and governance.

Nice To Haves

Experience with time-series or log analytics systems.
Exposure to real-time/streaming architectures (e.g., Kafka).
Experience with cloud platforms (Azure, AWS, or GCP).
Familiarity with machine learning workflows and lifecycle.
Domain experience in semiconductor or high-throughput systems (nice to have).

Responsibilities

Design and build end-to-end data pipelines (ETL/ELT) for ingesting, processing, and transforming data.
Handle multiple data sources including: Tool-generated logs (e.g., AT log files) JSON and semi-structured data
Ensure full data traceability, enabling backward tracking of all data points.
Implement validation, monitoring, and error handling to ensure data quality and reliability.
Design and manage scalable database schemas.
Support both single-node and distributed database environments.
Implement tablespaces, partitioning, and sharding strategies to ensure performance and scalability.
Optimize queries and maintain high performance for large-scale datasets.
Develop data processing workflows using Python.
Work extensively with dataframes for transformation and analysis.
Utilize libraries such as: Pandas, NumPy for data manipulation Plotly (or similar) for visualization and exploratory analysis
Automate data workflows and integrate them into pipelines.
Prepare and transform datasets for machine learning models.
Collaborate with data scientists and engineers to support model training and deployment workflows.
Enable scalable data foundations for AI/ML integration into production systems.