Data Scientist I or II (MAD-BS-OR)

Hitachi•Hillsboro, OR

56d•Hybrid

About The Position

Data Scientists are responsible for the development and maintenance of Artificial Intelligence (AI) software and systems for Hitachi High-Tech America, Inc. (HTA) products. This role involves hands-on development of algorithms in machine learning, statistical modeling, neural nets, and pattern recognition. The position requires developing, training, and deploying ML models for various applications like time-series forecasting, anomaly detection, classification, regression, and predictive maintenance. The role also includes designing end-to-end ML pipelines, leading Root Cause Analysis (RCA) investigations, building frameworks for Fault Tree Analysis (FTA), and collaborating with domain experts to translate failure patterns into ML features. Additionally, the Data Scientist will design and develop Agentic AI systems capable of autonomous reasoning, tool usage, and multi-step decision making, as well as implement LLM-based systems with tool-calling frameworks, RAG, and structured outputs. The role requires partnering with cross-functional teams to build scalable, production-ready solutions using Python-based ML frameworks and data processing tools, and deploying models and services using REST APIs and containerization. Experience with modern data platforms like time-series databases, analytical databases, and vector databases is also expected.

Requirements

Master of Science degree in Data Science, Statistics, Computer Science, or similar quantitative field
At least five (5) years of practical experience in writing algorithms in Machine Learning, Statistical Modelling, Neural Nets, and Pattern Recognition from data exploration
Five (5) years of experience in Data Science / Machine Learning
Strong programming skills in Python
Proven experience with: Time-series analysis and anomaly detection, Statistical modeling and machine learning algorithms
Hands-on experience with: Root Cause Analysis (RCA), Fault Tree Analysis (FTA) or failure modeling
Experience working with real-world, noisy, and large-scale datasets
Experience with Agentic AI / LLM systems, including: Tool-calling architecture, RAG pipelines, Prompt engineering and evaluation frameworks
Full software development lifecycle experience, must be comfortable working using Agile as well as iterative methodologies
Experience with Test-driven development using tools to spot performance issues and memory leaks
Ability to investigate and apply new technologies
Effective oral and written communication skills, including ability to effectively communicate challenging or technical concepts.
Excellent relationship building skills
Strong engineering analytical and problem-solving skills
Proactively undertake R&D activities and deliver tangible results under deadlines
Ability to manage multiple tasks and prioritize work accordingly
Work longer than normal hours as needed during releases and customer escalations
Self-sufficient, self-reliant, and self-disciplined, but also able to operate effectively as part of a team
Ability to comprehend and enforce safety policies

Nice To Haves

Familiarity with: Distributed systems and scalable ML infrastructure, MLOps practices (CI/CD, monitoring, model versioning)
Knowledge of: Signal processing or physics-based modeling, Graph-based reasoning or causal inference
General technical knowledge of semiconductor metrology equipment

Responsibilities

Hands-on development and write algorithms in machine learning, statistical modelling, neural nets, and pattern recognition from data exploration
Develop, train, and deploy ML models for Time-series forecasting and anomaly detection. Classification and regression on tabular and sensor data, predictive maintenance and failure prediction
Design end-to-end ML pipelines including Data ingestion, feature engineering, model training, evaluation, and deployment
Lead and support Root Cause Analysis (RCA) investigations using data-driven approaches
Build frameworks for Fault Tree Analysis (FTA) and failure mode identification
Collaborate with domain experts (engineering, operations) to translate failure patterns into ML features and models
Design and develop Agentic AI systems capable of: Autonomous reasoning over structured and unstructured data, Tool usage (query engines, APIs, analytics pipelines), Multi-step decision making and diagnostics workflows
Implement LLM-based systems with: Tool-calling frameworks, Retrieval-Augmented Generation (RAG), Structured outputs and validation pipelines
Partner with cross-functional teams (Data Engineers, Software Engineers, Domain Experts)
Build scalable, production-ready solutions using: Python-based ML frameworks (e.g., TensorFlow, PyTorch, Scikit-learn), Data processing tools (Pandas, Spark, SQL)
Deploy models and services using: REST APIs (FastAPI, Flask), Containerization (Docker, Kubernetes)
Work with modern data platforms: Time-series DBs (e.g., Prometheus, InfluxDB), Analytical DBs (e.g., ClickHouse, PostgreSQL), Vector DBs (e.g., Qdrant, FAISS)
Translate business problems into technical solutions
Creating architecture and complex designs independently and documenting them
Integrate and test software to confirm compliance with specifications
Developing functional specifications
Participate in design reviews, code reviews of peers and test reviews
Performing functional tests
Other duties as assigned