Data Scientist- Raleigh, NC- Hybrid

OMG Technology•Raleigh, NC

1d•Hybrid

About The Position

We are seeking an experienced Data Scientist with strong expertise in Data Science and machine learning engineering, including hands-on experience designing and deploying ML solutions in production. This role focuses on building scalable ML solutions, productionizing models, and enabling robust ML platforms for enterprise-grade deployments. This position requires 4 days in the office and one remote day per week, based at our corporate headquarters in Raleigh, North Carolina (North Hills).

Requirements

Bachelor’s degree in Computer Science, Information Technology, Data Science, Engineering, or a related field.
5+ years of hands-on experience with Python (pandas, PySpark, scikit-learn), Bash scripting, and Docker; familiarity with TensorFlow and PyTorch preferred.
Strong experience designing and implementing predictive and prescriptive models for regression, classification, and optimization problems.
Expertise with advanced modeling techniques such as structural time series modeling and boosting algorithms (e.g., XGBoost, LightGBM).
5+ years of experience with SageMaker (training, processing, pipelines, model registry, endpoints) or equivalent platforms such as Kubeflow, MLflow/Feast, Vertex AI, or Databricks ML.
5+ years of experience with Databricks DABS, Airflow, Step Functions, and event-driven architectures using EventBridge, SQS, and Kinesis.
3+ years of experience working with AWS, Azure, or GCP services including ECR/ECS, Lambda, API Gateway, S3, Glue, Athena, EMR, RDS/Aurora (PostgreSQL/MySQL), DynamoDB, CloudWatch, IAM, VPC, and WAF.
Strong understanding of Snowflake warehouses, databases, schemas, stages, Snowflake SQL, RBAC, UDFs, and Snowpark.
3+ years of hands-on experience with CodeBuild/CodePipeline, GitHub Actions, or GitLab CI/CD; experience with blue/green, canary, and shadow deployments for ML services and applications.
Proven experience building and optimizing batch and streaming pipelines, schema management, partitioning strategies, performance tuning, and parquet/iceberg best practices.
Experience with unit and integration testing for data and ML models, contract testing for feature pipelines, reproducible training workflows, and model/data drift monitoring.
Strong troubleshooting and incident response experience for ML services with exposure to SLOs, dashboards, runbooks, and debugging across data, model, and infrastructure layers.
Strong communication skills, collaborative mindset, problem-solving ability, and a proactive approach toward automation and documentation.

Nice To Haves

Experience in retail and/or manufacturing domains is preferred.

Responsibilities

Design and implement predictive and prescriptive models for regression, classification, and optimization problems.
Apply advanced techniques such as structural time series modeling and boosting algorithms (e.g., XGBoost, LightGBM).
Develop, train, evaluate, and optimize machine learning models using Python, PySpark, TensorFlow, and PyTorch.
Work closely with stakeholders to understand business challenges and translate them into scalable data science solutions.
Participate in end-to-end solution design and collaborate with cross-functional teams to ensure successful integration of models into business processes.
Rapidly prototype and test hypotheses to validate model approaches.
Build automated workflows for model monitoring and performance evaluation.
Create dashboards using tools like Databricks and Palantir to visualize key model metrics such as model drift, feature importance, and Shapley values.
Build repeatable and scalable paths from experimentation to deployment (batch, streaming, and low-latency endpoints), including feature engineering, training, validation, and evaluation.
Develop and maintain core ML platform components including model registry, feature store, experiment tracking, artifact repositories, and standardized CI/CD pipelines for ML workflows.
Design and implement robust data and ML pipelines orchestrated using Step Functions, Airflow, or Argo to train, validate, and deploy models based on schedules or event-driven triggers.
Implement end-to-end monitoring, data validation, model drift detection, quality checks, and alerting mechanisms aligned with SLA/SLO requirements.
Ensure model/version lineage, reproducibility, approvals, rollback strategies, auditability, and cost optimization aligned with enterprise governance policies.
Collaborate with onshore and offshore teams, mentor data scientists on packaging, testing, and optimization best practices, and contribute to engineering standards and code reviews.
Prototype innovative ML solutions and troubleshoot production issues across data, model, application, and infrastructure layers.