Site Reliability Engineer - Machine Learning

TrimbleWestminster, CO
Onsite

About The Position

Architect the Future of AI as our ML Ops / Agent Ops Engineer! Are you ready to redefine the Software Development Life Cycle by moving away from traditional manual execution toward an AI-Native, Agentic PDLC? Trimble is seeking an entrepreneurial ML Ops / Agent Ops Engineer to serve as the operational architect for our AI-Native Product Pods, pioneering the concept of Agent Ops to ensure our AI agents remain performant, cost-effective, and resilient. What Makes This Role Great: This isn't a traditional "infrastructure-only" role. You will bridge the gap between software reliability and machine learning orchestration, pioneering the concept of Agent Ops. You will architect the automated pipelines and infrastructure required to develop and deploy more than just code; you will orchestrate agentic behaviors and machine learning models at scale. As an entrepreneurial engineer who thrives in ambiguity, you are deeply committed to 'the craft of production.' Your mission is to ensure our AI agents remain performant, cost-effective, and resilient, all while upholding the operational rigor of a world-class SaaS platform.

Requirements

  • 3+ years of relevant experience with a Bachelor’s degree in Computer Science, Engineering, or a quantitative field (or 1-3 years with a Master’s).
  • In-depth practical knowledge of machine learning and deep learning principles, including model selection, training, and evaluation.
  • Fluency in Python and PyTorch is a must, along with familiarity with NumPy, pandas, and scikit-learn.
  • Proven track record of shipping production-ready ML work, supported by a strong GitHub portfolio or open-source contributions.
  • Solid grounding in data structures, software architecture, Linux, and bash scripting.

Nice To Haves

  • Experience with generative AI models, LLMs, and agentic platforms like Semantic Kernel.
  • Familiarity with MLOps tools such as Azure DevOps, New Relic, and Azure Monitor.
  • Experience with cloud-native ML services (Azure Cognitive Services, AWS SageMaker) and large-scale computing frameworks like Spark or Hadoop.
  • Knowledge of computer vision techniques for image and symbol analysis.

Responsibilities

  • Architect and orchestrate agentic behaviors and machine learning models at scale, overseeing the full lifecycle from conception to production deployment.
  • Pioneer MLOps and Agent Ops best practices, including containerization, model versioning, and robust monitoring to ensure operational rigor.
  • Collaborate with domain experts to translate ambiguous business needs into clear, actionable machine learning tasks and robust technical solutions.
  • Research and adapt the frontier of machine learning, such as foundation models and self-supervised methods, into production code to enhance system capabilities.
  • Conduct thorough testing, debugging, and optimization of ML models to ensure high efficiency, scalability, and reliability in complex environments.

Benefits

  • Trimble offers comprehensive core benefits that include Medical, Dental, Vision, Life, Disability, Time off plans and retirement plans.
  • Most of our businesses also offer tax savings plans for health, dependent care and commuter expenses as well as Paid Parental Leave and Employee Stock Purchase Plan.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service