Machine Learning Engineer - LLMs and Agentic

Oversight Systems Inc•Atlanta, GA

67d

About The Position

We are seeking a skilled and forward-looking ML Engineer with experience in Large Language Models (LLMs), generative AI, and agentic architectures to join our growing R&D and Applied AI team. This role is critical in helping Oversight deliver the next generation of agentic AI systems for enterprise spend management and risk controls. The ideal candidate has a strong foundation in machine learning, modern deep learning frameworks, and data pipelines, coupled with hands-on experience experimenting with LLMs, small language models (SLMs), multi-agent frameworks, and retrieval-augmented generation (RAG). You will work closely with AI/ML researchers, data engineers, and product teams to design, implement, and optimize models that power autonomous exception resolution, anomaly detection, and explainable insights. This is a hands-on engineering role where you will not only build and scale ML systems but also actively contribute to cutting-edge applied research in agentic AI.

Requirements

Bachelor’s or Master’s degree in Computer Science, Data Science, Machine Learning, or related field.
3+ years of experience building and deploying ML systems.
Proficiency in Python and libraries such as PyTorch, TensorFlow, Scikit-Learn, Hugging Face Transformers.
Hands-on experience with LLMs/SLMs (fine-tuning, prompt design, inference optimization).
Demonstrated experience with at least two of the following ecosystems: OpenAI GPT models, Anthropic Claude, Google Gemini, Meta LLaMA.
Familiarity with vector databases, embeddings, and RAG pipelines.
Ability to work with structured and unstructured data at scale.
Knowledge of SQL and distributed data frameworks (Spark, Ray).
Strong understanding of ML lifecycle: data prep, training, evaluation, deployment, monitoring.

Nice To Haves

Experience with agentic frameworks (LangChain, LangGraph, MCP, AutoGen).
Knowledge of AI safety, guardrails, and explainability techniques.
Hands-on experience deploying ML/LLM solutions in cloud environments (AWS, GCP, Azure).
Experience with CI/CD for ML (MLOps), monitoring, and observability.
Familiarity with anomaly detection, fraud/risk modeling, or behavioral analytics.
Contributions to open-source AI/ML projects or publications in applied ML research.

Responsibilities

Contribute to the design, training, fine-tuning, and deployment of ML/LLM models for production.
Implement RAG pipelines using vector databases.
Work with frameworks like LangChain, LangGraph, MCP to prototype and optimize multi-agent workflows.
Develop prompt engineering, optimization, and safety techniques for agentic LLM interactions.
Integrate memory, evidence packs, and explainability modules into agentic pipelines.
Work hands-on with multiple LLM ecosystems: OpenAI GPT models, Anthropic Claude, Google Gemini, Meta LLaMA.
Collaborate with Data Engineering to build and maintain real-time and batch data pipelines that serve ML/LLM workloads.
Conduct feature engineering, preprocessing, and embeddings generation for structured and unstructured data.
Implement model monitoring, drift detection, and retraining pipelines.
Leverage cloud ML platforms for experimentation and scaling.
Explore and evaluate emerging LLM/SLM architectures and agent orchestration patterns.
Experiment with generative AI and multimodal models to extend capabilities beyond text.
Collaborate with R&D to prototype autonomous resolution agents, anomaly detection models, and reasoning engines.
Translate research prototypes into production-ready components.
Work cross-functionally with R&D, Data Science, Product, and Engineering to deliver business-aligned AI features.
Participate in design reviews, architecture discussions, and model evaluations.
Document processes, experiments, and results effectively for knowledge sharing.
Mentor junior engineers and contribute to ML engineering best practices.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Number of Employees

101-250 employees

Machine Learning Engineer - LLMs and Agentic

About The Position

Requirements

Nice To Haves

Responsibilities

What This Job Offers

Job Search Resources

Tools

Career Hubs

Guides

Company