AI Foundational Model Engineer

NTT DATA Services•Jersey City, NJ

2d•$139,872 - $209,808•Onsite

About The Position

Design, build, deploy, and optimize enterprise-grade AI systems powered by foundation models, LLMs, retrieval-augmented generation, and agentic AI workflows. The role converts AI concepts into secure, scalable, observable, and supportable production systems suitable for a regulated financial-services environment. Primary ownership includes Production LLM applications, RAG pipelines, AI services, and model-serving integrations, as well as the end-to-end LLMOps/MLOps lifecycle from experimentation to deployment, monitoring, evaluation, rollback, and continuous improvement. This also covers model adaptation, inference optimization, APIs, observability, and operational readiness for GenAI solutions.

Requirements

7+ years in AI/ML engineering, platform engineering, software engineering, or applied machine learning.
Hands-on experience with LLMs, transformers, embeddings, RAG, semantic search, and GenAI application patterns.
Strong Python engineering skills with PyTorch, TensorFlow, Hugging Face, LangChain, LlamaIndex, Semantic Kernel, or equivalent frameworks.
Experience deploying production AI services using APIs, containers, Kubernetes, CI/CD, cloud-native services, and monitoring platforms.
Practical knowledge of model evaluation, fine-tuning, inference optimization, and secure data handling.

Nice To Haves

Banking, risk, compliance, financial crime, operations, or enterprise technology background.
Experience with Azure OpenAI, AWS Bedrock, Vertex AI, Databricks, vLLM, Triton, MLflow, Kubeflow, or model gateways.
Exposure to model risk, AI governance, audit controls, AI cost governance, and private or open-source LLM deployments.

Responsibilities

Design and implement LLM-powered applications such as knowledge assistants, document intelligence solutions, workflow agents, summarization tools, and decision-support systems.
Build RAG pipelines using embeddings, chunking strategies, vector databases, semantic retrieval, reranking, response grounding, and citation patterns.
Adapt and optimize models using LoRA, PEFT, instruction tuning, distillation, transfer learning, quantization, and domain adaptation techniques.
Develop scalable APIs, microservices, model-serving components, and integration patterns across cloud, hybrid, or containerized environments.
Optimize inference workloads for latency, throughput, token efficiency, cost, reliability, and user experience.
Implement model and application observability, including prompt logs, retrieval quality, hallucination indicators, drift signals, feedback loops, cost telemetry, and service health.
Embed security, privacy, Responsible AI, and model risk controls into AI application design and delivery.
Create production documentation, runbooks, release notes, test evidence, and audit-ready implementation records.