Lead Engineer

INFOSYS NOVA HOLDINGS LLC•Indianapolis, IN

1d•Onsite

About The Position

This role is responsible for the architecture, development, and productionization of an enterprise-scale Generative AI platform designed to host, manage, and operationalize fine-tuned and open-source Large Language Models (LLMs) in highly regulated environments. The platform enables secure, performant, and compliant AI inference across internal enterprise applications, with an initial focus on pharmaceutical and life sciences use cases. The engineer will operate at the intersection of distributed systems engineering, applied machine learning infrastructure, AI security, and MLOps, translating experimental NLP and generative AI workflows into robust, observable, and governable production services.

Requirements

Strong experience designing and operating distributed cloud-native systems.
Hands-on expertise deploying LLMs in production with performance, scalability, and security constraints.
Deep understanding of container orchestration (Kubernetes) and GPU-enabled workloads.
Experience implementing real-time inference services, API gateways.
Proven ability to design systems meeting compliance, auditability, and governance requirements.

Nice To Haves

Experience in pharmaceutical, healthcare, or highly regulated enterprise environments.
Exposure to AI security, prompt-risk mitigation, and regulated AI deployment.
Experience translating NLP research and generative modeling techniques into production platforms.
Strong collaboration skills with data scientists, ML engineers, SREs, and product teams.

Responsibilities

Architect and implement a GPU-accelerated, cloud-native LLM serving platform using containerized microservices deployed on Kubernetes.
Design systems that support low-latency, high-throughput inference while maintaining fault tolerance, horizontal scalability, and isolation across dev, test, and production clusters.
Abstract infrastructure primitives to expose self-service model lifecycle APIs for data scientists and ML engineers.
Deploy and manage fine-tuned and parameter-efficient LLMs using techniques such as PEFT and LoRA.
Implement end-to-end model versioning, promotion, rollback, and deprecation workflows.
Support integration of multiple LLM backends (open-source and commercial) behind standardized inference interfaces.
Engineer real-time request/response inspection pipelines to analyze user prompts and model outputs for prompt injection, data exfiltration, hallucination risk, and policy and compliance violations.
Implement multi-layer security controls embedded at ingress, orchestration, and model-serving layers.
Ensure all model interactions are traceable, auditable, and reproducible.
Build and operationalize retrieval-augmented generation (RAG) pipelines integrating LLMs with enterprise document repositories and vector search backends.
Standardize prompt engineering frameworks, contextual grounding strategies, and evaluation methodologies.
Enable enterprise use cases including contextual Q&A, semantic search, summarization, redaction, and knowledge extraction.
Use workflow orchestration frameworks (e.g., Temporal.io) to manage long-running, stateful AI pipelines, including inference orchestration, evaluation, and post-processing.
Implement asynchronous, event-driven AI workflows using gRPC-based service communication.
Standardize infrastructure provisioning using Infrastructure-as-Code (IaC) principles to ensure deterministic, repeatable deployments.
Automate CI/CD pipelines for model artifacts, prompts, and platform services.
Enable dynamic resource allocation, GPU scheduling, and zero/low-downtime upgrades.
Design and implement observability pipelines collecting model latency and throughput, token usage and cost metrics, security violations and guardrail triggers, and drift, degradation, and anomalous behavior.
Establish Service Level Objectives (SLOs) and reliability targets for LLM inference services.
Enable proactive debugging, capacity planning, and performance optimization.
Integrate the platform with internal policy enforcement systems, IAM, and role-based access controls (RBAC).
Ensure generative outputs comply with enterprise governance frameworks, regulatory requirements, and ethical guidelines.
Maintain detailed audit logs to support compliance and validation in regulated environments.
Develop reusable platform components enabling collaboration across data science, DevOps, and product teams.
Provide standardized interfaces and SDKs for downstream applications to consume AI services.
Serve as a technical bridge between AI research experimentation and enterprise-grade production systems.