AI/ML Engineer

Innosoft Corporation•Washington, DC

17h•Hybrid

About The Position

The National Endowment for the Arts (NEA) is investing in artificial intelligence and machine learning capabilities to modernize internal operations, improve grant analysis workflows, and enhance public-facing services. The organization requires production-grade AI/ML systems that meet federal data governance, privacy, and accessibility standards while delivering measurable impact across NEA's mission areas. The Senior AI/ML Engineer will design, build, and deploy machine learning models and AI systems supporting NEA's business needs across structured and unstructured data domains. This role spans the full AI/ML lifecycle, including data preprocessing, feature engineering, model training, evaluation, deployment, and monitoring. The engineer will work within an Agile SCRUM environment, collaborating with data scientists, software engineers, domain experts, and federal stakeholders to integrate models into production environments hosted on Amazon Web Services and Microsoft Azure cloud infrastructure.

Requirements

Bachelor's or Master's degree in Computer Science, Data Science, Engineering, Mathematics, or a related technical field.
5+ years of hands-on experience in AI/ML development and deployment in production environments.
Senior level proficiency in Python, including the ability to develop production-grade backend services, APIs, middleware, and machine learning data pipelines.
Senior level experience with TensorFlow, PyTorch, scikit-learn, and Orange for machine learning model development.
Experience working with both Amazon Web Services (AWS) and Microsoft Azure for hosting, deployment, and scalability of AI/ML workloads.
Strong understanding of machine learning algorithms, including supervised, unsupervised, and deep learning approaches, along with data structures and software engineering principles.
Experience with MLOps practices including CI/CD pipelines, model versioning, experiment tracking, automated retraining, and drift detection.
Experience with containerization (Docker) and orchestration (Kubernetes) for AI/ML workloads.
Experience integrating machine learning systems with SQL and NoSQL databases.
Experience with Agile/Scrum methodologies and project management tools (e.g., Azure DevOps, Jira).
Demonstrated ability to deliver quality production-grade machine learning systems on time and as estimated within an Agile SCRUM environment.
Active participation in mentoring and guiding less experienced engineers through code reviews and best practices.
Excellent communication, collaboration, and organizational skills with experience working alongside cross-functional teams and federal stakeholders.

Nice To Haves

Experience working with Large Language Models (LLMs), Retrieval-Augmented Generation (RAG), agentic AI workflows, and fine-tuning of small language models or embedding models.
Familiarity with vector databases and retrieval frameworks (FAISS, Pinecone, Chroma, pgvector) and graph databases (Neo4j, Amazon Neptune) for AI/ML retrieval workloads.
Experience with Agile/Scrum development environments in a federal or regulated setting.
Experience working with cross-functional teams across global time zones and cultures.
Working knowledge of accessibility standards (Section 508/WCAG) and federal web requirements where AI/ML systems interact with public-facing applications.
Familiarity with Azure Monitor, App Insights, Log Analytics, Prometheus, and Grafana for AI/ML observability and performance monitoring.
Working knowledge of infrastructure automation and configuration management tools (Terraform, Ansible).
Understanding of data governance, privacy frameworks, and ethical AI practices applicable to federal data environments.
Python (advanced), including async programming and performance optimization
Machine learning frameworks: TensorFlow, PyTorch, scikit-learn, Keras, Orange, Hugging Face Transformers
Generative AI: LLM APIs (OpenAI, Anthropic, Azure OpenAI), prompt engineering, RAG, fine-tuning
Cloud platforms: AWS (SageMaker, Bedrock, Lambda, S3, EC2, EKS) and Microsoft Azure (Azure ML, Azure OpenAI, AKS, Azure Databricks)
Backend and APIs: FastAPI, Flask, REST API design, async processing
Data pipelines and orchestration: Apache Airflow, Kubeflow, MLflow
Databases: PostgreSQL, MySQL, MongoDB, vector databases
Containerization and orchestration: Docker, Kubernetes
MLOps and CI/CD: GitHub Actions, Azure DevOps, Jenkins, model versioning, experiment tracking
Version control with Git
Software testing protocols, code review practices, and clean coding principles

Responsibilities

Design, develop, and deploy machine learning models and AI systems tailored to NEA's business needs across both structured and unstructured data sources;
Conduct data preprocessing, feature engineering, model selection, training, evaluation, and validation across the full ML lifecycle;
Build and operate production-grade machine learning pipelines using Python, TensorFlow, PyTorch, scikit-learn, and Orange;
Develop and integrate Large Language Model (LLM) capabilities, including Retrieval-Augmented Generation (RAG), embedding models, and prompt engineering for domain-specific tasks;
Design and implement model deployment workflows on Amazon Web Services (AWS) and Microsoft Azure, including managed services such as SageMaker, Bedrock, Azure Machine Learning, and Azure OpenAI;
Implement MLOps practices including model versioning, experiment tracking, automated retraining, drift detection, and rollback capabilities;
Develop CI/CD pipelines for machine learning workloads using GitHub Actions, Azure DevOps, Jenkins, or equivalent tools;
Containerize machine learning services using Docker and orchestrate deployments on Kubernetes (AKS, EKS) for scalability and resilience;
Build Python-based REST APIs and asynchronous backend services for model inference, batch processing, and real-time prediction using frameworks such as FastAPI and Flask;
Integrate machine learning components with relational and non-relational database systems including PostgreSQL, MySQL, and MongoDB;
Monitor model performance in production, implement observability through tools such as Azure Monitor, Prometheus, and Grafana, and retrain or update models as data and business needs evolve;
Implement responsible AI practices including model explainability (SHAP, LIME), bias detection, fairness audits, and adherence to data governance and privacy standards;
Conduct architectural peer reviews for code created by other engineers and contribute to engineering standards across the AI/ML platform;
Set up build, test, staging, and production environments and deploy code through structured release processes;
Contribute to estimations for all tickets in the backlog and participate in Backlog Grooming, Sprint Planning, and Sprint Review meetings;
Adhere to Agile SCRUM methodologies and organizational delivery processes;
Work closely and collaboratively with federal and contractor personnel to develop solutions that align with NEA mission objectives;
Share knowledge and expertise with colleagues, mentoring and guiding less experienced engineers through code reviews, design reviews, and best-practice guidance;
Stay current with advancements in AI/ML technologies, including foundation models, agentic AI, fine-tuning techniques, and emerging frameworks, and recommend appropriate solutions for NEA initiatives.