AI/ML Data Scientist Intern

Command Post Technologies, Inc.Suffolk, VA
Onsite

About The Position

We are looking for a curious and driven AI/ML Data Scientist Intern to join our team in Suffolk, Virginia. This internship offers a hands-on opportunity for students or early-career professionals with a foundation in Computer Science to gain real-world experience in artificial intelligence, machine learning, and data science. You will work alongside experienced engineers and data professionals to build, fine-tune, and deploy machine learning models, construct retrieval-augmented generation pipelines, and curate high-quality datasets that support organizational objectives.

Requirements

  • Linux Foundations – Basic understanding of Linux operating systems, including file system navigation, user management, permissions, and command-line operations.
  • Python Basics – Foundational proficiency in Python programming, including the ability to write scripts, work with libraries, manipulate data structures, and debug code.
  • Agentic AI – Familiarity with the concepts and architecture behind agentic AI systems, including how autonomous agents plan, reason, and execute multi-step tasks.
  • Hugging Face – Experience navigating the Hugging Face ecosystem, including the ability to load pre-trained models, tokenizers, and datasets from the Hugging Face Hub.
  • Dataset Curation – Understanding of how to source, clean, label, and organize datasets for machine learning training and evaluation purposes.
  • LoRA Fine-Tuning – Knowledge of Low-Rank Adaptation (LoRA) techniques for efficiently fine-tuning large language models with reduced computational overhead.
  • RAG Pipelines – Understanding of retrieval-augmented generation architecture, including how to connect language models with external knowledge sources to improve response accuracy.
  • Document Extraction – Familiarity with techniques and tools for extracting structured data from unstructured documents such as PDFs, scanned images, and web pages.
  • Chunking Strategies – Knowledge of methods for splitting large documents into smaller, semantically coherent segments optimized for embedding and retrieval.
  • Embedding Models – Understanding of how text embedding models work and how they are used to represent documents as vectors for similarity search and retrieval applications.
  • Basic Networking – Understanding of core networking concepts including IP addresses, subnetting, the OSI model, and the functional differences between Layer 2 and Layer 3 protocols.
  • Azure Virtual Desktop Concepts – Familiarity with Azure Virtual Desktop components, including Host Pools, Workspaces, and Application Groups.
  • HTML, JavaScript, React – Foundational knowledge of front-end web technologies, including the ability to read and understand HTML structure, JavaScript logic, and React component architecture.

Nice To Haves

  • Vector Databases – Experience working with vector database platforms such as Pinecone, Weaviate, or ChromaDB for storing and querying high-dimensional embeddings.
  • LangChain or LlamaIndex – Familiarity with orchestration frameworks used to build applications powered by large language models.
  • Prompt Engineering – Knowledge of techniques for crafting effective prompts to guide large language model behavior and improve output quality.
  • MLOps and Model Deployment – Experience with tools and workflows for packaging, deploying, and monitoring machine learning models in production environments.
  • Docker & Containerization – Basic understanding of container concepts and experience running applications in Docker or Kubernetes environments.
  • Transformer Architectures – Understanding of the transformer model architecture, including self-attention mechanisms and how they power modern language models.
  • Data Annotation and Labeling – Experience with data annotation workflows and labeling tools used to prepare supervised learning datasets.
  • Evaluation Metrics for Generative AI – Knowledge of how to assess the quality of generative AI outputs using metrics such as BLEU, ROUGE, perplexity, or human evaluation frameworks.
  • Cloud Platforms for ML Workloads – Exposure to cloud-based machine learning services on AWS, GCP, or Azure for training, hosting, and scaling models.
  • Version Control Systems (Git) – Familiarity with Git workflows for managing code, collaborating with teams, and tracking project history.
  • Microsoft EntraID – Familiarity with Microsoft’s identity and access management platform for managing user authentication and permissions.
  • API Calls – Experience making and testing API calls using tools such as Postman, cURL, or similar utilities.
  • Azure Services – Broader exposure to Azure services beyond the fundamentals, such as Azure Storage, Azure Networking, or Azure Active Directory.
  • Node.js / .NET API – Experience building or consuming APIs using Node.js or the .NET framework.
  • Azure Serverless Functions – Familiarity with event-driven, serverless computing in Azure for running lightweight backend processes.
  • Visio or Other Drawing Application – Ability to create data flow diagrams, system architecture visuals, or workflow documentation using Microsoft Visio or comparable tools such as draw.io or Lucidchart.

Responsibilities

  • Assist in the development and fine-tuning of large language models using techniques such as LoRA to optimize model performance for specific use cases.
  • Support the design and implementation of retrieval-augmented generation (RAG) pipelines to enhance AI-driven applications with relevant, contextual data.
  • Curate, clean, and prepare datasets for training and evaluation, ensuring data quality and relevance across projects.
  • Work with embedding models to convert text and documents into vector representations for search and retrieval systems.
  • Develop and refine chunking strategies for processing large documents into manageable, semantically meaningful segments.
  • Extract structured information from unstructured documents using automated document extraction techniques.
  • Build and experiment with agentic AI workflows that enable autonomous task execution and decision-making.
  • Contribute to front-end interfaces and internal tools using HTML, JavaScript, and React to support data visualization and model interaction.
  • Document processes, experiments, and findings for internal knowledge sharing and reproducibility.

Benefits

  • Leadership training
  • Career professional development
  • Work/Life balance
  • Rewards and recognition
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service