About The Position

We are seeking a Data Scientist proficient in Python and Jupyter Notebook to support a specific program line tied to AI/ML. NLP experience is preferred. The Level 3 Data Scientist shall possess capabilities in Foundations (Mathematical, Computational, Statistical), Data Processing (Data management and curation, data description and visualization, workflow and reproducibility), and Modeling, Inference, and Prediction (Data modeling and assessment, domain-specific considerations). This role requires the ability to make and communicate principal conclusions from data using elements of mathematics, statistics, computer science, and applications-specific knowledge. The Data Scientist will use analytic modeling, statistical analysis, programming, and/or other appropriate scientific methods to develop and implement qualitative and quantitative methods for characterizing, exploring, and assessing large datasets. They will translate mission needs and analytic questions into technical requirements and assist others in drawing conclusions from data analysis. Effective communication of complex technical information to non-technical audiences is essential. This Data Scientist position will support a Natural Language Processing (NLP) project focused on accurately and automatically tokenizing language data, developing automated solutions for part-of-speech annotation, and improving existing models by scoring performance against human-generated annotations for speech and text.

Requirements

  • Proficiency with Jupyter Notebooks using Python is required.
  • TS/SCI with polygraph is required.
  • Bachelor's Degree with 10 years of relevant experience.
  • Associate's degree with 12 years of experience may be considered for individuals with in-depth experience that is clearly related to the position.
  • Bachelor's Degree must be in Mathematics, Applied Mathematics Statistics, Applied Statistics, Machine learning, Data Science, Operations Research, or Computer Science or a degree in a related field (Computer Information Systems, Engineering), a degree in the physical/hard sciences (e.g. physics, chemistry, biology, astronomy), or other science disciplines with a substantial computational component (i.e. behavioral, social, or life) may be considered if it included a concentration of coursework (5 or more courses) in advanced Mathematics (typically 300 level or higher, such as linear algebra, probability and statistics, machine learning) and/or computer science (e.g. algorithms, programming, , data structures, data mining, artificial intelligence).
  • Relevant experience must be in designing/implementing machine learning, data science, advanced analytical algorithms, programming (skill in at least on high level language (e.g. Python), statistical analysis (e.g. variability, sampling error, inference, hypothesis testing, EDA, application of linear models), data management (e.g. data cleaning and transformation), data mining, data modeling and assessment, artificial intelligence, and/or software engineering.

Nice To Haves

  • NLP experience is preferred.

Responsibilities

  • Make and communicate principal conclusions from data using elements of mathematics, statistics, computer science, and applications-specific knowledge.
  • Use analytic modeling, statistical analysis, programming, and/or another appropriate scientific method, develop and implement qualitative and quantitative methods for characterizing, exploring, and assessing large datasets in various states of organization, cleanliness, and structure that account for the unique feature and limitations inherent in Government data holdings.
  • Translate practical mission needs and analytic questions related to large datasets into technical requirements.
  • Conversely, assist others with drawing appropriate conclusions from the analysis of such data.
  • Effectively communicate complex technical information to non-technical audiences.
  • Accurately and automatically tokenize language data with spoken or written origins.
  • Develop automated solutions for the annotation of language data with parts of speech information.
  • Improve existing models by scoring performance against human-generated annotations for speech and text.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service