We are seeking a Data Scientist proficient in Python and Jupyter Notebook to support a Natural Language Processing (NLP) project to accurately and automatically tokenize language data with spoken or written origins. You will develop automated solutions for the annotation of language data with parts of speech information and improve existing models by scoring performance against human-generated annotations for speech and text. The Level 3 Data Scientist shall possess the following capabilities: Foundations: (Mathematical, Computational, Statistical). Data Processing: (Data management and curation, data description and visualization, workflow and reproducibility). Modeling, Inference, and Prediction: (Data modeling and assessment, domain-specific considerations). Ability to make and communicate principal conclusions from data using elements of mathematics, statistics, computer science, and applications-specific knowledge. Ability to use analytic modeling, statistical analysis, programming, and/or another appropriate scientific method, develop and implement qualitative and quantitative methods for characterizing, exploring, and assessing large datasets in various states of organization, cleanliness, and structure that account for the unique feature and limitations inherent in Government data holdings. Translate practical mission needs and analytic questions related to large datasets into technical requirements and, conversely, assist others with drawing appropriate conclusions from the analysis of such data. Effectively communicate complex technical information to non-technical audiences. Ability to train and develop NLP/NER for LLM solutions within an agentic AI framework (LangGraph). Must be able to perform supervised and unsupervised model training and validation for automated knowledge extraction from unstructured natural language data in multiple languages without a predefined ontology. Familiarity with customer data sources and data retrieval techniques is necessary for producing preprocessed training data, which will require an understanding of techniques to ensure data quality and readiness for integration into the system. Understanding of enterprise data compliance and policy concerns are necessary to ensure solutions are built for end user access.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level