Staff Data Engineer/Scientist

CACI•Chantilly, VA

About The Position

We are looking for a Staff Data Engineer/Scientist looking for new challenging problems. You will support the development of AI/ML algorithms in a multitude of disciplines from large language models, natural language processing, and time-series predictive analytics. Additionally, we have a team of excellent researchers and software developers who are eager to mentor and teach their craft.

Requirements

B.S. in data science, AI/ML, computer science, or related field
Minimum six (6) years of relevant experience as a Data Engineer/Scientist.
Experience developing data pipelines and normalizing data with canonical Python packages (e.g. NumPy, Pandas, Polars)
Experience contributing on a team using version control (e.g. git, GitLab, Bitbucket)
Active TS/SCI U.S. Government Security Clearance with a recent Full-Scope Polygraph (FSP)

Nice To Haves

M.S. or PhD in data science, AI/ML, computer science, or related field
Experience with Gitlab, DevSecOps utilizing test-driven development, containers, (e.g. Docker, Docker Compose), cloud services (e.g. AWS), tools for distributed computing (e.g. Spark, Pyspark)
Experience leading an interdisciplinary team of researchers and software developers
Experience with any of the following: Large Language Models and experience identifying ways to incorporate them into new domains and applications
Applying Transformer-based architectures to domains in other areas outside of Natural Language Processing (NLP) such as computer vision
Natural Language Processing algorithms such as BERT
Reinforcement learning and familiarity with Gymnasium Gym, OpenEnv, TorchRL, RLlib, and Stable Baselines
Applying clustering algorithms and/or deep neural networks to real life problems
Implementing tracking and pattern-of-life algorithms
Experience with GenAI Ops techniques (e.g. LLM-as-a-judge) and frameworks (e.g. LangFuse, MLFlow, Arize Phoenix)
Experience with Machine Learning libraries and frameworks such as HuggingFace and LangChain
Experience with Linux
Familiarity with using AWS cloud computing resources such as EC2, S3, Lambda, Bedrock, etc.
Experience with any of the following additional languages: Java, C++, Rust, Go, and/or C#
Experience implementing algorithms on the GPU in Python or C++ using CUDA and other CUDA libraries
Experience in application deployment, virtualization, and containerization (e.g. Podman, Docker, Kubernetes, Rancher)
Experience shaping and writing proposals

Responsibilities

Lead and mentor an interdisciplinary team consisting of both developers and researchers. The team's core focus is the implementation of ETL pipelines to support a variety of AI/ML and LLM solutions, which in turn address a broad range of customer challenges.
Assembles large, complex sets of data to support AI/ML algorithm implementation
Builds required infrastructure for optimal extraction, transformation and loading of data from various data sources
Curate and maintain data that is stored in support of metrics and evaluation
Implement Artificial Intelligence/Machine Learning algorithms
Identifies, designs, and implements internal process improvements including re-designing infrastructure for greater scalability, optimizing data delivery, and automating manual processes
Using Agile methodologies to develop software.

Benefits

Our employees value the flexibility at CACI that allows them to balance quality work and their personal lives.
We offer competitive compensation, benefits and learning and development opportunities.
Our broad and competitive mix of benefits options is designed to support and protect employees and their families.
At CACI, you will receive comprehensive benefits such as; healthcare, wellness, financial, retirement, family support, continuing education, and time off benefits.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume