DevOps/Platform Engineer II - Richland, WA

Pacific Northwest National Laboratory•Richland, WA

18h•$109,000 - $163,600•Onsite

About The Position

We are seeking a DevOps/Platform Engineer to join PNNL's AI engineering team, contributing to innovative systems spanning agentic AI platforms, large-scale data orchestration, and real-time intelligence processing. This is an excellent opportunity for early to mid-career developers to apply their software engineering skills to meaningful national security challenges while growing their expertise in AI/ML systems, cloud infrastructure, and distributed computing. Who You Are You're a motivated software engineer with foundational experience in building production systems and a strong desire to grow your expertise in AI/ML and scalable infrastructure. You're comfortable working both independently on defined tasks and collaboratively on larger initiatives. You're eager to learn new technologies, apply software engineering best practices, and contribute to mission-critical systems while building your professional network and technical reputation. What You'll Build AI Systems & Platforms Develop components of agentic AI systems and LLM-based applications Implement features using frameworks like LangChain, LlamaIndex, or similar tools Build and maintain ML pipelines, data preprocessing workflows, and model deployment infrastructure Create utilities and tools that support AI/ML development and operations Work with multi-modal data including text, structured data, and sensor information Data Pipelines & Infrastructure Build data pipelines for large-scale ETL, transformation, and analytics workflows Implement streaming data processors and event-driven components Develop microservices and APIs within distributed architectures handling high-throughput workloads Deploy containerized applications using Docker and Kubernetes Contribute to CI/CD pipelines and automated testing frameworks Mission-Critical Production Systems Write clean, well-tested code following established best practices Implement monitoring, logging, and observability for applications Build developer tooling and documentation to support team productivity Contribute to system performance optimization and debugging efforts Support deployments in cloud and secure environments Technical Leadership Work on small tasks and project elements, progressing to independent ownership Collaborate with cross-functional teams including data scientists, researchers, and senior engineers Participate in code reviews, design discussions, and technical planning Mentor junior staff and students when opportunities arise Contribute technical content to proposals and project documentation Present your work at team meetings and technical forums

Requirements

Working proficiency in Python with foundational knowledge of at least one additional language (Bash, Go, C#, JavaScript/TypeScript) for scripting and automation tasks
Understanding of Infrastructure as Code principles with exposure to tools like Terraform, CloudFormation, or Ansible and ability to write basic infrastructure configurations
Familiarity with version control workflows (Git) including branching, commits, pull requests, and collaborative development practices with willingness to learn CI/CD pipeline concepts and contribute to build automation
Eagerness to learn and apply AI assist tools (e.g., GitHub Copilot, Claude, ChatGPT) to accelerate learning, generate infrastructure code, troubleshoot issues, and improve automation script quality
Foundational knowledge of machine learning concepts including model training, evaluation, and deployment with exposure to frameworks (PyTorch, TensorFlow, scikit-learn)
Basic understanding of the ML lifecycle and MLOps principles including experiment tracking, model versioning, and monitoring with willingness to learn tools like MLflow, Weights & Biases, or Kubeflow
Exposure to or willingness to learn about ML model serving, inference APIs, and supporting infrastructure for training and deployment pipelines
Interest in supporting LLM applications, agent-based frameworks, and ML workloads on cloud platforms or Kubernetes with eagerness to grow expertise through hands-on projects
Basic knowledge of cloud computing principles and familiarity with services within AWS, Azure, or GCP (compute, storage, networking, IAM)
Exposure to containerization with Docker and foundational understanding of container orchestration concepts (Kubernetes) with willingness to learn pod management, deployments, and services
Understanding of basic networking concepts including DNS, load balancing, and firewalls with awareness of RESTful API principles and microservice architecture patterns
Familiarity with monitoring and logging tools (CloudWatch, Prometheus, Grafana, ELK Stack) and willingness to learn observability practices
Awareness of cloud-native data pipeline concepts and ETL/ELT principles with exposure to services like AWS S3, Lambda, Glue, or equivalent Azure/GCP services
Basic knowledge of cloud-based data storage systems (S3, PostgreSQL, MongoDB) and understanding of differences between relational and NoSQL databases
Foundational understanding of distributed computing and streaming concepts with exposure to frameworks like Spark, Kafka, or Ray through coursework or personal projects
Knowledge of common data formats (JSON, CSV, Parquet, Avro) with basic understanding of schema design, data validation, and data quality considerations
Ability to collaborate effectively within DevOps, platform engineering, and cross-functional teams while actively seeking mentorship and learning opportunities
Developing communication skills to document infrastructure configurations, write clear runbooks, and articulate technical challenges through team discussions and written documentation
Enthusiastic participation in code reviews and infrastructure design discussions with openness to constructive feedback and eagerness to learn best practices
Demonstrated ability to incorporate feedback, learn from operational incidents, and continuously improve through peer collaboration, self-study, and hands-on experience
PhD -OR- MS/MA -OR- BS/BA and 2 years of relevant experience
U.S. Citizenship

Nice To Haves

Degree in computer science, software engineering, or related technical field.
Exposure to infrastructure automation, deployment pipelines, or cloud platform management through coursework, personal projects, labs, or internship experience.
Basic scripting or programming experience with Python, Bash, or similar languages demonstrated through academic projects or personal automation initiatives.
Experience with containerization (Docker) through personal projects, coursework, or labs with interest in learning Kubernetes.
Strong problem-solving abilities demonstrated through technical challenges, troubleshooting exercises, or course projects.
Active engagement in learning cloud technologies, automation, MLOps, or modern infrastructure practices (e.g., coursework, certifications, or technical projects)
Participation in relevant communities, online courses (Coursera, Udemy, A Cloud Guru), or technical forums demonstrating commitment to continuous learning.

Responsibilities

Develop components of agentic AI systems and LLM-based applications
Implement features using frameworks like LangChain, LlamaIndex, or similar tools
Build and maintain ML pipelines, data preprocessing workflows, and model deployment infrastructure
Create utilities and tools that support AI/ML development and operations
Work with multi-modal data including text, structured data, and sensor information
Build data pipelines for large-scale ETL, transformation, and analytics workflows
Implement streaming data processors and event-driven components
Develop microservices and APIs within distributed architectures handling high-throughput workloads
Deploy containerized applications using Docker and Kubernetes
Contribute to CI/CD pipelines and automated testing frameworks
Write clean, well-tested code following established best practices
Implement monitoring, logging, and observability for applications
Build developer tooling and documentation to support team productivity
Contribute to system performance optimization and debugging efforts
Support deployments in cloud and secure environments
Work on small tasks and project elements, progressing to independent ownership
Collaborate with cross-functional teams including data scientists, researchers, and senior engineers
Participate in code reviews, design discussions, and technical planning
Mentor junior staff and students when opportunities arise
Contribute technical content to proposals and project documentation
Present your work at team meetings and technical forums