Senior Software Engineer, ML Inference

Cognitiv•Bellevue, WA

6d•$160,000 - $210,000•Hybrid

About The Position

We are looking for a Senior Software Engineer focused on ML inference to help build and scale the systems that power Cognitiv’s ML-driven products. In this role, you’ll work on performance-critical inference systems that enable real-time decision-making at scale. You’ll collaborate closely with ML Researchers, Product, and other Engineers to design, implement, and optimize production ML services used by some of the world’s biggest brands. This is a hands-on engineering role with meaningful technical ownership and room to grow in scope and influence. Location: Hybrid - MTW out of our Bellevue WA office.

Requirements

Experienced ML Engineer: ~4+ years working with ML systems in production, including hands-on experience with PyTorch or LibTorch.
Strong Systems Engineer: ~4+ years of professional C++ experience with attention to performance and memory efficiency.
Inference-Focused: Experience optimizing models and inference pipelines for real-world constraints like latency and scale.
Collaborative Communicator: Comfortable explaining technical tradeoffs and working closely with cross-functional partners.
Ownership-Driven: Able to take responsibility for the services you build and improve them over time.
Technically Educated: Bachelor’s degree or higher in Computer Science, Engineering, Math, Physics, or a related field.

Nice To Haves

Experience with GPU or hardware-accelerated inference (e.g., NVIDIA TensorRT)
Experience with Docker and Kubernetes
Familiarity with Infrastructure-as-Code tools (Terraform, Ansible)
Exposure to advanced ML architectures (e.g., two-tower models, teacher-student learning)
Experience with Rust
Familiarity with MLOps tooling (monitoring, lifecycle management, automation)
Experience using AI-assisted development tools

Responsibilities

Build and optimize ML inference systems used in production, leveraging both industry-standard frameworks and in-house technology.
Implement performance-critical components in C++ and PyTorch/LibTorch with a focus on latency, throughput, and reliability.
Collaborate cross-functionally with ML Research, Product, and Engineering partners to bring models from experimentation into production.
Improve existing systems by identifying performance bottlenecks, reliability gaps, and scalability issues.
Contribute to design discussions and technical reviews for inference-related services.
Write high-quality, production-ready code with strong testing, monitoring, and documentation.
Support the full development lifecycle of services you work on, from design through deployment and iteration.
Mentor and support teammates through code reviews and knowledge sharing.