CoreWeave-posted about 23 hours ago
Full-time • Senior
Hybrid • Livingston, NJ
501-1,000 employees

CoreWeave is The Essential Cloud for AI™. Built for pioneers by pioneers, CoreWeave delivers a platform of technology, tools, and teams that enables innovators to build and scale AI with confidence. Trusted by leading AI labs, startups, and global enterprises, CoreWeave combines superior infrastructure performance with deep technical expertise to accelerate breakthroughs and turn compute into capability. Founded in 2017, CoreWeave became a publicly traded company (Nasdaq: CRWV) in March 2025. Learn more at www.coreweave.com . About the Role: SysEng HAVOCK ( H ardware - A cceleration - V irtualization - O perating Systems - C ontainerization - K ernel) CoreWeave is looking for a Senior Systems Engineer who is ready to evolve beyond traditional DevOps. You will start by stabilizing and scaling our Linux OS and Kernel build pipelines. Once the foundation is set, you will lead the transition to AI-native infrastructure , building "smart" workflows that don't just report errors, but understand and fix them. You are a Systems Engineer at heart, but you are ready to apply LLMs, RAG, and predictive modeling to solve infrastructure challenges at scale. Our Team’s Stack: Languages: Python, Go, bash/sh Observability: Prometheus, Victoria Metrics, Grafana OS & Kernel: Linux Kernel (custom build), Ubuntu Hardware: Intel/AMD/ARM CPUs, Nvidia GPUs, DPUs, Infiniband and Ethernet NICs Containerization: Docker, Kubernetes (k8s), KubeVirt, containerd, kubelet

  • Pipeline Architecture: Design, maintain, and automate reproducible OS image build pipelines for our massive fleet of GPU-accelerated servers.
  • Kernel Distribution: Collaborate with kernel engineers to package, validate, and distribute custom Linux builds across Intel, AMD, and ARM architectures.
  • Dependency Management: Build tooling to manage dependencies, versioning, and release workflows, ensuring hermetic builds.
  • Telemetry & Metrics: Standardize the collection of build metrics to create a baseline for future AI modeling.
  • "Smart" CI/CD & Auto-Remediation: Architect AI agents that ingest and analyze build logs in real-time. Develop systems that auto-triage errors, categorize failure patterns, and generate context-aware fix suggestions for engineering teams.
  • Predictive Regression Modeling: Design ML workflows that utilize historical performance data to detect kernel and OS regressions (latency, throughput, stability) in staging environments before they impact production.
  • Dynamic Kernel Tuning: Implement closed-loop feedback systems that analyze real-time system metrics and automatically suggest or apply sysctl parameter optimizations for specific customer workloads.
  • Next-Gen ChatOps: Engineer LLM-driven interfaces for Slack/internal tools, enabling stakeholders to query build statuses, request log summaries, or provision resources using natural language commands.
  • 4+ years of professional experience in Linux Systems Engineering, Release Engineering, or DevOps.
  • Deep knowledge of Linux internals (boot process, kernel modules, networking stack).
  • Experience with package management (Debian/Ubuntu) and build systems.
  • Strong proficiency in Python (essential for the AI integration aspects of this role).
  • Demonstrable experience integrating API-based AI models (OpenAI, Anthropic, or local open-source models) into software workflows.
  • Understanding of RAG (Retrieval-Augmented Generation) architectures for querying technical documentation or logs.
  • Experience building event-driven automation (e.g., using webhooks to trigger analysis agents).
  • Familiarity with data structures required for vector search or time-series analysis.
  • Experience with Kubeflow or MLFlow .
  • Background in High-Performance Computing (HPC).
  • Experience fine-tuning small language models (SLMs) for code or log analysis tasks.
  • Medical, dental, and vision insurance - 100% paid for by CoreWeave
  • Company-paid Life Insurance
  • Voluntary supplemental life insurance
  • Short and long-term disability insurance
  • Flexible Spending Account
  • Health Savings Account
  • Tuition Reimbursement
  • Ability to Participate in Employee Stock Purchase Program (ESPP)
  • Mental Wellness Benefits through Spring Health
  • Family-Forming support provided by Carrot
  • Paid Parental Leave
  • Flexible, full-service childcare support with Kinside
  • 401(k) with a generous employer match
  • Flexible PTO
  • Catered lunch each day in our office and data center locations
  • A casual work environment
  • A work culture focused on innovative disruption
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service