Core Weave-posted 2 months ago
$165,000 - $242,000/Yr
Full-time • Mid Level
Hybrid • Sunnyvale, CA
Professional, Scientific, and Technical Services

CoreWeave is the AI Hyperscaler, delivering a cloud platform of cutting edge services powering the next wave of AI. Our technology provides enterprises and leading AI labs with the most performant, efficient and resilient solutions for accelerated computing. Since 2017, CoreWeave has operated a growing footprint of data centers covering every region of the US and across Europe. CoreWeave was ranked as one of the TIME100 most influential companies of 2024. As the leader in the industry, we thrive in an environment where adaptability and resilience are key. Our culture offers career-defining opportunities for those who excel amid change and challenge. If you're someone who thrives in a dynamic environment, enjoys solving complex problems, and is eager to make a significant impact, CoreWeave is the place for you. Join us, and be part of a team solving some of the most exciting challenges in the industry. CoreWeave powers the creation and delivery of the intelligence that drives innovation.

  • Champion reliability initiatives for Kubernetes application deployments: Advocate for best practices to ensure high availability, scalability, and resilience of applications in Kubernetes, focusing on robust testing, secure pipelines, and efficient resource use.
  • Administer multi-tenant Kubernetes platforms: Manage complex multi-tenant Kubernetes clusters, configuring access, quotas, and security for isolation and optimal resource allocation while upholding SLAs.
  • Perform lifecycle and day 2 operations on clusters: Execute Kubernetes cluster lifecycle, including provisioning, patching, monitoring, backup, disaster recovery, and troubleshooting.
  • Deep dive into reliability issues: Conduct in-depth analysis and root cause identification for complex reliability incidents in Kubernetes, utilizing advanced debugging and monitoring tools to propose preventative measures.
  • Perform on-call duties: Respond to critical alerts and incidents outside business hours, providing timely resolution to minimize disruptions, collaborating with teams, and communicating clearly.
  • Bachelor's in CS, Engineering, or related field, or equivalent experience preferred.
  • CKA or similar certifications is highly desired.
  • 2+ years administering multi-tenant SAAS Kubernetes (EKS, AKS, GKS).
  • Strong Gitops/Devops with Argocd or similar helm chart management.
  • Proven Docker and containerization experience.
  • Strong Linux OS experience.
  • Proficient in Go.
  • Excellent problem-solving, debugging, and analytical skills.
  • Strong communication and collaboration.
  • Master's degree in Computer Science, Engineering, or a related field.
  • Experience with performance profiling and optimization of distributed systems.
  • Knowledge of network protocols and distributed consensus algorithms.
  • Medical, dental, and vision insurance - 100% paid for by CoreWeave
  • Company-paid Life Insurance
  • Voluntary supplemental life insurance
  • Short and long-term disability insurance
  • Flexible Spending Account
  • Health Savings Account
  • Tuition Reimbursement
  • Ability to Participate in Employee Stock Purchase Program (ESPP)
  • Mental Wellness Benefits through Spring Health
  • Family-Forming support provided by Carrot
  • Paid Parental Leave
  • Flexible, full-service childcare support with Kinside
  • 401(k) with a generous employer match
  • Flexible PTO
  • Catered lunch each day in our office and data center locations
  • A casual work environment
  • A work culture focused on innovative disruption
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service