Senior Site Reliability Engineer

HiveSan Francisco, CA
9d$160,000 - $250,000

About The Position

About Hive Hive has raised over $120M in capital from leading investors, including General Catalyst, 8VC, Glynn Capital, Bain & Company, Visa Ventures, and others. We have over 250 employees globally in our San Francisco, Seattle, and Delhi offices. Please reach out if you are interested in joining the future of AI! DevOps and Systems Team Our unique machine learning needs led us to open our own data centers, with an emphasis on distributed high performance computing integrating GPUs. Even with these data centers, we maintain a hybrid infrastructure with public clouds when the right fit. As we continue to commercialize our machine learning models, we also need to grow our DevOps and Site Reliability team to maintain the reliability of our enterprise SaaS offering for our customers. Our ideal candidate is someone who is able to thrive in an unstructured environment and takes automation seriously. You believe there is no task that can’t be automated and no server scale too large. You take pride in optimizing performance at scale in every part of the stack and never manually performing the same task twice.

Requirements

  • Minimum 3 - 5 years of previous experience in development, operations, IT, or a related field
  • Comfortable working on Linux infrastructures (Debian) via the CLI
  • Able to learn quickly in a fast-paced environment
  • Able to debug, optimize, and automate routine tasks
  • Able to multitask, prioritize, and manage time efficiently independently
  • Able to physically lift equipment at least 30 pounds
  • Can communicate effectively across teams and management levels

Nice To Haves

  • Degree in computer science, or similar, is an added plus!

Responsibilities

  • Automate manual operational processes
  • Improve workflows of developer, data, and machine learning teams
  • Manage secure integration and deployment tooling
  • Create, maintain, monitor, and audit secure infrastructure
  • Manage a diverse array of technology platforms, following best practices and procedures
  • Participate in on-call rotation and root cause analysis
  • Maintain awareness of industry best practices for data maintenance handling as it relates to your role
  • Adhere to policies, guidelines and procedures pertaining to the protection of information assets
  • Report actual or suspected security and/or policy violations/breaches to an appropriate authority
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service