Senior Site Reliability Engineer

Hive•San Francisco, CA

61d•$160,000 - $250,000

About The Position

About Hive Hive has raised over $120M in capital from leading investors, including General Catalyst, 8VC, Glynn Capital, Bain & Company, Visa Ventures, and others. We have over 250 employees globally in our San Francisco, Seattle, and Delhi offices. Please reach out if you are interested in joining the future of AI! DevOps and Systems Team Our unique machine learning needs led us to open our own data centers, with an emphasis on distributed high performance computing integrating GPUs. Even with these data centers, we maintain a hybrid infrastructure with public clouds when the right fit. As we continue to commercialize our machine learning models, we also need to grow our DevOps and Site Reliability team to maintain the reliability of our enterprise SaaS offering for our customers. Our ideal candidate is someone who is able to thrive in an unstructured environment and takes automation seriously. You believe there is no task that can’t be automated and no server scale too large. You take pride in optimizing performance at scale in every part of the stack and never manually performing the same task twice.

Requirements

Minimum 3 - 5 years of previous experience in development, operations, IT, or a related field
Comfortable working on Linux infrastructures (Debian) via the CLI
Able to learn quickly in a fast-paced environment
Able to debug, optimize, and automate routine tasks
Able to multitask, prioritize, and manage time efficiently independently
Able to physically lift equipment at least 30 pounds
Can communicate effectively across teams and management levels

Nice To Haves

Degree in computer science, or similar, is an added plus!

Responsibilities

Automate manual operational processes
Improve workflows of developer, data, and machine learning teams
Manage secure integration and deployment tooling
Create, maintain, monitor, and audit secure infrastructure
Manage a diverse array of technology platforms, following best practices and procedures
Participate in on-call rotation and root cause analysis
Maintain awareness of industry best practices for data maintenance handling as it relates to your role
Adhere to policies, guidelines and procedures pertaining to the protection of information assets
Report actual or suspected security and/or policy violations/breaches to an appropriate authority

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume