About the position
Ahrefs is seeking a Site Reliability Engineer with expertise in Linux and distributed systems to maintain its distributed crawler and ensure all systems are operational 24/7. The ideal candidate should have experience with bare-metal servers and be able to participate in daily on-call rotations. They should also possess a deep understanding of operating systems and network fundamentals, as well as the ability to investigate infrastructure issues on live production systems. The candidate should be able to develop internal automation, foresee potential problems, and prevent them from happening.
Responsibilities
- Understand the whole technology stack at all levels: from network and user-space code to OS internals and hardware
- Independently deal with and investigate infrastructure issues on live production systems including dealing with hardware problems and interact with datacenters
- Develop internal automation - monitoring, setup, statistics
- Have the ability to foresee potential problems and prevent them from happening. Apply first-aid reaction to infrastructure failures when necessary
- Help developers with deployment and integration
- Make well-reasoned technical choices and take responsibility for it
- Approach problems with a practical mindset and suppress perfectionism when time is a priority
- Setup automatic systems to control infrastructure
- Possess a healthy detestation for complex shell scripts
- Be ready to work in a small team and take responsible independent decisions
Requirements
- Deep understanding of operating systems and networks fundamentals
- Practical knowledge of Linux userspace and kernel internals
- Working experience with bare-metal servers
- Participation in on-call rotation (6 hours every weekday + one weekend per month)
- Work in (one of) US or SG timezone
- Understand the whole technology stack at all levels: from network and user-space code to OS internals and hardware
- Independently deal with and investigate infrastructure issues on live production systems including dealing with hardware problems and interact with datacenters
- Develop internal automation - monitoring, setup, statistics
- Have the ability to foresee potential problems and prevent them from happening. Apply first-aid reaction to infrastructure failures when necessary
- Help developers with deployment and integration
- Make well-reasoned technical choices and take responsibility for it
- Approach problems with a practical mindset and suppress perfectionism when time is a priority
- Setup automatic systems to control infrastructure
- Possess a healthy detestation for complex shell scripts
- Be ready to work in a small team and take responsible independent decisions
Benefits
- Competitive compensation package
- Informal and thriving work atmosphere
- Above-average perks and fringe benefits
- First-class workplace (hardware, software, etc) in the modern office (for office-based employees)
- Hardware allowance (for remote employees)