Senior Site Reliability Engineer

NinjaOne

79d•$160,000 - $240,000•Hybrid

About The Position

At NinjaOne we are passionate about building unified IT solutions that simplify the way IT organizations work. We are currently looking for a Senior Site Reliability Engineer to join our SRE team in the Platform Engineering organization and help us scale our products to millions of end-users. We are looking for individuals with a passion for automation and observability, ensuring the quality and availability of our services. Location - We are flexible on remote working from home, if you are located in the USA and reside in one of the following states - CA, CO, CT, FL, GA, IL, KS, MA, MD, ME, NJ, NC, NY, OR, TN, TX, VA, and WA. We have physical offices in Austin, TX and Tampa, FL, if you prefer a hybrid option. We hire the best software engineers, but experience in our stack can’t hurt: NinjaOne is built on Java, Kotlin, C++, Golang and Postgres; supporting millions of user endpoints and running as a scalable cloud service in AWS. Knowing large-scale datastore bottlenecks, asynchronous application design and client-server architecture will help you.

Requirements

10+ years’ experience in DevOps and/or Site Reliability Engineering roles
3+ years' experience with an object-oriented language (preferably Java, .NET or C++)
Intermediate+ level Linux administration, scripting, and troubleshooting
Demonstrable knowledge of Observability tools (New Relic, Splunk, DataDog)
Comprehensive experience with AWS (Amazon Web Services) and its core capabilities(VPC, EC2, ECS, Route53, Fargate, ALB/NLB distributions, etc)
Experience with cloud automation and infrastructure-as-code (IaC) toolsets, primarily CloudFormation but also including Terraform, Helm and Ansible. CDK a plus.
Good understanding of containers, Fargate, Kubernetes, and overall distributed microservice architectures
Passionate about automation, security, and self-service environments/portals
Hands-on experience with CI/CD and SDLC (Software Development Life Cycle) processes
Effective communication skills, both verbal and written.

Responsibilities

Diagnose and resolve complex application and infrastructure issues
Participate in our 24x7 on-call rotation, SCRUM, and deployment planning
Perform Root Cause Analysis (RCA) and provide recommendations for application teams
Improve availability and reduce customer impact using Industry best observability tools
Ensure best-practice and security-minded architecture by influencing design decisions
Create and maintain technical documentation and SOP’s
Develop software, scripts, or tooling to improve efficiency and reduce delivery time of applications and infrastructure.
Other duties as needed

Benefits

We are a collaborative, kind, and curious community.
We honor your flexibility needs with full-time work that is hybrid remote.
We have you covered with our comprehensive benefits package, which includes medical, dental, and vision insurance.
We help you prepare for your financial future with our 401(k) plan.
We prioritize your work-life balance with our unlimited PTO.
We reward your work with opportunity for growth and advancement.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume