Site Reliability Engineer, Associate - Data, Cloud & Developer Experience

Blackstone•New York, NY

About The Position

Blackstone is the world’s largest alternative asset manager. We seek to create positive economic impact and long-term value for our investors, the companies we invest in, and the communities in which we work. We do this by using extraordinary people and flexible capital to help companies solve problems. Our $1.1 trillion in assets under management include investment vehicles focused on private equity, real estate, public debt and equity, infrastructure, life sciences, growth equity, opportunistic, non-investment grade credit, real assets and secondary funds, all on a global basis. Further information is available at www.blackstone.com. Follow @blackstone on LinkedIn, X, and Instagram. Role: Blackstone's Site Reliability Engineering team is responsible for improving the reliability of systems and services to meet the needs of the business. This is achieved through collaboration with the development and engineering teams to leverage SRE practices and principles. You'll have the opportunity to identify and solve new problems as they arise, deploy and maintain observability systems and pipelines, enhance operations and support for services and platforms, and pursue emerging opportunities for efficiency and business value. This position involves the selection, implementation, and maintenance of key observability tooling. It requires ongoing evaluation of the firms needs in observability, monitoring, alerting, resilience, and recovery. We work alongside service owners on design, implementation, and management of services for continuous improvement. We achieve the requisite reliability of services using clear definitions and measurable targets. We plan for and practice recovery from disaster scenarios and respond in real time to incidents. We guide the postmortem process in order to mitigate risks, prevent future disruptions, and improve the on-call experience. We aim to eliminate manual work, improve operational efficiency, and ensure high-quality outputs in all that we do.

Requirements

2 + years of professional experience with either, Infrastructure Engineering, Software Engineering, DevOps Engineering or Platform Engineering.
Automation script writing skills; effectively reads and troubleshoots code (Python, C#, Typescript, etc.)
Makes effective use of coding assistants and chat models (Anthropic, OpenAI)
Proficiency with public cloud providers (strong AWS experience required, preferred Azure experience)
Configuration-as-code, infrastructure management, and CI/CD tooling experience (Terraform, Puppet, Gitlab CI)
Hand-on experience with Docker and container schedulers including AWS ECS & EKS
Excellent troubleshooting skills for Linux and Windows, and networking experience with observability tools (Grafana, Prometheus, Splunk, etc.)
Comfortable under pressure with incident management and collaborating during postmortems
Excellent communication and organizational skills
Curiosity and motivation to improve systems and processes through a sense of shared ownership

Responsibilities

Assist technical leadership in the understanding and adoption of SRE methodologies across the firm
Incorporate observability standards into code and deployment pipelines
Help evolve the SRE standards that are adopted across all teams
Partner with colleagues in various roles and reporting lines to improve service reliability and operational efficiency
Assist developers and engineers directly and through AI assistants
Implement instrumentation and provide comprehensive performance insights to service owners
Ensure monitoring and alerting reflects the reliability of services for users and enables effective on-call operations
Implement strategic observability tools and work to control overhead in maintenance and cost
Participate in on-call rotations and respond to system incidents to ensure service availability and minimize operational impact
Use automation to manage, maintain, and scale SRE systems with minimal human intervention
Foster a blameless team culture while assisting in postmortem discussions and reporting

Benefits

comprehensive health benefits, including but not limited to medical, dental, vision, and FSA benefits
paid time off
life insurance
401(k) plan
discretionary bonuses

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume