Responsibilities Design resilient and scalable solutions for high availability and disaster recovery Zealously pursue cloud cost optimization, both for platform infrastructure and lab cloud fabrics Work with the data and billing teams to ensure financial transparency (costs properly allocated to labs) and to identify/prevent fraudulent charges Build a defense in depth around our cloud platform and lab fabrics to ensure all possible security measures are in place for securing our cloud resources Implement monitoring for availability, performance and cost Automate routine tasks (example: orphan object cleanups) and drive the creation of infrastructure as code (IAC) Troubleshoot and resolve technical issues related to cloud infrastructure Communicate internally the status of issues, outages challenges and risks Act as the cloud team representative during incidents Guide and influence decisions with proven data analysis and metrics to ensure clear understanding across all levels of the organization Support and promote the company values through positive interaction with both internal and external partners and customers on a regular basis Other strategic business initiatives or cross-functional project involvement as required Qualifications Bachelor's degree in Computer Science, Engineering, Mathematics, Software Engineering, or related field preferred, but not required 5+ years of professional experience supporting an always on global SaaS application, and architecting software solutions using native Azure functionality 2+ years of experience in SRE and DevOps methodologies Approaches problems with an automation-first mindset and prefers spending the day writing PowerShell versus clicking buttons in the Azure Portal Demonstrated history of working collaboratively and across organizational boundaries Demonstrated history of working with senior leaders from other teams and with customer technical stakeholders Deep technical knowledge of scripting and automation as applies to public cloud infrastructure deployment and management Rockstar troubleshooting abilities, especially as it relates to fixing something you didn’t create and for which documentation doesn’t exist Experience with programming, Docker/Kubernetes/container management, Git/Github, a cloud provider platform such as Azure, AWS, Google and scripting languages such as PowerShell, Bash, or Python Experience creating infrastructure as code, both greenfield and brownfield Assist in writing documentation that includes the necessary information required for publishing to internal or external audiences Ability to multi-task, stay organized, exercise strong time management and prioritize work on a daily and weekly basis Ability to present and convey material both formally and informally to all levels of the organization Ability to work with others to take sets of abstract requirements and define methods of achieving those requirements using the released capabilities of a platform Ability to objectively analyze how a platform is used and make recommendations on how to improve the platform through operational and development requests Excellent written, oral, listening and communication skills Interest and ability in mentoring other team members as applicable Strong MS Office, web conferencing and internal communication software experience Familiar with SCRUM and Agile processes
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
101-250 employees