The Staff Software Engineer - DevOps is responsible for all stages of the software development lifecycle using a variety of technologies and tools to build impactful software solutions. The scope of this job includes building and optimizing comprehensive solutions that prioritize end-user efficiency and experience. Key Responsibilities: Lead the design and architecture of major systems and services, and ensure software solutions are scalable, reliable, maintainable, and aligned with business needs. Collaborate with solution managers, engineers, data scientists, and other stakeholders to define and prioritize technical requirements that meet client needs and business objectives. Collaborate with teams to ensure sustained quality and reliability of our software solutions, and act as a go-to expert by identifying and resolving complex, high-priority issues in both development and production environments. Actively contribute to code reviews, provide constructive feedback on design and implementation, and provide technical guidance to other engineers to elevate skills, productivity, and overall effectiveness. Drive innovation by evaluating and implementing new technologies, methodologies, and AI capabilities that improve team efficiency, software performance, and development processes. Ensure code meets functional and performance requirements, advocate for high-quality software, and ensure rigorous testing processes, including automated unit tests, integration tests, and other testing frameworks. Leverage AI tools and platforms as an integral part of daily responsibilities to enhance decision-making, streamline workflows, and drive data-informed outcomes. Perform other job duties as assigned. Ensure the reliability, availability, and performance of our systems and services. Work closely with various teams to build and maintain scalable, efficient, and resilient infrastructure. Incident management; lead the response to system outages and incidents, ensuring quick resolution and minimal impact on end-users. Conduct post-incident reviews and implement improvements to prevent recurrence. Monitoring and Alerting; design, implement, and maintain monitoring and alerting systems using tools like New Relic, Grafana, and ELK stack to ensure system health and performance. Perform other job duties as assigned.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level