Senior Site Reliability Engineer

CaterpillarChicago, IL
1d

About The Position

When you join Caterpillar, you're joining a global team who cares not just about the work we do – but also about each other. We are the makers, problem solvers, and future world builders who are creating stronger, more sustainable communities. We don't just talk about progress and innovation here – we make it happen, with our customers, where we work and live. Together, we are building a better world, so we can all enjoy living in it. Job Summary: As a Site Reliability Engineer, you will be responsible for ensuring the reliability, availability, and performance of our D365 ERP systems, connectivity, and infrastructure. You will collaborate with cross-functional teams to develop and implement strategies to improve system stability, automate repetitive tasks, and enhance service delivery and performance. If you have a passion for delivering reliable, high-performance services and thrive in a fast-paced environment, we'd love to hear from you. Apply now to join our team as a Site Reliability Engineer!

Requirements

  • Effective Communications: Strong understanding of communication concepts, tools and techniques; ability to effectively transmit, receive, and accurately interpret ideas, information, and needs through the application of appropriate communication behaviors.
  • Technical Troubleshooting: Extensive knowledge of technical troubleshooting approaches, tools and techniques; ability to anticipate, recognize, and resolve technical issues on hardware, software, application or operation.
  • Performance Measurement and Tuning: Knowledge of system performance, testing and programming; ability to monitor, measure, and optimize system performance and network communication.
  • Software Release Management: Knowledge of strategies, practices and tools for managing versions and distribution of software products and enhancements; ability to evaluate and improve release management practices and tools.
  • Software Reliability Management: Knowledge of software reliability management; ability to develop and use principles, methodologies and metrics that increase software product performance and reliability.
  • Bachelor's degree in Computer Science, Information Technology, a related field, or equivalent experience.
  • 6+ years of experience in site reliability engineering, DevOps, QA, or a related field.
  • Strong troubleshooting and critical thinking skills
  • 6+ years of experience and proficiency in one or more programming languages, such as Python (preferred), Javascript (preferred).
  • Solid understanding of networking, load balancing, on prem hosting solutions, and web application architectures.
  • Excellent problem-solving skills and a strong attention to detail.
  • Strong IT and Business communication skills and ability to collaborate effectively with cross-functional teams.

Nice To Haves

  • Strong experience with Microsoft D365 or general Azure based services
  • Experience with AWS infrastructure and services
  • Experience with IaC solutions like Cloudformation and Terraform
  • Experience with CI/CD solutions - Github, Azure DevOps
  • Experience with containerization technologies, such as Docker and Kubernetes.

Responsibilities

  • Monitor and troubleshoot production and QA systems to identify and resolve performance, scalability, and reliability issues proactively.
  • Participate in the on-call rotation to provide 24/7 critical incident support for eCommerce platform systems
  • Design, implement, and maintain automated processes and tools to streamline deployment and release processes.
  • Collaborate with cross-functional teams to define, document, and implement operational processes, best practices, and procedures.
  • Implement and maintain system monitoring tools and dashboards to provide real-time insights into system performance and identify potential issues.
  • Work closely with developers to identify and fix bugs and performance bottlenecks in the application code.
  • Ensure that systems and infrastructure comply with security, compliance, and regulatory requirements.
  • Continuously evaluate systems and processes to identify areas for improvement and implement changes as needed.

Benefits

  • Medical, dental, and vision benefits
  • Paid time off plan (Vacation, Holidays, Volunteer, etc.)
  • 401(k) savings plans
  • Health Savings Account (HSA)
  • Flexible Spending Accounts (FSAs)
  • Health Lifestyle Programs
  • Employee Assistance Program
  • Voluntary Benefits and Employee Discounts
  • Career Development
  • Incentive bonus
  • Disability benefits
  • Life Insurance
  • Parental leave
  • Adoption benefits
  • Tuition Reimbursement
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service