Cloud Operations Engineer

Lumin Digital
7d$110,000 - $125,000

About The Position

The Operations Center has two main focuses: removing toil and enhancing platform visibility. The Cloud Operations Engineer role is responsible for monitoring cloud infrastructure using observability tools, performing Tier 1 incident triage, and ensuring timely resolution or escalation of production issues. Supports CI/CD pipelines, mobile application releases, and SSL/TLS certificate lifecycle management while maintaining accurate documentation and clear cross-functional communication. Reporting to the Operations Center Manager, the qualified candidate will possess exceptional communication, cross functional skills, and a solid understanding of cloud infrastructure. At Lumin, we thrive on curiosity and innovation. Our culture fosters trust - in our expertise and decisions, respect - for diverse perspectives and talents, and boldness - in pursuing innovative paths. These values guide us, shaping a workplace where collaboration thrives, ideas flourish, and new possibilities are discovered. Focused on continuous improvement and innovation, we encourage our team to explore, experiment, and put new ideas into action, challenging the usual way of doing things.

Requirements

  • Cultural fit. Humility. Strong sense of ownership, and integrity. Willing to walk in the mud.
  • Commitment to continually improving yourself.
  • Detail-oriented.
  • Exceptional written and verbal communication skills.
  • Effective collaboration skills with a proven ability to work cross-functionally in order to establish and meet shared business goals.
  • Experience with a monitoring platform (Cloudwatch, Grafana, etc.)
  • Familiarity with automation/orchestration tools
  • Familiarity with Atlassian suite or similar tools
  • Experience with AWS preferred
  • Bachelor's degree or 3 years equivalent experience

Nice To Haves

  • Certifications are nice, but not required

Responsibilities

  • Monitor cloud infrastructure and application health using observability tools; respond to alerts and ensure timely triage and resolution of production issues.
  • Perform Tier 1 incident triage, document findings, and escalate appropriately to Development or SRE teams while maintaining clear communication.
  • Monitor and support CI/CD pipelines to ensure successful builds and deployments; troubleshoot and coordinate resolution of pipeline failures.
  • Support/coordinate mobile application release processes.
  • Manage SSL/TLS certificate lifecycle activities, including renewals and proactive expiration monitoring.
  • Proactively identify patterns in incidents or alerts and implement improvements that reduce operational toil and increase platform stability.
  • Contribute to automation and orchestration efforts that improve efficiency, reliability, and repeatability of operational processes.
  • Maintain accurate documentation, runbooks, and standard operating procedures to improve operational consistency and knowledge sharing.
  • Collaborate cross-functionally with SRE, Development, Security, Product, and Support to ensure platform health, visibility, and alignment to shared business goals.
  • Perform other duties as assigned.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service