Maintenance Engineer

Luminai

11h•Hybrid

About The Position

As a Maintenance Engineer at Luminai, you will ensure the reliability, performance, and resilience of the systems that power mission-critical AI workflows. You’ll operate at the core of our production infrastructure — maintaining, monitoring, and continuously improving the systems that healthcare, finance, and telecommunications organizations depend on every day. This is a highly ownership-driven role for someone who thrives on operational excellence, proactively prevents issues before they arise, and takes pride in keeping complex systems running smoothly in high-stakes environments. You’ll work closely with Engineering, Product, and Forward Deployed teams to ensure our deployments are stable, secure, and scalable. This is a hybrid position. Our team is in-office 3 days a week (Mon, Tue, Thu) in San Mateo, California.

Requirements

3+ years of experience in support engineering, site reliability engineering, or infrastructure maintenance
Strong proficiency in Python or scripting languages
Experience managing cloud infrastructure (AWS, GCP, or Azure)
Strong problem-solving skills and a proactive, preventative mindset
Clear communication skills and ability to collaborate across engineering and customer-facing teams
High ownership and accountability in high-reliability environments

Responsibilities

Monitor, maintain, and improve the reliability of production AI systems and workflow infrastructure
Proactively identify, diagnose, and resolve system issues across application, integration, and cloud infrastructure layers
Own incident response processes, including root cause analysis and long-term remediation
Implement monitoring, alerting, and observability tooling to ensure system health and uptime
Collaborate with Engineering to harden deployments and improve system architecture for resilience and scalability
Support customer-facing teams by troubleshooting and resolving technical issues in live environments
Document system configurations, operational procedures, and recovery protocols
Continuously improve reliability standards, deployment practices, and operational safeguards

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume