Sequen - Staff Software Engineer – Infrastructure

Fabric•Los Angeles, CA

66d•$250,000 - $350,000

About The Position

Sequen AI is leading the charge for building frontier ranking models for search and recommendations. Sequen AI's technology specializes in designing end-user behavior for large consumer enterprises. We are currently looking for talented and experienced Infrastructure Engineers to join our team and support the development, scaling, and maintenance of our cutting-edge AI systems. Core Infrastructure: The systems team is responsible for supporting clusters used to train, research, and ultimately serve AI models. Your work will be crucial in ensuring Sequen is able to continue to reliably train and serve frontier ranking models. Observability: We build and maintain the infrastructure that monitors the health, performance, and efficiency of our AI systems. You'll work across teams to implement monitoring solutions using tools like Prometheus,, and Datadog, while developing automated approaches for dashboards and alerts. Your work will create reliable, low-maintenance systems that enable proactive monitoring and operational excellence.

Requirements

Have 10+ years of relevant industry experience, 3+ years leading large scale, complex projects or teams as an engineer or tech lead
Possess deep knowledge of modern cloud infrastructure including Kubernetes, Infrastructure as Code, AWS, and GCP
Are obsessed with distributed systems at scale, infrastructure reliability, scalability, security, and continuous improvement
Strong proficiency in at least one programming language (e.g., Python, Go, Java)
Strong problem-solving skills and ability to work independently
Have a passion for supporting internal partners like research to understand their needs
Have excellent communication skills to build consensus with stakeholders, both internally and externally

Nice To Haves

Security and privacy best practice expertise
Hands-on experience with data pipelines and processing large-scale datasets
Experience with machine learning infrastructure like GPUs,
Technical expertise: Quickly understanding systems design tradeoffs, keeping track of rapidly evolving software systems

Responsibilities

Consult with different stakeholders to deeply understand infrastructure, data and compute needs, identifying potential solutions to support frontier research and product development
Set technical strategy and oversee development of high scale, reliable infrastructure systems.
Design processes (e.g. postmortem review, incident response, on-call rotations) that help the team operate effectively and never fail the same way twice