Principal Site Reliability Engineer

Blue River Technology
92d$166,000 - $293,000

About The Position

We are looking for a Principal Site Reliability Engineer to join the CVML Platform team at Blue River Technology. You will work to create a hybrid infrastructure, integrating edge devices, on-premises, and cloud resources to a cohesive CVML & Robotics foundation. You will work on cost effectiveness, transparency, and security aspects of the platform, focusing on speed and quality of solutions and services provided. You will work with both your peers and stakeholders from other teams to achieve alignment on the platform's vision and technologies. You must show initiative and the ability to organize your work schedule, and be comfortable with supporting the application needs of multiple teams, systems, and products.

Requirements

  • 8+ years of experience building infrastructure with K8S, AWS, and bare metal.
  • 8+ years of experience working with Python and Go (with production experience).
  • 8+ years of experience working with infra automation tools: Terraform / Terragrunt (or Pulumi / CDK).
  • 8+ years of experience with Linux-based systems and networks, and a deep understanding of internal components, networking, and security aspects.
  • Has a track record of building and maintaining scalable systems in production environments.
  • Experience in building CI/CD pipelines using GitHub Actions (or GitLab / Jenkins) for application release and deployment.
  • Experience in using AWS ECS, EKS, IAM, EC2, and RDS at production scale.
  • Deep understanding of Kubernetes and its internals (kubelet, CRDs, etc) and experience with building and extending clusters from scratch.
  • Strong problem-solving skills and ability to troubleshoot complex infrastructure and networking issues.
  • Excellent communication skills to collaborate effectively with technical and non-technical stakeholders.
  • Attention to detail and commitment to producing high-quality, well-documented code.

Nice To Haves

  • Experience with standard SQL, NoSQL, and MPP databases.
  • Experience with writing production Kubernetes operators.
  • Airflow, Kubeflow, or other orchestration system experience.
  • Can understand some C++ and/or Rust, or talk with people who do.
  • Prior experience in the autonomy and robotics space is a huge plus.

Responsibilities

  • Architect and implement various cloud and on-premise applications, systems, and infrastructure.
  • Integrate extremely diverse systems, configure stable integration, uptime, and monitoring.
  • Work with edge devices of various formats and integrate them with on-prem and cloud workflows, including networking, low-level OS, and electrical/control integration.
  • Optimize the performance and throughput of the system at the filesystem, networking, and software levels.
  • Optimize cost, operational stability, and supportability of highly diverse platforms and tech stack.
  • Collaborate with cross-functional teams to design, develop, and maintain robust, scalable, and user-friendly web and mobile data-intensive applications.
  • Build tools that enable users to easily move between different applications and platforms to utilize the strengths of each in a coherent ecosystem.
  • Work closely with cross-functional teams, including data scientists, analysts, software engineers, and product managers, to understand data requirements and deliver data solutions that align with business goals.
  • Create and maintain technical documentation, including data flow diagrams, architecture designs, and standard operating procedures.
  • Stay up-to-date with industry trends and emerging technologies related to data engineering, recommending and implementing new tools and frameworks as appropriate.

Benefits

  • Visa sponsorship is available for this position on a case-by-case basis.
  • The US annual base salary range for this position is $166,000 - $293,000, along with eligibility for Blue River’s bonus and benefit programs.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service