Lightmatter-posted 3 months ago
$160,000 - $200,000/Yr
Full-time • Senior
Boston, MA
251-500 employees
Professional, Scientific, and Technical Services

Lightmatter is leading the revolution in AI data center infrastructure, enabling the next giant leaps in human progress. The company invented the world's first 3D-stacked photonics engine, Passage, capable of connecting thousands to millions of processors at the speed of light in extreme-scale data centers for the most advanced AI and HPC workloads. In this role, you will lead the development of a comprehensive High Temperature Operating Life (HTOL) Test Software system. Your work will involve designing, implementing, and maintaining a scalable multi-chassis testing platform that performs automated stress and performance testing with real-time monitoring and comprehensive data collection capabilities.

  • Architect, build, and maintain scalable architecture for a multi-chassis HTOL testing system.
  • Develop containerized applications for deployment at scale using Python-based services for chassis coordination and management.
  • Create hardware abstraction layers and develop APIs that represent hardware systems, providing essential capabilities for monitoring and management of those systems.
  • Develop data collection pipelines handling sensor data and performance metrics.
  • Create automated deployment and testing pipelines using CI/CD best practices.
  • Work closely with the frontend team to ensure seamless integration of backend APIs with applications.
  • Write automated tests to monitor the reliability and performance of the system; maintain clear and concise documentation for troubleshooting.
  • Continuously monitor and optimize performance to reduce response times and improve system scalability; ensure uptime in production environments; establish capacity planning procedures.
  • BS and 12+ years of experience or MS and 8+ years of experience; degree in Computer Science, Electrical Engineering, or related field.
  • Expert level Python, knowledge of web frameworks such as FastAPI, Flask, Django; strong understanding of API design principles and best practices.
  • Experience with containerization and orchestration technologies such as Docker and Docker Compose.
  • Experience with one or more databases such as MongoDB, PostgreSQL, Redis, time-series databases.
  • Familiarity with testing frameworks such as pytest and integration testing, performance testing tools.
  • Experience with CI/CD tools such as GitHub Actions/Runners and Infrastructure as Code tools such as Ansible.
  • Experience with hardware integration or embedded systems; interfacing with BMCs, FPGAs, temperature sensors, thermal management, power management systems.
  • Familiarity with real-time data handling and communication protocols, such as gRPC, TCP/IP, WebSockets, message brokers or similar technologies.
  • Experience with high-availability, mission-critical systems.
  • Experience in the Semiconductor Industry: wafer-level testing, burn-in systems, reliability testing.
  • Professional Certifications: Agile/Scrum certifications.
  • Experience building backend services for web applications like Next.js, proficiency in JavaScript/TypeScript.
  • Comprehensive Health Care Plan (Medical, Dental & Vision)
  • Retirement Savings Matching Program
  • Life Insurance (Basic, Voluntary & AD&D)
  • Generous Time Off (Vacation, Sick & Public Holidays)
  • Paid Family Leave
  • Short Term & Long Term Disability
  • Training & Development
  • Commuter Benefits
  • Flexible, hybrid workplace model
  • Equity grants (applicable to full-time employees)
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service