Backend Software Engineer: Distributed Data

Archetype AI•San Mateo, CA

11h

About The Position

Archetype AI is building Newton, our multi-sensor data fusion AI platform. We are seeking a Staff Software Engineer to own data processing and analysis across our edge devices and platform, from raw sensor ingestion through prepared, validated datasets ready for AI workflows. This role sits at the intersection of data engineering and device software. You will build high-performance data pipelines in C++ that run on small Linux devices, and apply rigorous analytical techniques using Python to explore, validate, and understand sensor and video data. You will work hands-on with real-world customer data, exploring, cleaning, transforming, and validating it, while also building the device software that makes this processing reliable and performant in constrained environments. This is a Staff-level individual contributor role reporting to the Head of Solutions Engineering, working closely with Product, Design, Platform, and AI teams as part of the broader Go-To-Market (GTM) organization. You will frequently work directly with customers to support deployments and build production-ready solutions.

Requirements

7+ years in data engineering, data analysis, or related technical roles with hands-on data processing focus.
Deep experience with time-series data (video a plus): ingestion, preprocessing, feature extraction, quality assessment.
Proven ability to apply diverse analytical techniques: statistical analysis, signal processing, visualization, anomaly detection.
Experience with iterative data workflows: hypothesis, transformation, evaluation, refinement.
Comfortable building and running software on Linux devices, familiarity with system-level concerns (resource usage, process management, I/O).
Experience with real-time or streaming data processing under latency and throughput constraints.
Familiarity with data preparation for ML: dataset formatting, labeling workflows, train/eval splits, data validation.
C++ (production development): Strong proficiency building production data pipelines and device software. Experience with modern C++, memory management, multithreading, and performance optimization.
Python (analysis & prototyping): Strong proficiency for data exploration, statistical analysis, visualization, and rapid prototyping. Experience with NumPy, Pandas, Matplotlib, and Jupyter notebooks.
Proven expertise in Linux system architecture and performance, including process design, I/O strategies, and diagnosing complex production issues.
Debugging & profiling: Strong skills diagnosing performance issues, memory problems, and data pipeline failures in both C++ and Python.
Clear, structured written communication, including customer-facing documentation of findings, processes, and technical decisions.
Proven ability to present complex analytical and technical results directly to customers, translating them into concrete, actionable insights for technical teams and business stakeholders.

Nice To Haves

Background in signal processing, control systems, or physics-based data analysis.
Experience with embedding-space analysis or other AI/ML diagnostic techniques.
Prior work optimizing data pipelines for resource-constrained environments.
Background in solutions engineering or customer-facing technical work.

Responsibilities

Design and develop scalable, efficient, and reliable data processing systems that handle large volumes of data
Collaborate with data engineers, data scientists, and product managers to design and implement data processing systems that meet the needs of our business and our users
Write high-quality, maintainable code that is efficient, scalable, and reliable, using programming languages such as Java, Python, and Scala
Work with distributed computing frameworks such as Apache Spark, Hadoop, and Flink to design and implement data processing systems that handle large volumes of data
Design and implement data storage systems such as NoSQL databases, columnar storage, and data warehousing to meet the needs of our business and our users
Collaborate with cross-functional teams to design and implement data processing systems that meet the needs of our business and our users
Contribute to the development of our data infrastructure, including data pipelines, data warehouses, and data lakes
Collaborate with our data scientists to design and implement data processing systems that enable them to focus on high-level tasks, while our data infrastructure handles the heavy lifting
Participate in code reviews, contribute to the development of our codebase, and ensure that our code is maintainable, efficient, and scalable
Stay up-to-date with the latest technologies and trends in data processing and infrastructure, and contribute to the development of our data infrastructure and data processing systems