Staff Software Engineer, Edge AI Systems

Archetype AI•Palo Alto, CA

13d

About The Position

Archetype AI is building Newton, our multi-sensor data fusion AI platform. We are seeking a Staff Software Engineer to own data processing and analysis across our edge devices and platform, from raw sensor ingestion through prepared, validated datasets ready for AI workflows. This role sits at the intersection of data engineering and device software. You will build high-performance data pipelines in C++ that run on small Linux devices, and apply rigorous analytical techniques using Python to explore, validate, and understand sensor and video data. You will work hands-on with real-world customer data, exploring, cleaning, transforming, and validating it, while also building the device software that makes this processing reliable and performant in constrained environments. This is a Staff-level individual contributor role reporting to the Head of Solutions Engineering, working closely with Product, Design, Platform, and AI teams as part of the broader Go-To-Market (GTM) organization. You will frequently work directly with customers to support deployments and build production-ready solutions.

Requirements

7+ years in data engineering, data analysis, or related technical roles with hands-on data processing focus.
Deep experience with time-series data (video a plus): ingestion, preprocessing, feature extraction, quality assessment.
Proven ability to apply diverse analytical techniques: statistical analysis, signal processing, visualization, anomaly detection.
Experience with iterative data workflows: hypothesis, transformation, evaluation, refinement.
Comfortable building and running software on Linux devices, familiarity with system-level concerns (resource usage, process management, I/O).
Experience with real-time or streaming data processing under latency and throughput constraints.
Familiarity with data preparation for ML: dataset formatting, labeling workflows, train/eval splits, data validation.
C++ (production development): Strong proficiency building production data pipelines and device software. Experience with modern C++, memory management, multithreading, and performance optimization.
Python (analysis & prototyping): Strong proficiency for data exploration, statistical analysis, visualization, and rapid prototyping. Experience with NumPy, Pandas, Matplotlib, and Jupyter notebooks.
Proven expertise in Linux system architecture and performance, including process design, I/O strategies, and diagnosing complex production issues.
Debugging & profiling: Strong skills diagnosing performance issues, memory problems, and data pipeline failures in both C++ and Python.
Clear, structured written communication, including customer-facing documentation of findings, processes, and technical decisions.
Proven ability to present complex analytical and technical results directly to customers, translating them into concrete, actionable insights for technical teams and business stakeholders.

Nice To Haves

Background in signal processing, control systems, or physics-based data analysis.
Experience with embedding-space analysis or other AI/ML diagnostic techniques.
Prior work optimizing data pipelines for resource-constrained environments.
Background in solutions engineering or customer-facing technical work.

Responsibilities

Analyze raw data using Python for statistical analysis, visualization, and exploratory techniques to understand quality, patterns, and anomalies.
Prepare datasets for AI workflows: cleaning, normalization, imputation, filtering, resampling, and validation.
Execute iterative preprocessing cycles: refine transformations, evaluate results, compare against baselines, retain improvements.
Build tooling for data validation, quality monitoring, and automated preprocessing.
Generate clear reports and visualizations that communicate findings to technical and non-technical stakeholders.
Build and optimize data processing software in C++ that runs on small, resource-constrained Linux devices.
Ensure pipelines meet real-time performance requirements: low latency, bounded memory, reliable throughput.
Integrate sensor inputs and manage data flow on-device: ingestion, buffering, local processing, and transmission.
Work within device constraints: limited CPU, memory, storage, and intermittent connectivity.
Contribute to device deployment, configuration, and operational tooling.
Partner with Solutions Engineers to assess customer data assets and deployment requirements.
Translate customer data challenges into reusable pipeline components and analysis workflows.