Join the Evolv Machine Learning & Sensors team as a Data Scientist focused on driving deep understanding of sensor data, feature spaces, and data quality that power our AI/ML systems. This hands-on role emphasizes representation analysis, exploratory data insights, and data-centric improvements that directly enhance model accuracy, robustness, and generalization. You will work across classical ML and deep learning pipelines to identify blind spots, diagnose data issues, and guide data curation and collection strategies. Success in the Role: What performance outcomes will you work toward in the first 6–12 months? In the first 30 days: Develop a strong understanding of Evolv’s sensor ecosystem, datasets, and ML pipelines. Review dataset structure, labeling processes, and existing exploratory data analyses. Run initial UMAP/PCA/t-SNE analyses to map data distributions and identify anomalies. Identify opportunities to improve data quality, labeling consistency, and dataset coverage. Within the first three months: Perform deep representation analysis across sensor, time‑series, and feature data. Evaluate classical ML and deep learning models by linking model errors to data issues. Define data quality metrics and initial dataset acceptance criteria. Collaborate with data collection teams to guide targeted data acquisition and relabeling. Data mining on existing field data and understanding patterns and extract useful information and insights Design methods to improve data quality, converting noisy/unverified data into clean/verified data By the end of the first year: Own data‑centric insights that directly improve ML model performance. Establish ongoing monitoring of data drift, blind spots, and label quality. Provide strategic guidance for future data collection, annotation, and curation. Develop automated tools and dashboards for data quality reporting and representation analysis. The Work: What type of work will you be doing? What assignments, requirements, or skills will you be performing on a regular basis? Data Understanding & Representation Analysis: Analyze high‑dimensional sensor and feature data using UMAP, t‑SNE, PCA, and related techniques. Identify clusters, outliers, distribution gaps, and blind spots across classes and environments. Diagnose dataset shift, domain mismatch, sparsity, and representation collapse. Model‑Aware Data Analysis: Conduct data analysis aligned with both classical ML models (XGBoost, SVR, k‑NN, tree‑based models) and deep learning models (CNNs, Transformers). Analyze embeddings, confusion matrices, and failure cases to map model issues back to data causes. Data Quality & Curation: Investigate imbalanced data, noisy sensor signals, and mislabeled or ambiguous samples. Develop strategies for weakly labeled or unlabeled data using clustering or pseudo‑labeling. Define data quality metrics, acceptance criteria, and labeling strategies. Work with internal teams and external vendors to improve label consistency and coverage. Insight‑Driven Improvements: Translate exploratory insights into clear recommendations for data collection, relabeling, or filtering. Drive data‑centric improvements instead of relying solely on algorithmic changes. Track KPIs such as data quality, data quantity, collection rate, and utilization efficiency. Collaboration & Communication: Work closely with internal and external data collection teams to refine data pipelines. Communicate findings through visualizations, reports, and technical deep‑dives.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level