Arrowstreet Capital-posted 6 days ago
$115,000 - $325,000/Yr
Full-time • Mid Level
Boston, MA
251-500 employees

We are seeking a hands-on Engineer to play a crucial role in the design, automation, and operation of our enterprise data platform. In this role, you will participate in the development and support of key services that make it easier to bring new data sources into our business and turn them into valuable analytics. You will work closely with colleagues from different departments to understand their data needs and help improve how data moves across the company. You will contribute to building and optimizing code, including developing data platform endpoints and creating flexible tools that can be adapted for a variety of internal requirements. You will design and develop CI/CD, infrastructure-as-code, reliability, security, and observability for data services, working closely with multiple stakeholders including business data owners, application and cloud teams. You will drive the development of data pipelines and tooling that incorporates various data sources into a standard data warehouse/lakehouse. This role will also work closely with business analysts to implement reporting capabilities that support advanced analytics that drive business insights.

  • Develop end-to-end CI/CD for data applications, pipelines, and platform services using GitLab CI/CD (build, test, deploy, promote, rollback, environments).
  • Automate infrastructure provisioning, configuration, and compliance with Terraform (and related tooling), implementing modular, reusable IaC patterns and GitOps workflows.
  • Design and operate a cloud-native data platform on AWS data services (networking, compute, storage, security), enabling scalable ingestion, processing, storage, and retrieval.
  • Implement platform reliability practices: monitoring, logging, tracing, alerting and enhanced platform resiliency via a multi-region design
  • Build and maintain deployment pipelines and release management for data workflows (batch/streaming), APIs, and microservices.
  • Standardize environments and packaging (Docker), manage Kubernetes (EKS) clusters, and optimize workload scheduling, autoscaling, and cost efficiency.
  • Enforce security and compliance-by-design: IAM, least privilege, KMS/secrets management and aligning security best practices
  • Develop platform tooling and self-service interfaces that connect data producers to consumers (service templates, golden paths, catalogs, and SLAs).
  • Migrate and modernize legacy/on-prem data infrastructure to AWS, consolidating disparate systems into a unified, governed data layer.
  • Drive continuous improvement: evaluate new tools, optimize performance and cost, reduce toil through automation, and champion best practices across teams.
  • 5+ years of professional experience in data engineering, DevOps, or related roles supporting data-centric systems
  • Strong programming skills in Python; experience with testing frameworks and software engineering best practices.
  • Demonstrated expertise in SQL and data processing frameworks/libraries such as Spark and/or Pandas.
  • Hands-on experience building complex, scalable data systems and pipelines (batch and/or streaming).
  • AWS experience with AWS services for data and compute (e.g., S3, EC2, Lambda, Glue/EMR, Redshift/RDS, IAM, VPC); 3+ years preferred.
  • Proficiency with CI/CD and version control platforms (e.g., GitLab, GitHub, Jenkins), including pipeline design, automation, and troubleshooting.
  • Infrastructure as Code (Terraform preferred); ability to provision, manage, and secure cloud resources via IaC.
  • Experience with containerization and orchestration (Docker, Kubernetes).
  • Understanding of data storage technologies and modeling across databases, data warehouses, data lakes, and lakehouses; schema design and query optimization.
  • Experience optimizing queries and manipulating/aggregating data, including point-in-time/temporal data semantics.
  • Knowledge of monitoring and observability (e.g., CloudWatch, Prometheus/Grafana, ELK) for data pipelines and services.
  • Solid grasp of system and data security practices, especially in regulated environments (IAM, KMS, least privilege, network controls).
  • Design skills; ability to design scalable, reliable data solutions.
  • Communication and collaboration skills with a track record of building strong cross-functional relationships.
  • Familiarity with agile and SecDevOps practices and workflows.
  • Experience with orchestration tools (e.g., Airflow, Prefect) for workflow management.
  • Exposure to data Lakehouse patterns, data mesh concepts, and streaming technologies
  • Experience with additional languages for data/infra (e.g., Java, Scala, Rust, C#, C++).
  • Data quality, lineage, and governance tooling experience.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service