Lead Data Engineer, Associate Director

Fitch Group•New York, NY

48d•$140,000 - $160,000•Hybrid

About The Position

Fitch Group is currently seeking a Associate Director/Lead Data Engineer based out of our Chicago office. As a leading, global financial information services provider, Fitch Group delivers vital credit and risk insights, robust data, and dynamic tools to champion more efficient, transparent financial markets. With over 100 years of experience and colleagues in over 30 countries, Fitch Group's culture of credibility, independence, and transparency is embedded throughout its structure, which includes Fitch Ratings, one of the world's top three credit ratings agencies, and Fitch Solutions, a leading provider of insights, data and analytics. With dual headquarters in London and New York, Fitch Group is owned by Hearst. Fitch's Technology & Data Team is a dynamic department where innovation meets impact. Our team includes the Chief Data Office, Chief Software Office, Chief Technology Office, Emerging Technology, Shared Technology Services, Technology, Risk and the Executive Program Management Office (EPMO). Driven by our investment in cutting-edge technologies like AI and cloud solutions, we're home to a diverse range of roles and backgrounds united by a shared passion for leveraging modern technology to drive projects that matter to our organization and clients. We are also proud to be recognized by Built In as a "Best Place to Work in Technology" 3 years in a row. Whether you're an experienced professional or just starting your career, we offer an exciting and supportive environment where you can grow, innovate, and make a difference. Want to learn more about a career in technology and data at Fitch? Visit: https://careers.fitch.group/content/Technology-and-Data/

Requirements

You have 8+ years of data engineering experience, including 3+ years in a lead role architecting large-scale data platforms.
You possess expert-level proficiency in Python and Java for building cloud-native data processing solutions.
You have deep hands-on experience with Apache Airflow, Snowflake (data warehousing, modeling, optimization), and Databricks.
You have strong AWS expertise, including S3, Lambda, Glue, EMR, Kinesis, EKS, and RDS.
You have production database experience with PostgreSQL (design, optimization, replication) and MongoDB (document modeling, sharding, replica sets).
You have solid experience with containerization and orchestration using Docker, Kubernetes, and AWS EKS, including cluster management and autoscaling.
You have proven CI/CD and GitOps experience using GitHub, GitHub Actions, and ArgoCD for automated deployments and multi-environment management.
You are proficient with agile tools such as JIRA for sprint management and Confluence for technical documentation and knowledge sharing.
You have excellent analytical, problem-solving, and communication skills, with the ability to explain complex concepts to non-technical stakeholders and drive initiatives in complex environments.
You have working knowledge of AI/ML frameworks (LangChain, LlamaIndex, AutoGen, etc.) and understand how Agentic AI can enhance data engineering workflows through automated data validation, intelligent orchestration, and self-healing pipelines.
You have practical understanding of AI integration patterns in data platforms, including prompt engineering, RAG architectures, and vector database implementations.
You are familiar with Model Context Protocol (MCP) or similar frameworks for enabling AI agents to interact securely and efficiently with data sources, APIs, and tools.
You have experience with AI-powered development tools such as GitHub Copilot and Amazon Q.

Nice To Haves

Experience with code quality metrics and shift-left principles.
Experience testing container resiliency (Docker/Kubernetes).
Experience designing large end-to-end performance scenarios.
Experience building large and high-performing data pipelines.
Exposure to Playwright and BDD for automated testing.
Exposure to the financial industry and data platforms (data warehouses, data lakes).
Experience with modern data stack tools, data mesh/fabric architectures, and streaming platforms (Kafka, Kinesis).
Proficiency with observability tools (Datadog) and data quality/governance frameworks.
Understanding of data security and compliance standards (GDPR, SOC 2, CCPA) and contributions to open-source data projects.
Relevant certifications (AWS Data Analytics/Solutions Architect, Databricks/Snowflake Data Engineer, CKA).
Hands-on experience building production Agentic AI systems that operate on data platforms, including multi-agent orchestration and intelligent pipeline optimization.
Deep expertise with Model Context Protocol (MCP) implementation, including building custom MCP servers or integration patterns for enterprise data platforms.

Responsibilities

Lead the design and architecture of end-to-end data pipelines and solutions on modern cloud-based platforms, including Snowflake, Databricks, and AWS.
Build and optimize robust, scalable data orchestration workflows using Apache Airflow and implement best practices across multiple agile squads.
Design and implement data solutions using PostgreSQL for relational data and MongoDB for NoSQL requirements, ensuring optimal performance and scalability.
Architect and deploy containerized data applications using Docker, Kubernetes, and AWS EKS, incorporating GitHub Actions for automated deployments.
Design and implement CI/CD pipelines using GitHub Actions, establish branching strategies, and ensure automated testing, code quality checks, and security scanning.
Collaborate with cross-functional teams—including Data Scientists, Analytics teams, and business stakeholders—to translate requirements into scalable technical solutions.
Mentor and guide data engineers by promoting technical excellence, establishing coding standards, and conducting architecture reviews.
Drive data platform modernization initiatives and ensure data quality, reliability, and governance across all data systems.
Design and implement AI-enhanced data pipelines that leverage LLMs and Agentic AI frameworks to automate data quality checks, anomaly detection, and intelligent data transformation workflows.
Architect data infrastructure to support AI/ML workloads, including feature stores, vector databases, and real-time inference pipelines integrated with cloud-native services.
Leverage established standards and best practices to integrate AI agents into data engineering workflows, including context management protocols (MCP) for seamless AI-to-data-platform communication.

Benefits

Hybrid Work Environment: On-site presence required two days per week.
A Culture of Learning & Mobility: Access to dedicated training, leadership development, and mentorship programs to support continuous learning.
Investing in Your Future: Retirement planning and tuition reimbursement programs to help you meet your short- and long-term goals.
Promoting Health & Wellbeing: Comprehensive healthcare offerings that support physical, mental, financial, social, and occupational wellbeing.
Supportive Parenting Policies: Family-friendly policies, including a generous global parental leave plan, designed to help you balance work and family life.
Inclusive Work Environment: A collaborative workplace where all voices are valued, supported by Employee Resource Groups that unite and empower colleagues worldwide.
Dedication to Giving Back: Paid volunteer days, matched donation programs, and ample opportunities to volunteer in your community.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume