Staff Data Engineer - AIOps

American Express•Phoenix, AZ

About The Position

At American Express, our culture is built on a 175-year history of innovation, shared values and Leadership Behaviors, and an unwavering commitment to back our customers, communities, and colleagues. From delivering differentiated products to providing world-class customer service, we operate with a strong risk mindset, ensuring we continue to uphold our brand promise of trust, security, and service. As part of Team Amex, you'll experience this powerful backing with comprehensive support for your holistic well-being and many opportunities to learn new skills, develop as a leader, and grow your career. Here, your voice and ideas matter, your work makes an impact, and together, you will help us define the future of American Express.

Requirements

Bachelor’s degree in Computer Science, Engineering, Data Science, or equivalent practical experience; advanced degree preferred
Strong knowledge of machine learning fundamentals, including supervised/unsupervised learning, time-series, NLP, and model evaluation techniques
Hands-on knowledge of Generative AI and LLM ecosystems, including transformers, embeddings, vector databases, prompt engineering, RAG patterns, and agentic frameworks
Deep understanding of data platform and storage technologies, including relational, NoSQL, columnar, graph, and vector stores
Knowledge of distributed systems and cloud-native architectures, including containerization, orchestration, and service-based design
Familiarity with model governance, explainability, bias detection, and AI risk management in enterprise environments
Strong understanding of data formats and APIs (JSON, Parquet, Avro, XML), schema management, and metadata systems
Significant experience in data engineering, ML engineering, or AI platform engineering roles
Strong hands-on programming experience in Python (required); experience with Java, Scala, or similar languages is a plus
Experience building and operating ML pipelines and AI platforms using tools such as Airflow, Kubeflow, MLflow, SageMaker, Vertex AI, or equivalent
Experience with GenAI frameworks and tooling (e.g., LangChain, LlamaIndex, OpenAI/Vertex APIs, vector databases like Pinecone, FAISS, or similar)
Experience designing and scaling large-scale data systems across technologies such as BigQuery, Spanner, Hive, HBase, NoSQL stores, relational databases, and streaming platforms
Experience with cloud-based data and AI platforms (AWS, GCP, Azure), including cost optimization and performance tuning for AI workloads
Proven experience leading, mentoring, and influencing senior engineers and cross-functional teams
Experience integrating AI solutions into infrastructure, observability, reliability engineering, or operational platforms is strongly preferred
Experience with production-grade CI/CD, monitoring, and automation for data and AI systems

Responsibilities

Leads and mentors engineers across Data Engineering, ML Engineering, and AI Ops, fostering a culture of technical excellence, experimentation, and production-grade AI delivery at scale
Designs, builds, and operates end-to-end AI Ops platforms supporting machine learning, generative AI, and agentic workflows, from data ingestion and feature engineering through model training, deployment, monitoring, and lifecycle management
Hands-on development of AI-enabled systems, including ML pipelines, LLM-based applications, retrieval-augmented generation (RAG), prompt pipelines, agent orchestration, and model inference services
Defines and implements scalable data and feature pipelines optimized for AI/ML workloads, ensuring high data quality, lineage, reproducibility, and compliance with enterprise governance standards
Leads MLOps and LLMOps practices, including CI/CD for models, automated testing and validation, model versioning, experiment tracking, drift detection, performance monitoring, and rollback strategies
Oversees integration of diverse structured and unstructured data sources (batch and streaming) to support analytics, ML, and GenAI use cases across global infrastructure operations
Partners closely with infrastructure, platform, security, and product teams to embed AI capabilities into operational systems, observability platforms, reliability engineering, and automation workflows
Conducts architecture and design reviews for AI platforms, data systems, and ML pipelines, ensuring solutions meet scalability, reliability, security, and cost-efficiency requirements
Drives AI Ops automation initiatives, leveraging ML and GenAI to improve incident detection, root cause analysis, capacity forecasting, anomaly detection, and self-healing infrastructure
Monitors and optimizes AI and data workflows, ensuring adherence to delivery timelines, sprint commitments, and best practices in DevOps, DataOps, and AI Ops
Influences enterprise AI strategy by evaluating emerging AI/ML technologies, frameworks, and platforms, and guiding their adoption in a regulated, production environment

Benefits

Competitive base salaries
Bonus incentives
6%25 Company Match on retirement savings plan
Free financial coaching and financial well-being support
Comprehensive medical, dental, vision, life insurance, and disability benefits
Flexible working model with hybrid, onsite or virtual arrangements depending on role and business need
20+ weeks paid parental leave for all parents, regardless of gender, offered for pregnancy, adoption or surrogacy
Free access to global on-site wellness centers staffed with nurses and doctors (depending on location)
Free and confidential counseling support through our Healthy Minds program
Career development and training opportunities