Python and Database Developer, Assistant Vice President

Citi•New York, NY

24d

About The Position

Data Engineering Leadership: Participate in design and development of data pipelines for ingestion, transformation, and loading of data from various sources (databases, APIs, streaming platforms) into our data warehouse/lake, ensuring seamless data flow and accessibility. Develop data models that support business requirements and analytical needs. Optimize data models for query performance and data accessibility. Database Optimization: Write optimized and maintainable SQL queries and leverage SQLAlchemy for efficient database interaction, ensuring high performance and data accuracy. Data Quality Assurance: Implement robust data quality checks and monitoring systems to ensure data integrity and accuracy, proactively identifying and resolving data issues. Data Governance Contribution: Contribute to the design and implementation of data governance policies and procedures, ensuring compliance with regulatory requirements and internal standards. Technology Innovation: Continuously research and implement new technologies and best practices to improve the efficiency, scalability, and resilience of our data platform. Cloud Deployment & Monitoring: Take ownership of the deployment and monitoring of data pipelines and related infrastructure on cloud platforms such as OpenShift, ECS, or Kubernetes, ensuring optimal performance and reliability. Operational Excellence: Ability to occasionally work a non-standard shift, including nights and/or weekends, and/or have on-call responsibilities to support critical data operations.

Requirements

6+ years of hands-on experience in a Data Engineering role
Strong proficiency in Python (version 3.6+), with experience in Python packaging and shared libraries like Pandas and NumPy.
Extensive experience working with relational databases, writing complex SQL, and optimizing queries for performance.
Proven expertise with SQLAlchemy or similar ORM libraries for efficient database interaction.
Solid understanding of data warehousing concepts and experience working with large datasets, including data modeling and ETL processes.
Ability to guide and mentor junior developers, fostering a collaborative team environment and promoting professional growth.
Strong communication skills, both written and verbal, with the ability to explain complex technical concepts to both technical and non-technical audiences.
Proficient in industry-standard best practices such as Design Patterns, Coding Standards, Coding modularity, and Prototyping.
Bachelor's degree in Computer Science, Software Engineering, or a related field.

Nice To Haves

Experience with data visualization tools and techniques for presenting data insights effectively.
Familiarity with agile development methodologies and experience working in agile teams.
Experience implementing REST APIs in Python using microframeworks like Flask.
Experience with workflow management tools like Airflow (experience with PySpark or PyFlink is a major plus).
Experience working in a Continuous Integration and Continuous Delivery environment and familiarity with tools like Jenkins, TeamCity, SonarQube, OpenShift, ECS, or Kubernetes.

Responsibilities

Participate in design and development of data pipelines for ingestion, transformation, and loading of data from various sources (databases, APIs, streaming platforms) into our data warehouse/lake, ensuring seamless data flow and accessibility.
Develop data models that support business requirements and analytical needs.
Optimize data models for query performance and data accessibility.
Write optimized and maintainable SQL queries and leverage SQLAlchemy for efficient database interaction, ensuring high performance and data accuracy.
Implement robust data quality checks and monitoring systems to ensure data integrity and accuracy, proactively identifying and resolving data issues.
Contribute to the design and implementation of data governance policies and procedures, ensuring compliance with regulatory requirements and internal standards.
Continuously research and implement new technologies and best practices to improve the efficiency, scalability, and resilience of our data platform.
Take ownership of the deployment and monitoring of data pipelines and related infrastructure on cloud platforms such as OpenShift, ECS, or Kubernetes, ensuring optimal performance and reliability.
Ability to occasionally work a non-standard shift, including nights and/or weekends, and/or have on-call responsibilities to support critical data operations.
Design, develop, and maintain database schemas and models.
Write and optimize SQL queries for data retrieval, manipulation, and reporting.
Communicate technical concepts and solutions effectively to both technical and non-technical audiences.
Provide technical support and troubleshooting for production systems.
Stay up-to-date with the latest trends and technologies in Python development, database systems, and data engineering.
Evaluate and recommend new tools and technologies to improve development efficiency and product quality.
Contribute to the continuous improvement of development processes and practices.