PySpark Big Data Senior Developer - Vice President

CitiMississauga, ON
Onsite

About The Position

We are building an A-team of highly skilled and autonomous engineers, and we are seeking an exceptional PySpark Big Data Senior Developer to join our dynamic and focused squads. This role is for a hands-on player/coach who thrives in a high-autonomy environment, is deeply committed to leveraging AI for maximum productivity, and possesses a profound understanding of the functional domains our work impacts. The ideal candidate will be instrumental in designing, developing, and optimizing large-scale data processing solutions using PySpark and cutting-edge big data technologies. We are looking for an AI-first thinker who can raise the bar, coach others, and strategically contribute to our evolving technology landscape.

Requirements

  • 6+ years of extensive, hands-on experience as a Senior Big Data Developer, with a strong emphasis on PySpark and the Apache Spark ecosystem, operating as a player/coach.
  • Expert proficiency in Python, with a proven track record of developing robust, scalable, and high-performance PySpark applications for large-scale data processing.
  • Deep understanding and extensive hands-on experience with Apache Spark (Spark Core, Spark SQL, Spark Streaming) and its ecosystem.
  • Experience with distributed computing frameworks such as Hadoop (HDFS, YARN).
  • Expert proficiency in SQL and extensive experience with data warehousing concepts and technologies (e.g., Hive, Snowflake, Redshift, Databricks SQL).
  • Proven experience with various data storage formats (e.g., Parquet, ORC, Avro) and data lake solutions (e.g., Delta Lake, Iceberg).
  • Strong experience with Apache Kafka for building real-time data pipelines and event-driven architectures.
  • Proven effectiveness with AI coding tools (e.g., Claude Code, Codex, Antigravity) is a mandatory requirement.
  • A strong "AI-first thinker" mindset, demonstrating how to leverage and integrate AI tools into the development workflow for continuous improvement.
  • Experience with or a strong willingness to actively explore and implement other AI-powered tools to optimize big data development processes.
  • Strong ability to articulate the functional domain being worked in, understanding the business context, and explaining "why" the technical solutions matter.
  • Advanced understanding of data structures, algorithms, and performance optimization techniques for large-scale distributed data processing.
  • Experience with RESTful API design and development for data ingestion or exposure points.
  • Familiarity with containerization technologies (e.g., Docker, Kubernetes) for deploying and managing big data applications.
  • Expert proficiency with version control systems, especially Git, and advanced branching strategies.
  • Exceptional problem-solving, analytical, and debugging skills in highly complex, distributed big data environments.
  • Superior communication and interpersonal skills, with a proven ability to work effectively and autonomously within small, high-performing teams, and to mentor others.
  • Demonstrated high autonomy and agency in tackling complex challenges and delivering impactful solutions.
  • Bachelor's or Master's degree in Computer Science, Engineering, Data Science, or a related quantitative field is required. Equivalent practical experience with a demonstrable track record of excellence will also be considered.

Nice To Haves

  • Experience with NoSQL databases (e.g., MongoDB, Cassandra, HBase) is a significant plus.
  • Demonstrated experience with big data services on major cloud platforms (e.g., AWS EMR/Glue/Redshift, Azure Databricks/Data Factory/Synapse, GCP Dataflow/Dataproc/BigQuery) is highly desirable.

Responsibilities

  • Operate end-to-end in the design, development, and implementation of robust big data solutions, ensuring optimal performance, scalability, data quality, and security.
  • Collaborate closely within small, co-located squads (4-7 person teams), fostering high communication and low coordination overhead, to translate complex business requirements into technical specifications for big data processing and analytical solutions.
  • Act as a player/coach within the team, mentoring junior members and leading by example in the development of efficient and innovative big data architectures.
  • Design, develop, and optimize large-scale data pipelines using PySpark for data ingestion, transformation, and aggregation, always with an eye towards efficiency and domain relevance.
  • Implement and manage real-time data streaming and event-driven architectures using technologies like Apache Kafka.
  • Design and implement sophisticated data warehousing solutions and dimensional models for efficient data storage and retrieval, ensuring alignment with business needs.
  • Work with various distributed data storage technologies, including distributed file systems (e.g., HDFS, S3) and NoSQL databases (e.g., MongoDB, Cassandra), selecting the right tool for the right problem.
  • Implement efficient data processing and storage strategies to optimize the performance and scalability of big data applications, with a strong focus on the "why" behind the technology choices.
  • Champion best practices in software development, including rigorous code reviews, implementing comprehensive testing, and supporting continuous integration and continuous deployment (CI/CD) pipelines.
  • Demonstrate high autonomy and agency in driving projects forward, making informed decisions, and proactively identifying areas for improvement.
  • Proactively leverage and contribute to the development of AI-powered development tools, including internal Citi AI tools like Copilot, Claude Code, Codex, and Antigravity, to significantly enhance productivity, code quality, and accelerate development cycles.
  • Lead technical discussions and contribute strategically to the evolution of our big data technology stack, always seeking innovative approaches.
  • Troubleshoot and resolve complex technical issues within big data environments, demonstrating strong analytical and problem-solving skills.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service