Data Engineer

Emerald Expositions•San Juan Capistrano, CA

41d•$130,000 - $140,000•Hybrid

About The Position

We are seeking a highly motivated and technically proficient Data Engineer to join our growing data and analytics team. This role involves designing, developing, and optimizing scalable data pipelines and integrations across various cloud-based and third-party platforms. The ideal candidate will have hands-on experience with Databricks, Apache Spark, PySpark, and cloud computing, along with strong problem-solving skills and a solid understanding of data architecture and integration best practices.This position is based out of our San Juan Capistrano office on a hybrid basis.

Requirements

5+ years of experience in data engineering, software development, or a related role.
Strong hands-on experience with:Databricks, Apache Spark, PySpark
SQL and cloud-native or relational databases
Python programming for data integration and processing
API development and integration experience with Postman, Swagger, REST APIs
Proficiency in working with AWS Cloud or similar platforms (S3, Lambda, AppFlow, etc.)
Strong understanding of data pipelines, ETL/ELT, and data architecture principles
Experience integrating data from platforms like Salesforce, HubSpot, or similar CRMs
Strong problem-solving skills and ability to work in fast-paced environments
Excellent communication and collaboration skills

Nice To Haves

Bachelors degree in computer science, Data Engineering, or a related technical field
Experience with Delta Lake, Databricks DLT (Delta Live Tables), and Unity Catalog
Familiarity with data governance, data cataloging, and access control mechanisms
Experience with GitHub, Jira, and Confluence for code management and team collaboration
Expertise in Scrum methodology and Agile team environments
Familiarity with data orchestration and transformation tools such as dbt or Airflow
Experience with event-driven architectures and real-time data (e.g., Kafka)
Certifications in Databricks, AWS, or other cloud platforms (a plus)

Responsibilities

Develop and optimize data pipelines and workflows using Databricks, Apache Spark, PySpark, and cloud-native services.
Integrate data from internal systems and external platforms such as HubSpot, Salesforce, and other CRM systems via APIs.
Implement cloud-based data architectures following data mesh principles and best practices.
Collaborate on data modeling, transformation, and quality assurance for analytics and reporting purposes.
Build and maintain APIs; use Postman and Swagger for testing and documentation.
Write efficient and modular code in Python and leverage SQL for data processing.
Follow SDLC best practices including version control, CI/CD, and code reviews.
Ensure data security, integrity, and governance across the full data lifecycle.
Use AWS (or similar platforms like Azure or GCP) for compute, storage, and orchestration services.
Work closely with cross-functional teams using Agile/Scrum methodologies.