3M-posted about 1 year ago
$160,284 - $195,903/Yr
Full-time • Senior
Maplewood, MN
Nonmetallic Mineral Product Manufacturing

The Principal Data Engineer at 3M will play a crucial role in developing scalable data systems within the Corporate Research Systems Lab (CRSL). This position involves collaborating with a diverse team to design and support an Enterprise Data Mesh, enabling advanced informatics and digital technologies across various markets. The role emphasizes technical architecture, data integration, and the implementation of best practices in data engineering and analytics.

  • Architect, design, and build scalable, efficient, and fault-tolerant data operations.
  • Collaborate with senior leadership, analysts, engineers, and scientists to implement new mesh domain nodes and data initiatives.
  • Drive technical architecture for accelerated solution designs, including data integration, modeling, governance, and applications.
  • Explore and recommend new tools and technologies to optimize the data platform.
  • Improve and implement data engineering and analytics engineering best practices.
  • Collaborate with data engineering and domain nodes teams to design physical data models and mappings.
  • Work with scientists and informaticians to develop advanced digital solutions and promote digital transformation and technologies.
  • Perform code reviews, manage code performance improvements, and enforce code maintainability standards.
  • Develop and maintain scalable data pipelines for ingesting, transforming, and distributing data streams.
  • Advise and mentor 3M businesses, data scientists, and data consumers on data standards, pipeline development, and data consumption.
  • Provide technical guidance and mentorship, ensure adherence to best practices, and maintain high software quality through rigorous testing and code reviews.
  • Guide project planning and execution, manage timelines and resources, and facilitate effective communication between team members and stakeholders.
  • Foster a positive team environment, assist in recruitment, and provide training opportunities to address skill gaps.
  • Bachelor's degree or higher in Computer Science from an accredited university.
  • Ten (10) years of professional experience in data management, data engineering, data governance, and data warehouse/lakehouse design and development with proficiency across SQL and NoSQL data management systems.
  • Five (5) years of extensive experience and proficiency with Python, Apache Spark, PySpark, and Databricks.
  • Three (3) years of hands-on experience in Python to extract data from APIs and build data pipelines.
  • Exposure to data and data types in the Materials science, chemistry, computational chemistry, physics space.
  • Proficiency in developing or architecting modern distributed cloud architecture and workloads (AWS, Databricks preferred).
  • Familiarity with data mesh style architecture design principles.
  • Proficiency in building data pipelines to integrate business applications and procedures.
  • Solid understanding of advanced Databricks concepts like Delta Lake, MLFlow, Advanced Notebook Features, Custom Libraries and Workflows, Unity Catalog, etc.
  • Experience with AWS cloud computing services and infrastructure developing data lakes and data pipelines leveraging multiple technologies such as AWS S3, AWS Glue, Elastic MapReduce, etc.
  • Experience with stream-processing systems: Amazon Kinesis, Spark, Storm, Kafka, etc.
  • Data quality and validation principles experience, security principles data encryption, access control, authentication & authorization.
  • Deep experience in definition and implementation of feature engineering.
  • Experience with Docker containers and Kubernetes, experience developing or interacting with APIs.
  • Experience in using data orchestration workflows using open-source tools like Temporal.io, Apache Airflow is a plus.
  • Knowledge of data visualization tools like Dash Apps, Tableau, Power BI, etc.
  • Good experience with agile development processes and concepts with leveraging project management tools like JIRA and Confluence.
  • Medical, Dental & Vision insurance
  • Health Savings Accounts
  • Health Care & Dependent Care Flexible Spending Accounts
  • Disability Benefits
  • Life Insurance
  • Voluntary Benefits
  • Paid Absences
  • Retirement Benefits
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service