Data Direct Networks-posted 3 months ago
11-50 employees

This is an incredible opportunity to be part of a company that has been at the forefront of AI and high-performance data storage innovation for over two decades. DataDirect Networks (DDN) is a global market leader renowned for powering many of the world's most demanding AI data centers, in industries ranging from life sciences and healthcare to financial services, autonomous cars, Government, academia, research and manufacturing. DDN is the global leader in AI and multi-cloud data management at scale. Our cutting-edge data intelligence platform is designed to accelerate AI workloads, enabling organizations to extract maximum value from their data. With a proven track record of performance, reliability, and scalability, DDN empowers businesses to tackle the most challenging AI and data-intensive workloads with confidence. Our commitment to innovation, customer success, and market leadership makes this an exciting and rewarding role for a driven professional looking to make a lasting impact in the world of AI and data storage.

  • Design autonomous logic for optimizing SQL and non-SQL analytic queries to leverage Infinia’s distributed infrastructure.
  • Implement high-performance indexing for structured and non-structured data, using strategies such as B-epsilon trees, full-text indexing, and vectorization.
  • Develop internal systems for high-throughput data access and transformation using formats such as Parquet, ORC, and Avro.
  • Engineer integration layers that support open interfaces like Trino, Apache Spark, Apache Iceberg, Delta Lake, HDFS, and Hive Metastore, enabling seamless compatibility with open-source clients.
  • Build and tune execution plans that leverage Infinia’s high-throughput I/O and compute capabilities for large-scale AI and analytics workloads.
  • Analyze and optimize performance of distributed query execution, data storage, caching, and memory usage.
  • Write automated tests to validate correctness and performance of analytic queries among varied cluster topologies.
  • Contribute to relevant open-source ecosystems, where appropriate, through collaboration, feature integration, or direct code contributions.
  • Stay up to date with the evolving open data lake and query engine landscape to guide architectural decisions.
  • Partner with Data Scientists, Platform Engineers, and Product Managers to deliver integrated, end-to-end solutions.
  • Provide technical leadership, mentorship, and design direction to other engineers on the team.
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
  • 12+ years of experience in software development, with 5+ years in distributed systems, data platforms, or big data technologies.
  • Expert-level knowledge of SQL, Python, and Java or Scala.
  • Experience working with Apache Spark, distributed query engines, or distributed databases.
  • Strong familiarity with HDFS, Hive Metastore, and data partitioning strategies.
  • Hands-on experience with Apache Iceberg and/or Delta Lake.
  • Deep understanding of file formats including Parquet, ORC, Avro, and their performance characteristics.
  • Background in real-time data streaming using tools such as Apache Kafka.
  • Prior experience with C++.
  • Prior contributions to open-source projects; committer status is a plus.
  • Proven ability to lead complex technical initiatives and mentor junior engineers.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service