Senior Data Scientist

HPE•Sunnyvale, CA

59d•$153,500 - $310,500•Onsite

About The Position

Hewlett Packard Enterprise is the global edge-to-cloud company advancing the way people live and work. We help companies connect, protect, analyze, and act on their data and applications wherever they live, from edge to cloud, so they can turn insights into outcomes at the speed required to thrive in today’s complex world. Our culture thrives on finding new and better ways to accelerate what’s next. We know varied backgrounds are valued and succeed here. We have the flexibility to manage our work and personal needs. We make bold moves, together, and are a force for good. If you are looking to stretch and grow your career our culture will embrace you. Open up opportunities with HPE. Senior Data Scientist will be engaged in data science-related research and software application development and engineering duties related to our AI Datacenter technology and autonomous platform to provide an unprecedented visibility and operational efficiency and into the user experience. The Software Engineer will collaborate with other engineers and to build the next generation of autonomous Datacenter networks leveraging big data and predictive models. The Software Engineer will leverage the data collected from the network to empower the inference engine of our Mist platform and systems, including the Mist virtual assistant chat bot. In addition, will use his/her knowledge of network communication, machine learning and software engineering to develop and implement scalable algorithms to process a large amount of streaming data to detect anomalies, predict problems, provide Root Cause Analysis (RCA) and classify them in real-time. The Software Engineer will also be responsible to develop the software and algorithms to enhance the cloud intelligent for Marvis.

Requirements

Solid statistics and math background, good knowledge of machine learning methods like k-Nearest Neighbors, Naive Bayes, SVM, Decision Forests.
Excellent Communication Skills to articulate observations and use cases with PM and network domain experts who are not experienced in AI/ML through data visualization tool.
Have done time series data analysis, forecasting and correlation is preferrable.
Have utilized latest AI/ML techniques, such as Neural Networks, Transformer, etc. for time series data or interested to explore these techniques for time series data.
Analyze feature requirements from product manager, collaborate with engineers and data scientists to design the solutions.
Require good understanding of datacenter networking topology and protocols.
Require the knowledge of the multi-cloud production environment
Require the agility to troubleshoot open-source data processing engine, such as Apache Spark, Apache Storm and Apache Flink
Require good knowledge and experience of the big data tool sets and techniques of distributed storage and computation engine
Require the experience to develop the reusable and highly scalable data processing component
Require good knowledge and experience to work with cloud based CICD tools and cloud devops teams to collect stats and create monitors for our data processing pipelines
Require good understanding of MCPs and Agentic frameworks.
Bachelor's degree in Computer Science/ Engineering/Mathematics or equivalent experience
5+ years of experience Search Indexing, Ranking, Information Retrieval and Querying.
Proficient in Python and Golang
Proficient in implementing NLP, Machine Learning models and algorithms into production at scale.

Nice To Haves

PhD degree in Statistics, Operations Research, Computer Science or equivalent and 5+ years of relevant experience. Or Master´s Degree in these areas and at least 8 years of relevant experience.
Experience with statistical data analysis, data mining, and querying.
Experience in deploying and leading complete ML platforms in AWS/GCP/Azure.

Responsibilities

Design and implement machine learning solutions which require to process terabytes of streaming data to detect anomalies in DC networks of our customers, predict problems and future trends, provide Root Cause Analysis (60%)
Troubleshoot production environment and customer reported issues (20%)
Utilize analytical and programming skills and open-source systems, such as Hadoop, Hive, Spark, Elasticsearch, Redis, etc. develop data processing pipeline required efficacy and latency (20%)