Senior Data Scientist

Bloomberg Industry Group•Arlington, TX

About The Position

You are a research scientist and engineer who wants to work in the areas of machine learning, natural language processing, information, extraction, graphical models, summarization, information retrieval, recommend systems, and/or knowledge graphs. You will bring new ideas, evangelizing them, and shepherding their adoption within the team. You extract and identify relevant, meaningful, and actionable information from structured and unstructured data in real-time and provide advanced ways of accessing this data (such as search, summarization, and recommendations). What you will do: Write code (Python, R, SQL, Java, etc.) to obtain, clean, manipulate, and analyze data. Retrieve, synthesize, and present critical data in a format that is immediately useful to answering specific questions or improving system performance. Analyze and interpret historical data to identify trends and support optimal decision making. Drive the development of machine learning and statistical models to solve specific business problems. Leverage AI/LLM frameworks to develop internal and external solutions. Capitalize on opportunities for improving workflows to increase data quality, accuracy, and timeliness. Lead experiments to test hypotheses and measure the effectiveness of solutions. Formalize assumptions about how our systems should work, create statistical definitions of outliers, and develop methods to systematically identify outliers. Determine why such examples are outliers and if action is needed. Partner with engineering teams to help implement end-to-end pipelines. Champion collaboration across teams to deliver enhancements, features, and time-sensitive projects. Deliver reports or presentations to share insights to audiences of varying levels of technical sophistication. Maintain a portfolio of self-directed projects that are aligned with business goals. Take ownership of technical solutions and direct machine learning strategy for the team. Drive team innovation through training, mentoring, and knowledge-sharing exercises with colleagues.

Requirements

Master’s degree in Data Science, Statistics, Computer Science, Mathematics, or a related field.
7+ years of experience of applied data science experience working with large projects and diverse data sets, including preprocessing, cleansing, and verifying the integrity of data.
Strong proficiency in Python and SQL, R, or Java, and experience with data analysis libraries (pandas, NumPy, scikit-learn).
Experience with data visualization tools (Tableau, Power BI, QuickSight, matplotlib, seaborn).
Experience with Database Management Systems (Oracle, PostgreSQL, MySQL, Redshift, etc.).
Experience with distributed computational frameworks (YARN, Spark, Hadoop, Kubernetes, Docker), cloud-based computing (Apache Solr, Lucene, or Elasticsearch).
Knowledge of descriptive and inferential statistics, regression, supervised and unsupervised learning methods, multivariate and univariate hypothesis testing.
Experience selecting features and building training sets for developing machine learning models.
Experience with machine learning engineering
Excellent understanding of ML algorithms (e.g. KNN, Naive Bayes, decision trees, ensemble models, clustering) and the ability to select the best methods for a business problem.
Extensive knowledge of leading open-source data analysis tools, ML libraries, and NLP processing techniques (including topic modeling and summarization).
Knowledge of advanced concepts such as weakly supervised learning, reinforcement learning, and deep learning.
Familiarity with Databricks, Spark, or other similar big data platforms.
Effective project management skills and ability to prioritize tasks.
Ability to work quickly, accurately, and efficiently in a fast-paced environment with shifting priorities.
Strong problem-solving and analytical skills.
Excellent communication skills, with the ability to explain complex concepts to non-technical stakeholders.

Responsibilities

Write code (Python, R, SQL, Java, etc.) to obtain, clean, manipulate, and analyze data.
Retrieve, synthesize, and present critical data in a format that is immediately useful to answering specific questions or improving system performance.
Analyze and interpret historical data to identify trends and support optimal decision making.
Drive the development of machine learning and statistical models to solve specific business problems.
Leverage AI/LLM frameworks to develop internal and external solutions.
Capitalize on opportunities for improving workflows to increase data quality, accuracy, and timeliness.
Lead experiments to test hypotheses and measure the effectiveness of solutions.
Formalize assumptions about how our systems should work, create statistical definitions of outliers, and develop methods to systematically identify outliers.
Determine why such examples are outliers and if action is needed.
Partner with engineering teams to help implement end-to-end pipelines.
Champion collaboration across teams to deliver enhancements, features, and time-sensitive projects.
Deliver reports or presentations to share insights to audiences of varying levels of technical sophistication.
Maintain a portfolio of self-directed projects that are aligned with business goals.
Take ownership of technical solutions and direct machine learning strategy for the team.
Drive team innovation through training, mentoring, and knowledge-sharing exercises with colleagues.