Software Development Engineer - ML Ops

Workday•McLean, VA

1d•Hybrid

About The Position

Your work days are brighter here. We’re obsessed with making hard work pay off, for our people, our customers, and the world around us. As a Fortune 500 company and a leading AI platform for managing people, money, and agents, we’re shaping the future of work so teams can reach their potential and focus on what matters most. The minute you join, you’ll feel it. Not just in the products we build, but in how we show up for each other. Our culture is rooted in integrity, empathy, and shared enthusiasm. We’re in this together, tackling big challenges with bold ideas and genuine care. We look for curious minds and courageous collaborators who bring sun-drenched optimism and drive. Whether you're building smarter solutions, supporting customers, or creating a space where everyone belongs, you’ll do meaningful work with Workmates who’ve got your back. In return, we’ll give you the trust to take risks, the tools to grow, the skills to develop and the support of a company invested in you for the long haul. So, if you want to inspire a brighter work day for everyone, including yourself, you’ve found a match in Workday, and we hope to be a match for you too. About the Team The Workday ML Runtime team is seeking an energetic and determined Software Engineer to design, implement, and deliver highly scalable features for our Machine Learning Runtime platform. As a member of this fast paced group you will have a unique and rewarding opportunity to shape and contribute towards microservices that power Workday Machine Learning features in production. You will partner with Data Scientists, ML Engineers, and other Software Engineers to create the technology that brings these features to life. About the Role Key Responsibilities : Developing frameworks, automation, and tooling to foster a culture of efficiency and innovation. Apply technologies like Kubernetes, Docker, and Python to enhance developer scalability in creating innovative ML Runtime Inference applications. Implementation and operation of distributed systems and software development including the conception, specifying, designing, programming, documenting, testing, and bug fixing involved in creating and maintaining applications, frameworks, or other software components. Developing products and services that empower developers to streamline their interactions with the ML platform. Working with public clouds (such as IAAS, AWS, GCP) and applying capacity management principles. Deploying and orchestrating containers in production environments, including technologies like Containers, Kubernetes, Service Mesh, ArgoCD and related tools. Actively engage with Tech Leads and ML Engineers across teams to elaborate on requirements and drive technical solutions. Own and develop features from end to end including infrastructure as code. Research, evaluate, prototype and drive adoption of new ML tools with reliability and scale in mind Strong dedication to proactively addressing and resolving issues, automating processes, and empowering engineers to self-service their operational needs for improved productivity. Availability for on-call support on a rotational basis. This role will support one or more direct or indirect contracts with the U.S. Federal Government which, due to federal government security requirements, mandates that all Workday personnel working on the contracts be United States citizens (naturalized or native). About You This role may require a security clearance at the TS/SCI w/CI Poly level. Applicants must have the ability to obtain and maintain a U.S. government issued security clearance. An active TS/SCI w/CI Poly is preferred. We need creative and dedicated Software Engineers, like you, who really want to move the needle. You should enjoy working on projects that can significantly improve developer satisfaction and save the company millions of dollars. By nature, you are inquisitive and ready to question the status quo. You have a passion for exploring and implementing innovative techniques and approaches to solve complex and challenging problems. Most importantly of all you are a superlative collaborator and teammate and bring out the very best in everyone. You feel happiest when working with a highly capable and motivated team of people passionate about software and technology.

Requirements

US Citizenship is required.
5 or more years of DevOps experience including Infrastructure automation, building CICD pipelines.
Good in System design and writing comprehensive technical design docs
Proficient in Python programming.
Design, implement, and maintain robust DevOps pipelines for deploying, monitoring, and scaling machine learning runtime environment.
Experience using technologies like Kubernetes/Docker to help developers scale their efforts in creating new and innovative products.
Collaborate with other Machine Learning teams to improve not just the product, but efficiencies in engineering processes.
US Citizenship is required.
3 or more years of DevOps experience including Infrastructure automation, building CICD pipelines.
Good in System design and writing comprehensive technical design docs
Proficient in Python programming.
Design, implement, and maintain robust DevOps pipelines for deploying, monitoring, and scaling machine learning runtime environment.
Experience using technologies like Kubernetes/Docker to help developers scale their efforts in creating new and innovative products.
Collaborate with other Machine Learning teams to improve not just the product, but efficiencies in engineering processes.

Nice To Haves

Machine learning background.
Experience with communication protocols, RESTful services, service-oriented architecture, distributed systems, and microservices.
Building comprehensive monitoring services.
Prior experience with enterprise SaaS products.
Experience with monitoring tools like Grafana.
Passion for creating and maintaining documentation and fixing run books.
Availability for on-call support on a rotating basis.
Proficiency in infrastructure automation tools like Terraform, implementing CI/CD pipelines using Git and Jenkins, and applying continuous deployment tool such as ArgoCD
BS/MS in Computer Science or a related technical field.
Excellent problem-solving skills with a focus on creating and maintaining accurate documentation.
Experience in leading or mentoring other team members and proven team collaboration experience, i.e. understanding group dynamics, effective communication strategies, conflict resolution techniques, and the ability to foster a positive and inclusive team.

Responsibilities

Developing frameworks, automation, and tooling to foster a culture of efficiency and innovation.
Apply technologies like Kubernetes, Docker, and Python to enhance developer scalability in creating innovative ML Runtime Inference applications.
Implementation and operation of distributed systems and software development including the conception, specifying, designing, programming, documenting, testing, and bug fixing involved in creating and maintaining applications, frameworks, or other software components.
Developing products and services that empower developers to streamline their interactions with the ML platform.
Working with public clouds (such as IAAS, AWS, GCP) and applying capacity management principles.
Deploying and orchestrating containers in production environments, including technologies like Containers, Kubernetes, Service Mesh, ArgoCD and related tools.
Actively engage with Tech Leads and ML Engineers across teams to elaborate on requirements and drive technical solutions.
Own and develop features from end to end including infrastructure as code.
Research, evaluate, prototype and drive adoption of new ML tools with reliability and scale in mind
Strong dedication to proactively addressing and resolving issues, automating processes, and empowering engineers to self-service their operational needs for improved productivity.
Availability for on-call support on a rotational basis.