RL Environments Engineer

Custom Software Development Company

1d•$34,000 - $44,000•Remote

About The Position

Hello World! We are The Codest - International Tech Software Company with tech hubs in Poland delivering global IT solutions and projects. Our core values lie in “Customers and People First” approach that prioritises the needs of our customers and a collaborative environment for our employees, enabling us to deliver exceptional products and services. Our expertise centers on web development, cloud engineering, DevOps and quality. After many years of developing our own product - Yieldbird, which was honored as a laureate of the prestigious Top25 Deloitte awards, we arrived at our mission: to help tech companies build impactful product and scale their IT teams through boosting IT delivery performance. Through our extensive experience with product development challenges, we have become experts in building digital products and scaling IT teams. But our journey does not end here - we want to continue our growth. If you’re goal-driven and looking for new opportunities, join our team! What awaits you is an enriching and collaborative environment that fosters your growth at every step. We are currently looking for: RL Environments Engineer Our client builds reinforcement learning environments and training tasks for frontier AI labs. The work is technical, research-adjacent, and hands-on. We're not looking for web developers or backend engineers who have used LLM APIs.

Requirements

Experience with PyTorch or JAX at the framework level (not just importing a model)
Familiarity with RL concepts: reward functions, environment design, training loops, evaluation
Ability to read ML papers and implement them. This is a core part of the job. If someone hasn't reproduced or extended a research result, they'll struggle here.
Production Python skills: Docker, git, clean code, reproducible environments. Notebooks-only people won't work.
Exposure to any of: model training/finetuning, inference optimization, CUDA/Triton kernels, distributed training, model internals (attention, KV caches, tokenizers)

Nice To Haves

Publications or competitive programming background
Experience with MuJoCo, game environments, or simulation frameworks
Scientific computing (Rust, C++, numerical methods)

Responsibilities

Design and build MLE/SWE environments and diverse tasks.
Target a specified language model and satisfy the required difficulty distribution.

Benefits

34 - 44k PLN (B2B/useme)
100% remote work (but we have offices in Krakow and Warsaw and we’re happy to meet there from time to time 😉)
300 PLN to use on our benefits platform, Worksmile - gift cards, medical services, sports, etc.
Our B2B contract contains provisions that allow you to obtain IP BOX support
Integration events, education opportunities and much more…
A unique opportunity to take your career to the next level - we’re looking for people who want to create an impact. You have ideas, we want to hear them!

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume