Machine Learning Engineer (LLMs + RL)

CompanyAgileRL

LocationLondon Area, United Kingdom

Posted At3/2/2026

UK Visa Sponsorship Analytics

Occupation Type

Programmers and software development professionals

Occupation Code Skill LevelHigher Skilled

Sponsorship Salary Threshold

£54,700 (£28.05 per hour)

Occupation rate applies

Above analytics are generated algorithmically based on job titles and may not always be the same as the company's job classification. You can also check detailed occupation eligibility, and salary criteria on our UK Visa Eligible Occupations & Salary Thresholds page.

Disclaimer: Hunt UK Visa Sponsors aggregates job listings from publicly available sources, such as search engines, to assist with your job hunting. We do not claim affiliation with AgileRL. For the most up-to-date job details, please visit the official website by clicking "Apply Now."

Description

At AgileRL, we are on a mission to accelerate reinforcement learning for building superhuman artificial intelligence systems.‍

We believe that reinforcement learning will form a part of every sophisticated AI system of the future. It already impacts the world we live in, from its use in creating LLMs with reasoning capabilities, to enabling autonomous vehicles to make decisions. Reinforcement learning enables AI models to plan and achieve objectives, but currently very few companies or individuals have the resources to leverage this powerful machine learning paradigm.‍

AgileRL offers Arena, an enterprise-grade reinforcement learning operations (RLOps) platform and a state-of-the-art open-source framework to eliminate these barriers to entry. Our framework has already achieved 10x faster training and hyperparameter optimisation than leading RL libraries. Arena, built on top of our open-source framework, is focused on four key areas - simulation, training, deployment and monitoring.‍

We work closely with companies across industries including finance, defence, and technology to deliver best-in-class autonomous solutions. We are looking for talented engineers to join the team and develop the systems and tools that will enable the next wave of impactful AI.

‍As a member of the AgileRL team, you will have the opportunity to be at the forefront of reinforcement learning innovation. We value curiosity, creativity, and a passion for pushing boundaries. Together, we will build not only state-of-the-art software but also a culture of excellence, collaboration, and continuous learning.

We are seeking a talented and experienced Machine Learning Engineer to join our team and contribute to the development of a first-of-its-kind RLOps platform. As a Machine Learning Engineer, you will be responsible for designing, implementing, and maintaining the infrastructure, tools, and services that enable businesses to build and deploy reinforcement learning models efficiently and effectively.

Responsibilities:

Collaborate with the team to understand requirements and design the architecture of the Arena platform and open-source framework.
Develop scalable and reliable infrastructure to support LLM training, reinforcement fine-tuning, model deployment, and management.
Integrate existing machine learning frameworks and libraries into the platform and open-source framework, providing a range of algorithms, environments, and tools for AI model development.

Stay up-to-date with the latest advancements in AI, MLOps, reinforcement learning algorithms, tools, and techniques, and incorporate them into the platform as appropriate.

Provide technical guidance and support to internal users and external customers using the Arena platform and open-source framework.

Requirements:

Master's or Ph.D. degree in Computer Science, Engineering, or a related field, or 3+ years of relevant industry experience.
Solid understanding of LLM training, reinforcement learning algorithms and concepts, with hands-on experience in building and training AI models.
Strong programming skills, with experience using ML frameworks and libraries (e.g. PyTorch, TensorFlow, Ray, Gym, TRL, DeepSpeed, VLLM), and MLOps tools.
Experience in building machine learning platforms or tooling for industrial or enterprise settings.
Proficiency in data management techniques, including storage, retrieval, and pre-processing of large-scale datasets.
Familiarity with model deployment and integration, including the development of APIs and deployment pipelines, and performance optimisation.

Experience in designing and developing cloud-based infrastructure for distributed computing and scalable data processing.

Deep understanding of software engineering and machine learning principles and best practices.

Strong problem-solving and communication skills, and the ability to work independently as well as in a team environment.

Compensation:

Competitive salary + significant stock options.
30 days of holiday, plus bank holidays, per year.
Flexible working from home and 6-month remote working policies.
Enhanced parental leave.
Learning budget of £500 per calendar year for books, training courses and conferences.
Company pension scheme.
Regular team socials and quarterly all-company parties.
Cycle-to-work scheme.
Join the fast-growing AgileRL team and play a key role in the development of cutting-edge reinforcement learning tooling and infrastructure.

Learn more about AgileRL at https://agilerl.com