← all jobs

Research Scientist - RL Training

Work from home Full-time role Hiring

ABOUT THE ROLE We're looking for a Research Scientist to work on reinforcement learning for training and aligning large language models. This is a foundational research role focused on one of the most consequential open data problems in AI: how to generate the data, reward signals, and training procedures that steer LLM behavior in reliable and generalizable directions - and a core capability that directly differentiates Snorkel's data-as-a-service offering. You'll work closely with Snorkel's research, engineering, and delivery teams to advance our RL data capabilities - translating research ideas into the preference datasets, reward models, and RL-ready corpora we produce for frontier AI labs, and contributing to a research agenda that is central to Snorkel's long-term differentiation as a provider of bespoke training data. MAIN RESPONSIBILITIES

  • Research and implement reinforcement learning techniques - including GRPO, RLHF, RLAIF, DPO, and reward modeling - and translate them into data products (preference datasets, reward signals, verifiable rewards) that customers can use to train and fine-tune large language models.
  • Design and build data pipelines that generate high-quality training signal for RL workflows, including AI-assisted data annotation and curation data pipelines to improve model generalization to unseen benchmarks .
  • Prototype and iterate on end-to-end RL training recipes that inform what data Snorkel ships as part of its data-as-a-service deliveries.
  • Work closely with research scientists, ML engineers, and delivery teams to translate RL research into customer-ready data products.
  • Stay current with the latest developments in large-scale muli-node LLM training, alignment research, and scalable RL methods (on complex environments such as Terminal-Bench), bringing relevant advances into Snorkel's data-as-a-service approach.
  • Contribute to Snorkel's research publications and internal knowledge base in RL and model training.

PREFERRED QUALIFICATIONS

  • Deep expertise in reinforcement learning from human or AI feedback, reward modeling and credit attribution ideally with a clear perspective on what data makes these techniques work.
  • Experience training or fine-tuning 30B+ large language models at scale, including familiarity with distributed training infrastructure.
  • Strong proficiency in Python and ML frameworks, especially PyTorch and HuggingFace and hands-on experience with RL frameworks such as Verl and SkyRL.
  • Solid software engineering fundamentals - you can build research prototypes that others can run, extend, and integrate into data production workflows.
  • Familiarity with ML infrastructure and cloud platforms and tools (AWS, GCP, Kubernetes, Slurm, etc.); experience with large-scale RL training pipelines a strong plus.
  • Comfort operating in a high-iteration environment with open-ended research questions and shifting, customer-driven technical constraints.
  • Ph.D. in machine learning, reinforcement learning, or a related field strongly preferred; exceptional industry experience considered.

Salary Range $200,000-$325,000 USD

More open positions

Post-Irradiation Examination (PIE) Research Scientist

Work from home Full-time role

Senior Research Scientist, Reward Models

Work from home Full-time role

[Remote] Bioinformatics Scientist (Remote)

Work from home Full-time role

Remote Bioinformatics

Work from home Full-time role

[Remote] Bioinformatics Scientist | $60/hr Remote

Work from home Full-time role

[Hiring] Bioinformatics & Single-Cell Genomics Consultant @24-MAG

Work from home Full-time role

Director of Operations, Creative (Internal Agency) (Hybrid: Onsite and Remote)

Work from home Full-time role

Maximus, Quality Assurance Coordinator (Remote in New York) - Application via WayUp

Work from home Full-time role

Remote Data Entry Specialist – Work From Home Opportunity with Competitive $30/Hour Pay at careerzynith

Work from home Full-time role

Neuroanalyst

Work from home Full-time role

Property Accountant

Work from home Full-time role

Experienced Customer Success Manager – Driving Client Satisfaction and Growth at careerzynith

Work from home Full-time role

Marketing Coordinator (Onsite or Remote)

Work from home Full-time role

Senior Director, Campaigns & Content

Work from home Full-time role

Senior Governance Risk and Compliance (GRC) Analyst and Team Lead

Work from home Full-time role

Reinforcement Learning Engineer

Work from home Full-time role

Underwriter II

Work from home Full-time role

Remote Part-Time Data Entry Specialist – Flexible Schedule, Competitive Hourly Rate, and Professional Development at careerzynith

Work from home Full-time role

Learning Designer

Work from home Full-time role

Remote Product Tester - Android Apps (No Experience / Part-Time)

Work from home Full-time role

Remote Customer Service Representative – Healthcare Technology Support (California)

Work from home Full-time role