reward-model

Star

Here are 21 public repositories matching this topic...

wendell0218 / Awesome-RL-for-Video-Generation

Star

A curated list of papers on reinforcement learning for video generation

reinforcement-learning ppo video-generation dpo reward-model grpo

Updated Oct 10, 2025

Westlake-AI / SemiReward

Star

[ICLR 2024] SemiReward: A General Reward Model for Semi-supervised Learning

machine-learning natural-language-processing computer-vision regression transformer semi-supervised-learning audio-classification weakly-supervised-learning yahoo-answers cifar-100 label-noise esc-50 vision-transformer reward-model

Updated Jun 10, 2024
Python

bobxwu / learning-from-rewards-llm-papers

Star

A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward models and learning strategies across training, inference, and post-inference stages.

reinforcement-learning post-training self-correction reward-learning large-language-models llm llms reward-models reward-model reward-modeling guided-decoding test-time-scaling

Updated Jun 13, 2025

tongjingqi / Awesome-Agent-RL

Star

A curated list of awesome resources about reward construction for AI agents. This repository covers cutting-edge research, and practical guides on defining and collecting rewards to build more intelligent and aligned AI agents.

agent awesome reinforcement-learning rl awesome-list llm reward-model agentic-ai rlvr agent-training

Updated Sep 1, 2025

InternLM / Spark

Star

An official implementation of "SPARK: Synergistic Policy And Reward Co-Evolving Framework"

self-improvement multi-modal large-language-models vision-language-model reward-model large-vision-language-models self-rewarding math-reasoning

Updated Oct 9, 2025
Python

yeyimilk / CrowdVLM-R1

Star

Proposed fuzzy reward model with GRPO to improve VLM's abilities in crowd counting task.

reinforcement-learning vlm crowdcounting llm reward-model r1-zero vlm-r1 multimodal-r1

Updated Apr 11, 2025
Python

rochitasundar / Generative-AI-with-Large-Language-Models

Star

This repository contains the lab work for Coursera course on "Generative AI with Large Language Models".

reinforcement-learning transformer kl-divergence proximal-policy-optimization large-language-models prompt-engineering flan-t5 instruction-finetuning low-rank-adaptation reward-model parameter-efficient-fine-tuning llm-evaluation

Updated Dec 1, 2023
Jupyter Notebook

itaychachy / RewardSDS

Star

Official PyTorch Implementation for the "RewardSDS: Aligning Score Distillation via Reward-Weighted Sampling" paper!

ai computer-vision 3d-generation mechine-learning reward-model

Updated Jun 10, 2025
Python

NiuTrans / GRAM

Star

Code for ICML 2025 paper "GRAM: A Generative Foundation Reward Model for Reward Generalization"

generative generalization rlhf reward-model

Updated Sep 4, 2025
Python

AlignRM / CheemsRM

Star

ACL'25: Cheems: A Practical Guidance for Building and Evaluating Chinese Reward Models from Scratch

reinforcement-learning large-language-model reward-model

Updated Jun 10, 2025
Python

hlp-ai / miniChatGPT

Star

Mini ChatGPT

pytorch ppo sft gpt2 chatgpt instructgpt reward-model

Updated May 12, 2023
Python

taishan1994 / Reward-Model-Finetuning

Star

专门用于训练奖励模型的仓库。

reward-model qwen2

Updated Aug 7, 2024
Python

techandy42 / LLM_Reward_Model

Star

Developing a LLM response ranking reward model using HFRL except it's GPT-3.5 instead of human.

language-model reward-model hfrl

Updated Dec 28, 2023
Jupyter Notebook

kaicheng001 / Awesome-R1

Star

A curated list of research papers, models, and resources related to R1-style reasoning models following DeepSeek-R1's breakthrough in January 2025.

awesome thinking r1 vlm lmm llm mllm reward-model reasoning-models deepseek-r1

Updated Jul 2, 2025

m-serious / module-reward-models

Star

This project implements a multi-module reward model training system designed to further RL finetune agent performance across complex multi-turn tasks.

reinforcement-learning-agent reward-model

Updated Aug 16, 2025
Python

kantkrishan0206-crypto / AlignGPT

Star

“This project implements a mini LLM alignment pipeline using Reinforcement Learning from Human Feedback (RLHF). It includes training a reward model from human-annotated preference data, fine-tuning the language model via policy optimization, and performing ablation studies to evaluate robustness, fairness, and alignment trade-offs.”

python nlp machine-learning deep-learning transformers pytorch alignment language-models tokenization ai-safety fine-tuning preference-learning ppo policy-optimization dpo human-feedback rlhf reward-model

Updated Oct 11, 2025
Python

sebastianpinedaar / llumux

Star

Compose, train and test fast LLM routers

model-selection automl-pipeline large-language-models llms reward-model llm-training llm-inference llm-evaluation llm-pipeline llm-routing

Updated Oct 8, 2025
Python

thisisHJLee / RLHF

Star

nlp reinforcement-learning language-model ppo rlhf supervised-finetuning reward-model

Updated Jul 20, 2023

ritikdhame / Reward-Model-for-Evaluating-Machine-Translations

Star

A reward model to evaluate machine translations, focusing on English-to-Spanish sentence pairs, with applications in natural language processing (NLP), translation quality assessment, and multilingual content adaptation

nlp reinforcement-learning bert reward-model llm-inference

Updated Oct 6, 2025
Jupyter Notebook

RuvenGuna94 / Dialogue-Summary-remove-toxic-text-PPO

Star

Fine-tuning FLAN-T5 with PPO and PEFT to generate less toxic text summaries. This notebook leverages Meta AI's hate speech reward model and utilizes RLHF techniques for improved safety.

nlp toxic-comment-classification hate-speech-detection toxicity-analysis ppo-pytorch dialogue-summarization generative-ai detoxification reward-model

Updated Jan 4, 2025
Jupyter Notebook

Improve this page

Add a description, image, and links to the reward-model topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the reward-model topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reward-model

Here are 21 public repositories matching this topic...

wendell0218 / Awesome-RL-for-Video-Generation

Westlake-AI / SemiReward

bobxwu / learning-from-rewards-llm-papers

tongjingqi / Awesome-Agent-RL

InternLM / Spark

yeyimilk / CrowdVLM-R1

rochitasundar / Generative-AI-with-Large-Language-Models

itaychachy / RewardSDS

NiuTrans / GRAM

AlignRM / CheemsRM

hlp-ai / miniChatGPT

taishan1994 / Reward-Model-Finetuning

techandy42 / LLM_Reward_Model

kaicheng001 / Awesome-R1

m-serious / module-reward-models

kantkrishan0206-crypto / AlignGPT

sebastianpinedaar / llumux

thisisHJLee / RLHF

ritikdhame / Reward-Model-for-Evaluating-Machine-Translations

RuvenGuna94 / Dialogue-Summary-remove-toxic-text-PPO

Improve this page

Add this topic to your repo