2 Comments
⭠ Return to thread

RewardBench lead author here, a couple notes:

* We're working on your training correlation caveat :)

* Now we're at the phase where we are getting closed models added to the benchmark to show the gap open needs to close (because good alignment capabilities are important for good societal outcomes.

* The leaderboard is characterized by the design space not being well explored, so more DPO models exist because they're popular. I don't expect this to change too much, but already more people are training RMs since release! (a specific training blog post here: https://www.notion.so/Reward-Modeling-for-RLHF-abe03f9afdac42b9a5bee746844518d0?pvs=21)

Great summary of the benchmark. Keep up the great work.

https://huggingface.co/spaces/allenai/reward-bench

Expand full comment

Thanks for the informative comment, and I am looking forward to that correlation analysis (haha, but no rush!)

Expand full comment