Discussion about this post

User's avatar
Lewis Tunstall's avatar

Very nice summary Sebastian!

FYI from correspondence I had with the Llama 2 authors, rejection sampling is done in an “offline” fashion where one first generates K samples per prompt in the dataset, then applies ranking + SFT.

You can read more about this here: https://huggingface.co/papers/2307.09288#64c6961115bd12e5798b9e3f

Expand full comment
Jek's avatar

The ability to simplify and distill complex topics is essential in today's world. Well written—thank you.

Expand full comment
8 more comments...

No posts