Discussion about this post

User's avatar
Aarom's avatar

Hi Sebastian!

re: instruction tuning, I'm wondering if it would ever make sense to either

1. weight the loss function to consider output tokens *more* than instruction tokens

2. train on the full task for an epoch (or therabouts) and then on only the output

Expand full comment
Dominika's avatar

Hi, great post, as always!

You mention that the authors of the instruction tuning paper do not mask the template, however, from this fragment of the paper: "The loss function, L, for instruction modelling calculates the negative log-likelihood for both instruction and completion tokens, excluding any prompt template tokens. " I am not sure if that is true?

Do I miss something?

Expand full comment
7 more comments...

No posts