Discussion about this post

User's avatar
Fabian Stemmer's avatar

Thanks for the great overview and summary of recent research papers, this is very useful.

While reading your summary on the Mixtral 8x7B Model, I noticed what I believe to be a mistake in your model size computations though.

You write:

"In total, Mixtral 8x7B comprises 47B parameters. This means that a Mistral 7B model has 9B non-feed-forward parameters" - clearly this cannot be true, right?

It seems you took the 56B parameters one might expect from the 8x7B model and deducted the actual 47B parameters to arrive at the 9B number.

Unless I'm mistaken, the correct math should be as follows:

non-FF + FF = 7B

non-FF + 8*FF = 47B

Solving this I arrive at ~1.3B non-feed-forward parameters and ~5.7 feed-forward parameters in a Mistral 7B model, which comes out at the mentioned 47B total parameters for Mixtral 8x7B and the 13B active parameters for two active experts.

Expand full comment
Andrew Ma's avatar

In section 2. Tuning Language Models by Proxy, you make the comment under subsection "Practical Considerations": "b) It's useful when the large base model (1) is a "black box", and its internal weights are inaccessible.

However, there's a catch: the smaller models must share the same vocabulary as the larger target model. (In theory, if someone knows the vocabulary of GPT-4 and can access its logit outputs, they could create specialized GPT-4 models using this method.)"

This comment reminded me of a result in March 2024, where a couple of teams were able to get the logits and the vocabulary of ChatGPT 3.5 via API calls (Please see: https://www.youtube.com/watch?v=O_eUzrFU6eQ or https://arxiv.org/abs/2403.09539)

I'm not sure if the ChatGPT 3.5's API has been changed since this result to prevent this access, but I wanted to share it because I thought it was relevant.

----

Anyways, thanks for the awesome newsletter! I always enjoy reading your insights into sprawling field of AI research. Please keep up the good work!

Expand full comment
13 more comments...

No posts