14 Comments

It's a real pleasure to read the summary of such a knowledgeable man as you, kudos!

Now, I'm a little bit surprised that you didn't introduce multi-modality in LLMs as a main axis of research/differentiation. Pairing text to vision is already relatively straightforward and there is also sub-modality differentiation in the sound landscape with speech to text enlarged to text to sound (e.g. the Bark model), but this is only the beginning....

IMHO, generalized multi-modality it is a too neglected straightforward path towards AGI as it would solve a good part o the thorny issue of symbol grounding in those models (the other parts being feedback from retro-action from the world to the models, a direct pathway toward the synthesis of genuine evolved intentionality).

With models encompassing our 5 senses and proprioception, they should evolve inner world representations more aligned with human ones. One can even speculate if those LLMs would converge toward an universal underlying neural coding scheme like e.g. a kind of Grossberg' ART refinement, as described in this paper, https://www.mdpi.com/2078-2489/14/2/82

Expand full comment

Thanks for the kind words and suggestions! I didn't cover multi-modality here since Llama 2 and the GPT-4 API don't support this natively. However, I totally agree, and it's actually possible to extend Llama / Llama 2 for multi-modality using Llama-Adapter method, for example, which I covered in previous issues of Ahead of AI (https://magazine.sebastianraschka.com/p/ahead-of-ai-8-the-latest-open-source#§llama-adapter-v).

Expand full comment

Awesome write-up. I tend to try to follow the news as they occur, but you do such a great job in distilling everything that I may just consume your newsletter. I wonder if you use any LLM to help you writing or organizing raw text.

Expand full comment

Thanks! I sometimes use LLMs for mild rewording or grammar checking if I am not happy with a paragraph I've written. However, I haven't found it useful for writing or organizing raw text.

Expand full comment

Thanks for the newlsetter!

I am not sure if I fully understand this sentence: "That's because new knowledge is usually ingested via pretraining, not finetuning; this is also true for open-source models." My impression was that finetuning was the way to inject new knowledge, or what exactly is meant here?

Expand full comment

That's a good point,

I think we (still) don't have any extensive studies on this. An interesting one would be: pretrain an LLM on a dataset that contains everything but medical texts. Then

a) Use it as is

b) Further pretrain it on medical data

c) Finetune it on medical data

d) Throw it out and pretrain a new model on the original data + the medical data

and evaluate it.

Currently, the common perception is that the knowledge had to be present during pretraining or has to be ingested by further pretraining. Finetuning is then used to better extract this excisting knowledge from the model. Or in other words, with finetuning, you guide the model more precisely in terms of what to do, not what it knows.

I do think that for most models, scenario c) above will work great because the model has already seen domain specific data during pretraining. You are basically just refining/finetuning it here.

Some examples include:

1) BloombergGPT (https://arxiv.org/abs/2303.17564), where the researchers pretrained a model based on the regular pretraining data but adding ~50% financial data to the mix.

2) ProtGPT (https://pubmed.ncbi.nlm.nih.gov/35896542/), a transformer pretrained on protein sequences.

3) Goat (https://arxiv.org/abs/2305.14201), a finetuned Llama model that is good on arithmetic tasks, beating GPT-4. Llama model are already capable of arithmetic tasks, here you make them better on those specifically.

If you know of any additional resources that show otherwise, I'd really appreciate it if you could share those!

Expand full comment

I was curious about the unnatural code Llama model. Why didn’t Meta release its weights? Its performance is the closest to GPT-4.

Expand full comment

Good question, and I have no idea why they didn't share it. One reason I could think of is that the performance of this model is actually not that good compared to the regular model. That's because the HumanEval benchmark may be flawed. I.e., the model may work well on the HumanEval benchmark but it doesn't translate that well to real-world performance.

Expand full comment

Thank you for the write up. enjoyed reading it.

I have a question, are there any benchmarking analysis comparing finetuning model weights vs prompt (or prefix) tuning? My understanding is that the later is only preferred since it is a much easier training job, but performance-wise, finetuning the model weights yields better results. is that correct?

Expand full comment

You are correct, finetuning generally leads to better performance (but your mileage varies depending on the finetuning method and dataset; if you only have a handful of labeled examples, then finetuning will of course not be helpful).

There are many instances and studies scattered across the literature, off the top of my head, these two come to mind:

1) ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation (covered here: https://magazine.sebastianraschka.com/p/ai-research-highlights-in-3-sentences-738

2) Goat: Fine-tuned LLaMA Outperforms GPT-4 on Arithmetic Tasks (covered here: https://magazine.sebastianraschka.com/p/ai-research-highlights-in-3-sentences-2a1)

Expand full comment

ditto... Again WOW amazing coverage!

Expand full comment

Thanks Abhinav and Richard, I appreciate the kind words!

Expand full comment

As always an amazing overview!

Expand full comment

Wow amazing coverage!

I got overwhelmed following all the AI news last week. But reading your newsletter is a relief, I feel I am on top of the things now :)

Expand full comment