32 Comments
Dec 30, 2023Liked by Sebastian Raschka, PhD

I'd add the recent Medprompt paper that demonstrated how effective prompting strategies can enable a generalized model like GPT-4 to outperform a specialized fine-tuned model such as Google's Med-PaLM https://arxiv.org/abs/2311.16452

It shows the potential we have yet to explore with such LLMs that can be applied to smaller models as well, substantially boosting their performance at a fraction of the size, cost, and latency.

Expand full comment
Dec 30, 2023Liked by Sebastian Raschka, PhD

On the Bloomberg piece... It was confusing to me why Option 3 was different than Option 5. I sense that I am missed a key contrast, perhaps between full-from-scratch-training and fine-tuning. Good practical point about $100 versus $millions. 👍

PS: SUPER!!! Another most-excellent textbook from SR. I got it! Minor note... Your 45% discount was not accepted since Manning already discounts the ebook by 50%.

PSS: You are missing an opportunity with this new textbook. What about a chapter on 'Beyond Language To Multi-Modal'? The term LLM is aging; it should LxM for both pretraining inputs and generative outputs.

Expand full comment
Dec 30, 2023Liked by Sebastian Raschka, PhD

Thanks for this, especially appreciated the diagram for fine tuning models on domain specific dataset. It would be great if you can expand on that a bit in your upcoming blogs. I see these models performing increasingly well on academic datasets but I feel it's really limiting to use LLMs just through prompting to customize for a domain specific dataset. I am also reading your book (first 2 chapters) and enjoying it.

Expand full comment
Dec 30, 2023Liked by Sebastian Raschka, PhD

Thank you for all your generous contributions to my AI Learning Journey

Expand full comment
Dec 30, 2023Liked by Sebastian Raschka, PhD

Very insightful summary.

I was hoping that the Amber paper would make it for the same reasons, releasing the weights, data and methodology used to train the model.

Expand full comment
Dec 31, 2023Liked by Sebastian Raschka, PhD

FWIW, although Axis of Ordinary is the only daily Substack that I regularly read, Ahead of AI has been my favorite and only must-read Substack -- and I subscribe to over three dozen AI-related Substacks.

Keep up the great work in 2024!

Expand full comment
Dec 31, 2023Liked by Sebastian Raschka, PhD

I’m curious what you think of the Mamba paper(https://arxiv.org/pdf/2312.00752.pdf) and how it stacks up. It is relatively new, but has shown potential for sub-quadratic scaling

Expand full comment
Dec 31, 2023Liked by Sebastian Raschka, PhD

Hi Sebastian,

Thanks for a great post again! And Happy new year 2024!

Expand full comment
Dec 30, 2023Liked by Sebastian Raschka, PhD

These are wonderful recommendations! I wonder where the LIMA paper ranks :)

Wish you a wonderful new year and thanks ever so much for all your work!

Expand full comment
Dec 31, 2023Liked by Sebastian Raschka, PhD

Dr. Raschka,

Have you ever had this discussion? Back in the early 1990’s I worked with a few thousand other engineers and scientists within the DoD and DOE laboratory systems. We tried to change people’s minds within our community about one little thing but we could not overcome the great weight of the uneducated but very greedy entrepreneurs who were in love with the term “Artificial Intelligence” (AI). AI sold programs. AI sucked in the investors. But there is no such thing as AI and never will be. The best that our sciences and engineering will ever do is to Mimic Intelligence (MI). The associative engine which is the brain spews out thoughts that can only be mimicked by the best of our code writers. The papers that you provided are wonderful only in so far as their authors were able to capture and articulate the intellectual products of their own minds, i.e. real intelligence.

In all of my years of work, I never met anyone who wanted to use the term MI instead of AI even though they knew that AI was a myth. Are all of us in the scientific world so greedy that we are willing to put belief systems first even when the facts are glaringly obvious? We do all the brilliant people who have the brilliant thoughts an injustice when we lead our users into believing that the codes are intelligent, even artificially.

All the best,

David

Expand full comment
Jan 2Liked by Sebastian Raschka, PhD

Love your content, would you have any plans to post content on large multimodal models (LMMs) anytime in the near future?

Expand full comment
Jan 1Liked by Sebastian Raschka, PhD

Are 7B language models in the middle of their own Moore's Law-esque curve with respect to performance? It seems like more and more, new foundation models are being trained up to 7B parameters - and outclassing 70B parameter models.

I'm guessing there are big resource limitations on how frequently you can train a 70B parameter model, which makes me think we'll see more efficiency gains applied at smaller sizes.

Expand full comment

I have developed a Kaggle notebook to Learn TPU v3.8 + Kaggle + LLM Red Teaming For 20 Hours / Week Free. Running Models on TPUs are super fast!!!

Try out the link & share - https://www.kaggle.com/code/jaycneo/gemma-tpu-llm-red-teaming-notebook-detoxio-ai/

Expand full comment