How Good Are the Latest Open LLMs? And Is DPO…

Sebastian Raschka, PhD

May 12, 2024

122

Discussing the Latest Model Releases and AI Research in April 2024

Read →

4 Comments

Dustin Small

May 13, 2024

Wow! Amazing article - clear that you put a lot of work into this. Subscribed!

Expand full comment

Reply (1)

Sebastian Raschka, PhD

May 14, 2024

Thanks for the kind words!

Expand full comment

Trelis Research

May 16, 2024

That's a nice graph you put together. Impressive what Mistral are doing - although probably a bit unfortunate for them Llama 3 70B beats out their 8x22B (although they remain faster).

Have you played with the OpenELM models? I spent three days ORPO fine-tuning them to make a video and they were so bad I had to resort to just putting a few notes in my newsletter.. Pretty disappointing how poorly documented they are (no chat template, no gguf support, no flash attention, no vllm).

Expand full comment

Reply (1)

Sebastian Raschka, PhD

May 17, 2024

Ouch, this sounds frustrating. I haven't really had a chance to do anything with the OpenELM models ... I am just using Llama 3 mostly. But sounds like I am not missing too much 😅. On the other hand, I think the OpenELM paper was great though!

Expand full comment