Discussion about this post

User's avatar
Dustin Small's avatar

Wow! Amazing article - clear that you put a lot of work into this. Subscribed!

Expand full comment
Trelis Research's avatar

That's a nice graph you put together. Impressive what Mistral are doing - although probably a bit unfortunate for them Llama 3 70B beats out their 8x22B (although they remain faster).

Have you played with the OpenELM models? I spent three days ORPO fine-tuning them to make a video and they were so bad I had to resort to just putting a few notes in my newsletter.. Pretty disappointing how poorly documented they are (no chat template, no gguf support, no flash attention, no vllm).

Expand full comment
2 more comments...

No posts