Discussion about this post

User's avatar
Xiaolong's avatar

Thank you for sharing it!

However, in the last "overview" diagram, the "Method" of Molmo and NVLM seems to be filled in incorrectly. That is, "Both + Hybrid" should correspond to NVLM instead of Molmo.

Expand full comment
Daniel Kleine's avatar

Great overview! How do you find the time to draw all those detailed visualizations? ;)

A question to 2.1.3: So for training the model, the input text must be a description of the image, right?

Expand full comment
49 more comments...

No posts