A sampling of recent happenings in the multimodal space. Be sure to expect more this year.
This is AI generated audio with Python and 11Labs
Source code: https://github.com/natolambert/interconnects-tools
Original post: https://www.interconnects.ai/p/multimodal-rlhf
00:00 Multimodal LM roundup: Unified IO 2, inputs and outputs, Gemini, LLaVA-RLHF, and RLHF questions
02:46 Unified IO 2: Scaling multi-input, multi-output model pretraining
07:47 Collecting preference data for images
09:31 LLaVA-RLHF: The first experiments in multimodal RLHF fine-tuning
13:20 Multimodal RLHF questions, ideas, and resources