Multimodality and Large Multimodal Models (LMMs)

📰 Chip Huyen's Blog

For a long time, each ML model operated in one data mode – text (translation, language modeling), image (object detection, image classification), or audio (speech recognition). However, natural intelligence is not limited to just a single modality. Humans can read, talk, and see. We listen to music to relax and watch out for strange noises to detect danger. Being able to work with multimodal data is essential for us or any AI to operate in the real world. OpenAI noted in their <

Published 10 Oct 2023
Read full article → ← Back to Reads