Over 10 mio. titler Fri fragt ved køb over 499,- Hurtig levering 30 dages retur

Multimodal Foundation Models

Bog
  • Format
  • Bog, paperback
  • Engelsk
  • 230 sider

Beskrivelse

This monograph presents a comprehensive survey of the taxonomy and evolution of multimodal foundation models that demonstrate vision and vision-language capabilities, focusing on the transition from specialist models to general-purpose assistants.



The focus encompasses five core topics, categorized into two classes; (i) a survey of well-established research areas: multimodal foundation models pre-trained for specific purposes, including two topics - methods of learning vision backbones for visual understanding and text-to-image generation; (ii) recent advances in exploratory, open research areas: multimodal foundation models that aim to play the role of general-purpose assistants, including three topics - unified vision models inspired by large language models (LLMs), end-to-end training of multimodal LLMs, and chaining multimodal tools with LLMs.



The target audience of the monograph is researchers, graduate students, and professionals in computer vision and vision-language multimodal communities who are eager to learn the basics and recent advances in multimodal foundation models.

Læs hele beskrivelsen
Detaljer
  • SprogEngelsk
  • Sidetal230
  • Udgivelsesdato06-05-2024
  • ISBN139781638283362
  • Forlag Now Publishers Inc
  • FormatPaperback
Størrelse og vægt
  • Vægt357 g
  • Dybde1,3 cm
  • coffee cup img
    10 cm
    book img
    15,6 cm
    23,4 cm

    Findes i disse kategorier...

    Machine Name: SAXO082