ViewTube

Skip

Recommended videos

code_your_own_AI

22:24

PyTorch ViT: The Ultimate Guide to Fine-Tuning for Object Identification (COLAB)

4,224 views

1 year ago

Trelis Research

51:06

Fine-tune Multi-modal LLaVA Vision and Language Models

10,873 views

2 months ago

Yannic Kilcher

48:07

OpenAI CLIP: ConnectingText and Images (Paper Explained)

117,697 views

3 years ago

code_your_own_AI

30:50

New Discovery: Retrieval Heads for Long Context

1,723 views

1 day ago

Chat with your Image! BLIP-2 connects Q-Former w/ VISION-LANGUAGE models (ViT & T5 LLM)

5,314 views

118

code_your_own_AI

32.4K subscribers

Fri, 10 Mar 2023 00:00:00 GMT

Tags

Vision Transformer

LLM

Flan-T5

Flan-T5-XXL

BLIP-2

Combined Vision-Language Transformers, interlinked w/ a Q-Former, a Querying Transformer! BLIP 2. BLIP-2! The financial resources for pre-training both systems (Vision and Language) are astronomical? Let me introduce you to a clever, new training method: BLIP-2. Multimodal Large Language Models for visual QA or perception-language tasks, multimodal dialogue, or image captioning, and image recognition with verbal content descriptions, plus a Chat function. Visual Perception and Large Language Models: The new combination in Transformers. Multi-modal Large Language Models for visual QA or image captioning. All rights and credits w/: BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models https://arxiv.org/abs/2301.12597 #ai #machinelearning #chatgpt #vision #llm #BLIP2 #QFormer

ViewTube

Recommended videos

Chat with your Image! BLIP-2 connects Q-Former w/ VISION-LANGUAGE models (ViT & T5 LLM)

17 Comments