ViewTube

Skip

Recommended videos

code_your_own_AI

4:33

The Intersection of Copyright Law and Human Faces: Exploring Virtual K-Pop with MAVE

366 views

1 year ago

Visual Studio Code

6:06:31

VS Code Day 2024

32,565 views

Streamed 10 days ago

code_your_own_AI

14:52

Catching Up w/ AI: Past 5 Days

1,086 views

2 weeks ago

James Briggs

30:27

Vision Transformers (ViT) Explained + Fine-tuning in Python

45,214 views

1 year ago

Visualizing the Self-Attention Head of the Last Layer in DINO ViT: A Unique Perspective on Vision AI

2,033 views

code_your_own_AI

32.4K subscribers

Sat, 18 Feb 2023 00:00:00 GMT

Tags

Vision transformer

ViT pre-trained only with DINO

ViT without fine-tuned

Visualization of Attention Heat Map on an image

In a Colab Notebook we code a visualization of the last layer of the Vision Transformer Encoder stack and analyze the visual output of each of the 12 Attention Heads, given a specific image. Now we understand how a only pre-trained ViT (although with the DINO method) can not always succeed in an image classification (downstream) task. The fine-tuning of the ViT is simply missing - but essential for a better performance. Based on the COLAB NB by Niels Rogge, HuggingFace (all rights with him): https://colab.research.google.com/github/NielsRogge/Transformers-Tutorials/blob/master/DINO/Visualize_self_attention_of_DINO.ipynb In one of my next video we will fine-tune a pre-trained Vision Transformer ViT from scratch. For better image classification performance. #ai #vision #technology

ViewTube

Recommended videos

Visualizing the Self-Attention Head of the Last Layer in DINO ViT: A Unique Perspective on Vision AI

2 Comments