Upload date
All time
Last hour
Today
This week
This month
This year
Type
All
Video
Channel
Playlist
Movie
Duration
Short (< 4 minutes)
Medium (4-20 minutes)
Long (> 20 minutes)
Sort by
Relevance
Rating
View count
Features
HD
Subtitles/CC
Creative Commons
3D
Live
4K
360°
VR180
HDR
5,670 results
stanford cs224n
attention is all you need
attention is all you need paper explained
andrej karpathy tesla
micrograd
nanogpt
transformer architecture
cs25 transformers united v4
491K subscribers
We reproduce the GPT-2 (124M) from scratch. This video covers the whole process: First we build the GPT-2 network, then we ...
287,408 views
2 weeks ago
The Tokenizer is a necessary and pervasive component of Large Language Models (LLMs), where it translates between strings ...
526,545 views
4 months ago
This is a 1 hour general-audience introduction to Large Language Models: the core technical component behind systems like ...
1,981,817 views
7 months ago
We build a Generatively Pretrained Transformer (GPT), following the paper "Attention is All You Need" and OpenAI's GPT-2 ...
4,441,711 views
1 year ago
We take the 2-layer MLP from previous video and make it deeper with a tree-like structure, arriving at a convolutional neural ...
163,938 views
We take the 2-layer MLP (with BatchNorm) from the previous video and backpropagate through it manually without using PyTorch ...
182,530 views
We dive into some of the internals of MLPs with multiple layers and scrutinize the statistics of the forward pass activations, ...
260,942 views
We implement a multilayer perceptron (MLP) character-level language model. In this video we also introduce many basics of ...
293,012 views
We implement a bigram character-level language model, which we will further complexify in followup videos into a modern ...
625,734 views
Prompt: "psychedelic faces" Stable diffusion takes a noise vector as input and samples an image. To create this video I smoothly ...
34,444 views
Since their introduction in 2017, transformers have revolutionized Natural Language Processing (NLP). Now, transformers are ...
628,228 views
GUEST BIO: Andrej Karpathy is a legendary AI researcher, engineer, and educator. He's the former director of AI at Tesla, ...
369,467 views
Elon Musk DUB fires employees in twitter zoom meeting. Elon Musk fires all employees on twitter meeting over random questions ...
12,624,738 views
Head of Tesla Full Self Driving Andrej Karpathy gives a very technical and in-depth presentation at Tesla AI Day on August 19 ...
27,407 views
2 years ago
A complete explanation of all the layers of a Transformer Model: Multi-Head Self-Attention, Positional Encoding, including all the ...
327,058 views
We can't quite leverage GPUs as well in the context of recurrent neural networks now if we consider transformer networks we're ...
339,507 views
4 years ago
A segment on the technology powering self-driving Teslas from Tesla's Autonomy Day 2019.
58,416 views
3 years ago
In this video we read the original transformer paper "Attention is all you need" and implement it from scratch! Attention is all you ...
294,665 views
Large language models usually give great answers, but because they're limited to the training data used to create the model.
566,516 views
10 months ago
Hear from Andrej Karpathy on how Tesla is using PyTorch to develop full self-driving capabilities for its vehicles, including ...
511,952 views
All Credits To Jay Alammar Reference Link: http://jalammar.github.io/illustrated-transformer/ Research Paper: ...
199,756 views
Streamed 3 years ago
This is the second of a series of 3 videos where we demystify Transformer models and explain them with visuals and friendly ...
211,546 views
9 months ago
*Speakers:* * Andrej Karpathy *Session Information:* This video is one of many sessions delivered for the Microsoft Build 2023 ...
653,178 views
2,645,353 views
2 months ago
OUTLINE: 0:00 - Introduction 0:58 - Neural networks 6:01 - Biology 11:32 - Aliens 21:43 - Universe 33:34 - Transformers 41:50 ...
3,110,505 views
837,770 views
Like a Transformer the network becomes a kind of general purpose computer over text so I think that's a kind of like nice way to ...
1,912 views
Transformers are the rage nowadays, but how do they work? This video demystifies the novel neural network architecture with ...
923,242 views
4,100 views
0
172,000 views
5,500 views
19,000 views
28 views
375,000 views
2,700 views
993 views
2,800 views
903 views
555 views
326 views
4,800 views
1,400 views
799 views
803 views
195 views
642 views
1,900 views
8,500 views
86 views
74 views
445 views
4,200 views
132 views
957 views
486 views
GUEST BIO: John Carmack is a legendary programmer, co-founder of id Software, and lead programmer of many revolutionary ...
76,143 views
Transformer Neural Networks are the heart of pretty much everything exciting in AI right now. ChatGPT, Google Translate and ...
600,163 views
11 months ago
Here are a few other relevant resources Build a GPT from scratch, by Andrej Karpathy https://youtu.be/kCc8FmEb1nY If you want a ...
1,215,635 views
270,613 views
This is the most step-by-step spelled-out explanation of backpropagation and training of neural networks. It only assumes basic ...
1,676,279 views