ViewTube

ViewTube
Sign inSign upSubscriptions
Filters

Upload date

Type

Duration

Sort by

Features

Reset

6,759,791 results

Related queries

paged attention

yannic kilcher

group query attention

kv cache

mixture of experts

cuda programming

Jia-Bin Huang
How FlashAttention Accelerates Generative AI Revolution

FlashAttention is an IO-aware algorithm for computing attention used in Transformers. It's fast, memory-efficient, and exact.

11:54
How FlashAttention Accelerates Generative AI Revolution

23,465 views

1 year ago

Umar Jamil
Flash Attention derived and coded from first principles with Triton (Python)

In this video, I'll be deriving and coding Flash Attention from scratch. I'll be deriving every operation we do in Flash Attention using ...

7:38:18
Flash Attention derived and coded from first principles with Triton (Python)

70,588 views

1 year ago

Machine Learning Studio
FlashAttention: Accelerate LLM training

In this video, we cover FlashAttention. FlashAttention is an Io-aware attention algorithm that significantly accelerates the training of ...

11:27
FlashAttention: Accelerate LLM training

7,850 views

1 year ago

Stanford MedAI
MedAI #54: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | Tri Dao

Title: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness Speaker: Tri Dao Abstract: Transformers are ...

47:47
MedAI #54: FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness | Tri Dao

20,213 views

3 years ago

Stanford MLSys Seminars
FlashAttention - Tri Dao | Stanford MLSys #67

Episode 67 of the Stanford MLSys Seminar “Foundation Models Limited Series”! Speaker: Tri Dao Abstract: Transformers are slow ...

58:58
FlashAttention - Tri Dao | Stanford MLSys #67

38,221 views

Streamed 2 years ago

GPU MODE
How FlashAttention 4 Works

Speaker: Charles Frye From the Modal team: https://modal.com/blog/reverse-engineer-flash-attention-4.

1:15:09
How FlashAttention 4 Works

3,730 views

Streamed 2 months ago

Tales Of Tensors
Flash Attention: The Fastest Attention Mechanism?

This video explains FlashAttention-1, FlashAttention-2, and FlashAttention-3 in a clear, visual, step-by-step way. We look at why ...

8:43
Flash Attention: The Fastest Attention Mechanism?

614 views

4 weeks ago

Aleksa Gordić - The AI Epiphany
Flash Attention 2.0 with Tri Dao (author)! | Discord server talks

Become The AI Epiphany Patreon ❤️ https://www.patreon.com/theaiepiphany ‍ ‍ ‍ Join our Discord community ...

1:00:25
Flash Attention 2.0 with Tri Dao (author)! | Discord server talks

22,637 views

2 years ago

GPU MODE
Lecture 80: How FlashAttention 4 Works

Speaker: Charles Frye The source code (in CuTe) for FlashAttention4 on Blackwell GPUs has recently been released for the ...

1:15:09
Lecture 80: How FlashAttention 4 Works

4,696 views

2 months ago

Martin Is A Dad
FlashAttention V1 Deep Dive By Google Engineer | Fast and Memory-Efficient LLM Training

... #flash #maths #machinelearning 0:00 Intro 0:56 CPU and GPU Memory Hierarchy 4:29 Standard Attention 8:26 Flash Attention ...

34:38
FlashAttention V1 Deep Dive By Google Engineer | Fast and Memory-Efficient LLM Training

1,517 views

7 months ago

Data Science Gems
Flash Attention

Flash attention has become very popular recently for efficient training. It is an IO-aware exact attention method. It reduces ...

26:35
Flash Attention

6,162 views

2 years ago

3Blue1Brown
Attention in transformers, step-by-step | Deep Learning Chapter 6

Demystifying attention, the key mechanism inside transformers and LLMs. Instead of sponsored ad reads, these lessons are ...

26:10
Attention in transformers, step-by-step | Deep Learning Chapter 6

3,505,780 views

1 year ago

GPU MODE
Lecture 12: Flash Attention

Uh so I'm short selling you a bit if you wanted to have live coding of the fastest flash attention Kel uh because I was a bit foolishly ...

1:12:14
Lecture 12: Flash Attention

7,151 views

1 year ago

Sachin Kalsi
ELI5 FlashAttention: Understanding GPU Architecture - Part 1

In "FlashAttention - Understanding how GPU works - Part 1," we unravel the mechanisms behind FlashAttention, in short, and its ...

25:46
ELI5 FlashAttention: Understanding GPU Architecture - Part 1

10,011 views

2 years ago

Benji’s AI Playground
How To Install Flash Attention On Windows

Learn how to install Flash Attention on Windows for your ComfyUI setup in this step-by-step tutorial! Discover the easiest method ...

3:33
How To Install Flash Attention On Windows

6,433 views

6 months ago

Unify
Flash Attention Explained

In this episode, we explore the Flash Attention algorithm with our esteemed guest speaker, Dan Fu, renowned researcher at ...

57:20
Flash Attention Explained

5,538 views

Streamed 2 years ago

GPU MODE
Lecture 36: CUTLASS and Flash Attention 3

Speaker: Jay Shah Slides: https://github.com/cuda-mode/lectures Correction by Jay: "It turns out I inserted the wrong image for the ...

1:49:16
Lecture 36: CUTLASS and Flash Attention 3

8,867 views

1 year ago

GPU MODE
Lecture 50: A learning journey CUDA, Triton, Flash Attention

Speaker: Umar Jamil.

1:20:43
Lecture 50: A learning journey CUDA, Triton, Flash Attention

9,098 views

9 months ago

CodersLegacy
How to install Flash Attention 2 on Windows (Easy Solution)

Flash Attention Wheels: https://huggingface.co/lldacing/flash-attention-windows-wheel ☕️ Buy Me A Coffee and Support the ...

4:08
How to install Flash Attention 2 on Windows (Easy Solution)

2,827 views

6 months ago

Julien Simon
Deep dive - Better Attention layers for Transformer models

... namely Multi-Query Attention, Group-Query Attention, Sliding Window Attention, Flash Attention v1 and v2, and Paged Attention.

40:54
Deep dive - Better Attention layers for Transformer models

14,637 views

1 year ago

Faradawn Yang
LLM Optimization Lecture 4: Grouped Query Attention, Paged Attention, Flash Attention

The forward pass time is 0.037 seconds what if we swap into the flash attention comment in this flash infer function using the QKV ...

43:48
LLM Optimization Lecture 4: Grouped Query Attention, Paged Attention, Flash Attention

427 views

1 month ago

Efficient NLP
The KV Cache: Memory Usage in Transformers

Try Voice Writer - speak your thoughts and let AI handle the grammar: https://voicewriter.io The KV cache is what takes up the bulk ...

8:33
The KV Cache: Memory Usage in Transformers

91,177 views

2 years ago

SulemanX 🇵🇸
Barry x Thea Edit | The Flash | Arrow | Attention

barryallen #barryallenedit #barryallenedits #grantgustin #theflash #theflashedit #theflashedits #dc #theflashseason9 #theaqueen ...

0:20
Barry x Thea Edit | The Flash | Arrow | Attention

9,767 views

2 years ago

YanAITalk
LLM inference optimization: Architecture, KV cache and Flash attention

... uh like uh like proposed for different stages there's like a flash attention uh for prefill like flash decoding so anything with a flash ...

44:06
LLM inference optimization: Architecture, KV cache and Flash attention

13,789 views

1 year ago