ViewTube

ViewTube
Sign inSign upSubscriptions
Filters

Upload date

Type

Duration

Sort by

Features

Reset

4,519 results

Related queries

llama 3 architecture

flash attention

rotary position embedding

kv cache

decoder only transformer

speculative decoding

multi-head latent attention

attention mechanism

multi head attention

llama 2

self attention

transformer architecture

sweeglu

Machine Learning Studio
Variants of Multi-head attention: Multi-query (MQA) and Grouped-query attention (GQA)

Explore the intricacies of Multihead Attention variants: Multi-Query Attention (MQA) and Grouped-Query Attention (GQA).

8:13
Variants of Multi-head attention: Multi-query (MQA) and Grouped-query attention (GQA)

12,281 views

2 years ago

DataMListic
Multi-Head Attention (MHA), Multi-Query Attention (MQA), Grouped Query Attention (GQA) Explained

In this video, we explore how the Multi-Head Attention (MHA), Multi-Query Attention (MQA) and Grouped-Query Attention (GQA) ...

7:24
Multi-Head Attention (MHA), Multi-Query Attention (MQA), Grouped Query Attention (GQA) Explained

9,334 views

1 year ago

Vizuara
Multi-Query Attention Explained | Dealing with KV Cache Memory Issues Part 1

In this video, we learn everything about the Multi-Query Attention (MQA). MQA was the first solution researchers came up with to ...

37:44
Multi-Query Attention Explained | Dealing with KV Cache Memory Issues Part 1

3,715 views

8 months ago

Rajistics - data science, AI, and machine learning
Transformer Architecture: Fast Attention, Rotary Positional Embeddings, and Multi-Query Attention

Three major improvements to the transformer architecture that everyone should know. They include Fast Attention, Rotary ...

1:21
Transformer Architecture: Fast Attention, Rotary Positional Embeddings, and Multi-Query Attention

872 views

2 years ago

Tales Of Tensors
Why Grouped Query Attention (GQA) Outperforms Multi-head Attention

What if one architecture tweak made Llama 3 5× faster with 99.8% of the quality? In this deep dive, we break down Grouped ...

5:44
Why Grouped Query Attention (GQA) Outperforms Multi-head Attention

133 views

1 month ago

Umar Jamil
LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

Full explanation of the LLaMA 1 and LLaMA 2 model from Meta, including Rotary Positional Embeddings, RMS Normalization, ...

1:10:55
LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

111,269 views

2 years ago

Vizuara
Understand Grouped Query Attention (GQA) | The final frontier before latent attention

In this video, we learn everything about the Grouped Query Attention (GQA). GQA is the middle ground between Multi-Query ...

35:55
Understand Grouped Query Attention (GQA) | The final frontier before latent attention

3,569 views

8 months ago

Data Science Made Easy
What is Multi Query Attention (MQA)?

Multi-Query Attention (MQA) is a variation of the traditional multi-head attention mechanism designed to improve efficiency and ...

1:09
What is Multi Query Attention (MQA)?

3 views

1 month ago

Welch Labs
How DeepSeek Rewrote the Transformer [MLA]

Thanks to KiwiCo for sponsoring today's video! Go to https://www.kiwico.com/welchlabs and use code WELCHLABS for 50% off ...

18:09
How DeepSeek Rewrote the Transformer [MLA]

827,242 views

9 months ago

Chris Hay
Multi-Head vs Grouped Query Attention.  Claude AI, Llama-3, Gemma are choosing speed over quality?

Multi-Head vs Grouped Query Attention. Are Claude, Llama-3, Gemma are choosing speed over quality? frontier model providers ...

20:30
Multi-Head vs Grouped Query Attention. Claude AI, Llama-3, Gemma are choosing speed over quality?

1,586 views

1 year ago

The GenAI POD
Gen AI Transformer Attention - MHA, MQA & GQA

In this session, we'll explore Multi-Head Attention, Multi-Query Attention, and Group Query Attention. We'll discuss their ...

10:58
Gen AI Transformer Attention - MHA, MQA & GQA

175 views

1 year ago

Sachin Kalsi
LLM Jargons Explained: Part 2 - Multi Query Attention & Group Query Attention

In this video, I'll delve into Multi Query Attention (MQA) and Grouped Query Attention (GQA), as well as touch on Multi-Head ...

15:51
LLM Jargons Explained: Part 2 - Multi Query Attention & Group Query Attention

1,314 views

1 year ago

Data Science Made Easy
What is Grouped Query Attention (GQA)

Grouped Query Attention (GQA) is an optimization of Multi-Head Attention designed to balance efficiency and expressiveness in ...

1:02
What is Grouped Query Attention (GQA)

32 views

1 month ago

Super Data Science: ML & AI Podcast with Jon Krohn
Multi-Query vs Multi-Head Attention

Learn all about the open-source libraries developed by Lightning AI, as their Staff Research Engineer, Sebastian Raschka, joins ...

1:40
Multi-Query vs Multi-Head Attention

218 views

1 year ago

Julien Simon
Deep dive - Better Attention layers for Transformer models

00:00 Introduction 03:00 Self-attention 07:20 Multi-Head Attention (MHA) 12:32 Multi-Query Attention (MQA) 18:45 Group-Query ...

40:54
Deep dive - Better Attention layers for Transformer models

14,637 views

1 year ago

Machine Learning Courses
Query, Key and Value Matrix for Attention Mechanisms in Large Language Models

link to full course: https://www.udemy.com/course/mathematics-behind-large-language-models-and-transformers/?

18:21
Query, Key and Value Matrix for Attention Mechanisms in Large Language Models

28,293 views

1 year ago

Umar Jamil
Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm

Full coding of LLaMA 2 from scratch, with full explanation, including Rotary Positional Embedding, RMS Normalization, ...

3:04:11
Coding LLaMA 2 from scratch in PyTorch - KV Cache, Grouped Query Attention, Rotary PE, RMSNorm

60,994 views

2 years ago

Jia-Bin Huang
How Attention Got So Efficient [GQA/MLA/DSA]

... Attention (vector form) 04:26 Attention (matrix form) 07:07 Key-Value caching 09:42 Multi-Query Attention (MQA) 11:03 Grouped ...

29:02
How Attention Got So Efficient [GQA/MLA/DSA]

45,760 views

4 weeks ago

CodeFix
and multi query attention cursor team

Download 1M+ code from https://codegive.com/e5cae12 multi-query attention (mqa) is a variant of the attention mechanism used ...

3:49
and multi query attention cursor team

0 views

11 months ago

Lex Clips
How to make LLMs fast: KV Caching, Speculative Decoding, and Multi-Query Attention | Cursor Team

Lex Fridman Podcast full episode: https://www.youtube.com/watch?v=oFfVt3S51T4 Thank you for listening ❤ Check out our ...

15:15
How to make LLMs fast: KV Caching, Speculative Decoding, and Multi-Query Attention | Cursor Team

11,537 views

1 year ago