ViewTube

ViewTube
Sign inSign upSubscriptions
Filters

Upload date

Type

Duration

Sort by

Features

Reset

4,509 results

Related queries

llama 3 architecture

rotary position embedding

speculative decoding

decoder only transformer

kv cache

multi-head latent attention

multi head attention

flash attention

llama 2

attention mechanism

self attention

transformer architecture

sweeglu

Machine Learning Studio
Variants of Multi-head attention: Multi-query (MQA) and Grouped-query attention (GQA)

Explore the intricacies of Multihead Attention variants: Multi-Query Attention (MQA) and Grouped-Query Attention (GQA).

8:13
Variants of Multi-head attention: Multi-query (MQA) and Grouped-query attention (GQA)

12,293 views

2 years ago

DataMListic
Multi-Head Attention (MHA), Multi-Query Attention (MQA), Grouped Query Attention (GQA) Explained

In this video, we explore how the Multi-Head Attention (MHA), Multi-Query Attention (MQA) and Grouped-Query Attention (GQA) ...

7:24
Multi-Head Attention (MHA), Multi-Query Attention (MQA), Grouped Query Attention (GQA) Explained

9,342 views

1 year ago

Vizuara
Multi-Query Attention Explained | Dealing with KV Cache Memory Issues Part 1

In this video, we learn everything about the Multi-Query Attention (MQA). MQA was the first solution researchers came up with to ...

37:44
Multi-Query Attention Explained | Dealing with KV Cache Memory Issues Part 1

3,719 views

8 months ago

Vizuara
Understand Grouped Query Attention (GQA) | The final frontier before latent attention

In this video, we learn everything about the Grouped Query Attention (GQA). GQA is the middle ground between Multi-Query ...

35:55
Understand Grouped Query Attention (GQA) | The final frontier before latent attention

3,577 views

8 months ago

Tales Of Tensors
Why Grouped Query Attention (GQA) Outperforms Multi-head Attention

What if one architecture tweak made Llama 3 5× faster with 99.8% of the quality? In this deep dive, we break down Grouped ...

5:44
Why Grouped Query Attention (GQA) Outperforms Multi-head Attention

134 views

1 month ago

Rajistics - data science, AI, and machine learning
Transformer Architecture: Fast Attention, Rotary Positional Embeddings, and Multi-Query Attention

Three major improvements to the transformer architecture that everyone should know. They include Fast Attention, Rotary ...

1:21
Transformer Architecture: Fast Attention, Rotary Positional Embeddings, and Multi-Query Attention

872 views

2 years ago

Welch Labs
How DeepSeek Rewrote the Transformer [MLA]

Thanks to KiwiCo for sponsoring today's video! Go to https://www.kiwico.com/welchlabs and use code WELCHLABS for 50% off ...

18:09
How DeepSeek Rewrote the Transformer [MLA]

827,955 views

9 months ago

Umar Jamil
LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

Full explanation of the LLaMA 1 and LLaMA 2 model from Meta, including Rotary Positional Embeddings, RMS Normalization, ...

1:10:55
LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

111,322 views

2 years ago

Chris Hay
Multi-Head vs Grouped Query Attention.  Claude AI, Llama-3, Gemma are choosing speed over quality?

Multi-Head vs Grouped Query Attention. Are Claude, Llama-3, Gemma are choosing speed over quality? frontier model providers ...

20:30
Multi-Head vs Grouped Query Attention. Claude AI, Llama-3, Gemma are choosing speed over quality?

1,586 views

1 year ago

Machine Learning Courses
Query, Key and Value Matrix for Attention Mechanisms in Large Language Models

link to full course: https://www.udemy.com/course/mathematics-behind-large-language-models-and-transformers/?

18:21
Query, Key and Value Matrix for Attention Mechanisms in Large Language Models

28,336 views

1 year ago

Julien Simon
Deep dive - Better Attention layers for Transformer models

00:00 Introduction 03:00 Self-attention 07:20 Multi-Head Attention (MHA) 12:32 Multi-Query Attention (MQA) 18:45 Group-Query ...

40:54
Deep dive - Better Attention layers for Transformer models

14,640 views

1 year ago

Jia-Bin Huang
How Attention Got So Efficient [GQA/MLA/DSA]

... Attention (vector form) 04:26 Attention (matrix form) 07:07 Key-Value caching 09:42 Multi-Query Attention (MQA) 11:03 Grouped ...

29:02
How Attention Got So Efficient [GQA/MLA/DSA]

46,586 views

1 month ago

Data Science Made Easy
What is Grouped Query Attention (GQA)

Grouped Query Attention (GQA) is an optimization of Multi-Head Attention designed to balance efficiency and expressiveness in ...

1:02
What is Grouped Query Attention (GQA)

32 views

1 month ago

Data Science Made Easy
What is Multi Query Attention (MQA)?

Multi-Query Attention (MQA) is a variation of the traditional multi-head attention mechanism designed to improve efficiency and ...

1:09
What is Multi Query Attention (MQA)?

3 views

1 month ago

Super Data Science: ML & AI Podcast with Jon Krohn
Multi-Query vs Multi-Head Attention

Learn all about the open-source libraries developed by Lightning AI, as their Staff Research Engineer, Sebastian Raschka, joins ...

1:40
Multi-Query vs Multi-Head Attention

218 views

1 year ago

Efficient NLP
The KV Cache: Memory Usage in Transformers

Try Voice Writer - speak your thoughts and let AI handle the grammar: https://voicewriter.io The KV cache is what takes up the bulk ...

8:33
The KV Cache: Memory Usage in Transformers

91,264 views

2 years ago

The GenAI POD
Gen AI Transformer Attention - MHA, MQA & GQA

In this session, we'll explore Multi-Head Attention, Multi-Query Attention, and Group Query Attention. We'll discuss their ...

10:58
Gen AI Transformer Attention - MHA, MQA & GQA

175 views

1 year ago

bycloud
Is Signal Processing The CURE For AI's ADHD?

Check out HubSpot's Free ChatGPT Bundle! https://clickhubspot.com/jgv5 In this video, I will be covering the latest and the hottest ...

11:53
Is Signal Processing The CURE For AI's ADHD?

25,919 views

1 year ago

Sachin Kalsi
LLM Jargons Explained: Part 2 - Multi Query Attention & Group Query Attention

In this video, I'll delve into Multi Query Attention (MQA) and Grouped Query Attention (GQA), as well as touch on Multi-Head ...

15:51
LLM Jargons Explained: Part 2 - Multi Query Attention & Group Query Attention

1,314 views

1 year ago

Google Cloud Tech
Attention mechanism: Overview

This video introduces you to the attention mechanism, a powerful technique that allows neural networks to focus on specific parts ...

5:34
Attention mechanism: Overview

221,434 views

2 years ago