ViewTube

ViewTube
Sign inSign upSubscriptions
Filters

Upload date

Type

Duration

Sort by

Features

Reset

804 results

Related queries

kv cache

grouped query attention

multi head attention transformer

position embedding transformer

Efficient NLP
Rotary Positional Embeddings: Combining Absolute and Relative

... References: RoFormer: Enhanced Transformer with Rotary Position Embedding (main paper that proposes RoPE embeddings): ...

11:17
Rotary Positional Embeddings: Combining Absolute and Relative

67,848 views

2 years ago

Jia-Bin Huang
How Rotary Position Embedding Supercharges Modern LLMs [RoPE]

Positional information is critical in transformers' understanding of sequences and their ability to generalize beyond training context ...

13:39
How Rotary Position Embedding Supercharges Modern LLMs [RoPE]

21,299 views

1 year ago

DeepLearning Hero
RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs

Unlike sinusoidal embeddings, RoPE are well behaved and more resilient to predictions exceeding the training sequence length.

14:06
RoPE (Rotary positional embeddings) explained: The positional workhorse of modern LLMs

49,339 views

2 years ago

Outlier
Rotary Positional Embeddings Explained | Transformer

In this video I'm going through RoPE (Rotary Positional Embeddings) which is a key method in Transformer models of any ...

20:28
Rotary Positional Embeddings Explained | Transformer

8,696 views

4 months ago

JakZee
Rotary Position Embedding explained deeply (w/ code)

Rotary position embeddings or rope for short essentially what it is it's a way to embed or encode information about the positions of ...

23:26
Rotary Position Embedding explained deeply (w/ code)

5,261 views

1 year ago

Umar Jamil
LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

Full explanation of the LLaMA 1 and LLaMA 2 model from Meta, including Rotary Positional Embeddings, RMS Normalization, ...

1:10:55
LLaMA explained: KV-Cache, Rotary Positional Embedding, RMS Norm, Grouped Query Attention, SwiGLU

111,292 views

2 years ago

BrainDrain
How positional encoding works in transformers?

... points to demonstrate let's build them first for each position we create a vector of the same size as the embeddings the decision ...

5:36
How positional encoding works in transformers?

36,470 views

2 years ago

Vizuara
Rotary Positional Encodings | Explained Visually

In this lecture, we learn about Rotary Positional Encodings (RoPE). This is the type of positional encoding used by most modern ...

34:38
Rotary Positional Encodings | Explained Visually

4,836 views

7 months ago

People also watched

Artificial Intelligence
Relative Positional Encoding for Transformers with Linear Complexity | Oral | ICML 2021

If you have any copyright issues on video, please send us an email at khawar512@gmail.com.

17:03
Relative Positional Encoding for Transformers with Linear Complexity | Oral | ICML 2021

2,882 views

4 years ago

AI Bites
Positional Encoding and Input Embedding in Transformers - Part 3

This is video no. 3 in the 5 part video series on Transformers Neural Network Architecture. This video is about the positional ...

9:33
Positional Encoding and Input Embedding in Transformers - Part 3

8,621 views

2 years ago

FlyByMax
The GENIUS of Inertial Navigation Systems Explained

Moving-platform inertial navigation systems are miracles of engineering and a fantastic example of human ingenuity. This video ...

11:05
The GENIUS of Inertial Navigation Systems Explained

3,485,634 views

3 years ago

Colin Talks Tech
A Beginner's Guide to Vector Embeddings

A high level primer on vectors, vector embeddings and vector databases. References covered in this video: What are Vector ...

8:29
A Beginner's Guide to Vector Embeddings

79,896 views

2 years ago

Jia-Bin Huang
This Simple Optimizer Is Revolutionizing How We Train AI [Muon]

The Muon optimizer has demonstrated remarkable performance in accelerating machine learning model training, often ...

17:52
This Simple Optimizer Is Revolutionizing How We Train AI [Muon]

71,416 views

2 months ago

Yannic Kilcher
FNet: Mixing Tokens with Fourier Transforms (Machine Learning Research Paper Explained)

fnet #attention #fourier Do we even need Attention? FNets completely drop the Attention mechanism in favor of a simple Fourier ...

34:23
FNet: Mixing Tokens with Fourier Transforms (Machine Learning Research Paper Explained)

29,834 views

4 years ago

Stanford MLSys Seminars
FlashAttention - Tri Dao | Stanford MLSys #67

Episode 67 of the Stanford MLSys Seminar “Foundation Models Limited Series”! Speaker: Tri Dao Abstract: Transformers are slow ...

58:58
FlashAttention - Tri Dao | Stanford MLSys #67

38,225 views

Streamed 2 years ago

Akarsh Upadhyay
RoFormer: Enhanced Transformer with Rotary Embedding Presentation + Code Implementation

Two mistakes from my end: 1. In the video, I mentioned more about using it as a position embedding, but later I realized that it is ...

44:22
RoFormer: Enhanced Transformer with Rotary Embedding Presentation + Code Implementation

658 views

2 years ago

Jia-Bin Huang
Fantastic KL Divergence and How to (Actually) Compute It

Kullback–Leibler (KL) divergence measures the difference between two probability distributions. But where does that come from?

11:46
Fantastic KL Divergence and How to (Actually) Compute It

24,316 views

6 months ago

Under The Hood
What Are Word Embeddings?

word2vec #llm Converting text into numbers is the first step in training any machine learning model for NLP tasks. While one-hot ...

19:33
What Are Word Embeddings?

54,877 views

10 months ago

Zachary Huang
Give me 30 min, I will make RoPE click forever

Text:* https://github.com/The-Pocket/PocketFlow-Tutorial-Video-Generator/blob/main/docs/llm/rope.md 00:00 - Introduction 01:24 ...

29:08
Give me 30 min, I will make RoPE click forever

1,615 views

3 weeks ago

Data Science Gems
Rotary Positional Embeddings

Rotary position embedding (RoPE) combine the concept of absolute and relative position embeddings. RoPE naturally ...

30:18
Rotary Positional Embeddings

5,072 views

2 years ago

Discover AI
RoPE Rotary Position Embedding to 100K context length

ROPE - Rotary Position Embedding explained in simple terms for calculating the self attention in Transformers with a relative ...

39:56
RoPE Rotary Position Embedding to 100K context length

7,372 views

1 year ago

AI Coffee Break with Letitia
Positional embeddings in transformers EXPLAINED | Demystifying positional encodings.

What are positional embeddings and why do transformers need positional encodings? In this video, we explain why Attention is ...

9:40
Positional embeddings in transformers EXPLAINED | Demystifying positional encodings.

87,520 views

4 years ago

Jia-Bin Huang
How Attention Got So Efficient [GQA/MLA/DSA]

... https://api-docs.deepseek.com/news/news250929 - Rotary Position Embedding (RoPE): https://arxiv.org/abs/2104.09864 Video ...

29:02
How Attention Got So Efficient [GQA/MLA/DSA]

46,301 views

4 weeks ago

Vuk Rosić
Rotary Positional Embeddings & Rotation Matrix + Python  LLM code

https://colab.research.google.com/drive/1rPV4uIZHp9B6woci1KDDlIqYT7BZ9CpN?usp=sharing On my road to become AI ...

11:05
Rotary Positional Embeddings & Rotation Matrix + Python LLM code

523 views

1 year ago

Gabriel Mongaras
RoFormer: Enhanced Transformer with Rotary Position Embedding Explained

Paper found here: https://arxiv.org/abs/2104.09864.

39:52
RoFormer: Enhanced Transformer with Rotary Position Embedding Explained

7,675 views

2 years ago

Rajistics - data science, AI, and machine learning
Transformer Architecture: Fast Attention, Rotary Positional Embeddings, and Multi-Query Attention

... TRANSFORMER WITH ROTARY POSITION EMBEDDING: https://arxiv.org/pdf/2104.09864.pdf Rotary Embeddings: A Relative ...

1:21
Transformer Architecture: Fast Attention, Rotary Positional Embeddings, and Multi-Query Attention

872 views

2 years ago

TFT - The Fact Treasure
Rotary Position Embedding for Dummies - PE for GPT  Open Models

Rotary Position Embedding for Dummies - PE for GPT Open Models Following Video was created using Notebook LM.

6:16
Rotary Position Embedding for Dummies - PE for GPT Open Models

19 views

3 months ago

Serrano.Academy
How do Transformer Models keep track of the order of words? Positional Encoding

Transformer models can generate language really well, but how do they do it? A very important step of the pipeline is the ...

9:50
How do Transformer Models keep track of the order of words? Positional Encoding

13,056 views

1 year ago

Stanford Online
Stanford XCS224U: NLU I Contextual Word Representations, Part 3: Positional Encoding I Spring 2023

For more information about Stanford's Artificial Intelligence programs visit: https://stanford.io/ai This lecture is from the Stanford ...

13:02
Stanford XCS224U: NLU I Contextual Word Representations, Part 3: Positional Encoding I Spring 2023

14,031 views

2 years ago

Welch Labs
How DeepSeek Rewrote the Transformer [MLA]

Thanks to KiwiCo for sponsoring today's video! Go to https://www.kiwico.com/welchlabs and use code WELCHLABS for 50% off ...

18:09
How DeepSeek Rewrote the Transformer [MLA]

827,590 views

9 months ago