ViewTube

ViewTube
Sign inSign upSubscriptions
Filters

Upload date

Type

Duration

Sort by

Features

Reset

128,295 results

Related queries

kv cache

llm inference

multi query attention

flash attention explained

beam search

IBM Technology
Faster LLMs: Accelerate Inference with Speculative Decoding

Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ...

9:39
Faster LLMs: Accelerate Inference with Speculative Decoding

17,945 views

6 months ago

Efficient NLP
Speculative Decoding: When Two LLMs are Faster than One

Try Voice Writer - speak your thoughts and let AI handle the grammar: https://voicewriter.io Speculative decoding (or speculative ...

12:46
Speculative Decoding: When Two LLMs are Faster than One

29,600 views

2 years ago

Trelis Research
Speculative Decoding Explained

One Click Templates Repo (free): https://github.com/TrelisResearch/one-click-llms Advanced Inference Repo (Paid Lifetime ...

37:34
Speculative Decoding Explained

7,417 views

2 years ago

Nadav Timor
EAGLE and EAGLE-2: Lossless Inference Acceleration for LLMs - Hongyang Zhang

About the seminar: https://faster-llms.vercel.app Speaker: Hongyang Zhang (Waterloo & Vector Institute) Title: EAGLE and ...

48:26
EAGLE and EAGLE-2: Lossless Inference Acceleration for LLMs - Hongyang Zhang

2,802 views

10 months ago

Red Hat
Lossless LLM inference acceleration with Speculators

Red Hat's Mark Kurtz and Megan Flynn examine speculative decoding, a technique that uses a smaller, faster model—the ...

29:48
Lossless LLM inference acceleration with Speculators

324 views

1 month ago

Julien Simon
Deep Dive: Optimizing LLM inference

00:00 Introduction 01:15 Decoder-only inference 06:05 The KV cache 11:15 Continuous batching 16:17 Speculative decoding ...

36:12
Deep Dive: Optimizing LLM inference

42,722 views

1 year ago

Tales Of Tensors
Speculative Decoding: 3× Faster LLM Inference with Zero Quality Loss

Speculative decoding is one of the most important performance optimizations in modern LLM serving—and most people still don't ...

7:40
Speculative Decoding: 3× Faster LLM Inference with Zero Quality Loss

58 views

2 days ago

Intel Software
Speculative Decoding with OpenVINO | Intel Software

Speed up your Large Language Model by 2 or 3 times with OpenVINO's speculative decoding. Much faster inference without ...

7:00
Speculative Decoding with OpenVINO | Intel Software

196,877 views

5 months ago

AssemblyAI
What is Speculative Sampling? | Boosting LLM inference speed

Speculative Sampling is a decoding strategy that yields 2-3x speedups in LLM inference by generating multiple tokens per model ...

6:18
What is Speculative Sampling? | Boosting LLM inference speed

3,684 views

1 year ago

Jordan Boyd-Graber
Why using a dumb language model can speed up a smarter one: Speculative Decoding [Lecture]

This is a single lecture from a course. If you you like the material and want more context (e.g., the lectures that came before), check ...

7:48
Why using a dumb language model can speed up a smarter one: Speculative Decoding [Lecture]

109 views

3 weeks ago

GPU MODE
Lecture 22: Hacker's Guide to Speculative Decoding in VLLM

Abstract: We will discuss how vLLM combines continuous batching with speculative decoding with a focus on enabling external ...

1:09:25
Lecture 22: Hacker's Guide to Speculative Decoding in VLLM

11,009 views

1 year ago

The TWIML AI Podcast with Sam Charrington
Speculative Decoding and Efficient LLM Inference with Chris Lott - 717

Today, we're joined by Chris Lott, senior director of engineering at Qualcomm AI Research to discuss accelerating large language ...

1:16:02
Speculative Decoding and Efficient LLM Inference with Chris Lott - 717

1,659 views

10 months ago

EleutherAI
ML Performance Reading Group Session 19: Speculative Decoding

Session covering an overview of speculative decoding and several seminal papers in the space, including Medusa, Eagle 1/2/3, ...

1:36:03
ML Performance Reading Group Session 19: Speculative Decoding

489 views

4 days ago

GosuCoder
MASSIVELY speed up local AI models with Speculative Decoding in LM Studio

There is a lot of possibility with Speculative Decoding allowing us normal folks to run larger and larger AI models at home. I hope ...

22:36
MASSIVELY speed up local AI models with Speculative Decoding in LM Studio

18,544 views

9 months ago

R
Speculative Decoding for Fast LLM Inference Algorithm explained in detail

This video introduces the playlist, explains the Speculative Decoding Algorithm from the paper https://arxiv.org/pdf/2211.17192 in ...

18:45
Speculative Decoding for Fast LLM Inference Algorithm explained in detail

19 views

7 days ago

Lex Clips
How to make LLMs fast: KV Caching, Speculative Decoding, and Multi-Query Attention | Cursor Team

Lex Fridman Podcast full episode: https://www.youtube.com/watch?v=oFfVt3S51T4 Thank you for listening ❤ Check out our ...

15:15
How to make LLMs fast: KV Caching, Speculative Decoding, and Multi-Query Attention | Cursor Team

11,535 views

1 year ago

Genpakt
What is Speculative Decoding? How Do I Use It With vLLM

Please be patient and watch till the end of the video. More nuggets there :D Request Notebook Here: ...

12:56
What is Speculative Decoding? How Do I Use It With vLLM

950 views

1 year ago

Hertz Foundation
Accelerating Inference with Staged Speculative Decoding — Ben Spector | 2023 Hertz Summer Workshop

Hertz Fellow Benjamin Spector, a doctoral student at Stanford University, presents "Accelerating Inference with Staged ...

6:45
Accelerating Inference with Staged Speculative Decoding — Ben Spector | 2023 Hertz Summer Workshop

1,735 views

2 years ago

Doubleword
Behind the Stack, Ep. 13 - Faster Inference: Speculative Decoding for Batched Workloads

Speculative decoding is usually discussed as a way to make real time LLM APIs feel faster. But what happens when you apply it to ...

19:54
Behind the Stack, Ep. 13 - Faster Inference: Speculative Decoding for Batched Workloads

29 views

3 weeks ago

MLWorks
Understanding Speculative Decoding: Boosting LLM Efficiency and Speed

In this video, we're diving deep into Speculative Decoding, an advanced technique that is revolutionizing how AI language ...

14:37
Understanding Speculative Decoding: Boosting LLM Efficiency and Speed

329 views

8 months ago