speculative decoding

Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ...

9:39

Faster LLMs: Accelerate Inference with Speculative Decoding

17,911 views

6 months ago

Red Hat

Lossless LLM inference acceleration with Speculators

Red Hat's Mark Kurtz and Megan Flynn examine speculative decoding, a technique that uses a smaller, faster model—the ...

29:48

Lossless LLM inference acceleration with Speculators

317 views

1 month ago

Nadav Timor

EAGLE and EAGLE-2: Lossless Inference Acceleration for LLMs - Hongyang Zhang

About the seminar: https://faster-llms.vercel.app Speaker: Hongyang Zhang (Waterloo & Vector Institute) Title: EAGLE and ...

48:26

EAGLE and EAGLE-2: Lossless Inference Acceleration for LLMs - Hongyang Zhang

2,788 views

10 months ago

Tales Of Tensors

Speculative Decoding: 3× Faster LLM Inference with Zero Quality Loss

Speculative decoding is one of the most important performance optimizations in modern LLM serving—and most people still don't ...

7:40

Speculative Decoding: 3× Faster LLM Inference with Zero Quality Loss

47 views

1 day ago

GosuCoder

MASSIVELY speed up local AI models with Speculative Decoding in LM Studio

There is a lot of possibility with Speculative Decoding allowing us normal folks to run larger and larger AI models at home. I hope ...

22:36

MASSIVELY speed up local AI models with Speculative Decoding in LM Studio

18,531 views

9 months ago

Intel Software

Speed up your Large Language Model by 2 or 3 times with OpenVINO's speculative decoding. Much faster inference without ...

7:00

Speculative Decoding with OpenVINO | Intel Software

196,877 views

5 months ago

Jordan Boyd-Graber

Why using a dumb language model can speed up a smarter one: Speculative Decoding [Lecture]

This is a single lecture from a course. If you you like the material and want more context (e.g., the lectures that came before), check ...

7:48

Why using a dumb language model can speed up a smarter one: Speculative Decoding [Lecture]

109 views

3 weeks ago

The TWIML AI Podcast with Sam Charrington

Speculative Decoding and Efficient LLM Inference with Chris Lott - 717

Today, we're joined by Chris Lott, senior director of engineering at Qualcomm AI Research to discuss accelerating large language ...

1:16:02

Speculative Decoding and Efficient LLM Inference with Chris Lott - 717

1,655 views

10 months ago

R

Speculative Decoding for Fast LLM Inference Algorithm explained in detail

This video introduces the playlist, explains the Speculative Decoding Algorithm from the paper https://arxiv.org/pdf/2211.17192 in ...

18:45

Speculative Decoding for Fast LLM Inference Algorithm explained in detail

18 views

5 days ago

EleutherAI

ML Performance Reading Group Session 19: Speculative Decoding

Session covering an overview of speculative decoding and several seminal papers in the space, including Medusa, Eagle 1/2/3, ...

1:36:03

ML Performance Reading Group Session 19: Speculative Decoding

458 views

2 days ago

Liechti Consulting

Speculative decoding : ACCELERATE LLM INFERENCE without sacrificing quality

Speculative decoding is a technique used to speed up LLM inference by using a small, fast model to quickly generate text. A big ...

0:42

Speculative decoding : ACCELERATE LLM INFERENCE without sacrificing quality

188 views

7 months ago

MLWorks

Understanding Speculative Decoding: Boosting LLM Efficiency and Speed

In this video, we're diving deep into Speculative Decoding, an advanced technique that is revolutionizing how AI language ...

14:37

Understanding Speculative Decoding: Boosting LLM Efficiency and Speed

329 views

8 months ago

Paris AI Society

𝐒𝐩𝐞𝐜𝐮𝐥𝐚𝐭𝐢𝐯𝐞 𝐃𝐞𝐜𝐨𝐝𝐢𝐧𝐠 & 𝐒𝐞𝐥𝐟-𝐒𝐩𝐞𝐜𝐮𝐥𝐚𝐭𝐢𝐯𝐞 𝐃𝐞𝐜𝐨𝐝𝐢𝐧𝐠 - 𝗔𝗜 𝗚𝗼𝗹𝗱 𝗡𝘂𝗴𝗴𝗲𝘁 #𝟮.𝟭

AI Gold Nugget #2.1 – Speaker #1: Raphael Vienne Can we go faster…

23:40

𝐒𝐩𝐞𝐜𝐮𝐥𝐚𝐭𝐢𝐯𝐞 𝐃𝐞𝐜𝐨𝐝𝐢𝐧𝐠 & 𝐒𝐞𝐥𝐟-𝐒𝐩𝐞𝐜𝐮𝐥𝐚𝐭𝐢𝐯𝐞 𝐃𝐞𝐜𝐨𝐝𝐢𝐧𝐠 - 𝗔𝗜 𝗚𝗼𝗹𝗱 𝗡𝘂𝗴𝗴𝗲𝘁 #𝟮.𝟭

33 views

7 months ago

Doubleword

Behind the Stack, Ep. 13 - Faster Inference: Speculative Decoding for Batched Workloads

Speculative decoding is usually discussed as a way to make real time LLM APIs feel faster. But what happens when you apply it to ...

19:54

Behind the Stack, Ep. 13 - Faster Inference: Speculative Decoding for Batched Workloads

28 views

3 weeks ago

Doubleword

Behind the Stack, Ep 11 - Speculative Decoding

Speculative decoding is one of the most powerful - and misunderstood - techniques for speeding up LLM inference.

17:56

Behind the Stack, Ep 11 - Speculative Decoding

54 views

1 month ago

Vuk Rosić

Generate 10 Tokens At Once - Faster LLM INFERENCE - AdaSPEC - Speculative Decoding Improvement

arxiv - https://arxiv.org/pdf/2510.19779 Become AI Researcher & Train LLM From Scratch ...

11:34

Generate 10 Tokens At Once - Faster LLM INFERENCE - AdaSPEC - Speculative Decoding Improvement

435 views

2 months ago

STARP AI

5:41

Speculative Decoding & KV Cache

6 views

1 month ago

Xiao Yang

Audio Overview: Accelerating LLM Inference with Lossless Speculative Decoding (read)

Title: Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies Authors: ...

12:49

Audio Overview: Accelerating LLM Inference with Lossless Speculative Decoding (read)

50 views

5 months ago

Tim Lohner

What is speculative decoding—at a high level? Why is it used (what problem does it solve)? How does it differ from standard ...

3:14

Speculative Decoding in a Nutshell

37 views

4 months ago

GOSIM Foundation

【GOSIM HANGZHOU 2025】Yikai Zhu, Lukec Wang：SpecForge - Speculative Decoding Model Training Framework

Full Title：Yikai Zhu, Lukec Wang：SpecForge: Open Source Framework for Training Speculative Decoding Models Speculative ...

17:51

【GOSIM HANGZHOU 2025】Yikai Zhu, Lukec Wang：SpecForge - Speculative Decoding Model Training Framework

3 views

2 weeks ago

ViewTube