ViewTube

ViewTube
Sign inSign upSubscriptions
Filters

Upload date

Type

Duration

Sort by

Features

Reset

2,747 results

Red Hat
Lossless LLM inference acceleration with Speculators

Red Hat's Mark Kurtz and Megan Flynn examine speculative decoding, a technique that uses a smaller, faster model—the ...

29:48
Lossless LLM inference acceleration with Speculators

316 views

4 weeks ago

Tales Of Tensors
Speculative Decoding: 3× Faster LLM Inference with Zero Quality Loss

Speculative decoding is one of the most important performance optimizations in modern LLM serving—and most people still don't ...

7:40
Speculative Decoding: 3× Faster LLM Inference with Zero Quality Loss

47 views

1 day ago

Jordan Boyd-Graber
Why using a dumb language model can speed up a smarter one: Speculative Decoding [Lecture]

This is a single lecture from a course. If you you like the material and want more context (e.g., the lectures that came before), check ...

7:48
Why using a dumb language model can speed up a smarter one: Speculative Decoding [Lecture]

109 views

3 weeks ago

EleutherAI
ML Performance Reading Group Session 19: Speculative Decoding

Session covering an overview of speculative decoding and several seminal papers in the space, including Medusa, Eagle 1/2/3, ...

1:36:03
ML Performance Reading Group Session 19: Speculative Decoding

452 views

2 days ago

R
Speculative Decoding for Fast LLM Inference Algorithm explained in detail

This video introduces the playlist, explains the Speculative Decoding Algorithm from the paper https://arxiv.org/pdf/2211.17192 in ...

18:45
Speculative Decoding for Fast LLM Inference Algorithm explained in detail

18 views

5 days ago

Zaharah
The Secret to Faster LLMs: How Speculative Decoding Works

Why is generating text with LLMs so slow? It's not a compute problem, it's a memory bandwidth problem. In this video, we explore ...

7:06
The Secret to Faster LLMs: How Speculative Decoding Works

28 views

2 weeks ago

Doubleword
Behind the Stack, Ep. 13 - Faster Inference: Speculative Decoding for Batched Workloads

Speculative decoding is usually discussed as a way to make real time LLM APIs feel faster. But what happens when you apply it to ...

19:54
Behind the Stack, Ep. 13 - Faster Inference: Speculative Decoding for Batched Workloads

28 views

3 weeks ago

Uplatz
Convergence of Concurrency: Batching and Speculative Decoding Conflict | Uplatz

Modern LLM serving systems are built for scale, relying heavily on batching and parallelism to maximize GPU utilization. But as ...

8:08
Convergence of Concurrency: Batching and Speculative Decoding Conflict | Uplatz

0 views

18 hours ago

…をよむひと
AI's Speed Limit: Speculative Decoding EXPLAINED!

Hello everyone! Host Ni-no is back, exploring a trending article from the archives that answers a burning question: 'Is there a ...

9:22
AI's Speed Limit: Speculative Decoding EXPLAINED!

0 views

8 days ago

Entropic
NeuRIPS 2025: Conformal Sparsification for Bandwidth-Efficient Edge-Cloud Speculative Decoding

This video offers a formal and structured overview of the core ideas, methodology, and contributions of the work Conformal ...

8:40
NeuRIPS 2025: Conformal Sparsification for Bandwidth-Efficient Edge-Cloud Speculative Decoding

10 views

12 days ago

GOSIM Foundation
【GOSIM HANGZHOU 2025】Yikai Zhu, Lukec Wang:SpecForge - Speculative Decoding Model Training Framework

Full Title:Yikai Zhu, Lukec Wang:SpecForge: Open Source Framework for Training Speculative Decoding Models Speculative ...

17:51
【GOSIM HANGZHOU 2025】Yikai Zhu, Lukec Wang:SpecForge - Speculative Decoding Model Training Framework

3 views

2 weeks ago

Zaharah
Speculative Decoding for Faster LLMs

Learn how Speculative Decoding speeds up inference with a draft + verify approach. @zarabux.

0:18
Speculative Decoding for Faster LLMs

0 views

2 days ago

AI Research Roundup
DEER: Diffusion Drafting for Faster LLMs

Instead of using an autoregressive drafter in speculative decoding, it uses a discrete diffusion LLM that can draft whole token ...

3:51
DEER: Diffusion Drafting for Faster LLMs

0 views

6 days ago

IgniteGTM
Inside LinkedIn’s AI Stack. Scaling GPUs, Agents, and Efficiency Across Every Layer

... networking, scheduling, observability, training optimization, fused kernels, distillation, RL efficiency, speculative decoding, and ...

19:30
Inside LinkedIn’s AI Stack. Scaling GPUs, Agents, and Efficiency Across Every Layer

0 views

2 weeks ago

NewestAIInovations
NVIDIA TiDAR

Notably, the architecture is shown to outperform both standard diffusion models and current state-of-the-art speculative decoding ...

6:40
NVIDIA TiDAR

0 views

3 weeks ago

AI Research Roundup
AdaSPEC: Selective KD for Faster LLM Spec Decoding

... Selective Knowledge Distillation for Efficient Speculative Decoders' This work tackles inefficiencies in speculative decoding, ...

3:42
AdaSPEC: Selective KD for Faster LLM Spec Decoding

0 views

2 weeks ago

PaperLens
NVIDIA TiDAR: 5.9x Faster LLM Inference! Diffusion Speed, AR Quality

TiDAR outperforms speculative decoding in throughput and surpasses diffusion models like Dream and Llada in efficiency and ...

5:56
NVIDIA TiDAR: 5.9x Faster LLM Inference! Diffusion Speed, AR Quality

165 views

4 weeks ago

임커밋
How Companies Save on LLM Serving Costs

* Collaboration inquiries: commit.im@gmail.com (Please refrain from using personal emails.) The video animation was created ...

4:29
How Companies Save on LLM Serving Costs

10,117 views

3 weeks ago

Paper to Pod
5x Faster LLMs? How TiDAR Merges Diffusion and AR Architectures

*Beating Speculative Decoding:* Why TiDAR offers a more efficient alternative to current acceleration methods like speculative ...

7:24
5x Faster LLMs? How TiDAR Merges Diffusion and AR Architectures

54 views

2 weeks ago

AWS Events
AWS re:Invent 2025 - Sustainable and cost-efficient generative AI with agentic workflows (AIM333)

... scalable development, and optimization techniques like quantization and speculative decoding. Auto-scaling, batch processing, ...

54:06
AWS re:Invent 2025 - Sustainable and cost-efficient generative AI with agentic workflows (AIM333)

239 views

2 weeks ago