Upload date
All time
Last hour
Today
This week
This month
This year
Type
All
Video
Channel
Playlist
Movie
Duration
Short (< 4 minutes)
Medium (4-20 minutes)
Long (> 20 minutes)
Sort by
Relevance
Rating
View count
Features
HD
Subtitles/CC
Creative Commons
3D
Live
4K
360°
VR180
HDR
2,747 results
Red Hat's Mark Kurtz and Megan Flynn examine speculative decoding, a technique that uses a smaller, faster model—the ...
316 views
4 weeks ago
Speculative decoding is one of the most important performance optimizations in modern LLM serving—and most people still don't ...
47 views
1 day ago
This is a single lecture from a course. If you you like the material and want more context (e.g., the lectures that came before), check ...
109 views
3 weeks ago
Session covering an overview of speculative decoding and several seminal papers in the space, including Medusa, Eagle 1/2/3, ...
452 views
2 days ago
This video introduces the playlist, explains the Speculative Decoding Algorithm from the paper https://arxiv.org/pdf/2211.17192 in ...
18 views
5 days ago
Why is generating text with LLMs so slow? It's not a compute problem, it's a memory bandwidth problem. In this video, we explore ...
28 views
2 weeks ago
Speculative decoding is usually discussed as a way to make real time LLM APIs feel faster. But what happens when you apply it to ...
Modern LLM serving systems are built for scale, relying heavily on batching and parallelism to maximize GPU utilization. But as ...
0 views
18 hours ago
Hello everyone! Host Ni-no is back, exploring a trending article from the archives that answers a burning question: 'Is there a ...
8 days ago
This video offers a formal and structured overview of the core ideas, methodology, and contributions of the work Conformal ...
10 views
12 days ago
Full Title:Yikai Zhu, Lukec Wang:SpecForge: Open Source Framework for Training Speculative Decoding Models Speculative ...
3 views
Learn how Speculative Decoding speeds up inference with a draft + verify approach. @zarabux.
Instead of using an autoregressive drafter in speculative decoding, it uses a discrete diffusion LLM that can draft whole token ...
6 days ago
... networking, scheduling, observability, training optimization, fused kernels, distillation, RL efficiency, speculative decoding, and ...
Notably, the architecture is shown to outperform both standard diffusion models and current state-of-the-art speculative decoding ...
... Selective Knowledge Distillation for Efficient Speculative Decoders' This work tackles inefficiencies in speculative decoding, ...
TiDAR outperforms speculative decoding in throughput and surpasses diffusion models like Dream and Llada in efficiency and ...
165 views
* Collaboration inquiries: commit.im@gmail.com (Please refrain from using personal emails.) The video animation was created ...
10,117 views
*Beating Speculative Decoding:* Why TiDAR offers a more efficient alternative to current acceleration methods like speculative ...
54 views
... scalable development, and optimization techniques like quantization and speculative decoding. Auto-scaling, batch processing, ...
239 views