Upload date
All time
Last hour
Today
This week
This month
This year
Type
All
Video
Channel
Playlist
Movie
Duration
Short (< 4 minutes)
Medium (4-20 minutes)
Long (> 20 minutes)
Sort by
Relevance
Rating
View count
Features
HD
Subtitles/CC
Creative Commons
3D
Live
4K
360°
VR180
HDR
6,937 results
One Click Templates Repo (free): https://github.com/TrelisResearch/one-click-llms Advanced Inference Repo (Paid Lifetime ...
7,411 views
2 years ago
Red Hat's Mark Kurtz and Megan Flynn examine speculative decoding, a technique that uses a smaller, faster model—the ...
317 views
1 month ago
About the seminar: https://faster-llms.vercel.app Speaker: Hongyang Zhang (Waterloo & Vector Institute) Title: EAGLE and ...
2,786 views
10 months ago
00:00 Introduction 01:15 Decoder-only inference 06:05 The KV cache 11:15 Continuous batching 16:17 Speculative decoding ...
42,670 views
1 year ago
Today, we're joined by Chris Lott, senior director of engineering at Qualcomm AI Research to discuss accelerating large language ...
1,654 views
There is a lot of possibility with Speculative Decoding allowing us normal folks to run larger and larger AI models at home. I hope ...
18,531 views
9 months ago
Abstract: We will discuss how vLLM combines continuous batching with speculative decoding with a focus on enabling external ...
10,990 views
Session covering an overview of speculative decoding and several seminal papers in the space, including Medusa, Eagle 1/2/3, ...
457 views
2 days ago
AI Gold Nugget #2.1 – Speaker #1: Raphael Vienne Can we go faster…
33 views
7 months ago
Speculative Architecture 14:24 Speculative Decoding Example 15:35 Introducing Medusa 16:53 Medusa's Decoding Heads 17:32 ...
2,636 views
In this vLLM office hours session, we explore the latest updates in vLLM v0.6.2, including Llama 3.2 Vision support, the ...
2,886 views
416 views
Streamed 3 months ago
In this shot we implemented a cool paper - from zero to hero! Speculative decoding is a neat algorithm introduced by two different ...
1,892 views
We discussed the inference optimization technique known as Speculative Decoding with a world class researcher, expert, and ...
98 views
3 months ago
This talk explains why document parsing is still hard and why vision-first methods outperform OCR for complex layouts, merged ...
141 views
... vLLM core committer - An Intermediate Guide to Inference Using vLLM: PagedAttention, Quantization, Speculative Decoding, ...
249 views
2 months ago
In this video we discuss a technique called speculative decoding to speed up LLMs inference time. This has a number of use ...
19 views
... challenges behind them, including: Speculative Decoding, Prefix Caching, Disaggregated Prefill, and multi-accelerator support.
23,699 views
... downstream evaluation, and significant inference time speedups through techniques like speculative decoding and adaptive ...
153 views
4 months ago
Note for paper: Fast Inference from Transformers via Speculative Decoding (http://arxiv.org/abs/2211.17192v2) video slides: ...
43 views