Upload date
All time
Last hour
Today
This week
This month
This year
Type
All
Video
Channel
Playlist
Movie
Duration
Short (< 4 minutes)
Medium (4-20 minutes)
Long (> 20 minutes)
Sort by
Relevance
Rating
View count
Features
HD
Subtitles/CC
Creative Commons
3D
Live
4K
360°
VR180
HDR
128,295 results
kv cache
llm inference
multi query attention
flash attention explained
beam search
Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ...
17,945 views
6 months ago
Try Voice Writer - speak your thoughts and let AI handle the grammar: https://voicewriter.io Speculative decoding (or speculative ...
29,600 views
2 years ago
One Click Templates Repo (free): https://github.com/TrelisResearch/one-click-llms Advanced Inference Repo (Paid Lifetime ...
7,417 views
About the seminar: https://faster-llms.vercel.app Speaker: Hongyang Zhang (Waterloo & Vector Institute) Title: EAGLE and ...
2,802 views
10 months ago
Red Hat's Mark Kurtz and Megan Flynn examine speculative decoding, a technique that uses a smaller, faster model—the ...
324 views
1 month ago
00:00 Introduction 01:15 Decoder-only inference 06:05 The KV cache 11:15 Continuous batching 16:17 Speculative decoding ...
42,722 views
1 year ago
Speculative decoding is one of the most important performance optimizations in modern LLM serving—and most people still don't ...
58 views
2 days ago
Speed up your Large Language Model by 2 or 3 times with OpenVINO's speculative decoding. Much faster inference without ...
196,877 views
5 months ago
Speculative Sampling is a decoding strategy that yields 2-3x speedups in LLM inference by generating multiple tokens per model ...
3,684 views
This is a single lecture from a course. If you you like the material and want more context (e.g., the lectures that came before), check ...
109 views
3 weeks ago
Abstract: We will discuss how vLLM combines continuous batching with speculative decoding with a focus on enabling external ...
11,009 views
Today, we're joined by Chris Lott, senior director of engineering at Qualcomm AI Research to discuss accelerating large language ...
1,659 views
Session covering an overview of speculative decoding and several seminal papers in the space, including Medusa, Eagle 1/2/3, ...
489 views
4 days ago
There is a lot of possibility with Speculative Decoding allowing us normal folks to run larger and larger AI models at home. I hope ...
18,544 views
9 months ago
This video introduces the playlist, explains the Speculative Decoding Algorithm from the paper https://arxiv.org/pdf/2211.17192 in ...
19 views
7 days ago
Lex Fridman Podcast full episode: https://www.youtube.com/watch?v=oFfVt3S51T4 Thank you for listening ❤ Check out our ...
11,535 views
Please be patient and watch till the end of the video. More nuggets there :D Request Notebook Here: ...
950 views
Hertz Fellow Benjamin Spector, a doctoral student at Stanford University, presents "Accelerating Inference with Staged ...
1,735 views
Speculative decoding is usually discussed as a way to make real time LLM APIs feel faster. But what happens when you apply it to ...
29 views
In this video, we're diving deep into Speculative Decoding, an advanced technique that is revolutionizing how AI language ...
329 views
8 months ago