ViewTube

ViewTube
Sign inSign upSubscriptions
Filters

Upload date

Type

Duration

Sort by

Features

Reset

6,937 results

Trelis Research
Speculative Decoding Explained

One Click Templates Repo (free): https://github.com/TrelisResearch/one-click-llms Advanced Inference Repo (Paid Lifetime ...

37:34
Speculative Decoding Explained

7,411 views

2 years ago

Red Hat
Lossless LLM inference acceleration with Speculators

Red Hat's Mark Kurtz and Megan Flynn examine speculative decoding, a technique that uses a smaller, faster model—the ...

29:48
Lossless LLM inference acceleration with Speculators

317 views

1 month ago

Nadav Timor
EAGLE and EAGLE-2: Lossless Inference Acceleration for LLMs - Hongyang Zhang

About the seminar: https://faster-llms.vercel.app Speaker: Hongyang Zhang (Waterloo & Vector Institute) Title: EAGLE and ...

48:26
EAGLE and EAGLE-2: Lossless Inference Acceleration for LLMs - Hongyang Zhang

2,786 views

10 months ago

Julien Simon
Deep Dive: Optimizing LLM inference

00:00 Introduction 01:15 Decoder-only inference 06:05 The KV cache 11:15 Continuous batching 16:17 Speculative decoding ...

36:12
Deep Dive: Optimizing LLM inference

42,670 views

1 year ago

The TWIML AI Podcast with Sam Charrington
Speculative Decoding and Efficient LLM Inference with Chris Lott - 717

Today, we're joined by Chris Lott, senior director of engineering at Qualcomm AI Research to discuss accelerating large language ...

1:16:02
Speculative Decoding and Efficient LLM Inference with Chris Lott - 717

1,654 views

10 months ago

GosuCoder
MASSIVELY speed up local AI models with Speculative Decoding in LM Studio

There is a lot of possibility with Speculative Decoding allowing us normal folks to run larger and larger AI models at home. I hope ...

22:36
MASSIVELY speed up local AI models with Speculative Decoding in LM Studio

18,531 views

9 months ago

GPU MODE
Lecture 22: Hacker's Guide to Speculative Decoding in VLLM

Abstract: We will discuss how vLLM combines continuous batching with speculative decoding with a focus on enabling external ...

1:09:25
Lecture 22: Hacker's Guide to Speculative Decoding in VLLM

10,990 views

1 year ago

EleutherAI
ML Performance Reading Group Session 19: Speculative Decoding

Session covering an overview of speculative decoding and several seminal papers in the space, including Medusa, Eagle 1/2/3, ...

1:36:03
ML Performance Reading Group Session 19: Speculative Decoding

457 views

2 days ago

Paris AI Society
𝐒𝐩𝐞𝐜𝐮𝐥𝐚𝐭𝐢𝐯𝐞 𝐃𝐞𝐜𝐨𝐝𝐢𝐧𝐠 & 𝐒𝐞𝐥𝐟-𝐒𝐩𝐞𝐜𝐮𝐥𝐚𝐭𝐢𝐯𝐞 𝐃𝐞𝐜𝐨𝐝𝐢𝐧𝐠 - 𝗔𝗜 𝗚𝗼𝗹𝗱 𝗡𝘂𝗴𝗴𝗲𝘁 #𝟮.𝟭

AI Gold Nugget #2.1 – Speaker #1: Raphael Vienne Can we go faster…

23:40
𝐒𝐩𝐞𝐜𝐮𝐥𝐚𝐭𝐢𝐯𝐞 𝐃𝐞𝐜𝐨𝐝𝐢𝐧𝐠 & 𝐒𝐞𝐥𝐟-𝐒𝐩𝐞𝐜𝐮𝐥𝐚𝐭𝐢𝐯𝐞 𝐃𝐞𝐜𝐨𝐝𝐢𝐧𝐠 - 𝗔𝗜 𝗚𝗼𝗹𝗱 𝗡𝘂𝗴𝗴𝗲𝘁 #𝟮.𝟭

33 views

7 months ago

Oxen
How Medusa Works

Speculative Architecture 14:24 Speculative Decoding Example 15:35 Introducing Medusa 16:53 Medusa's Decoding Heads 17:32 ...

52:16
How Medusa Works

2,636 views

1 year ago

Neural Magic
vLLM Office Hours - Speculative Decoding in vLLM - October 3, 2024

In this vLLM office hours session, we explore the latest updates in vLLM v0.6.2, including Llama 3.2 Vision support, the ...

1:04:28
vLLM Office Hours - Speculative Decoding in vLLM - October 3, 2024

2,886 views

1 year ago

月球大叔
vllm + speculative decoding
1:08:04
vllm + speculative decoding

416 views

Streamed 3 months ago

One Shot Learning
Shot #14 [Hebrew]: Paper to Code - Speculative Decoding

In this shot we implemented a cool paper - from zero to hero! Speculative decoding is a neat algorithm introduced by two different ...

38:39
Shot #14 [Hebrew]: Paper to Code - Speculative Decoding

1,892 views

1 year ago

The Information Bottleneck
EP5: Speculative Decoding with Nadav Timor

We discussed the inference optimization technique known as Speculative Decoding with a world class researcher, expert, and ...

1:02:23
EP5: Speculative Decoding with Nadav Timor

98 views

3 months ago

Official Elastic Community
How Speculative Decoding Cuts OCR Hallucinations by 90%

This talk explains why document parsing is still hard and why vision-first methods outperform OCR for complex layouts, merged ...

22:21
How Speculative Decoding Cuts OCR Hallucinations by 90%

141 views

1 month ago

Red Hat Community
An Intermediate Guide to Inference Using vLLM

... vLLM core committer - An Intermediate Guide to Inference Using vLLM: PagedAttention, Quantization, Speculative Decoding, ...

39:58
An Intermediate Guide to Inference Using vLLM

249 views

2 months ago

Kazem Jahanbakhsh
Speculative decoding Part I

In this video we discuss a technique called speculative decoding to speed up LLMs inference time. This has a number of use ...

23:01
Speculative decoding Part I

19 views

7 months ago

Databricks
Accelerating LLM Inference with vLLM

... challenges behind them, including: Speculative Decoding, Prefix Caching, Disaggregated Prefill, and multi-accelerator support.

35:53
Accelerating LLM Inference with vLLM

23,699 views

1 year ago

AI Podcast Series. Byte Goose AI.
MatFormer: Explained. Nested Transformer for Elastic Inference. Foundation Models: LLMs.

... downstream evaluation, and significant inference time speedups through techniques like speculative decoding and adaptive ...

20:16
MatFormer: Explained. Nested Transformer for Elastic Inference. Foundation Models: LLMs.

153 views

4 months ago

Audinote-ML
[Audio notes] Fast Inference from Transformers via Speculative Decoding

Note for paper: Fast Inference from Transformers via Speculative Decoding (http://arxiv.org/abs/2211.17192v2) video slides: ...

32:00
[Audio notes] Fast Inference from Transformers via Speculative Decoding

43 views

1 year ago