speculative decoding

One Click Templates Repo (free): https://github.com/TrelisResearch/one-click-llms Advanced Inference Repo (Paid Lifetime ...

37:34

Speculative Decoding Explained

7,411 views

2 years ago

Red Hat

Lossless LLM inference acceleration with Speculators

Red Hat's Mark Kurtz and Megan Flynn examine speculative decoding, a technique that uses a smaller, faster model—the ...

29:48

Lossless LLM inference acceleration with Speculators

317 views

1 month ago

Nadav Timor

EAGLE and EAGLE-2: Lossless Inference Acceleration for LLMs - Hongyang Zhang

About the seminar: https://faster-llms.vercel.app Speaker: Hongyang Zhang (Waterloo & Vector Institute) Title: EAGLE and ...

48:26

EAGLE and EAGLE-2: Lossless Inference Acceleration for LLMs - Hongyang Zhang

2,786 views

10 months ago

Julien Simon

00:00 Introduction 01:15 Decoder-only inference 06:05 The KV cache 11:15 Continuous batching 16:17 Speculative decoding ...

36:12

Deep Dive: Optimizing LLM inference

42,670 views

1 year ago

The TWIML AI Podcast with Sam Charrington

Speculative Decoding and Efficient LLM Inference with Chris Lott - 717

Today, we're joined by Chris Lott, senior director of engineering at Qualcomm AI Research to discuss accelerating large language ...

1:16:02

Speculative Decoding and Efficient LLM Inference with Chris Lott - 717

1,654 views

10 months ago

GosuCoder

MASSIVELY speed up local AI models with Speculative Decoding in LM Studio

There is a lot of possibility with Speculative Decoding allowing us normal folks to run larger and larger AI models at home. I hope ...

22:36

MASSIVELY speed up local AI models with Speculative Decoding in LM Studio

18,531 views

9 months ago

GPU MODE

Lecture 22: Hacker's Guide to Speculative Decoding in VLLM

Abstract: We will discuss how vLLM combines continuous batching with speculative decoding with a focus on enabling external ...

1:09:25

Lecture 22: Hacker's Guide to Speculative Decoding in VLLM

10,990 views

1 year ago

EleutherAI

ML Performance Reading Group Session 19: Speculative Decoding

Session covering an overview of speculative decoding and several seminal papers in the space, including Medusa, Eagle 1/2/3, ...

1:36:03

ML Performance Reading Group Session 19: Speculative Decoding

457 views

2 days ago

Paris AI Society

𝐒𝐩𝐞𝐜𝐮𝐥𝐚𝐭𝐢𝐯𝐞 𝐃𝐞𝐜𝐨𝐝𝐢𝐧𝐠 & 𝐒𝐞𝐥𝐟-𝐒𝐩𝐞𝐜𝐮𝐥𝐚𝐭𝐢𝐯𝐞 𝐃𝐞𝐜𝐨𝐝𝐢𝐧𝐠 - 𝗔𝗜 𝗚𝗼𝗹𝗱 𝗡𝘂𝗴𝗴𝗲𝘁 #𝟮.𝟭

AI Gold Nugget #2.1 – Speaker #1: Raphael Vienne Can we go faster…

23:40

𝐒𝐩𝐞𝐜𝐮𝐥𝐚𝐭𝐢𝐯𝐞 𝐃𝐞𝐜𝐨𝐝𝐢𝐧𝐠 & 𝐒𝐞𝐥𝐟-𝐒𝐩𝐞𝐜𝐮𝐥𝐚𝐭𝐢𝐯𝐞 𝐃𝐞𝐜𝐨𝐝𝐢𝐧𝐠 - 𝗔𝗜 𝗚𝗼𝗹𝗱 𝗡𝘂𝗴𝗴𝗲𝘁 #𝟮.𝟭

33 views

7 months ago

Oxen

Speculative Architecture 14:24 Speculative Decoding Example 15:35 Introducing Medusa 16:53 Medusa's Decoding Heads 17:32 ...

52:16

How Medusa Works

2,636 views

1 year ago

Neural Magic

vLLM Office Hours - Speculative Decoding in vLLM - October 3, 2024

In this vLLM office hours session, we explore the latest updates in vLLM v0.6.2, including Llama 3.2 Vision support, the ...

1:04:28

vLLM Office Hours - Speculative Decoding in vLLM - October 3, 2024

2,886 views

1 year ago

月球大叔

1:08:04

vllm + speculative decoding

416 views

Streamed 3 months ago

One Shot Learning

Shot #14 [Hebrew]: Paper to Code - Speculative Decoding

In this shot we implemented a cool paper - from zero to hero! Speculative decoding is a neat algorithm introduced by two different ...

38:39

Shot #14 [Hebrew]: Paper to Code - Speculative Decoding

1,892 views

1 year ago

The Information Bottleneck

EP5: Speculative Decoding with Nadav Timor

We discussed the inference optimization technique known as Speculative Decoding with a world class researcher, expert, and ...

1:02:23

EP5: Speculative Decoding with Nadav Timor

98 views

3 months ago

Official Elastic Community

How Speculative Decoding Cuts OCR Hallucinations by 90%

This talk explains why document parsing is still hard and why vision-first methods outperform OCR for complex layouts, merged ...

22:21

How Speculative Decoding Cuts OCR Hallucinations by 90%

141 views

1 month ago

Red Hat Community

... vLLM core committer - An Intermediate Guide to Inference Using vLLM: PagedAttention, Quantization, Speculative Decoding, ...

39:58

An Intermediate Guide to Inference Using vLLM

249 views

2 months ago

Kazem Jahanbakhsh

In this video we discuss a technique called speculative decoding to speed up LLMs inference time. This has a number of use ...

23:01

Speculative decoding Part I

19 views

7 months ago

Databricks

... challenges behind them, including: Speculative Decoding, Prefix Caching, Disaggregated Prefill, and multi-accelerator support.

35:53

Accelerating LLM Inference with vLLM

23,699 views

1 year ago

AI Podcast Series. Byte Goose AI.

MatFormer: Explained. Nested Transformer for Elastic Inference. Foundation Models: LLMs.

... downstream evaluation, and significant inference time speedups through techniques like speculative decoding and adaptive ...

20:16

MatFormer: Explained. Nested Transformer for Elastic Inference. Foundation Models: LLMs.

153 views

4 months ago

Audinote-ML

[Audio notes] Fast Inference from Transformers via Speculative Decoding

Note for paper: Fast Inference from Transformers via Speculative Decoding (http://arxiv.org/abs/2211.17192v2) video slides: ...

32:00

[Audio notes] Fast Inference from Transformers via Speculative Decoding

43 views

1 year ago

ViewTube