speculative decoding

Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ...

9:39

Faster LLMs: Accelerate Inference with Speculative Decoding

17,945 views

6 months ago

Efficient NLP

Speculative Decoding: When Two LLMs are Faster than One

Try Voice Writer - speak your thoughts and let AI handle the grammar: https://voicewriter.io Speculative decoding (or speculative ...

12:46

Speculative Decoding: When Two LLMs are Faster than One

29,600 views

2 years ago

Trelis Research

One Click Templates Repo (free): https://github.com/TrelisResearch/one-click-llms Advanced Inference Repo (Paid Lifetime ...

37:34

Speculative Decoding Explained

7,417 views

2 years ago

Nadav Timor

EAGLE and EAGLE-2: Lossless Inference Acceleration for LLMs - Hongyang Zhang

About the seminar: https://faster-llms.vercel.app Speaker: Hongyang Zhang (Waterloo & Vector Institute) Title: EAGLE and ...

48:26

EAGLE and EAGLE-2: Lossless Inference Acceleration for LLMs - Hongyang Zhang

2,802 views

10 months ago

Red Hat

Lossless LLM inference acceleration with Speculators

Red Hat's Mark Kurtz and Megan Flynn examine speculative decoding, a technique that uses a smaller, faster model—the ...

29:48

Lossless LLM inference acceleration with Speculators

324 views

1 month ago

Julien Simon

00:00 Introduction 01:15 Decoder-only inference 06:05 The KV cache 11:15 Continuous batching 16:17 Speculative decoding ...

36:12

Deep Dive: Optimizing LLM inference

42,722 views

1 year ago

Tales Of Tensors

Speculative Decoding: 3× Faster LLM Inference with Zero Quality Loss

Speculative decoding is one of the most important performance optimizations in modern LLM serving—and most people still don't ...

7:40

Speculative Decoding: 3× Faster LLM Inference with Zero Quality Loss

58 views

2 days ago

Intel Software

Speed up your Large Language Model by 2 or 3 times with OpenVINO's speculative decoding. Much faster inference without ...

7:00

Speculative Decoding with OpenVINO | Intel Software

196,877 views

5 months ago

AssemblyAI

What is Speculative Sampling? | Boosting LLM inference speed

Speculative Sampling is a decoding strategy that yields 2-3x speedups in LLM inference by generating multiple tokens per model ...

6:18

What is Speculative Sampling? | Boosting LLM inference speed

3,684 views

1 year ago

Jordan Boyd-Graber

Why using a dumb language model can speed up a smarter one: Speculative Decoding [Lecture]

This is a single lecture from a course. If you you like the material and want more context (e.g., the lectures that came before), check ...

7:48

Why using a dumb language model can speed up a smarter one: Speculative Decoding [Lecture]

109 views

3 weeks ago

GPU MODE

Lecture 22: Hacker's Guide to Speculative Decoding in VLLM

Abstract: We will discuss how vLLM combines continuous batching with speculative decoding with a focus on enabling external ...

1:09:25

Lecture 22: Hacker's Guide to Speculative Decoding in VLLM

11,009 views

1 year ago

The TWIML AI Podcast with Sam Charrington

Speculative Decoding and Efficient LLM Inference with Chris Lott - 717

Today, we're joined by Chris Lott, senior director of engineering at Qualcomm AI Research to discuss accelerating large language ...

1:16:02

Speculative Decoding and Efficient LLM Inference with Chris Lott - 717

1,659 views

10 months ago

EleutherAI

ML Performance Reading Group Session 19: Speculative Decoding

Session covering an overview of speculative decoding and several seminal papers in the space, including Medusa, Eagle 1/2/3, ...

1:36:03

ML Performance Reading Group Session 19: Speculative Decoding

489 views

4 days ago

GosuCoder

MASSIVELY speed up local AI models with Speculative Decoding in LM Studio

There is a lot of possibility with Speculative Decoding allowing us normal folks to run larger and larger AI models at home. I hope ...

22:36

MASSIVELY speed up local AI models with Speculative Decoding in LM Studio

18,544 views

9 months ago

R

Speculative Decoding for Fast LLM Inference Algorithm explained in detail

This video introduces the playlist, explains the Speculative Decoding Algorithm from the paper https://arxiv.org/pdf/2211.17192 in ...

18:45

Speculative Decoding for Fast LLM Inference Algorithm explained in detail

19 views

7 days ago

Lex Clips

How to make LLMs fast: KV Caching, Speculative Decoding, and Multi-Query Attention | Cursor Team

Lex Fridman Podcast full episode: https://www.youtube.com/watch?v=oFfVt3S51T4 Thank you for listening ❤ Check out our ...

15:15

How to make LLMs fast: KV Caching, Speculative Decoding, and Multi-Query Attention | Cursor Team

11,535 views

1 year ago

Genpakt

What is Speculative Decoding? How Do I Use It With vLLM

Please be patient and watch till the end of the video. More nuggets there :D Request Notebook Here: ...

12:56

What is Speculative Decoding? How Do I Use It With vLLM

950 views

1 year ago

Hertz Foundation

Accelerating Inference with Staged Speculative Decoding — Ben Spector | 2023 Hertz Summer Workshop

Hertz Fellow Benjamin Spector, a doctoral student at Stanford University, presents "Accelerating Inference with Staged ...

6:45

Accelerating Inference with Staged Speculative Decoding — Ben Spector | 2023 Hertz Summer Workshop

1,735 views

2 years ago

Doubleword

Behind the Stack, Ep. 13 - Faster Inference: Speculative Decoding for Batched Workloads

Speculative decoding is usually discussed as a way to make real time LLM APIs feel faster. But what happens when you apply it to ...

19:54

Behind the Stack, Ep. 13 - Faster Inference: Speculative Decoding for Batched Workloads

29 views

3 weeks ago

MLWorks

Understanding Speculative Decoding: Boosting LLM Efficiency and Speed

In this video, we're diving deep into Speculative Decoding, an advanced technique that is revolutionizing how AI language ...

14:37

Understanding Speculative Decoding: Boosting LLM Efficiency and Speed

329 views

8 months ago

ViewTube

Related queries