llm inference

AI Inference: The Secret to AI's Superpowers

Download the AI model guide to learn more → https://ibm.biz/BdaJTb Learn more about the technology → https://ibm.biz/BdaJTp ...

10:41

AI Inference: The Secret to AI's Superpowers

101,761 views

1 year ago

Julien Simon

Open-source LLMs are great for conversational applications, but they can be difficult to scale in production and deliver latency ...

36:12

Deep Dive: Optimizing LLM inference

42,725 views

1 year ago

IBM Technology

What is vLLM? Efficient AI Inference for Large Language Models

Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ...

4:58

What is vLLM? Efficient AI Inference for Large Language Models

55,111 views

6 months ago

AI Engineer

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou

LLM inference is not your normal deep learning model deployment nor is it trivial when it comes to managing scale, performance ...

33:39

Mastering LLM Inference Optimization From Theory to Cost Effective Deployment: Mark Moyou

27,890 views

11 months ago

IBM Technology

Faster LLMs: Accelerate Inference with Speculative Decoding

Isaac Ke explains speculative decoding, a technique that accelerates LLM inference speeds by 2-4x without compromising output ...

9:39

Faster LLMs: Accelerate Inference with Speculative Decoding

17,949 views

6 months ago

DataCamp

Understanding LLM Inference | NVIDIA Experts Deconstruct How AI Works

In the last eighteen months, large language models (LLMs) have become commonplace. For many people, simply being able to ...

55:39

Understanding LLM Inference | NVIDIA Experts Deconstruct How AI Works

20,373 views

Streamed 1 year ago

PyTorch

Understanding the LLM Inference Workload - Mark Moyou, NVIDIA Understanding how to effectively size a production grade LLM ...

34:14

Understanding the LLM Inference Workload - Mark Moyou, NVIDIA

21,853 views

1 year ago

New Machina

VIDEO TITLE AI ML Training versus Inference ✍️VIDEO DESCRIPTION ✍️ AI / ML Knowledge one concept at a time.

4:41

AI ML Training versus Inference

9,249 views

1 year ago

Code to the Moon

Insanely Fast LLM Inference with this Stack

A walkthrough of some of the options developers are faced with when building applications that leverage LLMs. Includes ...

10:43

Insanely Fast LLM Inference with this Stack

9,628 views

3 months ago

AI Engineer

How fast are LLM inference engines anyway? — Charles Frye, Modal

Open weights models and open source inference servers have made massive strides in the year since we last got together at AIE ...

16:07

How fast are LLM inference engines anyway? — Charles Frye, Modal

1,363 views

5 months ago

YanAITalk

LLM inference optimization: Architecture, KV cache and Flash attention

... increasing size of the models comes with the increasing co uh increasing cost uh to train and to run inference uh on these large ...

44:06

LLM inference optimization: Architecture, KV cache and Flash attention

13,788 views

1 year ago

Andrej Karpathy

This is a general audience deep dive into the Large Language Model (LLM) AI technology that powers ChatGPT and related ...

3:31:24

Deep Dive into LLMs like ChatGPT

4,413,106 views

10 months ago

VERSES

Karl Friston Discusses Active Inference vs Generative AI LLM & IWAI Conference

In this month's Karl's Corner, Dr. Karl Friston, VERSES Chief Scientist, breaks down some big ideas, starting with the "bitter lesson.

30:05

Karl Friston Discusses Active Inference vs Generative AI LLM & IWAI Conference

1,573 views

2 months ago

3Blue1Brown

A light intro to LLMs, chatbots, pretraining, and transformers. Dig deeper here: ...

7:58

Large Language Models explained briefly

4,784,592 views

1 year ago

Richard Aragon

Defeating Nondeterminism in LLM Inference Is Impossible

Link to Document: ...

31:11

Defeating Nondeterminism in LLM Inference Is Impossible

778 views

3 months ago

AppliedAI

How Much GPU Memory is Needed for LLM Inference?

Discover a simple method to calculate GPU memory requirements for large language models like Llama 70B. Learn how the ...

5:28

How Much GPU Memory is Needed for LLM Inference?

957 views

1 year ago

Red Hat

Ready to serve your large language models faster, more efficiently, and at a lower cost? Discover how vLLM, a high-throughput ...

6:13

Optimize LLM inference with vLLM

7,627 views

5 months ago

MLOps.community

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral

Join the MLOps Community here: mlops.community/join // Abstract Getting the right LLM inference stack means choosing the right ...

30:25

Exploring the Latency/Throughput & Cost Space for LLM Inference // Timothée Lacroix // CTO Mistral

26,677 views

2 years ago

Nadav Timor

EAGLE and EAGLE-2: Lossless Inference Acceleration for LLMs - Hongyang Zhang

About the seminar: https://faster-llms.vercel.app Speaker: Hongyang Zhang (Waterloo & Vector Institute) Title: EAGLE and ...

48:26

EAGLE and EAGLE-2: Lossless Inference Acceleration for LLMs - Hongyang Zhang

2,801 views

10 months ago

GPU MODE

Speaker: Junda Chen.

1:15:19

Lecture 58: Disaggregated LLM Inference

4,510 views

7 months ago

ViewTube

Related queries