tensorrt llm

TensorRT LLM v1.0 includes a new architecture build with modular Python and PyTorch to make deployment and development ...

31:35

TensorRT LLM 1.0 Livestream: New Easy-To-Use Pythonic Runtime

2,465 views

Streamed 3 months ago

Google for Developers

Demo: Optimizing Gemma inference on NVIDIA GPUs with TensorRT-LLM

Even the smallest of Large Language Models are compute intensive significantly affecting the cost of your Generative AI ...

12:21

Demo: Optimizing Gemma inference on NVIDIA GPUs with TensorRT-LLM

4,736 views

1 year ago

Ask Simon!

Tensorrt Vs Vllm Which Open Source Library Wins 2025

TensorRT vs vLLM – Which Open-Source LLM Library Wins in 2025? Speed, scalability, and real-time inference — but which ...

2:03

Tensorrt Vs Vllm Which Open Source Library Wins 2025

359 views

3 months ago

Sam mokhtari

🔍 AI Serving Frameworks Explained: vLLM vs TensorRT-LLM vs Ray Serve | Which One Should You Use?

Choosing the right AI serving framework is critical for scaling large language models (LLMs) in production. In this video, we break ...

35:16

🔍 AI Serving Frameworks Explained: vLLM vs TensorRT-LLM vs Ray Serve | Which One Should You Use?

827 views

3 months ago

Modal

⚡Blazing Fast LLaMA 3: Crush Latency with TensorRT LLM

In this video, you'll learn how to serve Meta's LLaMA 3 8B model using TensorRT-LLM on Modal and bring latency down to under ...

6:51

⚡Blazing Fast LLaMA 3: Crush Latency with TensorRT LLM

1,406 views

7 months ago

AI Engineer

From model weights to API endpoint with TensorRT LLM: Philip Kiely and Pankaj Gupta

TensorRT-LLM is the highest-performance model serving framework, but it can have a steep learning curve when you're just ...

1:40:01

From model weights to API endpoint with TensorRT LLM: Philip Kiely and Pankaj Gupta

4,267 views

1 year ago

TensorFlow

In this episode of TensorFlow Meets, we are joined by Chris Gottbrath from NVidia and X.Q. from the Google Brain team to talk ...

8:07

NVidia TensorRT: high-performance deep learning inference accelerator (TensorFlow Meets)

22,382 views

7 years ago

Fahd Mirza

How-To Install TensorRT Locally to Optimize and Serve Any Model

This video installs TensorRT locally and tests it. TensorRT delivers blazing-fast GPU inference by optimizing kernels. Get 50% ...

8:38

How-To Install TensorRT Locally to Optimize and Serve Any Model

1,741 views

1 month ago

bycloud

All You Need To Know About Running LLMs Locally

Profit TensorRT LLM [Code] https://github.com/NVIDIA/TensorRT-LLM [Getting Started Blog] https://nvda.ws/3O7f8up [Dev Blog] ...

10:30

All You Need To Know About Running LLMs Locally

294,920 views

1 year ago

Toronto Machine Learning Society (TMLS)

LLM Inference: A Comparative Guide to Modern Open-Source Runtimes | Aleksandr Shirokov, Wildberries

From the MLOps World | GenAI Summit 2025 — Virtual Session (October 6, 2025) Session Title: LLM Inference: A Comparative ...

51:36

LLM Inference: A Comparative Guide to Modern Open-Source Runtimes | Aleksandr Shirokov, Wildberries

445 views

2 months ago

Hummingbird

What is Pytorch, TF, TFLite, TensorRT, ONNX?

Basic ideas behind Pytorch, TF, TFLite, TensorRT, ONNX in machine learning.

3:58

What is Pytorch, TF, TFLite, TensorRT, ONNX?

4,611 views

1 year ago

WorldofAI

NVIDIA's TensorRT-LLM: Building Powerful RAG Apps! (Opensource)

In this video, we will be taking a looking at NVIDIA's TensorRT-LLM and how it streamlines the deployment and optimization of ...

10:51

NVIDIA's TensorRT-LLM: Building Powerful RAG Apps! (Opensource)

5,904 views

1 year ago

Baseten

Automatic LLM optimization with TensorRT-LLM Engine Builder

TensorRT-LLM is an open source performance optimization toolbox created by NVIDIA for optimizing large language model ...

7:58

Automatic LLM optimization with TensorRT-LLM Engine Builder

1,815 views

1 year ago

PyTorch

Sponsored Session: Amazingly Fast and Incredibly Scalable Inference... - Harry Kim & Laikh Tewari

Sponsored Session: Amazingly Fast and Incredibly Scalable Inference with NVIDIA's Dynamo and TensorRT-LLM - Harry Kim ...

26:19

Sponsored Session: Amazingly Fast and Incredibly Scalable Inference... - Harry Kim & Laikh Tewari

129 views

1 month ago

NVIDIA Developer

Beyond the Algorithm with NVIDIA: TensorRT-LLM Goes GitHub First

Join us to learn more about the TensorRT-LLM's new open-development model. In this livestream, you'll learn contribution ...

44:09

Beyond the Algorithm with NVIDIA: TensorRT-LLM Goes GitHub First

2,951 views

Streamed 7 months ago

Manny Bernabe

In this video, you'll learn how to serve Meta's LLaMA 3 8B model using TensorRT-LLM on Modal and bring latency down to under ...

6:51

⚡Blazing-Fast LLaMA 3: Crush Latency with TensorRT-LLM

514 views

7 months ago

IBM Technology

Faster LLMs: Accelerate Inference with Speculative Decoding

Ready to become a certified watsonx AI Assistant Engineer? Register now and use code IBMTechYT20 for 20% off of your exam ...

9:39

Faster LLMs: Accelerate Inference with Speculative Decoding

17,954 views

6 months ago

NVIDIA Developer

The practice of doing performance analysis/optimization with TensorRT-LLM

Learn best practices on TensorRT-LLM performance analysis and optimization. Hear from our experts on the analysis the ...

54:01

The practice of doing performance analysis/optimization with TensorRT-LLM

1,261 views

Streamed 4 months ago

Innoplexus

Accelerating LLM inference using TensorRT-LLM! by Megh Makwana at Pune GPU Community's meetup

Relive the insightful moments from the Pune GPU Community's meetup LLM Application Showcase & Quantum Computing at ...

39:30

Accelerating LLM inference using TensorRT-LLM! by Megh Makwana at Pune GPU Community's meetup

622 views

1 year ago

Fireship

What is CUDA? And how does parallel computing on the GPU enable developers to unlock the full potential of AI? Learn the ...

3:13

Nvidia CUDA in 100 Seconds

1,978,083 views

1 year ago

Vuk Rosić

How To Deploy TensorRT-LLM To RunPod (Bugfix)

How to deploy tensorRT LLM to RunPod Fast LLM inference with Nvidia tensorrt LLM Commands ...

9:25

How To Deploy TensorRT-LLM To RunPod (Bugfix)

430 views

9 months ago

NVIDIA Developer

Beyond the Algorithm with NVIDIA: The New PyTorch Architecture for TensorRT-LLM

TensorRT-LLM equips practitioners with the tools needed to achieve state-of-the-art performance for large language model (LLM) ...

52:07

Beyond the Algorithm with NVIDIA: The New PyTorch Architecture for TensorRT-LLM

3,454 views

Streamed 8 months ago

NVIDIA Developer

Implementation and optimization of MTP for DeepSeek R1 in TensorRT-LLM

Learn from our experts about how we use MTP speculative decoding method to achieve better performance in TensorRT-LLM.

44:58

Implementation and optimization of MTP for DeepSeek R1 in TensorRT-LLM

1,325 views

Streamed 6 months ago

AiAndPixels

Avec TensorRT accélèrerez vos générations |Tuto Comfyui ----- Liens ...

8:31

Avec TensorRT accélèrerez vos générations |Tuto Comfyui

2,038 views

1 year ago

ViewTube

Related queries