ViewTube

ViewTube
Sign inSign upSubscriptions
Filters

Upload date

Type

Duration

Sort by

Features

Reset

2,922 results

Related queries

giskard llm

llm inference optimization

rag evaluation

deepeval

best llm

llm evaluation techniques

ragas

ai evaluation

IBM Technology
What are Large Language Model (LLM) Benchmarks?

Want to play with the technology yourself? Explore our interactive demo → https://ibm.biz/BdKetJ Learn more about the ...

6:21
What are Large Language Model (LLM) Benchmarks?

16,290 views

1 year ago

Simplilearn
LLM Benchmarking | How one LLM is tested against another? | LLM Evaluation Benchmarks | Simplilearn

Professional Certificate Program in Generative AI and Machine Learning - IITG (India Only) ...

9:19
LLM Benchmarking | How one LLM is tested against another? | LLM Evaluation Benchmarks | Simplilearn

2,130 views

1 year ago

Adam Lucek
What Do LLM Benchmarks Actually Tell Us? (+ How to Run Your Own)

Interpreting and running standardized language model benchmarks and evaluation datasets for both generalized and task ...

30:56
What Do LLM Benchmarks Actually Tell Us? (+ How to Run Your Own)

7,305 views

1 year ago

bycloud
7 Popular LLM Benchmarks Explained [OpenLLM Leaderboard & Chatbot Arena]

Check out my website here! https://leaderboard.bycloud.ai/ In this video, I will be going through and explain the benchmarks for ...

5:50
7 Popular LLM Benchmarks Explained [OpenLLM Leaderboard & Chatbot Arena]

26,275 views

1 year ago

Evidently AI
LLM evaluation benchmarks

In this video, we'll talk about LLM evaluation benchmarks. 00:12 What are LLM evaluation benchmarks? 00:59 Examples of LLM ...

3:07
LLM evaluation benchmarks

1,613 views

1 year ago

Stanford Online
Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 8 - LLM Evaluation

For more information about Stanford's graduate programs, visit: https://online.stanford.edu/graduate-education November 21, ...

1:49:25
Stanford CME295 Transformers & LLMs | Autumn 2025 | Lecture 8 - LLM Evaluation

19,192 views

3 weeks ago

People also watched

Universe of AI
GPT-5.2 Just Hit 75% on ARC-AGI! How Is This Possible?

GPT-5.2 just reached 75% accuracy on ARC-AGI, one of the hardest reasoning benchmarks in AI. In this video, I break down: ...

8:01
GPT-5.2 Just Hit 75% on ARC-AGI! How Is This Possible?

2,372 views

3 days ago

AI Upload
Ex-OpenAI Scientist WARNS: "You Have No Idea What's Coming"

Ex-OpenAI pioneer Ilya Sutskever warns that as AI begins to self-improve, its trajectory may become "extremely unpredictable and ...

18:14
Ex-OpenAI Scientist WARNS: "You Have No Idea What's Coming"

4,759,681 views

5 months ago

Arize AI
Build Your First Eval: Creating a Custom LLM Evaluator with a Golden Dataset

Building an evaluation from the ground up requires iteration and testing. In this video, we walk through how to use Arize Phoenix ...

30:49
Build Your First Eval: Creating a Custom LLM Evaluator with a Golden Dataset

1,843 views

4 months ago

Discover AI
LLM Quantization (Ollama, LM Studio): Any Performance Drop? TEST

A NEW benchmark and guide which quantization models to use locally on your PC or laptop. Either in Ollama or in LM Studio, ...

19:01
LLM Quantization (Ollama, LM Studio): Any Performance Drop? TEST

3,693 views

4 months ago

Peter Yang
Complete Beginner's Course on AI Evaluations in 50 Minutes (2025) | Aman Khan

Today, I want to share a new episode with Aman Khan. The best way to learn about AI evaluations is to watch 2 PMs build them ...

51:48
Complete Beginner's Course on AI Evaluations in 50 Minutes (2025) | Aman Khan

18,263 views

4 months ago

AISeeKing
GLM-4.7 v/s Minimax M2.1 (In-depth comparison): Which is the best open coding model?

GLM-4.7 v/s Minimax M2.1 (In-depth comparison): Which is the best open coding model?

9:49
GLM-4.7 v/s Minimax M2.1 (In-depth comparison): Which is the best open coding model?

15,183 views

2 days ago

Nicholas Renotte
How to Fine Tune your own LLM using LoRA (on a CUSTOM dataset!)

That gameboy blender animation...took 6 hours to render . Anyway, had a ton of fun coding this up and finally getting back to ...

1:01:06
How to Fine Tune your own LLM using LoRA (on a CUSTOM dataset!)

34,173 views

6 months ago

AIchievable
Best Open-Source Coding Model? GLM 4.7 vs DeepSeek 3.2 vs MiniMax M2.1 vs Kimi K2

What's the best open-source model for coding right now? I tested GLM 4.7, MiniMax M2.1, DeepSeek 3.2, and Kimi K2 Thinking ...

21:27
Best Open-Source Coding Model? GLM 4.7 vs DeepSeek 3.2 vs MiniMax M2.1 vs Kimi K2

5,464 views

1 day ago

Alex Ziskind
Local LLM Challenge | Speed vs Efficiency

I put three systems to the local LLM test. Gear Links * K9 Mini with 32GB RAM: https://amzn.to/3ZiKjcp * 🛠️ 96GB ...

16:25
Local LLM Challenge | Speed vs Efficiency

256,445 views

1 year ago

Matt Pocock
Most devs don't understand how LLM tokens work

Most devs are using LLMs daily but don't have a clue about some of the fundamentals. Understanding tokens is crucial because ...

10:58
Most devs don't understand how LLM tokens work

138,497 views

3 months ago

markarts
LLM evaluation - Benchmarking the benchmarks!

Done uh Blaze um he has benchmarked the benchmarks basically um against so chatbot Arena um Arena ELO here chatbot ...

0:59
LLM evaluation - Benchmarking the benchmarks!

588 views

1 year ago

Databricks
Evaluating LLM-based Applications

Evaluating LLM-based applications can feel like more of an art than a science. In this workshop, we'll give a hands-on introduction ...

33:50
Evaluating LLM-based Applications

43,624 views

2 years ago

DevDays
Evaluating LLM performance on FHIR: Practical benchmarks - Joshua Kelly | FHIR DevDays 2025

This presentation was part of #FHIRDevDays 2025, the premiere event for #FHIR implementers organized by @FirelyTeam and ...

19:47
Evaluating LLM performance on FHIR: Practical benchmarks - Joshua Kelly | FHIR DevDays 2025

0 views

3 weeks ago

OpenAI
A Survey of Techniques for Maximizing LLM Performance

Join us for a comprehensive survey of techniques designed to unlock the full potential of Language Model Models (LLMs).

45:32
A Survey of Techniques for Maximizing LLM Performance

219,372 views

2 years ago

Fahd Mirza
LLM Benchmarks for Evaluation

This video shares the list of LLM Benchmarks commonly used by EluetherAI. PLEASE FOLLOW ME: ▷ LinkedIn: ...

2:36
LLM Benchmarks for Evaluation

271 views

2 years ago

Stanford Online
Stanford CS224N: NLP with Deep Learning | Spring 2024 | Lecture 11 - Benchmarking by Yann Dubois

For more information about Stanford's online Artificial Intelligence programs visit: https://stanford.io/ai This lecture covers: 1.

1:24:24
Stanford CS224N: NLP with Deep Learning | Spring 2024 | Lecture 11 - Benchmarking by Yann Dubois

12,329 views

9 months ago

What's AI by Louis-François Bouchard
Key Metrics and Evaluation Methods for RAG

Build Your First Scalable Product with LLMs: https://academy.towardsai.net/courses/beginner-to-advanced-llm-dev?ref=1f9b29 ...

10:43
Key Metrics and Evaluation Methods for RAG

17,959 views

1 year ago

Trelis Research
Build Custom LLM Benchmarks for your Application

Get repo access at Trelis.com/ADVANCED-evals Trelis Evals (hosted solution) - Waitlist: https://forms.gle/q2bHurzLYNLW5d1U7 ...

46:46
Build Custom LLM Benchmarks for your Application

2,276 views

8 months ago

What's AI by Louis-François Bouchard
Master LLMs: Top Strategies to Evaluate LLM Performance

In this video, we look into how to evaluate and benchmark Large Language Models (LLMs) effectively. Learn about perplexity ...

8:42
Master LLMs: Top Strategies to Evaluate LLM Performance

8,309 views

2 years ago

Dave Ebbelaar
How to Systematically Setup LLM Evals (Metrics, Unit Tests, LLM-as-a-Judge)

... to Agentic AI Applications 1:54 Understanding LLM Evaluations 4:54 Core Challenges in LLM Development 7:54 Importance of ...

55:02
How to Systematically Setup LLM Evals (Metrics, Unit Tests, LLM-as-a-Judge)

21,913 views

3 months ago

Prompt Engineer
SmartPlay: The Ultimate Benchmark for Evaluating LLM Agents

In this video, we dive into the world of cutting-edge AI evaluation with SmartPlay, a groundbreaking benchmark designed to put ...

3:21
SmartPlay: The Ultimate Benchmark for Evaluating LLM Agents

269 views

2 years ago

Snorkel AI
How to Evaluate LLM Performance for Domain-Specific Use Cases

LLM evaluation is critical for generative AI in the enterprise, but measuring how well an LLM answers questions or performs tasks ...

56:43
How to Evaluate LLM Performance for Domain-Specific Use Cases

9,923 views

1 year ago