Upload date
All time
Last hour
Today
This week
This month
This year
Type
All
Video
Channel
Playlist
Movie
Duration
Short (< 4 minutes)
Medium (4-20 minutes)
Long (> 20 minutes)
Sort by
Relevance
Rating
View count
Features
HD
Subtitles/CC
Creative Commons
3D
Live
4K
360°
VR180
HDR
2,922 results
giskard llm
llm inference optimization
rag evaluation
deepeval
best llm
llm evaluation techniques
ragas
ai evaluation
Want to play with the technology yourself? Explore our interactive demo → https://ibm.biz/BdKetJ Learn more about the ...
16,290 views
1 year ago
Professional Certificate Program in Generative AI and Machine Learning - IITG (India Only) ...
2,130 views
Interpreting and running standardized language model benchmarks and evaluation datasets for both generalized and task ...
7,305 views
Check out my website here! https://leaderboard.bycloud.ai/ In this video, I will be going through and explain the benchmarks for ...
26,275 views
In this video, we'll talk about LLM evaluation benchmarks. 00:12 What are LLM evaluation benchmarks? 00:59 Examples of LLM ...
1,613 views
For more information about Stanford's graduate programs, visit: https://online.stanford.edu/graduate-education November 21, ...
19,192 views
3 weeks ago
GPT-5.2 just reached 75% accuracy on ARC-AGI, one of the hardest reasoning benchmarks in AI. In this video, I break down: ...
2,372 views
3 days ago
Ex-OpenAI pioneer Ilya Sutskever warns that as AI begins to self-improve, its trajectory may become "extremely unpredictable and ...
4,759,681 views
5 months ago
Building an evaluation from the ground up requires iteration and testing. In this video, we walk through how to use Arize Phoenix ...
1,843 views
4 months ago
A NEW benchmark and guide which quantization models to use locally on your PC or laptop. Either in Ollama or in LM Studio, ...
3,693 views
Today, I want to share a new episode with Aman Khan. The best way to learn about AI evaluations is to watch 2 PMs build them ...
18,263 views
GLM-4.7 v/s Minimax M2.1 (In-depth comparison): Which is the best open coding model?
15,183 views
2 days ago
That gameboy blender animation...took 6 hours to render . Anyway, had a ton of fun coding this up and finally getting back to ...
34,173 views
6 months ago
What's the best open-source model for coding right now? I tested GLM 4.7, MiniMax M2.1, DeepSeek 3.2, and Kimi K2 Thinking ...
5,464 views
1 day ago
I put three systems to the local LLM test. Gear Links * K9 Mini with 32GB RAM: https://amzn.to/3ZiKjcp * 🛠️ 96GB ...
256,445 views
Most devs are using LLMs daily but don't have a clue about some of the fundamentals. Understanding tokens is crucial because ...
138,497 views
3 months ago
Done uh Blaze um he has benchmarked the benchmarks basically um against so chatbot Arena um Arena ELO here chatbot ...
588 views
Evaluating LLM-based applications can feel like more of an art than a science. In this workshop, we'll give a hands-on introduction ...
43,624 views
2 years ago
This presentation was part of #FHIRDevDays 2025, the premiere event for #FHIR implementers organized by @FirelyTeam and ...
0 views
Join us for a comprehensive survey of techniques designed to unlock the full potential of Language Model Models (LLMs).
219,372 views
This video shares the list of LLM Benchmarks commonly used by EluetherAI. PLEASE FOLLOW ME: ▷ LinkedIn: ...
271 views
For more information about Stanford's online Artificial Intelligence programs visit: https://stanford.io/ai This lecture covers: 1.
12,329 views
9 months ago
Build Your First Scalable Product with LLMs: https://academy.towardsai.net/courses/beginner-to-advanced-llm-dev?ref=1f9b29 ...
17,959 views
Get repo access at Trelis.com/ADVANCED-evals Trelis Evals (hosted solution) - Waitlist: https://forms.gle/q2bHurzLYNLW5d1U7 ...
2,276 views
8 months ago
In this video, we look into how to evaluate and benchmark Large Language Models (LLMs) effectively. Learn about perplexity ...
8,309 views
... to Agentic AI Applications 1:54 Understanding LLM Evaluations 4:54 Core Challenges in LLM Development 7:54 Importance of ...
21,913 views
In this video, we dive into the world of cutting-edge AI evaluation with SmartPlay, a groundbreaking benchmark designed to put ...
269 views
LLM evaluation is critical for generative AI in the enterprise, but measuring how well an LLM answers questions or performs tasks ...
9,923 views