Upload date
All time
Last hour
Today
This week
This month
This year
Type
All
Video
Channel
Playlist
Movie
Duration
Short (< 4 minutes)
Medium (4-20 minutes)
Long (> 20 minutes)
Sort by
Relevance
Rating
View count
Features
HD
Subtitles/CC
Creative Commons
3D
Live
4K
360°
VR180
HDR
64,284 results
grpo
reinforcement learning from human feedback
reinforcement learning deepseek
reinforcement learning course
direct preference optimization
cs224n
deepseek explained
openai
reinforcement learning q learning
reinforcement learning stanford
reinforcement learning playlist
ppo
Full episode: https://www.youtube.com/watch?v=lXUZvyajciY Me on twitter: https://x.com/dwarkesh_sp Andrej Karpathy helped ...
84,782 views
2 months ago
Richard Sutton is the father of reinforcement learning, winner of the 2024 Turing Award, and author of The Bitter Lesson. And he ...
600,629 views
3 months ago
Lecture on reinforcement learning (RL) fine-tuning of large language models (LLMs). Even though we are in the RL era for ...
11,986 views
9 months ago
Lex Fridman Podcast full episode: https://www.youtube.com/watch?v=5t1vTLU7s40 Please support this podcast by checking out ...
29,263 views
1 year ago
Get access to the ADVANCED-fine-tuning Repo: https://trelis.com/ADVANCED-fine-tuning/ Consulting (Technical Assistance ...
15,003 views
10 months ago
Strengthen your technical foundations with Brilliant! Visit https://brilliant.org/AdamLucek/ to start learning for free and save 20% off ...
2,705 views
1 month ago
Want to play with the technology yourself? Explore our interactive demo → https://ibm.biz/BdKSby Learn more about the ...
72,263 views
Try Voice Writer - speak your thoughts and let AI handle the grammar: https://voicewriter.io In this video, we see how popular LLMs ...
17,762 views
Recorded live at the Agent Engineering Session Day from the AI Engineer Summit 2025 in New York. Learn more at ...
103,746 views
In this video, I break down DeepSeek's Group Relative Policy Optimization (GRPO) from first principles, without assuming prior ...
33,772 views
It's the backbone of Reinforcement Learning with Human Feedback (RLHF) -- which helps align AI models with human ...
39,519 views
Plus, we are going through the policy gradient equation, explaining RLVR (reinforcement learning with verifiable rewards), and ...
20,742 views
6 months ago
Generative Large Language Models, like ChatGPT and DeepSeek, are trained on massive text based datasets, like the entire ...
44,890 views
7 months ago
Turns out reinforcement learning is all you need Check out my prior video on RL: ...
11,601 views
A light intro to LLMs, chatbots, pretraining, and transformers. Dig deeper here: ...
4,817,726 views
How does Reinforcement Learning work? A short cartoon that intuitively explains this amazing machine learning approach, and ...
235,395 views
2 years ago
In this video, I will explain Reinforcement Learning from Human Feedback (RLHF) which is used to align, among others, models ...
63,531 views