reinforcement learning llm

Reinforcement learning is terrible – Andrej Karpathy

Full episode: https://www.youtube.com/watch?v=lXUZvyajciY Me on twitter: https://x.com/dwarkesh_sp Andrej Karpathy helped ...

4:10

Reinforcement learning is terrible – Andrej Karpathy

84,782 views

2 months ago

Dwarkesh Patel

Richard Sutton – Father of RL thinks LLMs are a dead end

Richard Sutton is the father of reinforcement learning, winner of the 2024 Turing Award, and author of The Bitter Lesson. And he ...

1:07:09

Richard Sutton – Father of RL thinks LLMs are a dead end

600,629 views

3 months ago

Natasha Jaques

Lecture on reinforcement learning (RL) fine-tuning of large language models (LLMs). Even though we are in the RL era for ...

33:10

Reinforcement Learning (RL) for LLMs

11,986 views

9 months ago

Lex Clips

Yann LeCun: Why RL is overrated | Lex Fridman Podcast Clips

Lex Fridman Podcast full episode: https://www.youtube.com/watch?v=5t1vTLU7s40 Please support this podcast by checking out ...

5:30

Yann LeCun: Why RL is overrated | Lex Fridman Podcast Clips

29,263 views

1 year ago

Trelis Research

Get access to the ADVANCED-fine-tuning Repo: https://trelis.com/ADVANCED-fine-tuning/ Consulting (Technical Assistance ...

1:18:19

Reinforcement Learning for LLMs in 2025

15,003 views

10 months ago

Adam Lucek

Reinforcement Learning with Verifiable Rewards - Teaching LLMs to Solve Problems

Strengthen your technical foundations with Brilliant! Visit https://brilliant.org/AdamLucek/ to start learning for free and save 20% off ...

39:33

Reinforcement Learning with Verifiable Rewards - Teaching LLMs to Solve Problems

2,705 views

1 month ago

IBM Technology

Reinforcement Learning from Human Feedback (RLHF) Explained

Want to play with the technology yourself? Explore our interactive demo → https://ibm.biz/BdKSby Learn more about the ...

11:29

Reinforcement Learning from Human Feedback (RLHF) Explained

72,263 views

1 year ago

Efficient NLP

Training LLM to play chess using Deepseek GRPO reinforcement learning

Try Voice Writer - speak your thoughts and let AI handle the grammar: https://voicewriter.io In this video, we see how popular LLMs ...

29:38

Training LLM to play chess using Deepseek GRPO reinforcement learning

17,762 views

9 months ago

AI Engineer

Reinforcement Learning for Agents - Will Brown, ML Researcher at Morgan Stanley

Recorded live at the Agent Engineering Session Day from the AI Engineer Summit 2025 in New York. Learn more at ...

18:17

Reinforcement Learning for Agents - Will Brown, ML Researcher at Morgan Stanley

103,746 views

9 months ago

Julia Turc

DeepSeek's GRPO (Group Relative Policy Optimization) | Reinforcement Learning for LLMs

In this video, I break down DeepSeek's Group Relative Policy Optimization (GRPO) from first principles, without assuming prior ...

23:16

DeepSeek's GRPO (Group Relative Policy Optimization) | Reinforcement Learning for LLMs

33,772 views

9 months ago

Julia Turc

Proximal Policy Optimization (PPO) for LLMs Explained Intuitively

It's the backbone of Reinforcement Learning with Human Feedback (RLHF) -- which helps align AI models with human ...

22:03

Proximal Policy Optimization (PPO) for LLMs Explained Intuitively

39,519 views

9 months ago

Neural Breakdown with AVB

How I finetuned a Small LM to THINK and solve puzzles on its own (GRPO & RL!)

Plus, we are going through the policy gradient equation, explaining RLVR (reinforcement learning with verifiable rewards), and ...

51:06

How I finetuned a Small LM to THINK and solve puzzles on its own (GRPO & RL!)

20,742 views

6 months ago

StatQuest with Josh Starmer

Reinforcement Learning with Human Feedback (RLHF), Clearly Explained!!!

Generative Large Language Models, like ChatGPT and DeepSeek, are trained on massive text based datasets, like the entire ...

18:02

Reinforcement Learning with Human Feedback (RLHF), Clearly Explained!!!

44,890 views

7 months ago

Adam Lucek

I Trained an LLM to Think Deeper (Here's How)

Turns out reinforcement learning is all you need Check out my prior video on RL: ...

27:04

I Trained an LLM to Think Deeper (Here's How)

11,601 views

10 months ago

3Blue1Brown

A light intro to LLMs, chatbots, pretraining, and transformers. Dig deeper here: ...

7:58

Large Language Models explained briefly

4,817,726 views

1 year ago

Graphics in 5 Minutes

How does Reinforcement Learning work? A short cartoon that intuitively explains this amazing machine learning approach, and ...

8:25

Reinforcement Learning from scratch

235,395 views

2 years ago

Umar Jamil

Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.

In this video, I will explain Reinforcement Learning from Human Feedback (RLHF) which is used to align, among others, models ...

2:15:13

Reinforcement Learning from Human Feedback explained with math derivations and the PyTorch code.

63,531 views

1 year ago

ViewTube

Related queries