MoE Models

Mixture of Experts: How LLMs get bigger without getting slower

Mixture of Experts (MoE) is everywhere: Meta / Llama 4, DeepSeek, Mistral. But how does it actually work? Do experts specialize?

26:42

Mixture of Experts: How LLMs get bigger without getting slower

26,769 views

9 months ago

Chris Hay

MoE Models Don't Work Like You Think - Inside GPT-OSS

Many people think that mixture of expert models have domain experts, i.e. math experts, code experts, language experts.

18:28

MoE Models Don't Work Like You Think - Inside GPT-OSS

3,772 views

2 weeks ago

Stanford Online

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 4: Mixture of experts

For more information about Stanford's online Artificial Intelligence programs visit: https://stanford.io/ai To learn more about ...

1:22:04

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 4: Mixture of experts

61,984 views

9 months ago

SaM Solutions

Mixture-of-Experts (MoE) LLMs: The Future of Efficient AI Models

Imagine having a whole team of specialists at your disposal, each an expert in a different field, and a smart coordinator who ...

6:01

Mixture-of-Experts (MoE) LLMs: The Future of Efficient AI Models

298 views

5 months ago

Cerebras

Mixture of Experts Explained: How to Build, Train & Debug MoE Models in 2025

Mixture-of-Experts (MoE) models now power leading AI systems like GPT-4, Qwen3, DeepSeek-v3, and Gemini 1.5. But behind ...

4:32

Mixture of Experts Explained: How to Build, Train & Debug MoE Models in 2025

1,569 views

6 months ago

No Hype AI

How Did They Do It? DeepSeek V3 and R1 Explained

DeepSeek: The First Open-Weight Reasoning Model! In this video, I'll break down DeepSeek's two flagship models— V3 and R1 ...

11:15

How Did They Do It? DeepSeek V3 and R1 Explained

47,201 views

11 months ago

AI Research Roundup

Why Orthogonal Weights Fail in MoE Models

In this AI Research Roundup episode, Alex discusses the paper: 'Geometric Regularization in Mixture-of-Experts: The Disconnect ...

3:35

Why Orthogonal Weights Fail in MoE Models

11 views

2 weeks ago

Next Tech and AI

Small Language Models Under 4GB: What Actually Works?

Never get stuck without AI again. Run three Small Language Models (SLMs)—also called Local LLMs—TinyLlama, Gemma-3 and ...

5:51

Small Language Models Under 4GB: What Actually Works?

5,878 views

5 months ago

Anyscale

Ray + vLLM Efficient Multi Node Orchestration for Sparse MoE Model Serving | Ray Summit 2025

Slides: https://drive.google.com/file/d/11OSdPJLZ1v4QH2KHlEYGYCts5qEdR5gN/view?usp=sharing At Ray Summit 2025, ...

30:58

Ray + vLLM Efficient Multi Node Orchestration for Sparse MoE Model Serving | Ray Summit 2025

650 views

2 months ago

bycloud

The REAL AI Architecture That Unifies Vision & Language

Get started now with open source & privacy focused password manager by Proton! https://proton.me/pass/bycloudai In this video, ...

10:13

The REAL AI Architecture That Unifies Vision & Language

44,624 views

7 months ago

AI with Lena Hall

Transformers vs MoE vs RNN vs Hybrid: Intuitive LLM Architecture Guide

Most developers default to transformers without understanding the alternatives. This video breaks down the intuition behind four ...

16:56

Transformers vs MoE vs RNN vs Hybrid: Intuitive LLM Architecture Guide

19,204 views

3 months ago

Red Hat

[vLLM Office Hours #29] Scaling MoE with llm-d

Time Stamps: 00:00 Bi-weekly vLLM project update (v0.9.2 and v0.10.0) 14:30 Scaling MoE models with llm-d 55:40 Q&A + ...

1:02:27

[vLLM Office Hours #29] Scaling MoE with llm-d

2,013 views

Streamed 5 months ago

Learn Meta-Analysis

Change this setting in LM Studio to run MoE LLMs faster.

I changed 2 settings in LM Studio and I increased my t/s by about 4x. My 8gb gpu (rtx 4060) now runs GPT OSS 120b at 20t/s and ...

8:45

Change this setting in LM Studio to run MoE LLMs faster.

12,377 views

4 months ago

xCreate

How to Run LARGE AI Models Locally with Low RAM - Model Memory Streaming Explained

In this video we'll go through three methods of running SUPER LARGE AI models locally, using model streaming, model serving, ...

13:39

How to Run LARGE AI Models Locally with Low RAM - Model Memory Streaming Explained

13,737 views

3 months ago

LLM Implementation

How 120B+ Parameter Models Run on One GPU (The MoE Secret)

How is it possible for a 120 billion parameter AI model to run on a single consumer GPU? This isn't magic—it's the result of ...

6:47

How 120B+ Parameter Models Run on One GPU (The MoE Secret)

1,579 views

5 months ago

New Machina

You'll also learn about real-world MoE models like Mixtral, and DeepSeek, which achieve state-of-the-art performance while ...

5:41

What is LLM Mixture of Experts ?

4,915 views

11 months ago

Cerebras

Daria Soboleva Training and Serving MoE Models Efficiently

... models efficiently before I start a quick intro about myself um I am researching LMS at Cerebras one of my recent gigs is MOE ...

9:18

Daria Soboleva Training and Serving MoE Models Efficiently

206 views

1 month ago

Sebastian Raschka

LLM Building Blocks & Transformer Alternatives

Resources: - Understanding and Coding the KV Cache in LLMs from Scratch: ...

27:09

LLM Building Blocks & Transformer Alternatives

14,282 views

3 months ago

Marktechpost AI

NVIDIA Releases Nemotron 3: Hybrid Mamba Transformer Models With Latent MoE .....

NVIDIA Nemotron 3 is an open family of hybrid Mamba Transformer MoE models, designed for agentic AI with long context and ...

5:30

NVIDIA Releases Nemotron 3: Hybrid Mamba Transformer Models With Latent MoE .....

585 views

1 month ago

Interconnects AI

"All in on building open models in the U.S."

Arcee AI is a the startup I've found to be taking the most real approach to monetizing their open models. With a bunch of ...

1:12:16

"All in on building open models in the U.S."

400 views

13 hours ago

ViewTube