MoE Models

Want to play with the technology yourself? Explore our interactive demo → https://ibm.biz/BdK8fn Learn more about the ...

7:58

What is Mixture of Experts?

48,818 views

1 year ago

Maarten Grootendorst

A Visual Guide to Mixture of Experts (MoE) in LLMs

In this highly visual guide, we explore the architecture of a Mixture of Experts in Large Language Models (LLM) and Vision ...

19:44

A Visual Guide to Mixture of Experts (MoE) in LLMs

48,738 views

1 year ago

AI Papers Academy

Introduction to Mixture-of-Experts | Original MoE Paper Explained

In this video we go back to the extremely important Google paper which introduced the Mixture-of-Experts (MoE) layer with ...

4:41

Introduction to Mixture-of-Experts | Original MoE Paper Explained

11,544 views

1 year ago

Julia Turc

Mixture of Experts: How LLMs get bigger without getting slower

Mixture of Experts (MoE) is everywhere: Meta / Llama 4, DeepSeek, Mistral. But how does it actually work? Do experts specialize?

26:42

Mixture of Experts: How LLMs get bigger without getting slower

26,800 views

9 months ago

Stanford Online

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 4: Mixture of experts

For more information about Stanford's online Artificial Intelligence programs visit: https://stanford.io/ai To learn more about ...

1:22:04

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 4: Mixture of experts

62,088 views

9 months ago

Chris Hay

MoE Models Don't Work Like You Think - Inside GPT-OSS

Many people think that mixture of expert models have domain experts, i.e. math experts, code experts, language experts.

18:28

MoE Models Don't Work Like You Think - Inside GPT-OSS

3,780 views

2 weeks ago

bycloud

1 Million Tiny Experts in an AI? Fine-Grained MoE Explained

To try everything Brilliant has to offer—free—for a full 30 days, visit https://brilliant.org/bycloud/ . You'll also get 20% off an annual ...

12:29

1 Million Tiny Experts in an AI? Fine-Grained MoE Explained

55,233 views

1 year ago

Cerebras

Mixture of Experts Explained: How to Build, Train & Debug MoE Models in 2025

Mixture-of-Experts (MoE) models now power leading AI systems like GPT-4, Qwen3, DeepSeek-v3, and Gemini 1.5. But behind ...

4:32

Mixture of Experts Explained: How to Build, Train & Debug MoE Models in 2025

1,571 views

6 months ago

Organic Mechanisms

How To Create And Use A Pharmacophore In MOE | MOE Tutorial

Molecular Operating Environment (MOE) tutorial covering how to create and use a pharmacophore in MOE. When docking ...

4:12

How To Create And Use A Pharmacophore In MOE | MOE Tutorial

7,295 views

3 years ago

SaM Solutions

Mixture-of-Experts (MoE) LLMs: The Future of Efficient AI Models

Imagine having a whole team of specialists at your disposal, each an expert in a different field, and a smart coordinator who ...

6:01

Mixture-of-Experts (MoE) LLMs: The Future of Efficient AI Models

299 views

5 months ago

AI Research Roundup

Why Orthogonal Weights Fail in MoE Models

In this AI Research Roundup episode, Alex discusses the paper: 'Geometric Regularization in Mixture-of-Experts: The Disconnect ...

3:35

Why Orthogonal Weights Fail in MoE Models

11 views

2 weeks ago

Stanford Online

Stanford CS25: V1 I Mixture of Experts (MoE) paradigm and the Switch Transformer

In deep learning, models typically reuse the same parameters for all inputs. Mixture of Experts (MoE) defies this and instead ...

1:05:44

Stanford CS25: V1 I Mixture of Experts (MoE) paradigm and the Switch Transformer

40,385 views

3 years ago

Paper With Video

[2024 Best AI Paper] MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

This video was created using https://paperspeech.com. If you'd like to create explainer videos for your own papers, please visit the ...

11:36

[2024 Best AI Paper] MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

71 views

1 year ago

No Hype AI

How Did They Do It? DeepSeek V3 and R1 Explained

DeepSeek: The First Open-Weight Reasoning Model! In this video, I'll break down DeepSeek's two flagship models— V3 and R1 ...

11:15

How Did They Do It? DeepSeek V3 and R1 Explained

47,221 views

11 months ago

Matt Williams

What are the different types of models - The Ollama Course

Dive into the world of Ollama and discover the various types of AI models at your fingertips. This comprehensive guide breaks ...

6:49

What are the different types of models - The Ollama Course

38,292 views

1 year ago

AI with Lena Hall

Transformers vs MoE vs RNN vs Hybrid: Intuitive LLM Architecture Guide

Most developers default to transformers without understanding the alternatives. This video breaks down the intuition behind four ...

16:56

Transformers vs MoE vs RNN vs Hybrid: Intuitive LLM Architecture Guide

19,204 views

3 months ago

Anyscale

Ray + vLLM Efficient Multi Node Orchestration for Sparse MoE Model Serving | Ray Summit 2025

Slides: https://drive.google.com/file/d/11OSdPJLZ1v4QH2KHlEYGYCts5qEdR5gN/view?usp=sharing At Ray Summit 2025, ...

30:58

Ray + vLLM Efficient Multi Node Orchestration for Sparse MoE Model Serving | Ray Summit 2025

653 views

2 months ago

AI Coffee Break with Letitia

MAMBA and State Space Models explained | SSM explained

We simply explain and illustrate Mamba, State Space Models (SSMs) and Selective SSMs. SSMs match performance of ...

22:27

MAMBA and State Space Models explained | SSM explained

82,727 views

1 year ago

bycloud

The REAL AI Architecture That Unifies Vision & Language

Get started now with open source & privacy focused password manager by Proton! https://proton.me/pass/bycloudai In this video, ...

10:13

The REAL AI Architecture That Unifies Vision & Language

44,629 views

7 months ago

Marktechpost AI

NVIDIA Releases Nemotron 3: Hybrid Mamba Transformer Models With Latent MoE .....

NVIDIA Nemotron 3 is an open family of hybrid Mamba Transformer MoE models, designed for agentic AI with long context and ...

5:30

NVIDIA Releases Nemotron 3: Hybrid Mamba Transformer Models With Latent MoE .....

586 views

1 month ago

Cerebras

Daria Soboleva Training and Serving MoE Models Efficiently

... models efficiently before I start a quick intro about myself um I am researching LMS at Cerebras one of my recent gigs is MOE ...

9:18

Daria Soboleva Training and Serving MoE Models Efficiently

206 views

1 month ago

bycloud

Mamba Might Just Make LLMs 1000x Cheaper...

Check out HubSpot's ChatGPT at work bundle! https://clickhubspot.com/twc Would mamba bring a revolution to LLMs and ...

14:06

Mamba Might Just Make LLMs 1000x Cheaper...

141,245 views

1 year ago

LLM Implementation

How 120B+ Parameter Models Run on One GPU (The MoE Secret)

How is it possible for a 120 billion parameter AI model to run on a single consumer GPU? This isn't magic—it's the result of ...

6:47

How 120B+ Parameter Models Run on One GPU (The MoE Secret)

1,582 views

5 months ago

Red Hat

[vLLM Office Hours #29] Scaling MoE with llm-d

Time Stamps: 00:00 Bi-weekly vLLM project update (v0.9.2 and v0.10.0) 14:30 Scaling MoE models with llm-d 55:40 Q&A + ...

1:02:27

[vLLM Office Hours #29] Scaling MoE with llm-d

2,013 views

Streamed 5 months ago

Professor Rahul Jain

MoE AI Models Explained | Future of Scalable & Efficient Artificial Intelligence

Unlock the future of Artificial Intelligence with this quick and powerful explanation of MoE (Mixture of Experts) AI Models. In under ...

2:01

MoE AI Models Explained | Future of Scalable & Efficient Artificial Intelligence

67 views

1 month ago

ViewTube