MoE Models

Want to play with the technology yourself? Explore our interactive demo → https://ibm.biz/BdK8fn Learn more about the ...

7:58

What is Mixture of Experts?

48,751 views

1 year ago

Maarten Grootendorst

A Visual Guide to Mixture of Experts (MoE) in LLMs

In this highly visual guide, we explore the architecture of a Mixture of Experts in Large Language Models (LLM) and Vision ...

19:44

A Visual Guide to Mixture of Experts (MoE) in LLMs

48,657 views

1 year ago

AI Papers Academy

Introduction to Mixture-of-Experts | Original MoE Paper Explained

In this video we go back to the extremely important Google paper which introduced the Mixture-of-Experts (MoE) layer with ...

4:41

Introduction to Mixture-of-Experts | Original MoE Paper Explained

11,533 views

1 year ago

Julia Turc

Mixture of Experts: How LLMs get bigger without getting slower

Mixture of Experts (MoE) is everywhere: Meta / Llama 4, DeepSeek, Mistral. But how does it actually work? Do experts specialize?

26:42

Mixture of Experts: How LLMs get bigger without getting slower

26,762 views

9 months ago

Stanford Online

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 4: Mixture of experts

For more information about Stanford's online Artificial Intelligence programs visit: https://stanford.io/ai To learn more about ...

1:22:04

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 4: Mixture of experts

61,964 views

9 months ago

Chris Hay

MoE Models Don't Work Like You Think - Inside GPT-OSS

Many people think that mixture of expert models have domain experts, i.e. math experts, code experts, language experts.

18:28

MoE Models Don't Work Like You Think - Inside GPT-OSS

3,770 views

2 weeks ago

IBM Technology

Mixture of Experts: Boosting AI Efficiency with Modular Models #ai #machinelearning #moe

AI news moves fast. Sign up for a monthly newsletter for AI updates from IBM ...

0:51

Mixture of Experts: Boosting AI Efficiency with Modular Models #ai #machinelearning #moe

4,723 views

1 year ago

bycloud

1 Million Tiny Experts in an AI? Fine-Grained MoE Explained

To try everything Brilliant has to offer—free—for a full 30 days, visit https://brilliant.org/bycloud/ . You'll also get 20% off an annual ...

12:29

1 Million Tiny Experts in an AI? Fine-Grained MoE Explained

55,229 views

1 year ago

Organic Mechanisms

How To Create And Use A Pharmacophore In MOE | MOE Tutorial

Molecular Operating Environment (MOE) tutorial covering how to create and use a pharmacophore in MOE. When docking ...

4:12

How To Create And Use A Pharmacophore In MOE | MOE Tutorial

7,290 views

2 years ago

Cerebras

Mixture of Experts Explained: How to Build, Train & Debug MoE Models in 2025

Mixture-of-Experts (MoE) models now power leading AI systems like GPT-4, Qwen3, DeepSeek-v3, and Gemini 1.5. But behind ...

4:32

Mixture of Experts Explained: How to Build, Train & Debug MoE Models in 2025

1,568 views

6 months ago

MoeIsBetter

2:38

MoeIsBetter - Outta There (Audio)

547,216 views

5 years ago

SaM Solutions

Mixture-of-Experts (MoE) LLMs: The Future of Efficient AI Models

Imagine having a whole team of specialists at your disposal, each an expert in a different field, and a smart coordinator who ...

6:01

Mixture-of-Experts (MoE) LLMs: The Future of Efficient AI Models

298 views

5 months ago

MoeIsBetter

Outta There (Prod By Ayo N Keyz) "Rich Dreamin" Available Now on ALL streaming platforms LINK BELOW ...

2:32

Moe - Outta There (Official Video)

4,601,660 views

6 years ago

AI Research Roundup

Why Orthogonal Weights Fail in MoE Models

In this AI Research Roundup episode, Alex discusses the paper: 'Geometric Regularization in Mixture-of-Experts: The Disconnect ...

3:35

Why Orthogonal Weights Fail in MoE Models

11 views

2 weeks ago

No Hype AI

How Did They Do It? DeepSeek V3 and R1 Explained

DeepSeek: The First Open-Weight Reasoning Model! In this video, I'll break down DeepSeek's two flagship models— V3 and R1 ...

11:15

How Did They Do It? DeepSeek V3 and R1 Explained

47,195 views

11 months ago

Paper With Video

[2024 Best AI Paper] MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

This video was created using https://paperspeech.com. If you'd like to create explainer videos for your own papers, please visit the ...

11:36

[2024 Best AI Paper] MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

71 views

1 year ago

Stanford Online

Stanford CS25: V1 I Mixture of Experts (MoE) paradigm and the Switch Transformer

In deep learning, models typically reuse the same parameters for all inputs. Mixture of Experts (MoE) defies this and instead ...

1:05:44

Stanford CS25: V1 I Mixture of Experts (MoE) paradigm and the Switch Transformer

40,376 views

3 years ago

Marktechpost AI

NVIDIA Releases Nemotron 3: Hybrid Mamba Transformer Models With Latent MoE .....

NVIDIA Nemotron 3 is an open family of hybrid Mamba Transformer MoE models, designed for agentic AI with long context and ...

5:30

NVIDIA Releases Nemotron 3: Hybrid Mamba Transformer Models With Latent MoE .....

585 views

1 month ago

AI with Lena Hall

Transformers vs MoE vs RNN vs Hybrid: Intuitive LLM Architecture Guide

Most developers default to transformers without understanding the alternatives. This video breaks down the intuition behind four ...

16:56

Transformers vs MoE vs RNN vs Hybrid: Intuitive LLM Architecture Guide

19,204 views

3 months ago

ViewTube

People also watched