MoE Models

Want to play with the technology yourself? Explore our interactive demo → https://ibm.biz/BdK8fn Learn more about the ...

7:58

What is Mixture of Experts?

48,801 views

1 year ago

Maarten Grootendorst

A Visual Guide to Mixture of Experts (MoE) in LLMs

In this highly visual guide, we explore the architecture of a Mixture of Experts in Large Language Models (LLM) and Vision ...

19:44

A Visual Guide to Mixture of Experts (MoE) in LLMs

48,708 views

1 year ago

AI Papers Academy

Introduction to Mixture-of-Experts | Original MoE Paper Explained

In this video we go back to the extremely important Google paper which introduced the Mixture-of-Experts (MoE) layer with ...

4:41

Introduction to Mixture-of-Experts | Original MoE Paper Explained

11,540 views

1 year ago

Julia Turc

Mixture of Experts: How LLMs get bigger without getting slower

Mixture of Experts (MoE) is everywhere: Meta / Llama 4, DeepSeek, Mistral. But how does it actually work? Do experts specialize?

26:42

Mixture of Experts: How LLMs get bigger without getting slower

26,787 views

9 months ago

Stanford Online

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 4: Mixture of experts

For more information about Stanford's online Artificial Intelligence programs visit: https://stanford.io/ai To learn more about ...

1:22:04

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 4: Mixture of experts

62,064 views

9 months ago

Chris Hay

MoE Models Don't Work Like You Think - Inside GPT-OSS

Many people think that mixture of expert models have domain experts, i.e. math experts, code experts, language experts.

18:28

MoE Models Don't Work Like You Think - Inside GPT-OSS

3,775 views

2 weeks ago

bycloud

1 Million Tiny Experts in an AI? Fine-Grained MoE Explained

To try everything Brilliant has to offer—free—for a full 30 days, visit https://brilliant.org/bycloud/ . You'll also get 20% off an annual ...

12:29

1 Million Tiny Experts in an AI? Fine-Grained MoE Explained

55,231 views

1 year ago

Organic Mechanisms

How To Create And Use A Pharmacophore In MOE | MOE Tutorial

Molecular Operating Environment (MOE) tutorial covering how to create and use a pharmacophore in MOE. When docking ...

4:12

How To Create And Use A Pharmacophore In MOE | MOE Tutorial

7,292 views

3 years ago

Cerebras

Mixture of Experts Explained: How to Build, Train & Debug MoE Models in 2025

Mixture-of-Experts (MoE) models now power leading AI systems like GPT-4, Qwen3, DeepSeek-v3, and Gemini 1.5. But behind ...

4:32

Mixture of Experts Explained: How to Build, Train & Debug MoE Models in 2025

1,569 views

6 months ago

SaM Solutions

Mixture-of-Experts (MoE) LLMs: The Future of Efficient AI Models

Imagine having a whole team of specialists at your disposal, each an expert in a different field, and a smart coordinator who ...

6:01

Mixture-of-Experts (MoE) LLMs: The Future of Efficient AI Models

299 views

5 months ago

MoeIsBetter

Outta There (Prod By Ayo N Keyz) "Rich Dreamin" Available Now on ALL streaming platforms LINK BELOW ...

2:32

Moe - Outta There (Official Video)

4,601,857 views

6 years ago

MoeIsBetter

2:38

MoeIsBetter - Outta There (Audio)

547,228 views

5 years ago

AI Research Roundup

Why Orthogonal Weights Fail in MoE Models

In this AI Research Roundup episode, Alex discusses the paper: 'Geometric Regularization in Mixture-of-Experts: The Disconnect ...

3:35

Why Orthogonal Weights Fail in MoE Models

11 views

2 weeks ago

bycloud

Mamba Might Just Make LLMs 1000x Cheaper...

Check out HubSpot's ChatGPT at work bundle! https://clickhubspot.com/twc Would mamba bring a revolution to LLMs and ...

14:06

Mamba Might Just Make LLMs 1000x Cheaper...

141,237 views

1 year ago

No Hype AI

How Did They Do It? DeepSeek V3 and R1 Explained

DeepSeek: The First Open-Weight Reasoning Model! In this video, I'll break down DeepSeek's two flagship models— V3 and R1 ...

11:15

How Did They Do It? DeepSeek V3 and R1 Explained

47,211 views

11 months ago

Stanford Online

Stanford CS25: V1 I Mixture of Experts (MoE) paradigm and the Switch Transformer

In deep learning, models typically reuse the same parameters for all inputs. Mixture of Experts (MoE) defies this and instead ...

1:05:44

Stanford CS25: V1 I Mixture of Experts (MoE) paradigm and the Switch Transformer

40,382 views

3 years ago

Paper With Video

[2024 Best AI Paper] MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

This video was created using https://paperspeech.com. If you'd like to create explainer videos for your own papers, please visit the ...

11:36

[2024 Best AI Paper] MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts

71 views

1 year ago

AI with Lena Hall

Transformers vs MoE vs RNN vs Hybrid: Intuitive LLM Architecture Guide

Most developers default to transformers without understanding the alternatives. This video breaks down the intuition behind four ...

16:56

Transformers vs MoE vs RNN vs Hybrid: Intuitive LLM Architecture Guide

19,204 views

3 months ago

Marktechpost AI

NVIDIA Releases Nemotron 3: Hybrid Mamba Transformer Models With Latent MoE .....

NVIDIA Nemotron 3 is an open family of hybrid Mamba Transformer MoE models, designed for agentic AI with long context and ...

5:30

NVIDIA Releases Nemotron 3: Hybrid Mamba Transformer Models With Latent MoE .....

585 views

1 month ago

AILinkDeepTech

Mixture of Experts (MoE) Coding | MoE Code Implementation | Mixture of Experts Model MoE Code: ...

7:04

Mixture of Experts (MoE) Coding | MoE Code Implementation | Mixture of Experts Model

729 views

11 months ago

ViewTube

People also watched