MoE Models

Want to play with the technology yourself? Explore our interactive demo → https://ibm.biz/BdK8fn Learn more about the ...

7:58

What is Mixture of Experts?

48,777 views

1 year ago

Maarten Grootendorst

A Visual Guide to Mixture of Experts (MoE) in LLMs

In this highly visual guide, we explore the architecture of a Mixture of Experts in Large Language Models (LLM) and Vision ...

19:44

A Visual Guide to Mixture of Experts (MoE) in LLMs

48,688 views

1 year ago

AI Papers Academy

Introduction to Mixture-of-Experts | Original MoE Paper Explained

In this video we go back to the extremely important Google paper which introduced the Mixture-of-Experts (MoE) layer with ...

4:41

Introduction to Mixture-of-Experts | Original MoE Paper Explained

11,537 views

1 year ago

Julia Turc

Mixture of Experts: How LLMs get bigger without getting slower

Mixture of Experts (MoE) is everywhere: Meta / Llama 4, DeepSeek, Mistral. But how does it actually work? Do experts specialize?

26:42

Mixture of Experts: How LLMs get bigger without getting slower

26,773 views

9 months ago

Chris Hay

MoE Models Don't Work Like You Think - Inside GPT-OSS

Many people think that mixture of expert models have domain experts, i.e. math experts, code experts, language experts.

18:28

MoE Models Don't Work Like You Think - Inside GPT-OSS

3,773 views

2 weeks ago

Cerebras

Mixture of Experts Explained: How to Build, Train & Debug MoE Models in 2025

Mixture-of-Experts (MoE) models now power leading AI systems like GPT-4, Qwen3, DeepSeek-v3, and Gemini 1.5. But behind ...

4:32

Mixture of Experts Explained: How to Build, Train & Debug MoE Models in 2025

1,569 views

6 months ago

bycloud

1 Million Tiny Experts in an AI? Fine-Grained MoE Explained

To try everything Brilliant has to offer—free—for a full 30 days, visit https://brilliant.org/bycloud/ . You'll also get 20% off an annual ...

12:29

1 Million Tiny Experts in an AI? Fine-Grained MoE Explained

55,229 views

1 year ago

bycloud

Mamba Might Just Make LLMs 1000x Cheaper...

Check out HubSpot's ChatGPT at work bundle! https://clickhubspot.com/twc Would mamba bring a revolution to LLMs and ...

14:06

Mamba Might Just Make LLMs 1000x Cheaper...

141,233 views

1 year ago

MoeIsBetter

Outta There (Prod By Ayo N Keyz) "Rich Dreamin" Available Now on ALL streaming platforms LINK BELOW ...

2:32

Moe - Outta There (Official Video)

4,601,709 views

6 years ago

Stanford Online

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 4: Mixture of experts

For more information about Stanford's online Artificial Intelligence programs visit: https://stanford.io/ai To learn more about ...

1:22:04

Stanford CS336 Language Modeling from Scratch | Spring 2025 | Lecture 4: Mixture of experts

61,986 views

9 months ago

Stanford Online

Stanford CS25: V1 I Mixture of Experts (MoE) paradigm and the Switch Transformer

In deep learning, models typically reuse the same parameters for all inputs. Mixture of Experts (MoE) defies this and instead ...

1:05:44

Stanford CS25: V1 I Mixture of Experts (MoE) paradigm and the Switch Transformer

40,379 views

3 years ago

Cerebras

Daria Soboleva Training and Serving MoE Models Efficiently

... models efficiently before I start a quick intro about myself um I am researching LMS at Cerebras one of my recent gigs is MOE ...

9:18

Daria Soboleva Training and Serving MoE Models Efficiently

206 views

1 month ago

Soumyajit Das

In this quick 150-second deep dive, we explore the architecture behind some of the world's most powerful AI models: Mixture of ...

2:32

MOE Explained in 150 seconds

20 views

3 weeks ago

SaM Solutions

Mixture-of-Experts (MoE) LLMs: The Future of Efficient AI Models

Imagine having a whole team of specialists at your disposal, each an expert in a different field, and a smart coordinator who ...

6:01

Mixture-of-Experts (MoE) LLMs: The Future of Efficient AI Models

298 views

5 months ago

LLM Implementation

How 120B+ Parameter Models Run on One GPU (The MoE Secret)

How is it possible for a 120 billion parameter AI model to run on a single consumer GPU? This isn't magic—it's the result of ...

6:47

How 120B+ Parameter Models Run on One GPU (The MoE Secret)

1,579 views

5 months ago

MoeIsBetter

2:38

MoeIsBetter - Outta There (Audio)

547,227 views

5 years ago

bycloud

The REAL AI Architecture That Unifies Vision & Language

Get started now with open source & privacy focused password manager by Proton! https://proton.me/pass/bycloudai In this video, ...

10:13

The REAL AI Architecture That Unifies Vision & Language

44,626 views

7 months ago

Vizuara

In this lecture, we start looking at the second major component of the DeepSeek architecture after MLA: that is Mixture of Experts ...

29:59

Mixture of Experts (MoE) Introduction

5,353 views

8 months ago

Trend Guards

Why MoE Models Are Taking Over AI (Deep Dive)

The Mixture-of-Experts (MoE) architecture is transforming the entire AI industry — powering breakthrough models like ...

7:10

Why MoE Models Are Taking Over AI (Deep Dive)

163 views

1 month ago

BrainOmega

Hands-on 2: Mixture of Experts (MoE) from Scratch

Support BrainOmega ☕ Buy Me a Coffee: https://buymeacoffee.com/brainomega Stripe: ...

10:00

Hands-on 2: Mixture of Experts (MoE) from Scratch

6,487 views

6 months ago

ViewTube

People also watched