MoE Models

HuggingFace

MoE Token Routing Explained: How Mixture of Experts Works (with Code)

This video dives deep into Token Routing, the core algorithm of Mixture of Experts (MoE) models. Slides: ...

34:15

MoE Token Routing Explained: How Mixture of Experts Works (with Code)

3,596 views

6 days ago

Durga Software Solutions

Sparse Models vs Dense Models Smarter, Faster AI

sparse models vs dense models dense vs sparse neural networks mixture of experts explained MoE models efficient AI models ...

5:17

Sparse Models vs Dense Models Smarter, Faster AI

0 views

3 days ago

AI Revolution

DeepSeek Leaks MODEL1: New Flagship AI Shocks The Industry

At the same time, Zhipu AI officially released GLM-4.7-Flash, a long-context MoE model built for real coding and reasoning that ...

15:40

DeepSeek Leaks MODEL1: New Flagship AI Shocks The Industry

37,124 views

6 days ago

BioinformPaper

Conditional Memory for Sparse Transformers via Ngram

In this video, we dive into Engram, a new architecture that adds "conditional memory" to AI models. Instead of forcing models to ...

10:28

Conditional Memory for Sparse Transformers via Ngram

109 views

6 days ago

Interconnects AI

"All in on building open models in the U.S."

Arcee AI is a the startup I've found to be taking the most real approach to monetizing their open models. With a bunch of ...

1:12:16

"All in on building open models in the U.S."

514 views

21 hours ago

AI Research Roundup

LongCat: New 560B MoE LLM for Agentic Reasoning

In this AI Research Roundup episode, Alex discusses the paper: 'LongCat-Flash-Thinking-2601 Technical Report' ...

4:26

LongCat: New 560B MoE LLM for Agentic Reasoning

24 views

2 days ago

Donato Capitella

Kimi-K2(1T)/GLM 4.7(355B) on a 4-Node Strix Halo Cluster - 512GB of Unified Memory

In this video, I demonstrate running large-scale Mixture-of-Experts (MoE) models on a 4-node cluster of AMD Strix Halo systems.

9:36

Kimi-K2(1T)/GLM 4.7(355B) on a 4-Node Strix Halo Cluster - 512GB of Unified Memory

7,778 views

5 days ago

AI Podcast Series. Byte Goose AI.

[DeepSeek ENGRAM] Scaling Large Language Models. Making LLM Models Smarter and Powerful: ENGRAM.

We've been scaling Large Language Models by adding more 'experts' through Mixture-of-Experts (MoE). We've focused on ...

15:36

[DeepSeek ENGRAM] Scaling Large Language Models. Making LLM Models Smarter and Powerful: ENGRAM.

126 views

3 days ago

Tech x Fibo

DeepSeek is a powerful open-source AI model series from China that is currently challenging the global dominance of ...

6:33

DeepSeek V3 : Efficient Power || TechXFibo

11 views

6 days ago

Prime Explained

Every Types of AI Models Explained Clearly

Every AI Model Type Explained in 9 Minutes LLM, VLM, SLM, MoE, RAG—these AI acronyms are everywhere. In this video, I ...

8:21

Every Types of AI Models Explained Clearly

228 views

3 days ago

Future Frontiers AI

DeepSeek V4 Architecture, Local LLMs, and the Future of Emotional AI 🧠

2026 is already shaping up to be a massive year for artificial intelligence. From major architectural leaks at DeepSeek to Google's ...

1:35

DeepSeek V4 Architecture, Local LLMs, and the Future of Emotional AI 🧠

21 views

4 days ago

Arcee AI

Code Reviews at Scale: Trinity Large Preview + Cline + OpenRouter

... Preview is a 400B-parameter sparse MoE model with just 13B active parameters at inference, designed for complex reasoning, ...

1:31

Code Reviews at Scale: Trinity Large Preview + Cline + OpenRouter

0 views

2 hours ago

EZ.Encoder Academy

DeepSeek Engram 论文串讲：从 Memory Network 到 N-gram Embedding

这期视频带大家完整串讲DeepSeek Engram 这篇论文，以及它背后的两条技术支线：Memory Network 和N-gram。理解了这些背景 ...

1:02:24

DeepSeek Engram 论文串讲：从 Memory Network 到 N-gram Embedding

3,105 views

6 days ago

Future Frontiers AI

2026 is already shaping up to be a massive year for artificial intelligence. From major architectural leaks at DeepSeek to Google's ...

9:33

DeepSeek V4 Architecture, Local LLMs, and the Future of Emotional AI 🧠

0 views

4 days ago

Ai Verdict

DeepSeek’s "Model 1" Leaked Code, Brutal Coding AI, & Z.AI Flash (The Verdict)

Is DeepSeek preparing to reset the industry again? Today on AI Verdict, we analyze the digital paper trail left on GitHub ...

8:18

DeepSeek’s "Model 1" Leaked Code, Brutal Coding AI, & Z.AI Flash (The Verdict)

19 views

6 days ago

Nexalith AI

Better Than GPT-4? DeepSeek V4 "Model1" Exposed! 🚀

In this episode of AI Revolution, we dive into the massive potential leak of DeepSeek's next flagship model. Developers have ...

2:35

Better Than GPT-4? DeepSeek V4 "Model1" Exposed! 🚀

22 views

6 days ago

Technology Now

DeepSeek V4 Changes Everything: 1M Token Memory Is Here

... open source AI model, open source LLM, large language model, AI memory breakthrough, Engram memory, MoE model, ...

9:18

DeepSeek V4 Changes Everything: 1M Token Memory Is Here

692 views

5 days ago

Moe Lueker

How to Make Money Online Creating AI Ad Creatives (Artlist AI Toolkit Tutorial)

Try Artlist AI Toolkit (Special Creator Access): https://bit.ly/FreeArtlist Get FREE Access to My AI Marketing Genius Custom GPT ...

14:56

How to Make Money Online Creating AI Ad Creatives (Artlist AI Toolkit Tutorial)

432 views

5 days ago

nullmicgo

January 21, 2026 | 10 Big AI News: Chip Geopolitics, AWS Blackwell GPUs, OpenAI Age-Gating & More!

... Title: Zhipu AI Releases GLM-4.7-Flash: A 30B-A3B MoE Model for Efficient Local Coding and Agents Source: MarkTechPost AI ...

24:59

January 21, 2026 | 10 Big AI News: Chip Geopolitics, AWS Blackwell GPUs, OpenAI Age-Gating & More!

11 views

7 days ago

DevOps in Action

GLM-4.7 Flash: How to use GLM 4.7 Flash for free - The New King of Local AI Coding? (30B MoE) 🚀

Zhipu AI just dropped GLM-4.7-Flash, and it's shaking up the open-source AI world. In this video, we break down why this 30B ...

10:24

GLM-4.7 Flash: How to use GLM 4.7 Flash for free - The New King of Local AI Coding? (30B MoE) 🚀

200 views

4 days ago

ViewTube