Mixture of Experts

3 videos across 2 channels

Mixture of Experts is a scalable approach that routes different parts of a model’s work to specialized sub-networks, letting most computations run on efficient cores while a few experts handle heavy tasks. This technique can dramatically speed up local and offline AI by leveraging GPU offload, CPU-based experts, and smart routing, all while mitigating memory and context-window constraints. For builders aiming at private, low-latency AI deployments, MOE unlocks larger capabilities without relying on cloud APIs, as highlighted by recent demonstrations and open-weight models.

Local AI Master Class - Setup, Software, Agentic, Autocomplete, Chat thumbnail

Local AI Master Class - Setup, Software, Agentic, Autocomplete, Chat

The video walks through building and running a local, private AI system to replace costly cloud API usage, emphasizing h

00:44:57
GLM-5 is unbelievable (Opus for 20% the cost??) thumbnail

GLM-5 is unbelievable (Opus for 20% the cost??)

The video reviews GLM5, an openweight model that competes with Opus and Codeex, highlighting its strong performance, rea

00:26:23