MCG — Modular Compute Governor
MCG is an AI training optimization tool developed by Fidelity Horizon that identifies structurally redundant layers in neural networks during training and enables their physical removal, producing smaller standard models with zero quality loss. Like engine tuning for AI models — every architecture reveals different savings.
Shut off the cylinders you don't need
Like a modern engine that deactivates cylinders in city driving, MCG discovers which "cylinders" in a neural network can be shut off, and which ones must stay active. Every model has a different structure, so every model reveals different savings. Unlike compression, nothing is damaged: the remaining parts keep their full original precision.
Most approaches to efficient AI start from a finished model and try to make it smaller, pruning weights, reducing precision, or distilling into a smaller model. MCG takes a fundamentally different approach: it identifies redundancy during training itself.
Learns what matters
During training, MCG automatically discovers which parts of the network are critical and which are structurally redundant. Each architecture reveals a different optimal structure, there are no universal shortcuts.
Verified layer removal
Parts identified as low-contribution can be physically removed from the original model at inference time, no retraining required. The result is a standard model with fewer layers.
Architecture-agnostic
Works on CNNs (ResNet, WideResNet), Vision Transformers (ViT), and LLMs (TinyLlama, Qwen, Llama). Verified across multiple architectures and scales from 11M to 72B parameters.
Additive, stacks with existing tools
MCG has been verified alongside quantization (4-bit) and LoRA. Layer removal compounds with your existing optimization stack.
Tested across scales
During a short analysis phase, the model reveals which layers are critical and which are redundant. All results verified and reproducible.
Compute reduction during governed training
Verified on models up to 14B parameters. FLOP savings and quality measured during the governance phase vs. the unmodified dense baseline.
| Architecture | Parameters | FLOP reduction | Quality vs. baseline |
|---|---|---|---|
| ResNet-18 | 11M | 55% | Preserved |
| WideResNet-28-10 | 36M | 47% | Preserved |
| ViT-B/16 | 86M | 78% | Preserved |
| TinyLlama | 1.1B | 48% | Improved |
| Qwen-3B | 3B | 51% | Preserved |
| Qwen-7B | 7B | 48% | Improved |
| Llama-3-8B | 8B | 40% | Preserved |
| Qwen-14B | 14B | 35% | Improved |
72B: four layers removed, zero quality loss
On a 72-billion parameter model (80 layers), MCG identified four layers that can be removed together, with no quality degradation. MMLU score actually improved by 0.1 percentage points. A second independent run confirmed the result: different layers identified, same outcome.
Verified on Qwen-72B-Instruct. Two independent seeds. Layer removal applied to the original, unmodified model, no retraining required.
MCG vs. existing approaches
MCG is not pruning. It does not remove weights or reduce precision. It governs compute allocation, a fundamentally different approach that preserves quality where other methods fail.
Pruning and compression
Existing methods remove or simplify weights after training. At 30%+ reduction, generation-based benchmarks collapse. The model loses coherent multi-step ability. Quality always degrades.
MCG governance
MCG identifies structural redundancy during training. The original model is then physically reduced based on what MCG discovered. All weights in the remaining layers stay intact. A different paradigm, not a better pruning method.
MCG — Common questions
What is MCG and how does it differ from model pruning?
MCG (Modular Compute Governor) is a training-time optimization tool that identifies structurally redundant neural network layers and enables their physical removal. Unlike pruning, which removes individual weights and degrades quality at high compression rates, MCG removes entire layers while keeping all remaining weights at full precision.
How much compute does MCG save?
FLOP reductions range from 35% to 78% depending on architecture, with quality preserved or improved in all verified cases. On a 72B parameter model, four layers were removed with zero quality loss.