AI Model Compression
Model compression reduces size and compute via pruning, quantization, distillation, and low-rank factorization. Typical workflow profiles bottlenecks, applies structured compression for hardware speedups, and fine-tunes to recover accuracy.