Mar 23, 2025 ParetoQ: Scaling Laws in Extremely Low-bit LLM Quantization Mar 16, 2025 Minions: Cost-efficient Collaboration Between On-device and Cloud Language Models Feb 23, 2025 Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization Feb 16, 2025 HALO: Hadamard-Assisted Lower-Precision Optimization for LLMs