Extremely Efficient, Collaborative and Private LLM Fine-Tuning

A subset of efforts on LLM Fine-Tuning from Dr. Praneeth vepakomma‘s CERT Lab on Collaboration, Efficiency, Responsibility and Trust.

Team: Our amazing team of students includes Kaustubh Ponkshe, Raghav Singhal, Rohit Vartak, Omar Alshamsi, Shaan Shah, John Gupta-She, Keira Mooney, Josue Castillo and more.

Alumni: PI’s previous students have been placed in top programs at CMU, Oxford, MIT, EPFL, Georgia Tech, UMich and Duke!

The lab is directed by Praneeth Vepakomma.

  • Assistant Professor, Mohamed bin Zayed University of Artificial Intelligence
  • Visiting Assistant Professor, Institute for Data, Systems, and Society (IDSS), Massachusetts Institute of Technology (MIT)
  • Research Page: https://sites.mit.edu/praneeth/

Prof. Vepakomma leads research initiatives with a major focus on collaborative ML and trustworthy/responsible AI. The ultimate goal of his work is to harness collaborative and trustworthy intelligence from networks of organizations and people in data-driven economies while achieving scale and maintaining ethics.

Focus Areas: Collaborative Learning, Federated Learning and it’s variants, Extremely-Efficient LLM Fine Tuning, Private Computation, Responsible/Trustworthy AI and Data Markets.

LoRA-SilverBullet: Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning

LoRA Silver Bullet (LoRA-SB) fixes the optimality gap in LoRA methods for efficient fine-tuning of large language models. The method approximates full fine-tuning within low-rank subspaces using an innovative initialization strategy.

Links to Paper, Code.

Key contributions and benefits include:

  • 27-90x parameter reduction compared to standard approaches while maintaining an improved performance.
  • Theoretical demonstration of optimal conditions using LoRA-XS architecture.
  • Optimal scaling for high-rank gradient updates without hyperparameter tuning.
  • Comprehensive outperformance of existing LoRA-XS methods.

ABBA: Highly Expressive Hadamard Product Adaptation for Large Language Models

The prevailing approach of LoRA achieves considerable efficiency in LLM fine-tuning but its expressivity is inherently constrained by low-rank representations. We introduce ABBA, a new PEFT method that re-parameterizes the update as a Hadamard product of two independently learnable low-rank matrices. This leads to significantly higher expressivity under the same parameter efficiency budget. Empirically, ABBA achieves state-of-the-art results on arithmetic and commonsense reasoning benchmarks, consistently outperforming existing PEFT methods by a significant margin across multiple models.

Links to Paper, Code.

Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning

We introduce Federated Silver Bullet (Fed-SB), a novel approach for federated fine-tuning of LLMs across multiple clients using LoRA-SB; our recently proposed low-rank adaptation method. Fed-SB achieves state-of-the-art performance across commonsense reasoning, arithmetic reasoning, and language inference tasks while reducing communication costs by up to 230x. In private settings, Fed-SB further improves performance by (1) reducing trainable parameters, thereby lowering the noise required for differential privacy, and (2) avoiding noise amplification introduced by other methods. Overall, Fed-SB establishes a new Pareto frontier in performance vs. communication cost, offering an efficient and scalable solution for both private and non-private federated fine-tuning.

Links to Paper, Code.

Exact Aggregation for Federated and Efficient Fine-Tuning of Foundation Models
ACL Main Conference (Oral)-2025

Applying LoRA in federated learning environments, where data is distributed across multiple clients, presents unique challenges. Existing methods rely on traditional federated averaging of LoRA adapters, resulting in inexact updates. To address this, we propose Federated Exact LoRA, or FedEx-LoRA, which adds a residual error term to the pretrained frozen weight matrix. Our approach achieves exact updates with minimal computational and communication overhead, preserving LoRA’s efficiency. We evaluate the method on various models across arithmetic reasoning, commonsense reasoning, natural language understanding and natural language generation tasks, showing consistent performance gains over state-of-the-art methods across multiple settings.

Links to Paper, Code.