Extremely Efficient, Collaborative and Private LLM Fine-Tuning – A subset of efforts from Praneeth vepakomma's Lab

Extremely Efficient, Collaborative and Private LLM Fine-Tuning

A subset of efforts on LLM Fine-Tuning from Dr. Praneeth vepakomma‘s CERT Lab on Collaboration, Efficiency, Responsibility and Trust.

Team: Our amazing team of students includes Kaustubh Ponkshe, Raghav Singhal, Yulian Wu, Rohit Vartak, Keira Mooney, Omar Alshamsi, Josue Castillo, Shaan Shah, John Gupta-She, Rushil Thareja, and more.

Alumni: PI’s previous students have been placed in top programs at CMU, Oxford, MIT, EPFL, Georgia Tech, UMich and Duke!

The lab is directed by Praneeth Vepakomma.

Assistant Professor, Mohamed bin Zayed University of Artificial Intelligence
Visiting Assistant Professor, Institute for Data, Systems, and Society (IDSS), Massachusetts Institute of Technology (MIT)
Research Page: https://sites.mit.edu/praneeth/

Prof. Vepakomma leads research initiatives with a major focus on collaborative ML and trustworthy/responsible AI. The ultimate goal of his work is to harness collaborative and trustworthy intelligence from networks of organizations and people in data-driven economies while achieving scale and maintaining ethics.

Focus Areas: Collaborative Learning, Federated Learning and it’s variants, Extremely-Efficient LLM Fine Tuning, Private Computation, Responsible/Trustworthy AI and Data Markets.

LoRA-SilverBullet: Initialization using Update Approximation is a Silver Bullet for Extremely Efficient Low-Rank Fine-Tuning

LoRA Silver Bullet (LoRA-SB) fixes the optimality gap in LoRA methods for efficient fine-tuning of large language models. The method approximates full fine-tuning within low-rank subspaces using an innovative initialization strategy.

Links to Paper, Code.

Key contributions and benefits include:

27-90x parameter reduction compared to standard approaches while maintaining an improved performance.
Theoretical demonstration of optimal conditions using LoRA-XS architecture.
Optimal scaling for high-rank gradient updates without hyperparameter tuning.
Comprehensive outperformance of existing LoRA-XS methods.

ABBA: Highly Expressive Hadamard Product Adaptation for Large Language Models

The prevailing approach of LoRA achieves considerable efficiency in LLM fine-tuning but its expressivity is inherently constrained by low-rank representations. We introduce ABBA, a new PEFT method that re-parameterizes the update as a Hadamard product of two independently learnable low-rank matrices. This leads to significantly higher expressivity under the same parameter efficiency budget. Empirically, ABBA achieves state-of-the-art results on arithmetic and commonsense reasoning benchmarks, consistently outperforming existing PEFT methods by a significant margin across multiple models.

Links to Paper, Code.

Fed-SB: A Silver Bullet for Extreme Communication Efficiency and Performance in (Private) Federated LoRA Fine-Tuning

We introduce Federated Silver Bullet (Fed-SB), a novel approach for federated fine-tuning of LLMs across multiple clients using LoRA-SB; our recently proposed low-rank adaptation method. Fed-SB achieves state-of-the-art performance across commonsense reasoning, arithmetic reasoning, and language inference tasks while reducing communication costs by up to 230x. In private settings, Fed-SB further improves performance by (1) reducing trainable parameters, thereby lowering the noise required for differential privacy, and (2) avoiding noise amplification introduced by other methods. Overall, Fed-SB establishes a new Pareto frontier in performance vs. communication cost, offering an efficient and scalable solution for both private and non-private federated fine-tuning.

Links to Paper, Code .

Exact Aggregation for Federated and Efficient Fine-Tuning of Foundation Models
ACL Main Conference (Oral)-2025

Applying LoRA in federated learning environments, where data is distributed across multiple clients, presents unique challenges. Existing methods rely on traditional federated averaging of LoRA adapters, resulting in inexact updates. To address this, we propose Federated Exact LoRA, or FedEx-LoRA, which adds a residual error term to the pretrained frozen weight matrix. Our approach achieves exact updates with minimal computational and communication overhead, preserving LoRA’s efficiency. We evaluate the method on various models across arithmetic reasoning, commonsense reasoning, natural language understanding and natural language generation tasks, showing consistent performance gains over state-of-the-art methods across multiple settings.

Links to Paper, Code.

DP-FUSION: Token-Level Differentially Private Inference for Large Language Models

Large language models (LLMs) can leak sensitive information from their context through generated outputs either by mistake or when prompted adversarially. Existing defenses that aim to preserve the context’s privacy during inference (i) lack formal guarantees or (ii) have a poor utility/privacy trade-off. We propose DP-FUSION, a token-level Differentially Private Inference (DPI) mechanism that provably bounds the leakage an LLM’s outputs reveals about sensitive tokens in its context. We demonstrate DPI through the task of document privatization, where the objective is to paraphrase documents such that sensitive content, e.g., Personally Identifiable Information (PII), cannot be reliably inferred, while the overall utility of the text is preserved. This is controlled by a parameter ϵ where ϵ close to 0 hides PII entirely, while higher values trade off privacy for improved paraphrase quality. DP-FUSION works as follows: (i) partition sensitive tokens into disjoint privacy groups, (ii) run the LLM once per group, and (iii) blend distributions so that the final output remains within a fixed statistical distance of the baseline distribution produced when no privacy group is revealed. This allows controlling the privacy/utility trade-off but requires multiple LLM forward passes.

Links to Paper, Code.

Offline and Online KL-Regularized RLHF under Differential Privacy, (PDF) Yulian Wu, Rushil Thareja, Praneeth Vepakomma, Francesco Orabona

Safety Subspaces are Not Linearly Distinct: A Fine-Tuning Case Study (Upcoming)