← Research

LoRAMBo: Fighting LoRA Memory Bottlenecks with Optimized Rank Selection

ICLR Workshop 2025Liam CawleyDecember 2024

LoRAMBo: Fighting LoRA Memory Bottlenecks Through Optimized Rank Selection

Problem

LoRA fine-tunes large language models by injecting trainable low-rank updates ΔW=AB\Delta W = AB^\top at each layer, dramatically reducing the number of trainable parameters. The standard practice assigns a uniform rank rr to every layer. This is wasteful: layers vary substantially in how much they benefit from additional rank, and the parameter budget iri(di+ki)\sum_i r_i(d_i + k_i) is finite.

The question is: given a total parameter budget, how should rank be distributed across layers to minimize task loss?

Approach

We develop three complementary views of this allocation problem, then unify them.

Approximation-theoretic. The Eckart--Young--Mirsky theorem gives the optimal rank-rr approximation to a matrix under Frobenius or operator norm. Treating each layer's ideal weight update ΔWi\Delta W_i^* as known, the per-layer approximation error decomposes as a sum of discarded singular values. The optimal allocation minimizes total error subject to a budget constraint --- a variant of the classical water-filling problem.

Curvature-aware. Not all approximation errors are equal: a rank deficit in a high-curvature layer incurs more loss than the same deficit in a flat region. We replace the Frobenius norm with a data-dependent norm weighted by the layer-wise Hessian, ΔWH2=tr(ΔWHiΔW)\lVert \Delta W \rVert_H^2 = \mathrm{tr}(\Delta W^\top H_i \Delta W). The resulting allocation sends more rank to layers where the loss landscape is sharply curved.

Online adaptive. In practice neither the optimal updates ΔWi\Delta W_i^* nor the Hessians are known in advance. We formulate a greedy online algorithm that, after each training step, estimates the marginal benefit of incrementing each layer's rank by one and allocates accordingly. We show this greedy procedure achieves a (11/e)(1 - 1/e)-approximate solution to the submodular optimization problem.

Results

The combined framework (offline curvature-aware initialization + online adaptation) consistently outperforms uniform-rank LoRA under matched parameter budgets.

SettingUniform LoRALoRAMBo
RoBERTa-base / QNLI89.3289.45
T5-base / SST-290.8292.58
WikiText-103 PPL (12M params)26.424.9
WikiText-103 PPL (24M params)25.123.8

The gains are largest when the budget is tight relative to the model size --- precisely the regime where LoRA is most useful. The curvature-aware term matters most in the first few hundred steps; thereafter the online algorithm dominates.

Takeaway

Uniform rank allocation is a surprisingly strong baseline, but it leaves performance on the table. The key insight is that rank allocation is a resource-allocation problem with diminishing returns, and the right abstraction is submodular optimization over a matroid constraint. The practical algorithm is simple: maintain running estimates of per-layer sensitivity and reallocate periodically.