Fine-tuning large language models (LLMs) remains a computational bottleneck due to their scale and memory demands. This paper presents a comprehensive evaluation of parameter-efficient fine-tuning (PEFT) techniques, including LoRA, BOFT, LoRA-GA, and uRNN, and introduces a novel hybrid strategy that dynamically integrates BOFT’s orthogonal stability with LoRA-GA’s gradient-aligned rapid convergence. By computing per-layer adaptive updates guided by gradient norms, the hybrid method achieves superior convergence efficiency and generalization across diverse tasks. We also explore, for the first time, the adaptation of unitary RNN (uRNN) principles to Transformer-based LLMs, enhancing gradient stability through structured unitary constraints. Across GLUE, GSM8K, MT-Bench, and HumanEval with models from 7B to 405B, the hybrid approach yields consistent gains across three independent runs per task and model, approaching the quality of full fine-tuning while reducing training time by about 2.1 × and peak memory by nearly 50%, indicating practical significance under resource constraints. A compact multilingual and low-resource study on XNLI and FLORES with 32 examples per language shows consistent gains under the same budget with a small, stable footprint. These results indicate a practical and scalable path to accessible LLM fine-tuning under resource constraints.
| Published in | American Journal of Computer Science and Technology (Volume 8, Issue 4) |
| DOI | 10.11648/j.ajcst.20250804.17 |
| Page(s) | 242-255 |
| Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
| Copyright |
Copyright © The Author(s), 2025. Published by Science Publishing Group |
Large Language Models, Parameter-Efficient Fine-Tuning, Low-Rank Adaptation
| [1] | Z. Han, C. Gao, J. Liu, J. Zhang, and S. Q. Zhang, “Parameter-efficient fine-tuning for large models: A comprehensive survey,” arXiv preprint arXiv:2403.14608, 2024. |
| [2] | E. J. Hu, Y. Shen, P.Wallis, Z. Allen-Zhu, Y. Li, S.Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” in International Conference on Learning Representations, 2022. [Online]. Available: |
| [3] | W. Liu, Z. Qiu, Y. Feng, Y. Xiu, Y. Xue, L. Yu, H. Feng, Z. Liu, J. Heo, S. Peng, Y. Wen, M. J. Black, A. Weller, and B. Sch¨olkopf, “Parameter-efficient orthogonal finetuning via butterfly factorization,” in ICLR, 2024. |
| [4] | S. Wang, L. Yu, and J. Li, “LoRA-GA: Lowrank adaptation with gradient approximation,” 2024. [Online]. Available: |
| [5] | M. Arjovsky, A. Shah, and Y. Bengio, “Unitary evolution recurrent neural networks,” in International Conference on Machine Learning. PMLR, 2016, pp. 1120–1128. |
| [6] | P. He, Y. Chen, Y. Wang, and Y. Zhang, “Protum: A new method for prompt tuning based on “[mask]”,” arXiv preprint arXiv:2201.12109, 2022. |
| [7] | N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for NLP,” in International Conference on Machine Learning. PMLR, 2019, pp. 2790–2799. |
| [8] | X. Liu, K. Ji, Y. Fu, W. L. Tam, Z. Du, Z. Yang, and J. Tang, “P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks,” arXiv preprint arXiv:2110.07602, 2021. |
| [9] | A. Aghajanyan, L. Zettlemoyer, and S. Gupta, “Intrinsic dimensionality explains the effectiveness of language model fine-tuning,” arXiv preprint arXiv:2012.13255, 2020. |
| [10] | T. Dao, A. Gu, M. Eichhorn, A. Rudra, and C. Ré, “Learning fast algorithms for linear transforms using butterfly factorizations,” in International Conference on Machine Learning. PMLR, 2019, pp. 1517–1527. |
| [11] | S. Wisdom, T. Powers, J. Hershey, J. Le Roux, and L. Atlas, “Full-capacity unitary recurrent neural networks,” Advances in Neural Information Processing Systems, vol. 29, 2016. |
| [12] | M. Emami, M. Sahraee Ardakan, S. Rangan, and A. K. Fletcher, “Input-output equivalence of unitary and contractive RNNs,” Advances in Neural Information Processing Systems, vol. 32, 2019. |
| [13] | L. Xu, H. Xie, S.-Z. J. Qin, X. Tao, and F. L. Wang, “Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment,” arXiv preprint arXiv:2312.12148, 2023. |
| [14] | N. Ding, Y. Qin, G. Yang, F. Wei, Z. Yang, Y. Su, S. Hu, Y. Chen, C.-M. Chan,W. Chen et al., “Parameter-efficient fine-tuning of large-scale pre-trained language models,” Nature Machine Intelligence, vol. 5, no. 3, pp. 220–235, 2023. |
| [15] | R. K. Mahabadi, S. Ruder, M. Dehghani, J. Henderson et al., “Compacter: Efficient low-rank hypercomplex adapter layers,” in Advances in Neural Information Processing Systems (NeurIPS), 2021. (no DOI available; if you cite the preprint, use |
| [16] | E. Ben Zaken, Y. Goldberg, and S. Ravfogel, “BitFit: Simple parameter-efficient fine-tuning for transformers,” arXiv preprint arXiv:2106.10199, 2021. |
| [17] | S. Li, K. Jia, Y. Wen, T. Liu, and D. Tao, “Orthogonal deep neural networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 4, pp. 1352–1368, 2019. |
| [18] | A. Prabhu, A. Farhadi, M. Rastegari et al., “Butterfly transform: An efficient FFT-based neural architecture design,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12021–12030. |
| [19] | Z. Wang, J. Liang, R. He, Z. Wang, and T. Tan, “LoRAPro: Are low-rank adapters properly optimized?” arXiv preprint arXiv:2407.18242, 2024. |
| [20] | I. Shafran, T. Bagby, and R. Skerry-Ryan, “Complex evolution recurrent neural networks (CERNNs),” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018, pp. 5854-5858. |
| [21] | J.-P. Bernardy and S. Lappin, “Assessing the unitary RNN as an end-to-end compositional model of syntax,” arXiv preprint arXiv:2208.05719, 2022. |
| [22] | J. Pfeiffer, A. Rücklé, et al., “AdapterFusion: Nondestructive task composition for transfer learning,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2021. |
| [23] | A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, “GLUE: A multi-task benchmark and analysis platform for natural language understanding,” in Proceedings of the 2018 EMNLP Workshop BlackboxNLP, 2018, pp. 353-355. |
| [24] | K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert, J. Tworek, J. Hilton, R. Nakano et al., “Training verifiers to solve math word problems,” arXiv preprint arXiv:2110.14168, 2021. |
| [25] | L. Zheng, W.-L. Chiang, Y. Sheng, S. Zhuang, Z. Wu, Y. Zhuang, Z. Lin, Z. Li, D. Li, E. P. Xing et al., “Judging LLM-as-a-judge with MT-bench and Chatbot Arena,” Advances in Neural Information Processing Systems, vol. 36, 2023. |
| [26] | M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. D. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman et al., “Evaluating large language models trained on code,” arXiv preprint arXiv:2107.03374, 2021. |
| [27] | A. Conneau, G. Lample, R. Rinott, A. Williams, S. R. Bowman, H. Schwenk, and V. Stoyanov, “XNLI: Evaluating cross-lingual sentence representations,” arXiv preprint arXiv:1809.05053, 2018. |
| [28] | T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “QLoRA: Efficient finetuning of quantized LLMs,” arXiv preprint arXiv:2305.14314, 2023. |
| [29] | S.-Y. Liu, C.-Y. Wang, H. Yin, P. Molchanov, Y.- C. F. Wang, K.-T. Cheng, and M.-H. Chen, “DoRA: Weight-decomposed low-rank adaptation,” arXiv preprint arXiv:2402.09353, 2024. |
| [30] | H. Liu, D. Tam, M. Muqeeth, J. Mohta, T. Huang, M. Bansal, and C. Raffel, “Few-shot parameterefficient fine-tuning is better and cheaper than incontext learning,” arXiv preprint arXiv:2205.05638, 2022. |
| [31] | N. Goyal, C. Gao, V. Chaudhary, P.-J. Chen, G. Wenzek, D. Ju, S. Krishnan, M. Ranzato, F. Guzman, and A. Fan, “The FLORES-101 evaluation benchmark for lowresource and multilingual machine translation,” 2021. [Online]. Available: |
APA Style
Qi, H., Dai, Z., Huang, C. (2025). Hybrid and Unitary PEFT for Resource-Efficient Large Language Models. American Journal of Computer Science and Technology, 8(4), 242-255. https://doi.org/10.11648/j.ajcst.20250804.17
ACS Style
Qi, H.; Dai, Z.; Huang, C. Hybrid and Unitary PEFT for Resource-Efficient Large Language Models. Am. J. Comput. Sci. Technol. 2025, 8(4), 242-255. doi: 10.11648/j.ajcst.20250804.17
@article{10.11648/j.ajcst.20250804.17,
author = {Haomin Qi and Zihan Dai and Chengbo Huang},
title = {Hybrid and Unitary PEFT for Resource-Efficient Large Language Models
},
journal = {American Journal of Computer Science and Technology},
volume = {8},
number = {4},
pages = {242-255},
doi = {10.11648/j.ajcst.20250804.17},
url = {https://doi.org/10.11648/j.ajcst.20250804.17},
eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajcst.20250804.17},
abstract = {Fine-tuning large language models (LLMs) remains a computational bottleneck due to their scale and memory demands. This paper presents a comprehensive evaluation of parameter-efficient fine-tuning (PEFT) techniques, including LoRA, BOFT, LoRA-GA, and uRNN, and introduces a novel hybrid strategy that dynamically integrates BOFT’s orthogonal stability with LoRA-GA’s gradient-aligned rapid convergence. By computing per-layer adaptive updates guided by gradient norms, the hybrid method achieves superior convergence efficiency and generalization across diverse tasks. We also explore, for the first time, the adaptation of unitary RNN (uRNN) principles to Transformer-based LLMs, enhancing gradient stability through structured unitary constraints. Across GLUE, GSM8K, MT-Bench, and HumanEval with models from 7B to 405B, the hybrid approach yields consistent gains across three independent runs per task and model, approaching the quality of full fine-tuning while reducing training time by about 2.1 × and peak memory by nearly 50%, indicating practical significance under resource constraints. A compact multilingual and low-resource study on XNLI and FLORES with 32 examples per language shows consistent gains under the same budget with a small, stable footprint. These results indicate a practical and scalable path to accessible LLM fine-tuning under resource constraints.
},
year = {2025}
}
TY - JOUR T1 - Hybrid and Unitary PEFT for Resource-Efficient Large Language Models AU - Haomin Qi AU - Zihan Dai AU - Chengbo Huang Y1 - 2025/12/19 PY - 2025 N1 - https://doi.org/10.11648/j.ajcst.20250804.17 DO - 10.11648/j.ajcst.20250804.17 T2 - American Journal of Computer Science and Technology JF - American Journal of Computer Science and Technology JO - American Journal of Computer Science and Technology SP - 242 EP - 255 PB - Science Publishing Group SN - 2640-012X UR - https://doi.org/10.11648/j.ajcst.20250804.17 AB - Fine-tuning large language models (LLMs) remains a computational bottleneck due to their scale and memory demands. This paper presents a comprehensive evaluation of parameter-efficient fine-tuning (PEFT) techniques, including LoRA, BOFT, LoRA-GA, and uRNN, and introduces a novel hybrid strategy that dynamically integrates BOFT’s orthogonal stability with LoRA-GA’s gradient-aligned rapid convergence. By computing per-layer adaptive updates guided by gradient norms, the hybrid method achieves superior convergence efficiency and generalization across diverse tasks. We also explore, for the first time, the adaptation of unitary RNN (uRNN) principles to Transformer-based LLMs, enhancing gradient stability through structured unitary constraints. Across GLUE, GSM8K, MT-Bench, and HumanEval with models from 7B to 405B, the hybrid approach yields consistent gains across three independent runs per task and model, approaching the quality of full fine-tuning while reducing training time by about 2.1 × and peak memory by nearly 50%, indicating practical significance under resource constraints. A compact multilingual and low-resource study on XNLI and FLORES with 32 examples per language shows consistent gains under the same budget with a small, stable footprint. These results indicate a practical and scalable path to accessible LLM fine-tuning under resource constraints. VL - 8 IS - 4 ER -