Research Article | | Peer-Reviewed

Hybrid and Unitary PEFT for Resource-Efficient Large Language Models

Received: 1 October 2025     Accepted: 20 October 2025     Published: 19 December 2025
Views:       Downloads:
Abstract

Fine-tuning large language models (LLMs) remains a computational bottleneck due to their scale and memory demands. This paper presents a comprehensive evaluation of parameter-efficient fine-tuning (PEFT) techniques, including LoRA, BOFT, LoRA-GA, and uRNN, and introduces a novel hybrid strategy that dynamically integrates BOFT’s orthogonal stability with LoRA-GA’s gradient-aligned rapid convergence. By computing per-layer adaptive updates guided by gradient norms, the hybrid method achieves superior convergence efficiency and generalization across diverse tasks. We also explore, for the first time, the adaptation of unitary RNN (uRNN) principles to Transformer-based LLMs, enhancing gradient stability through structured unitary constraints. Across GLUE, GSM8K, MT-Bench, and HumanEval with models from 7B to 405B, the hybrid approach yields consistent gains across three independent runs per task and model, approaching the quality of full fine-tuning while reducing training time by about 2.1 × and peak memory by nearly 50%, indicating practical significance under resource constraints. A compact multilingual and low-resource study on XNLI and FLORES with 32 examples per language shows consistent gains under the same budget with a small, stable footprint. These results indicate a practical and scalable path to accessible LLM fine-tuning under resource constraints.

Published in American Journal of Computer Science and Technology (Volume 8, Issue 4)
DOI 10.11648/j.ajcst.20250804.17
Page(s) 242-255
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2025. Published by Science Publishing Group

Keywords

Large Language Models, Parameter-Efficient Fine-Tuning, Low-Rank Adaptation

References
[1] Z. Han, C. Gao, J. Liu, J. Zhang, and S. Q. Zhang, “Parameter-efficient fine-tuning for large models: A comprehensive survey,” arXiv preprint arXiv:2403.14608, 2024.
[2] E. J. Hu, Y. Shen, P.Wallis, Z. Allen-Zhu, Y. Li, S.Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” in International Conference on Learning Representations, 2022. [Online]. Available:
[3] W. Liu, Z. Qiu, Y. Feng, Y. Xiu, Y. Xue, L. Yu, H. Feng, Z. Liu, J. Heo, S. Peng, Y. Wen, M. J. Black, A. Weller, and B. Sch¨olkopf, “Parameter-efficient orthogonal finetuning via butterfly factorization,” in ICLR, 2024.
[4] S. Wang, L. Yu, and J. Li, “LoRA-GA: Lowrank adaptation with gradient approximation,” 2024. [Online]. Available:
[5] M. Arjovsky, A. Shah, and Y. Bengio, “Unitary evolution recurrent neural networks,” in International Conference on Machine Learning. PMLR, 2016, pp. 1120–1128.
[6] P. He, Y. Chen, Y. Wang, and Y. Zhang, “Protum: A new method for prompt tuning based on “[mask]”,” arXiv preprint arXiv:2201.12109, 2022.
[7] N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for NLP,” in International Conference on Machine Learning. PMLR, 2019, pp. 2790–2799.
[8] X. Liu, K. Ji, Y. Fu, W. L. Tam, Z. Du, Z. Yang, and J. Tang, “P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks,” arXiv preprint arXiv:2110.07602, 2021.
[9] A. Aghajanyan, L. Zettlemoyer, and S. Gupta, “Intrinsic dimensionality explains the effectiveness of language model fine-tuning,” arXiv preprint arXiv:2012.13255, 2020.
[10] T. Dao, A. Gu, M. Eichhorn, A. Rudra, and C. Ré, “Learning fast algorithms for linear transforms using butterfly factorizations,” in International Conference on Machine Learning. PMLR, 2019, pp. 1517–1527.
[11] S. Wisdom, T. Powers, J. Hershey, J. Le Roux, and L. Atlas, “Full-capacity unitary recurrent neural networks,” Advances in Neural Information Processing Systems, vol. 29, 2016.
[12] M. Emami, M. Sahraee Ardakan, S. Rangan, and A. K. Fletcher, “Input-output equivalence of unitary and contractive RNNs,” Advances in Neural Information Processing Systems, vol. 32, 2019.
[13] L. Xu, H. Xie, S.-Z. J. Qin, X. Tao, and F. L. Wang, “Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment,” arXiv preprint arXiv:2312.12148, 2023.
[14] N. Ding, Y. Qin, G. Yang, F. Wei, Z. Yang, Y. Su, S. Hu, Y. Chen, C.-M. Chan,W. Chen et al., “Parameter-efficient fine-tuning of large-scale pre-trained language models,” Nature Machine Intelligence, vol. 5, no. 3, pp. 220–235, 2023.
[15] R. K. Mahabadi, S. Ruder, M. Dehghani, J. Henderson et al., “Compacter: Efficient low-rank hypercomplex adapter layers,” in Advances in Neural Information Processing Systems (NeurIPS), 2021. (no DOI available; if you cite the preprint, use
[16] E. Ben Zaken, Y. Goldberg, and S. Ravfogel, “BitFit: Simple parameter-efficient fine-tuning for transformers,” arXiv preprint arXiv:2106.10199, 2021.
[17] S. Li, K. Jia, Y. Wen, T. Liu, and D. Tao, “Orthogonal deep neural networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 4, pp. 1352–1368, 2019.
[18] A. Prabhu, A. Farhadi, M. Rastegari et al., “Butterfly transform: An efficient FFT-based neural architecture design,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12021–12030.
[19] Z. Wang, J. Liang, R. He, Z. Wang, and T. Tan, “LoRAPro: Are low-rank adapters properly optimized?” arXiv preprint arXiv:2407.18242, 2024.
[20] I. Shafran, T. Bagby, and R. Skerry-Ryan, “Complex evolution recurrent neural networks (CERNNs),” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018, pp. 5854-5858.
[21] J.-P. Bernardy and S. Lappin, “Assessing the unitary RNN as an end-to-end compositional model of syntax,” arXiv preprint arXiv:2208.05719, 2022.
[22] J. Pfeiffer, A. Rücklé, et al., “AdapterFusion: Nondestructive task composition for transfer learning,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2021.
[23] A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, “GLUE: A multi-task benchmark and analysis platform for natural language understanding,” in Proceedings of the 2018 EMNLP Workshop BlackboxNLP, 2018, pp. 353-355.
[24] K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert, J. Tworek, J. Hilton, R. Nakano et al., “Training verifiers to solve math word problems,” arXiv preprint arXiv:2110.14168, 2021.
[25] L. Zheng, W.-L. Chiang, Y. Sheng, S. Zhuang, Z. Wu, Y. Zhuang, Z. Lin, Z. Li, D. Li, E. P. Xing et al., “Judging LLM-as-a-judge with MT-bench and Chatbot Arena,” Advances in Neural Information Processing Systems, vol. 36, 2023.
[26] M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. D. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman et al., “Evaluating large language models trained on code,” arXiv preprint arXiv:2107.03374, 2021.
[27] A. Conneau, G. Lample, R. Rinott, A. Williams, S. R. Bowman, H. Schwenk, and V. Stoyanov, “XNLI: Evaluating cross-lingual sentence representations,” arXiv preprint arXiv:1809.05053, 2018.
[28] T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “QLoRA: Efficient finetuning of quantized LLMs,” arXiv preprint arXiv:2305.14314, 2023.
[29] S.-Y. Liu, C.-Y. Wang, H. Yin, P. Molchanov, Y.- C. F. Wang, K.-T. Cheng, and M.-H. Chen, “DoRA: Weight-decomposed low-rank adaptation,” arXiv preprint arXiv:2402.09353, 2024.
[30] H. Liu, D. Tam, M. Muqeeth, J. Mohta, T. Huang, M. Bansal, and C. Raffel, “Few-shot parameterefficient fine-tuning is better and cheaper than incontext learning,” arXiv preprint arXiv:2205.05638, 2022.
[31] N. Goyal, C. Gao, V. Chaudhary, P.-J. Chen, G. Wenzek, D. Ju, S. Krishnan, M. Ranzato, F. Guzman, and A. Fan, “The FLORES-101 evaluation benchmark for lowresource and multilingual machine translation,” 2021. [Online]. Available:
Cite This Article
  • APA Style

    Qi, H., Dai, Z., Huang, C. (2025). Hybrid and Unitary PEFT for Resource-Efficient Large Language Models. American Journal of Computer Science and Technology, 8(4), 242-255. https://doi.org/10.11648/j.ajcst.20250804.17

    Copy | Download

    ACS Style

    Qi, H.; Dai, Z.; Huang, C. Hybrid and Unitary PEFT for Resource-Efficient Large Language Models. Am. J. Comput. Sci. Technol. 2025, 8(4), 242-255. doi: 10.11648/j.ajcst.20250804.17

    Copy | Download

    AMA Style

    Qi H, Dai Z, Huang C. Hybrid and Unitary PEFT for Resource-Efficient Large Language Models. Am J Comput Sci Technol. 2025;8(4):242-255. doi: 10.11648/j.ajcst.20250804.17

    Copy | Download

  • @article{10.11648/j.ajcst.20250804.17,
      author = {Haomin Qi and Zihan Dai and Chengbo Huang},
      title = {Hybrid and Unitary PEFT for Resource-Efficient Large Language Models
    },
      journal = {American Journal of Computer Science and Technology},
      volume = {8},
      number = {4},
      pages = {242-255},
      doi = {10.11648/j.ajcst.20250804.17},
      url = {https://doi.org/10.11648/j.ajcst.20250804.17},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajcst.20250804.17},
      abstract = {Fine-tuning large language models (LLMs) remains a computational bottleneck due to their scale and memory demands. This paper presents a comprehensive evaluation of parameter-efficient fine-tuning (PEFT) techniques, including LoRA, BOFT, LoRA-GA, and uRNN, and introduces a novel hybrid strategy that dynamically integrates BOFT’s orthogonal stability with LoRA-GA’s gradient-aligned rapid convergence. By computing per-layer adaptive updates guided by gradient norms, the hybrid method achieves superior convergence efficiency and generalization across diverse tasks. We also explore, for the first time, the adaptation of unitary RNN (uRNN) principles to Transformer-based LLMs, enhancing gradient stability through structured unitary constraints. Across GLUE, GSM8K, MT-Bench, and HumanEval with models from 7B to 405B, the hybrid approach yields consistent gains across three independent runs per task and model, approaching the quality of full fine-tuning while reducing training time by about 2.1 × and peak memory by nearly 50%, indicating practical significance under resource constraints. A compact multilingual and low-resource study on XNLI and FLORES with 32 examples per language shows consistent gains under the same budget with a small, stable footprint. These results indicate a practical and scalable path to accessible LLM fine-tuning under resource constraints.
    },
     year = {2025}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Hybrid and Unitary PEFT for Resource-Efficient Large Language Models
    
    AU  - Haomin Qi
    AU  - Zihan Dai
    AU  - Chengbo Huang
    Y1  - 2025/12/19
    PY  - 2025
    N1  - https://doi.org/10.11648/j.ajcst.20250804.17
    DO  - 10.11648/j.ajcst.20250804.17
    T2  - American Journal of Computer Science and Technology
    JF  - American Journal of Computer Science and Technology
    JO  - American Journal of Computer Science and Technology
    SP  - 242
    EP  - 255
    PB  - Science Publishing Group
    SN  - 2640-012X
    UR  - https://doi.org/10.11648/j.ajcst.20250804.17
    AB  - Fine-tuning large language models (LLMs) remains a computational bottleneck due to their scale and memory demands. This paper presents a comprehensive evaluation of parameter-efficient fine-tuning (PEFT) techniques, including LoRA, BOFT, LoRA-GA, and uRNN, and introduces a novel hybrid strategy that dynamically integrates BOFT’s orthogonal stability with LoRA-GA’s gradient-aligned rapid convergence. By computing per-layer adaptive updates guided by gradient norms, the hybrid method achieves superior convergence efficiency and generalization across diverse tasks. We also explore, for the first time, the adaptation of unitary RNN (uRNN) principles to Transformer-based LLMs, enhancing gradient stability through structured unitary constraints. Across GLUE, GSM8K, MT-Bench, and HumanEval with models from 7B to 405B, the hybrid approach yields consistent gains across three independent runs per task and model, approaching the quality of full fine-tuning while reducing training time by about 2.1 × and peak memory by nearly 50%, indicating practical significance under resource constraints. A compact multilingual and low-resource study on XNLI and FLORES with 32 examples per language shows consistent gains under the same budget with a small, stable footprint. These results indicate a practical and scalable path to accessible LLM fine-tuning under resource constraints.
    
    VL  - 8
    IS  - 4
    ER  - 

    Copy | Download

Author Information
  • Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China; Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, United States of America

  • Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China; Department of Computer Science, University of Copenhagen, Copenhagen, Denmark

  • Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China; Department of Electrical Engineering, Columbia University, New York , United States of America

  • Sections