Hybrid and Unitary PEFT for Resource-Efficient Large Language Models

Haomin Qi; Zihan Dai; Chengbo Huang

doi:doi:10.11648/j.ajcst.20250804.17

Research Article |

| Peer-Reviewed

Hybrid and Unitary PEFT for Resource-Efficient Large Language Models

Haomin Qi^*

, Zihan Dai, Chengbo Huang

Published in American Journal of Computer Science and Technology (Volume 8, Issue 4)

Received: 1 October 2025 Accepted: 20 October 2025 Published: 19 December 2025

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

Fine-tuning large language models (LLMs) remains a computational bottleneck due to their scale and memory demands. This paper presents a comprehensive evaluation of parameter-efficient fine-tuning (PEFT) techniques, including LoRA, BOFT, LoRA-GA, and uRNN, and introduces a novel hybrid strategy that dynamically integrates BOFT’s orthogonal stability with LoRA-GA’s gradient-aligned rapid convergence. By computing per-layer adaptive updates guided by gradient norms, the hybrid method achieves superior convergence efficiency and generalization across diverse tasks. We also explore, for the first time, the adaptation of unitary RNN (uRNN) principles to Transformer-based LLMs, enhancing gradient stability through structured unitary constraints. Across GLUE, GSM8K, MT-Bench, and HumanEval with models from 7B to 405B, the hybrid approach yields consistent gains across three independent runs per task and model, approaching the quality of full fine-tuning while reducing training time by about 2.1 × and peak memory by nearly 50%, indicating practical significance under resource constraints. A compact multilingual and low-resource study on XNLI and FLORES with 32 examples per language shows consistent gains under the same budget with a small, stable footprint. These results indicate a practical and scalable path to accessible LLM fine-tuning under resource constraints.

Published in	American Journal of Computer Science and Technology (Volume 8, Issue 4)
DOI	10.11648/j.ajcst.20250804.17
Page(s)	242-255
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2025. Published by Science Publishing Group

Keywords

Large Language Models, Parameter-Efficient Fine-Tuning, Low-Rank Adaptation

References

[1]	Z. Han, C. Gao, J. Liu, J. Zhang, and S. Q. Zhang, “Parameter-efficient fine-tuning for large models: A comprehensive survey,” arXiv preprint arXiv:2403.14608, 2024.
[2]	E. J. Hu, Y. Shen, P.Wallis, Z. Allen-Zhu, Y. Li, S.Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models,” in International Conference on Learning Representations, 2022. [Online]. Available: https://doi.org/10.48550/arXiv.2106.09685
[3]	W. Liu, Z. Qiu, Y. Feng, Y. Xiu, Y. Xue, L. Yu, H. Feng, Z. Liu, J. Heo, S. Peng, Y. Wen, M. J. Black, A. Weller, and B. Sch¨olkopf, “Parameter-efficient orthogonal finetuning via butterfly factorization,” in ICLR, 2024. https://doi.org/10.48550/arXiv.2311.06243
[4]	S. Wang, L. Yu, and J. Li, “LoRA-GA: Lowrank adaptation with gradient approximation,” 2024. [Online]. Available: https://doi.org/10.48550/arXiv.2407.05000
[5]	M. Arjovsky, A. Shah, and Y. Bengio, “Unitary evolution recurrent neural networks,” in International Conference on Machine Learning. PMLR, 2016, pp. 1120–1128. https://doi.org/10.48550/arXiv.1511.06464
[6]	P. He, Y. Chen, Y. Wang, and Y. Zhang, “Protum: A new method for prompt tuning based on “[mask]”,” arXiv preprint arXiv:2201.12109, 2022. https://doi.org/10.48550/arXiv.2201.12109
[7]	N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for NLP,” in International Conference on Machine Learning. PMLR, 2019, pp. 2790–2799. https://doi.org/10.48550/arXiv.1902.00751
[8]	X. Liu, K. Ji, Y. Fu, W. L. Tam, Z. Du, Z. Yang, and J. Tang, “P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks,” arXiv preprint arXiv:2110.07602, 2021. https://doi.org/10.48550/arXiv.2110.07602
[9]	A. Aghajanyan, L. Zettlemoyer, and S. Gupta, “Intrinsic dimensionality explains the effectiveness of language model fine-tuning,” arXiv preprint arXiv:2012.13255, 2020. https://doi.org/ 10.48550/arXiv.2012.13255
[10]	T. Dao, A. Gu, M. Eichhorn, A. Rudra, and C. Ré, “Learning fast algorithms for linear transforms using butterfly factorizations,” in International Conference on Machine Learning. PMLR, 2019, pp. 1517–1527. https://doi.org/10.48550/arXiv.1903.05895
[11]	S. Wisdom, T. Powers, J. Hershey, J. Le Roux, and L. Atlas, “Full-capacity unitary recurrent neural networks,” Advances in Neural Information Processing Systems, vol. 29, 2016. https://doi.org/10.48550/arXiv.1611.00035
[12]	M. Emami, M. Sahraee Ardakan, S. Rangan, and A. K. Fletcher, “Input-output equivalence of unitary and contractive RNNs,” Advances in Neural Information Processing Systems, vol. 32, 2019. https://doi.org/10.48550/arXiv.1910.13672
[13]	L. Xu, H. Xie, S.-Z. J. Qin, X. Tao, and F. L. Wang, “Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment,” arXiv preprint arXiv:2312.12148, 2023. https://doi.org/10.48550/arXiv.2312.12148
[14]	N. Ding, Y. Qin, G. Yang, F. Wei, Z. Yang, Y. Su, S. Hu, Y. Chen, C.-M. Chan,W. Chen et al., “Parameter-efficient fine-tuning of large-scale pre-trained language models,” Nature Machine Intelligence, vol. 5, no. 3, pp. 220–235, 2023. https://doi.org/10.1038/s42256-023-00626-4
[15]	R. K. Mahabadi, S. Ruder, M. Dehghani, J. Henderson et al., “Compacter: Efficient low-rank hypercomplex adapter layers,” in Advances in Neural Information Processing Systems (NeurIPS), 2021. (no DOI available; if you cite the preprint, use https://doi.org/10.48550/arXiv.2106.04647 )
[16]	E. Ben Zaken, Y. Goldberg, and S. Ravfogel, “BitFit: Simple parameter-efficient fine-tuning for transformers,” arXiv preprint arXiv:2106.10199, 2021. https://doi.org/10.48550/arXiv.2106.10199
[17]	S. Li, K. Jia, Y. Wen, T. Liu, and D. Tao, “Orthogonal deep neural networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 4, pp. 1352–1368, 2019. https://doi.org/ 10.1109/TPAMI.2019.2948352
[18]	A. Prabhu, A. Farhadi, M. Rastegari et al., “Butterfly transform: An efficient FFT-based neural architecture design,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 12021–12030. https://doi.org/10.1109/CVPR42600.2020.01204
[19]	Z. Wang, J. Liang, R. He, Z. Wang, and T. Tan, “LoRAPro: Are low-rank adapters properly optimized?” arXiv preprint arXiv:2407.18242, 2024. https://doi.org/10.48550/arXiv.2407.18242
[20]	I. Shafran, T. Bagby, and R. Skerry-Ryan, “Complex evolution recurrent neural networks (CERNNs),” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018, pp. 5854-5858.
[21]	J.-P. Bernardy and S. Lappin, “Assessing the unitary RNN as an end-to-end compositional model of syntax,” arXiv preprint arXiv:2208.05719, 2022. https://doi.org/10.48550/arXiv.2208.05719
[22]	J. Pfeiffer, A. Rücklé, et al., “AdapterFusion: Nondestructive task composition for transfer learning,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2021. https://doi.org/10.18653/v1/2021.eacl-main.39
[23]	A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, “GLUE: A multi-task benchmark and analysis platform for natural language understanding,” in Proceedings of the 2018 EMNLP Workshop BlackboxNLP, 2018, pp. 353-355. https://doi.org/10.18653/v1/W18-5446
[24]	K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert, J. Tworek, J. Hilton, R. Nakano et al., “Training verifiers to solve math word problems,” arXiv preprint arXiv:2110.14168, 2021. https://doi.org/10.48550/arXiv.2110.14168
[25]	L. Zheng, W.-L. Chiang, Y. Sheng, S. Zhuang, Z. Wu, Y. Zhuang, Z. Lin, Z. Li, D. Li, E. P. Xing et al., “Judging LLM-as-a-judge with MT-bench and Chatbot Arena,” Advances in Neural Information Processing Systems, vol. 36, 2023. https://doi.org/10.48550/arXiv.2306.05685
[26]	M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. D. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman et al., “Evaluating large language models trained on code,” arXiv preprint arXiv:2107.03374, 2021. https://doi.org/10.48550/arXiv.2107.03374
[27]	A. Conneau, G. Lample, R. Rinott, A. Williams, S. R. Bowman, H. Schwenk, and V. Stoyanov, “XNLI: Evaluating cross-lingual sentence representations,” arXiv preprint arXiv:1809.05053, 2018. https://doi.org/10.48550/arXiv.1809.05053
[28]	T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer, “QLoRA: Efficient finetuning of quantized LLMs,” arXiv preprint arXiv:2305.14314, 2023. https://doi.org/10.48550/arXiv.2305.14314
[29]	S.-Y. Liu, C.-Y. Wang, H. Yin, P. Molchanov, Y.- C. F. Wang, K.-T. Cheng, and M.-H. Chen, “DoRA: Weight-decomposed low-rank adaptation,” arXiv preprint arXiv:2402.09353, 2024. https://doi.org/ 10.48550/arXiv.2402.09353
[30]	H. Liu, D. Tam, M. Muqeeth, J. Mohta, T. Huang, M. Bansal, and C. Raffel, “Few-shot parameterefficient fine-tuning is better and cheaper than incontext learning,” arXiv preprint arXiv:2205.05638, 2022. https://doi.org/10.48550/arXiv.2205.05638
[31]	N. Goyal, C. Gao, V. Chaudhary, P.-J. Chen, G. Wenzek, D. Ju, S. Krishnan, M. Ranzato, F. Guzman, and A. Fan, “The FLORES-101 evaluation benchmark for lowresource and multilingual machine translation,” 2021. [Online]. Available: https://doi.org/10.48550/arXiv.2106.03193

Cite This Article

Plain Text BibTeX RIS

APA Style

Qi, H., Dai, Z., Huang, C. (2025). Hybrid and Unitary PEFT for Resource-Efficient Large Language Models. American Journal of Computer Science and Technology, 8(4), 242-255. https://doi.org/10.11648/j.ajcst.20250804.17

Copy | Download

ACS Style

Qi, H.; Dai, Z.; Huang, C. Hybrid and Unitary PEFT for Resource-Efficient Large Language Models. Am. J. Comput. Sci. Technol. 2025, 8(4), 242-255. doi: 10.11648/j.ajcst.20250804.17

Copy | Download

AMA Style

Qi H, Dai Z, Huang C. Hybrid and Unitary PEFT for Resource-Efficient Large Language Models. Am J Comput Sci Technol. 2025;8(4):242-255. doi: 10.11648/j.ajcst.20250804.17

Copy | Download

@article{10.11648/j.ajcst.20250804.17,
  author = {Haomin Qi and Zihan Dai and Chengbo Huang},
  title = {Hybrid and Unitary PEFT for Resource-Efficient Large Language Models
},
  journal = {American Journal of Computer Science and Technology},
  volume = {8},
  number = {4},
  pages = {242-255},
  doi = {10.11648/j.ajcst.20250804.17},
  url = {https://doi.org/10.11648/j.ajcst.20250804.17},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajcst.20250804.17},
  abstract = {Fine-tuning large language models (LLMs) remains a computational bottleneck due to their scale and memory demands. This paper presents a comprehensive evaluation of parameter-efficient fine-tuning (PEFT) techniques, including LoRA, BOFT, LoRA-GA, and uRNN, and introduces a novel hybrid strategy that dynamically integrates BOFT’s orthogonal stability with LoRA-GA’s gradient-aligned rapid convergence. By computing per-layer adaptive updates guided by gradient norms, the hybrid method achieves superior convergence efficiency and generalization across diverse tasks. We also explore, for the first time, the adaptation of unitary RNN (uRNN) principles to Transformer-based LLMs, enhancing gradient stability through structured unitary constraints. Across GLUE, GSM8K, MT-Bench, and HumanEval with models from 7B to 405B, the hybrid approach yields consistent gains across three independent runs per task and model, approaching the quality of full fine-tuning while reducing training time by about 2.1 × and peak memory by nearly 50%, indicating practical significance under resource constraints. A compact multilingual and low-resource study on XNLI and FLORES with 32 examples per language shows consistent gains under the same budget with a small, stable footprint. These results indicate a practical and scalable path to accessible LLM fine-tuning under resource constraints.
},
 year = {2025}
}

Copy | Download

TY  - JOUR
T1  - Hybrid and Unitary PEFT for Resource-Efficient Large Language Models

AU  - Haomin Qi
AU  - Zihan Dai
AU  - Chengbo Huang
Y1  - 2025/12/19
PY  - 2025
N1  - https://doi.org/10.11648/j.ajcst.20250804.17
DO  - 10.11648/j.ajcst.20250804.17
T2  - American Journal of Computer Science and Technology
JF  - American Journal of Computer Science and Technology
JO  - American Journal of Computer Science and Technology
SP  - 242
EP  - 255
PB  - Science Publishing Group
SN  - 2640-012X
UR  - https://doi.org/10.11648/j.ajcst.20250804.17
AB  - Fine-tuning large language models (LLMs) remains a computational bottleneck due to their scale and memory demands. This paper presents a comprehensive evaluation of parameter-efficient fine-tuning (PEFT) techniques, including LoRA, BOFT, LoRA-GA, and uRNN, and introduces a novel hybrid strategy that dynamically integrates BOFT’s orthogonal stability with LoRA-GA’s gradient-aligned rapid convergence. By computing per-layer adaptive updates guided by gradient norms, the hybrid method achieves superior convergence efficiency and generalization across diverse tasks. We also explore, for the first time, the adaptation of unitary RNN (uRNN) principles to Transformer-based LLMs, enhancing gradient stability through structured unitary constraints. Across GLUE, GSM8K, MT-Bench, and HumanEval with models from 7B to 405B, the hybrid approach yields consistent gains across three independent runs per task and model, approaching the quality of full fine-tuning while reducing training time by about 2.1 × and peak memory by nearly 50%, indicating practical significance under resource constraints. A compact multilingual and low-resource study on XNLI and FLORES with 32 examples per language shows consistent gains under the same budget with a small, stable footprint. These results indicate a practical and scalable path to accessible LLM fine-tuning under resource constraints.

VL  - 8
IS  - 4
ER  -

Copy | Download

Author Information

Haomin Qi

Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China; Department of Electrical and Computer Engineering, University of California San Diego, La Jolla, United States of America

Contact Email

http://orcid.org/0009-0008-5383-825X
Zihan Dai

Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China; Department of Computer Science, University of Copenhagen, Copenhagen, Denmark

Contact Email
Chengbo Huang

Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China; Department of Electrical Engineering, Columbia University, New York , United States of America

Contact Email

Download PDF

Submit an Article

Sections

Plain Text BibTeX RIS

APA Style

Qi, H., Dai, Z., Huang, C. (2025). Hybrid and Unitary PEFT for Resource-Efficient Large Language Models. American Journal of Computer Science and Technology, 8(4), 242-255. https://doi.org/10.11648/j.ajcst.20250804.17

Copy | Download

ACS Style

Qi, H.; Dai, Z.; Huang, C. Hybrid and Unitary PEFT for Resource-Efficient Large Language Models. Am. J. Comput. Sci. Technol. 2025, 8(4), 242-255. doi: 10.11648/j.ajcst.20250804.17

Copy | Download

AMA Style

Qi H, Dai Z, Huang C. Hybrid and Unitary PEFT for Resource-Efficient Large Language Models. Am J Comput Sci Technol. 2025;8(4):242-255. doi: 10.11648/j.ajcst.20250804.17

Copy | Download

@article{10.11648/j.ajcst.20250804.17,
  author = {Haomin Qi and Zihan Dai and Chengbo Huang},
  title = {Hybrid and Unitary PEFT for Resource-Efficient Large Language Models
},
  journal = {American Journal of Computer Science and Technology},
  volume = {8},
  number = {4},
  pages = {242-255},
  doi = {10.11648/j.ajcst.20250804.17},
  url = {https://doi.org/10.11648/j.ajcst.20250804.17},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajcst.20250804.17},
  abstract = {Fine-tuning large language models (LLMs) remains a computational bottleneck due to their scale and memory demands. This paper presents a comprehensive evaluation of parameter-efficient fine-tuning (PEFT) techniques, including LoRA, BOFT, LoRA-GA, and uRNN, and introduces a novel hybrid strategy that dynamically integrates BOFT’s orthogonal stability with LoRA-GA’s gradient-aligned rapid convergence. By computing per-layer adaptive updates guided by gradient norms, the hybrid method achieves superior convergence efficiency and generalization across diverse tasks. We also explore, for the first time, the adaptation of unitary RNN (uRNN) principles to Transformer-based LLMs, enhancing gradient stability through structured unitary constraints. Across GLUE, GSM8K, MT-Bench, and HumanEval with models from 7B to 405B, the hybrid approach yields consistent gains across three independent runs per task and model, approaching the quality of full fine-tuning while reducing training time by about 2.1 × and peak memory by nearly 50%, indicating practical significance under resource constraints. A compact multilingual and low-resource study on XNLI and FLORES with 32 examples per language shows consistent gains under the same budget with a small, stable footprint. These results indicate a practical and scalable path to accessible LLM fine-tuning under resource constraints.
},
 year = {2025}
}

Copy | Download

TY  - JOUR
T1  - Hybrid and Unitary PEFT for Resource-Efficient Large Language Models

AU  - Haomin Qi
AU  - Zihan Dai
AU  - Chengbo Huang
Y1  - 2025/12/19
PY  - 2025
N1  - https://doi.org/10.11648/j.ajcst.20250804.17
DO  - 10.11648/j.ajcst.20250804.17
T2  - American Journal of Computer Science and Technology
JF  - American Journal of Computer Science and Technology
JO  - American Journal of Computer Science and Technology
SP  - 242
EP  - 255
PB  - Science Publishing Group
SN  - 2640-012X
UR  - https://doi.org/10.11648/j.ajcst.20250804.17
AB  - Fine-tuning large language models (LLMs) remains a computational bottleneck due to their scale and memory demands. This paper presents a comprehensive evaluation of parameter-efficient fine-tuning (PEFT) techniques, including LoRA, BOFT, LoRA-GA, and uRNN, and introduces a novel hybrid strategy that dynamically integrates BOFT’s orthogonal stability with LoRA-GA’s gradient-aligned rapid convergence. By computing per-layer adaptive updates guided by gradient norms, the hybrid method achieves superior convergence efficiency and generalization across diverse tasks. We also explore, for the first time, the adaptation of unitary RNN (uRNN) principles to Transformer-based LLMs, enhancing gradient stability through structured unitary constraints. Across GLUE, GSM8K, MT-Bench, and HumanEval with models from 7B to 405B, the hybrid approach yields consistent gains across three independent runs per task and model, approaching the quality of full fine-tuning while reducing training time by about 2.1 × and peak memory by nearly 50%, indicating practical significance under resource constraints. A compact multilingual and low-resource study on XNLI and FLORES with 32 examples per language shows consistent gains under the same budget with a small, stable footprint. These results indicate a practical and scalable path to accessible LLM fine-tuning under resource constraints.

VL  - 8
IS  - 4
ER  -

Copy | Download