Optimizing the Hyper-parameters of Multi-layer Perceptron with Greedy Search

Mingyu Bae

doi:doi:10.11648/j.ajcst.20210404.11

| Peer-Reviewed

Optimizing the Hyper-parameters of Multi-layer Perceptron with Greedy Search

Mingyu Bae

Published in American Journal of Computer Science and Technology (Volume 4, Issue 4)

Received: 13 September 2021 Accepted: 4 October 2021 Published: 15 October 2021

Views: Downloads:

Download PDF

Share This Article

Twitter
Linked In
Facebook

Abstract

The core of deep learning network is hyper-parameters which are updated through learning process with samples. Whenever a sample is fed into deep learning network, parameters change according to gradient value. At this point, the number of samples and the amount of learning are crucial, which are batch size and learning rate. To find the optimal batch size and learning rate, lots of trial is inevitable so it takes so much time and effort. Therefore, there have been lots of papers to enhance the efficiency of its optimization process by automatically tuning the single parameter. However, global optimization can’t be guaranteed by simply combining separately optimized parameters. This paper propose brand new effective method for hyperparameter optimization in which greedy search is adopted to find the optimal batch size and learning rate. In experiment with Fashion MNIST and Kuzushiji MNIST dataset, the proposed algorithm shows the similar performance as compared to complete search, which means the proposed algorithm can be a potential alternative to complete search.

Published in	American Journal of Computer Science and Technology (Volume 4, Issue 4)
DOI	10.11648/j.ajcst.20210404.11
Page(s)	90-96
Creative Commons	This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.
Copyright	Copyright © The Author(s), 2021. Published by Science Publishing Group

Keywords

Hyper Parameters, Batch Size, Learning Rate, Greedy Search

References

[1]	L. Bottou. Online learning and stochastic approximations. On-line learning in neural networks, 17 (9): 142, 1998.
[2]	I. Sutskever, J. Martens, G. Dahl, and G. Hinton. On the importance of initialization and momentum in deep learning. In International conference on machine learning, 2013.
[3]	L. Dinh, R. Pascanu, S. Bengio, and Y. Bengio. Sharp minima can generalize for deep nets. In International Conference on Machine Learning, 2017.
[4]	P. Goyal, P. Dollár, R. Girshick, P. Noordhuis, L. Wesolowski, A. Kyrola, A. Tulloch, Y. Jia, and K. He. Accurate, large minibatch sgd: training imagenet in 1 hour. arXiv preprint arXiv: 1706.02677, 2017.
[5]	S. Jastrzebski, Z. Kenton, D. Arpit, N. Ballas, A. Fischer, Y. Bengio, and A. Storkey. Three factors influencing minima in sgd. arXiv e-prints, 1711.04623, 2017.
[6]	N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang. On large-batch training for deep learning: Generalization gap and sharp minima. In International Conference on Learning Representations, 2017.
[7]	Y. Chen, C. Jin, and B. Yu. Stability and convergence trade-off of iterative optimization algorithms. arXiv preprint arXiv: 1804.01619, 2018.
[8]	M. Hardt, B. Recht, and Y. Singer. Train faster, generalize better: Stability of stochastic gradient descent. In International Conference on Machine Learning, 2015.
[9]	J. Lin, R. Camoriano, and L. Rosasco. Generalization properties and implicit regularization for multiple passes sgm. International Conference on Machine Learning, 2016.
[10]	W. Mou, L. Wang, X. Zhai, and K. Zheng. Generalization bounds of sgld for non-convex learning: Two theoretical viewpoints. In Annual Conference On Learning Theory, 2018.
[11]	A. Pensia, V. Jog, and P.-L. Loh. Generalization error bounds for noisy, iterative algorithms. In IEEE International Symposium on Information Theory, 2018.
[12]	James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. Journal of Machine Learning Research, Feb, 2012.
[13]	H Xu, J van Genabith, D Xiong, Q Liu, Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change, arXiv preprint arXiv: 2005.02008, 2020.
[14]	SL Smith, PJ Kindermans, C Ying, QV Le, DON’T DECAY THE LEARNING RATE, INCREASE THE BATCH SIZE, arXiv preprint arXiv: 1711.00489, 2017.
[15]	L Balles, J Romero, P Hennig, Coupling adaptive batch sizes with learning rates, arXiv preprint arXiv: 1612.05086, 2016.
[16]	F He, T Liu, D Tao, Control Batch Size and Learning Rate to Generalize Well: Theoretical and Empirical Evidence, Advances in Neural Information Processing Systems 32 (NeurIPS 2019).

Cite This Article

Plain Text BibTeX RIS

APA Style

Mingyu Bae. (2021). Optimizing the Hyper-parameters of Multi-layer Perceptron with Greedy Search. American Journal of Computer Science and Technology, 4(4), 90-96. https://doi.org/10.11648/j.ajcst.20210404.11

Copy | Download

ACS Style

Mingyu Bae. Optimizing the Hyper-parameters of Multi-layer Perceptron with Greedy Search. Am. J. Comput. Sci. Technol. 2021, 4(4), 90-96. doi: 10.11648/j.ajcst.20210404.11

Copy | Download

AMA Style

Mingyu Bae. Optimizing the Hyper-parameters of Multi-layer Perceptron with Greedy Search. Am J Comput Sci Technol. 2021;4(4):90-96. doi: 10.11648/j.ajcst.20210404.11

Copy | Download

@article{10.11648/j.ajcst.20210404.11,
  author = {Mingyu Bae},
  title = {Optimizing the Hyper-parameters of Multi-layer Perceptron with Greedy Search},
  journal = {American Journal of Computer Science and Technology},
  volume = {4},
  number = {4},
  pages = {90-96},
  doi = {10.11648/j.ajcst.20210404.11},
  url = {https://doi.org/10.11648/j.ajcst.20210404.11},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajcst.20210404.11},
  abstract = {The core of deep learning network is hyper-parameters which are updated through learning process with samples. Whenever a sample is fed into deep learning network, parameters change according to gradient value. At this point, the number of samples and the amount of learning are crucial, which are batch size and learning rate. To find the optimal batch size and learning rate, lots of trial is inevitable so it takes so much time and effort. Therefore, there have been lots of papers to enhance the efficiency of its optimization process by automatically tuning the single parameter. However, global optimization can’t be guaranteed by simply combining separately optimized parameters. This paper propose brand new effective method for hyperparameter optimization in which greedy search is adopted to find the optimal batch size and learning rate. In experiment with Fashion MNIST and Kuzushiji MNIST dataset, the proposed algorithm shows the similar performance as compared to complete search, which means the proposed algorithm can be a potential alternative to complete search.},
 year = {2021}
}

Copy | Download

TY  - JOUR
T1  - Optimizing the Hyper-parameters of Multi-layer Perceptron with Greedy Search
AU  - Mingyu Bae
Y1  - 2021/10/15
PY  - 2021
N1  - https://doi.org/10.11648/j.ajcst.20210404.11
DO  - 10.11648/j.ajcst.20210404.11
T2  - American Journal of Computer Science and Technology
JF  - American Journal of Computer Science and Technology
JO  - American Journal of Computer Science and Technology
SP  - 90
EP  - 96
PB  - Science Publishing Group
SN  - 2640-012X
UR  - https://doi.org/10.11648/j.ajcst.20210404.11
AB  - The core of deep learning network is hyper-parameters which are updated through learning process with samples. Whenever a sample is fed into deep learning network, parameters change according to gradient value. At this point, the number of samples and the amount of learning are crucial, which are batch size and learning rate. To find the optimal batch size and learning rate, lots of trial is inevitable so it takes so much time and effort. Therefore, there have been lots of papers to enhance the efficiency of its optimization process by automatically tuning the single parameter. However, global optimization can’t be guaranteed by simply combining separately optimized parameters. This paper propose brand new effective method for hyperparameter optimization in which greedy search is adopted to find the optimal batch size and learning rate. In experiment with Fashion MNIST and Kuzushiji MNIST dataset, the proposed algorithm shows the similar performance as compared to complete search, which means the proposed algorithm can be a potential alternative to complete search.
VL  - 4
IS  - 4
ER  -

Copy | Download

Author Information

Mingyu Bae

North London Collegiate School Jeju, Jeju, Korea

Download PDF

Sections

Plain Text BibTeX RIS

APA Style

Mingyu Bae. (2021). Optimizing the Hyper-parameters of Multi-layer Perceptron with Greedy Search. American Journal of Computer Science and Technology, 4(4), 90-96. https://doi.org/10.11648/j.ajcst.20210404.11

Copy | Download

ACS Style

Mingyu Bae. Optimizing the Hyper-parameters of Multi-layer Perceptron with Greedy Search. Am. J. Comput. Sci. Technol. 2021, 4(4), 90-96. doi: 10.11648/j.ajcst.20210404.11

Copy | Download

AMA Style

Mingyu Bae. Optimizing the Hyper-parameters of Multi-layer Perceptron with Greedy Search. Am J Comput Sci Technol. 2021;4(4):90-96. doi: 10.11648/j.ajcst.20210404.11

Copy | Download

@article{10.11648/j.ajcst.20210404.11,
  author = {Mingyu Bae},
  title = {Optimizing the Hyper-parameters of Multi-layer Perceptron with Greedy Search},
  journal = {American Journal of Computer Science and Technology},
  volume = {4},
  number = {4},
  pages = {90-96},
  doi = {10.11648/j.ajcst.20210404.11},
  url = {https://doi.org/10.11648/j.ajcst.20210404.11},
  eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajcst.20210404.11},
  abstract = {The core of deep learning network is hyper-parameters which are updated through learning process with samples. Whenever a sample is fed into deep learning network, parameters change according to gradient value. At this point, the number of samples and the amount of learning are crucial, which are batch size and learning rate. To find the optimal batch size and learning rate, lots of trial is inevitable so it takes so much time and effort. Therefore, there have been lots of papers to enhance the efficiency of its optimization process by automatically tuning the single parameter. However, global optimization can’t be guaranteed by simply combining separately optimized parameters. This paper propose brand new effective method for hyperparameter optimization in which greedy search is adopted to find the optimal batch size and learning rate. In experiment with Fashion MNIST and Kuzushiji MNIST dataset, the proposed algorithm shows the similar performance as compared to complete search, which means the proposed algorithm can be a potential alternative to complete search.},
 year = {2021}
}

Copy | Download

TY  - JOUR
T1  - Optimizing the Hyper-parameters of Multi-layer Perceptron with Greedy Search
AU  - Mingyu Bae
Y1  - 2021/10/15
PY  - 2021
N1  - https://doi.org/10.11648/j.ajcst.20210404.11
DO  - 10.11648/j.ajcst.20210404.11
T2  - American Journal of Computer Science and Technology
JF  - American Journal of Computer Science and Technology
JO  - American Journal of Computer Science and Technology
SP  - 90
EP  - 96
PB  - Science Publishing Group
SN  - 2640-012X
UR  - https://doi.org/10.11648/j.ajcst.20210404.11
AB  - The core of deep learning network is hyper-parameters which are updated through learning process with samples. Whenever a sample is fed into deep learning network, parameters change according to gradient value. At this point, the number of samples and the amount of learning are crucial, which are batch size and learning rate. To find the optimal batch size and learning rate, lots of trial is inevitable so it takes so much time and effort. Therefore, there have been lots of papers to enhance the efficiency of its optimization process by automatically tuning the single parameter. However, global optimization can’t be guaranteed by simply combining separately optimized parameters. This paper propose brand new effective method for hyperparameter optimization in which greedy search is adopted to find the optimal batch size and learning rate. In experiment with Fashion MNIST and Kuzushiji MNIST dataset, the proposed algorithm shows the similar performance as compared to complete search, which means the proposed algorithm can be a potential alternative to complete search.
VL  - 4
IS  - 4
ER  -

Copy | Download