Browsing by Author "Afolabi, O. J."

Now showing 1 - 2 of 2

A Comparative Analysis of Gradient Descent-Based Optimization Algorithms on Convolutional Neural Networks
(IEEE, 2018) Dogo, E. M.; Afolabi, O. J.; Nwulu, N. I.; Twala, B.; Aigbavboa, C. O.
In this paper, we perform a comparative evaluation of seven most commonly used first-order stochastic gradient-based optimization techniques in a simple Convolutional Neural Network (ConvNet) architectural setup. The investigated techniques are the Stochastic Gradient Descent (SGD), with vanilla (vSGD), with momentum (SGDm), with momentum and nesterov (SGDm+n)), Root Mean Square Propagation (RMSProp), Adaptive Moment Estimation (Adam), Adaptive Gradient (AdaGrad), Adaptive Delta (AdaDelta), Adaptive moment estimation Extension based on infinity norm (Adamax) and Nesterov-accelerated Adaptive Moment Estimation (Nadam). We trained the model and evaluated the optimization techniques in terms of convergence speed, accuracy and loss function using three randomly selected publicly available image classification datasets. The overall experimental results obtained show Nadam achieved better performance across the three datasets in comparison to the other optimization techniques, while AdaDelta performed the worst.
On the Relative Impact of Optimizers on Convolutional Neural Networks with Varying Depth and Width for Image Classification
(MDPI, 2022) Dogo, E. M.; Afolabi, O. J.; Twala, B.
The continued increase in computing resources is one key factor that is allowing deep learning researchers to scale, design and train new and complex convolutional neural network (CNN) architectures in terms of varying width, depth, or both width and depth to improve performance for a variety of problems. The contributions of this study include an uncovering of how different optimization algorithms impact CNN architectural setups with variations in width, depth, and both width/depth. Specifically in this study, three different CNN architectural setups in combination with nine different optimization algorithms—namely SGD vanilla, with momentum, and with Nesterov momentum, RMSProp, ADAM, ADAGrad, ADADelta, ADAMax, and NADAM—are trained and evaluated using three publicly available benchmark image classification datasets. Through extensive experimentation, we analyze the output predictions of the different optimizers with the CNN architectures using accuracy, convergence speed, and loss function as performance metrics. Findings based on the overall results obtained across the three image classification datasets show that ADAM and NADAM achieved superior performances with wider and deeper/wider setups, respectively, while ADADelta was the worst performer, especially with the deeper CNN architectural setup.