References
- Andrei, N. (2008). An unconstrained optimization test functions collection. Advanced Modeling and Optimization, 10(1), 147–161.
- Anthony, M., & Bartlett, P. L. (2009). Neural network learning: Theoretical foundations. Cambridge University Press.
- Armijo, L. (1966). Minimization of functions having Lipschitz continuous first partial derivatives. Pacific Journal of Mathematics, 16(1), 1–3. https://doi.org/10.2140/pjm
- Berger, J., & He, X. (2019). Statistics at a crossroads: Challenges and opportunities in the data science era. https://hub.ki/groups/statscrossroad/overview.
- Bishop, C. M. (Ed.). (2006). Pattern recognition and machine learning. Springer.
- Blum, A. L., & Rivest, R. L. (1992). Training a 3-node neural network is NP-complete. Neural Networks, 5(1), 117–127. https://doi.org/10.1016/S0893-6080(05)80010-3
- Borchers, H. W. (2019). pracma: Practical numerical math functions [Computer software manual]. https://CRAN.R-project.org/package=pracma (R package version 2.2.9).
- Chen, K. K. (2016). The upper bound on knots in neural networks. arXiv:1611.09448.
- Churchland, P. S., & Sejnowski, T. J. (2016). The computational brain. The MIT Press.
- Deng, Q., Cheng, Y., & Lan, G. (2018). Optimal adaptive and accelerated stochastic gradient descent. arXiv:1810.00553.
- Flach, P. (2012). Machine learning: The art and science of algorithms that make sense of data. Cambridge University Press.
- Fritsch, S., Guenther, F., Wright, M. N., Suling, M., & Mueller, S. M. (2019). Neuralnet: Training of neural networks [Computer software manual]. https://CRAN.R-project.org/package=neuralnet (R package version 1.44.2).
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
- Gordon, M. (2018). My attempt to understand the backpropagation algorithm for training neural networks (Tech. Rep.). https://www.cl.cam.ac.uk/archive/mjcg/plans/Backpropagation.pdf.
- Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer.
- Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical learning with sparsity: The lasso and generalizations. CRC Press.
- Hebb, D. O. (1949). The organization of behavior: A neuropsychological theory. John Wiley & Sons Inc.
- Hinton, G. E., Osindero, S., & Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18(7), 1527–1554. https://doi.org/10.1162/neco.2006.18.7.1527
- Hinton, G. E., & Salakhutdinov, R. R. (2006). Reducing the dimensionality of data with neural networks. Science, 313(5786), 504–507. https://doi.org/10.1126/science.1127647
- Hornik, K. (1991). Approximation capabilities of multilayer feedforward networks. Neural Networks, 4(2), 251–257. https://doi.org/10.1016/0893-6080(91)90009-T
- James, G., Witten, D., Hastie, T., & Tibshirani, R. (2015). An introduction to statistical learning: With applications in R. Springer.
- Karatzoglou, A., & Meyer, D. (2006). Support vector machines in R. Journal of Statistical Software, 15(9), 1–28. https://doi.org/10.18637/jss.v015.i09
- Kawaguchi, K. (2016). Deep learning without poor local minima. arXiv:1605.07110.
- Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). Imagenet classification with deep convolutional neural networks. Communications of the ACM.
- Lan, G. (2009). Convex optimization under inexact first-order information [Unpublished doctoral dissertation].
- Lan, G. (2012). An optimal method for stochastic composite optimization. Mathematical Programming, 133(1), 365–397. https://doi.org/10.1007/s10107-010-0434-y
- Le, Q. V. (2015). A tutorial on deep learning (Research Report No. 06.3). Google Brain, Google Inc.
- Lisa, L. (2015). Deep learning tutorial release 0.1. http://deeplearning.net/tutorial/contents.html.
- Magoulas, G. D., Vrahatis, M. N., & Androulakis, G. S. (1997). Effective backpropagation training with variable stepsize. Neural Networks, 10(1), 69–82. https://doi.org/10.1016/S0893-6080(96)00052-4
- Matloff, N. (2017). Statistical regression and classification: From linear models to machine learning. CRC Press.
- Mcculloch, W. S., & Pitts, W. (1990). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biology, 52(1–2), 99–115. https://doi.org/10.1016/S0092-8240(05)80006-0
- Meyer, D., Dimitriadou, E., & Hornik, K. (2019). e1071: Misc Functions of the Department of Statistics, Probability Theory Group [Computer software manual]. https://CRAN.R-project.org/package=e1071 (R package version 1.7-3).
- Mitliagkas, I. (2018). Theoretical principles for deep learning (Tech. Rep.). University of Montreal.
- Moreira, M., & Fiesler, E. (1995). Neural networks with adaptive learning rate and momentum terms (Tech. Rep.). IDIAP.
- Murphy, K. P. (2012). Machine learning: A probabilistic perspective. MIT Press.
- Nesterov, Y. (1983). A method of solving a convex programming problem with convergence rate O(1k2) A method of solving a convex programming problem with convergence rate . Doklady Akademii Nauk SSSR, 27(2), 372–376.
- Nesterov, Y. (2004). Introductory lectures on convex optimization: A basic course. Kluwer Academic Publishers.
- Nielsen, M. A. (2019). Neural networks and deep learning. http://neuralnetworksanddeeplearning.com/index.html.
- Petalas, Y. G., & Vrahatis, M. N. (2004). Parallel tangent methods with variable stepsize. 2004 IEEE international joint conference on neural networks (IEEE Cat. No. 04ch37541) (Vol. 2, pp. 1063–1066). IEEE.
- Polyak, B. T. (1964). Some methods of speeding up the convergence of iteration methods. USSR Computational Mathematics and Mathematical Physics, 4(5), 1–17. https://doi.org/10.1016/0041-5553(64)90137-5
- Prechelt, L. (1994). Proben1: A set of neural network benchmark problems and benchmarking rules (Tech. Rep.). University Karlsruhe.
- Press, W. H., Teukolsky, S. A., Vetterling, W. T., & Flannery, B. P. (1992). Numerical recipes in C. Cambridge University Press.
- Qian, N. (1999). On the momentum term in gradient descent learning algorithms. Neural Networks, 12(1), 145–151. https://doi.org/10.1016/S0893-6080(98)00116-6
- R Core Team (2020). R: A language and environment for statistical computing [Computer software manual]. https://www.R-project.org/
- Ripley, B., & Venables, W. (2020). NNET: Feed-forward neural networks and multinomial log-linear models [Computer software manual]. https://CRAN.R-project.org/package=nnet (R Package Version 7.3-13).
- Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by backpropagating errors. Nature, 323, 533–536. https://doi.org/10.1038/323533a0
- Rumelhart, D. E., & McClelland, J. L. (1986). Parallel distributed processing. MIT Press.
- Schonberger, V. M., & Cukier, K. (2013). Big data: A revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt.
- Shah, B. V., Buehler, R. J., & Kempthorne, O. (1964). Some algorithms for minimizing a function of several variables. Journal of the Society for Industrial and Applied Mathematics, 12(1), 74–92. https://doi.org/10.1137/0112007
- Stanford, C. C. (2019). Cs231n: Convolutional neural networks for visual recognition. Stanford University. http://cs231n.github.io.
- Wickham, H. (2014). Advanced R. Chapman & Hall/CRC.
- Wickham, H. (2016). GGPLOT2: Elegant graphics for data analysis. Springer.
- Yin, W. (2015). Optimization: Gradient methods (Tech. Rep.). Department of Mathematics, University of California.