Optimum, projected, and regularized extreme learning machine methods with singular value decomposition and L$_{2}$-Tikhonov regularization

The theory and implementation of an extreme learning machine (ELM) have proved that it is a simple, efficient, and accurate machine learning methodology. In an ELM, the hidden nodes are randomly initiated and fixed without iterative tuning. However, the optimal hidden layer neuron number ($L_{opt})$ is the key to ELM generalization performance where initializing this number by trial and error is not reasonably satisfied. Optimizing the hidden layer size using the leave-one-out cross validation method is a costly approach. In this paper, a fast and reliable statistical approach called optimum ELM (OELM) was developed to determine the minimum hidden layer size that yields an optimum performance. Another improvement that exploits the advantages of orthogonal projections with singular value decomposition was proposed in order to tackle the problem of randomness and correlated features in the input data. This approach, named projected ELM (PELM), achieves more than 2{\%} advance in average accuracy. The final contribution of this paper was implementing Tikhonov regularization in the form of the L$_{2}$-penalty with ELM (TRELM), which regularizes and improves the matrix computations utilizing the L-curve criterion and SVD. The L-curve, unlike iterative methods, can estimate the optimum regularization parameter by illustrating a curve with few points that represents the tradeoff between minimizing the training error and the residual of output weight. The proposed TRELM was tested in 3 different scenarios of data sizes: small, moderate, and big datasets. Due to the simplicity, robustness, and less time consumption of OELM and PELM, it is recommended to use them with small and even moderate amounts of data. TRELM demonstrated that when enhancing the ELM performance it is necessary to enlarge the size of hidden nodes $(L)$. As a result, in big data, increasing $L$ in TRELM is necessary, which concurrently leads to a better accuracy. Various well-known datasets and state-of-the-art learning approaches were compared with the proposed approaches.