Abstract—This study aims to build a predictive model for default in LendingClub using Artificial Neural Networks, and to compare its performance to the Logistic Regression model. The dataset was downloaded from LendingClub on Kaggle and the files contain complete loan data for all loans issued from 2007-2015, including the current loan status (Current, Late, Fully Paid, etc.) and latest payment information. There were 1,147 defaults out of 201,350 transactions. The dataset is highly unbalanced, and the positive class (defaults) accounts for 0.570% of all transactions. The records were randomly assigned into one of two groups: training sample and testing sample. We used the two models to predict the risk of default in LendingClub in the testing sample. Receiver operating characteristics (ROCs) were calculated and compared for these two models and a curve measuring predicted probability versus observed probability was plotted to demonstrate the calibration measure for these two models. A ROC of 0.73 in the training sample showed that the Logistic Regression clearly performed better. In the testing sample, the ROC was 0.75 for the Logistic Regression and 0.66 for the Artificial Neural Network. When compared to the Artificial Neural Network model, Logistic Regression had a better discriminating capability and was a better model in estimating credit defaults.
Index Terms—Artificial neural network, default in peer-to-peer lending, logistic regression, predictors.
Jiaying Sun is with Miss Porter's School, USA / Ivy Analytics LLC, United States (e-mail: jsun22@missporters.org).
[PDF]
Cite:Jiaying Sun, "Data Mining Techniques to Predict Default in LendingClub," Journal of Economics, Business and Management vol. 10, no. 1, pp. 60-64, 2022.
Copyright © 2022 by the authors. This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).