Analisis Prediksi Customer Churn pada Sektor E-Commerce Berdasarkan Perilaku Transaksi Menggunakan Pendekatan Machine Learning

Authors

  • Nadeerah Hani’ Fauziyyah Institut Bisnis dan Teknologi Indonesia
  • I Wayan Sudiarsa Institut Bisnis dan Teknologi Indonesia
  • Ida Ayu Eka Sastradewi Institut Bisnis dan Teknologi Indonesia
  • Kadek Agustine Yueyin Parisya Institut Bisnis dan Teknologi Indonesia
  • Sartika Sartika Institut Bisnis dan Teknologi Indonesia

DOI:

https://doi.org/10.61132/jumbidter.v3i1.1228

Keywords:

Customer Churn, E-Commerce, Machine Learning, Random Forest, Transaction Behavior

Abstract

Because it directly impacts revenue, customer loyalty, and long-term business sustainability, customer churn is a critical issue for the e-commerce industry. High churn rates indicate that a business is unable to retain existing customers, which means it is more expensive to acquire new customers. Therefore, a precise analytical approach is needed to identify customer behavior patterns that are likely to churn. Using machine learning methods, this study analyzes and predicts customer churn. For this study, the E-Commerce Customer Churn 2025 dataset, obtained from Kaggle, was used. This dataset consists of 10,000 customer data and contains fifteen variables covering transaction behavior, customer characteristics, and churn status. Data preprocessing, descriptive analysis, exploratory data analysis (EDA), and classification model development using Logistic Regression and Random Forest algorithms were part of the research project. Model evaluation was conducted using a Confusion Matrix and Receiver Operating Characteristic (ROC) Curve to evaluate the model's accuracy and ability to distinguish between churned and non-churned customers. The results showed that the Random Forest model performed better than Logistic Regression, with an ROC-AUC of 1.00. Furthermore, feature importance analysis revealed that the days_since_last_purchase variable was the most dominant factor in predicting customer churn. These findings are expected to help e-commerce companies design more effective, data-driven customer retention strategies.

 

Downloads

Download data is not yet available.

References

Bhattacherjee, A. (2001). Satisfaction, repurchase intent, and repurchase behavior: Investigating the moderating effect of customer characteristics. Journal of Marketing Research, 38(1), 131–142. https://doi.org/10.1509/jmkr.38.1.131.18832

Chen, J. S., & Tsou, H. T. (2016). Creating enduring customer value. Journal of Marketing, 80(6), 36–68. https://doi.org/10.1509/jm.15.0414

Coussement, K., & Van den Poel, D. (2008). Customer lifetime value measurement. Management Science, 54(1), 100–112. https://doi.org/10.1287/mnsc.1070.0746

Fader, P. S., Hardie, B. G. S., & Lee, K. L. (2005). Modeling customer lifetime value. Journal of Service Research, 9(2), 139–155. https://doi.org/10.1177/1094670506293810

Friedman, J. H., Hastie, T., & Tibshirani, R. (2009). Application of data mining techniques in customer relationship management: A literature review and classification. Expert Systems with Applications, 36(2), 2592–2602. https://doi.org/10.1016/j.eswa.2008.02.021

Huang, B., Kechadi, M. T., & Buckley, B. (2012). Customer churn prediction in telecommunications. Journal of Big Data, 6(1), Article 191. https://doi.org/10.1186/s40537-019-0191-6

Larivière, B., Keiningham, T. L., Cooil, B., Aksoy, L., & Malthouse, E. C. (2016). Modeling customer lifetime value. Journal of Service Research, 9(2), 139–155. https://doi.org/10.1177/1094670506293810

Lemmens, A., & Croux, C. (2006). Bagging and boosting classification trees to predict churn. Journal of Marketing Research, 43(2), 276–286.

Ngai, E. W. T., Xiu, L., & Chau, D. C. K. (2009). Application of data mining techniques in customer relationship management: A literature review and classification. Expert Systems with Applications, 36(2), 2592–2602. https://doi.org/10.1016/j.eswa.2008.02.021

Risselada, H., Verhoef, P. C., & Bijmolt, T. H. A. (2010). Staying power of churn prediction models. Journal of Interactive Marketing, 24(3), 198–208.

Shah, D., Kumar, V., Kim, K. H., & Choi, J. (2016). Managing customer profitability: A dynamic perspective. Journal of Marketing, 80(6), 36–68. https://doi.org/10.1509/jm.15.0414

Tsai, C. F., & Chen, M. Y. (2011). Predicting disease risks from highly imbalanced data using random forest. BMC Medical Informatics and Decision Making, 11(1), Article 51. https://doi.org/10.1186/1472-6947-11-51

Verhoef, P. C. (2003). Understanding the effect of customer relationship management efforts on customer retention and customer share development. Journal of Marketing, 67(4), 30–45. https://doi.org/10.1509/jmkg.67.4.30.18685

Yang, X., Wu, L., Zhou, S., & Gao, Z. (2019). A churn prediction model using random forest: Analysis of machine learning techniques for churn prediction and factor identification in telecom sector. IEEE Access, 7, 60134–60149. https://doi.org/10.1109/ACCESS.2019.2914999

Zhang, P., Li, N., & Sun, Y. (2004). An empirical study on predicting user acceptance of e-shopping on the web. Information & Management, 41(3), 351–368. https://doi.org/10.1016/S0378-7206(03)00079-X

Zhao, Y., Li, Y., & Wang, J. (2021). Integrated churn prediction and customer segmentation framework for telco business. IEEE Access, 9, 62118–62136. https://doi.org/10.1109/ACCESS.2021.3073776

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324

Downloads

Published

2026-01-27