INTEGRATED APPROACH TO THE ANALYSIS OF INVESTMENTS IN DIGITAL INFRASTRUCTURE BASED ON CLUSTERING AND REGRESSION MODELS
DOI: 10.31673/2409-7292.2025.041210
DOI:
https://doi.org/10.31673/2409-7292.2025.041210Abstract
This article proposes an integrated approach to analysing investments in digital infrastructure based on the combined
use of clustering algorithms and regression modelling. The relevance of the study is driven by the increasing importance of
digital infrastructure in regional economic competitiveness and the need to optimize investment decisions under resource
constraints. The methodology applies the K-Means and DBSCAN algorithms to segment regions according to their levels of
digital maturity, IT employment, internet penetration, and the availability of digital public services. For each identified cluster,
a separate multiple linear regression model is constructed to quantify the impact of key digital indicators on investment volumes.
Model performance is evaluated using R², RMSE, and MAE metrics with the application of k-fold cross-validation to ensure
robustness. The integration of clustering results with regression analysis enables the development of an analytical assessment
matrix that compares actual and predicted investment levels and identifies priority areas for investment policy. The proposed
approach enhances the evidence-based nature of managerial decision-making, allows the identification of efficient and inefficient regions, and supports the formulation of recommendations for optimising the allocation of resources in digital
infrastructure. The findings may be applied by government institutions, analytical centres, and ICT enterprises for strategic
planning and forecasting within the framework of digital transformation initiatives.
Keywords: digital infrastructure; investments; clustering; K-Means; DBSCAN; regression analysis; digital economy;
machine learning.
References
1. Brynjolfsson E., McAfee A. The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant
Technologies. New York: W.W. Norton & Company, 2014. 306 p. URL: https://wwnorton.com/books/the-secondmachine-age/
2. Goldfarb A., Tucker C. Digital Economics // Journal of Economic Literature. 2019. Vol. 57, No. 1. P. 3-43.
DOI: 10.1257/jel.20171452
3. Montgomery D. C., Peck E. A., Vining G. G. Introduction to Linear Regression Analysis. 6th ed. Hoboken:
Wiley, 2021. 704 p. URL: https: // www.wiley.com /en-us/Introduction+to+Linear+Regression+Analysis%2C+6th+
Edition-p-9781119578727
4. Jain A. K. Data Clustering: 50 Years Beyond K-Means // Pattern Recognition Letters. 2010. Vol. 31, No. 8. P.
651-666. DOI: 10.1016/j.patrec.2009.09.011
5. Ester M., Kriegel H.-P., Sander J., Xu X. A Density-Based Algorithm for Discovering Clusters in Large Spatial
Databases with Noise // Proceedings of the Second International Conference on Knowledge Discovery and Data Mining
(KDD’96). Portland, 1996. P. 226-231. URL: https://file.biolab.si/papers/1996-DBSCAN-KDD.pdf.
6. Галузов С. Ю., Бондарчук А. П., Бажан Т. О., Корецька В. О. Застосування методів Data Science для
прогнозування попиту в ритейлі // Телекомунікаційні та інформаційні технології. 2023. № 3. С. 56-62. DOI:
10.31673/2412-4338.2023.035965.
7. Бажан Т. О. Порівняльний аналіз методів машинного навчання для побудови прогнозів // Сучасний
захист інформації.2024.Т. 4 (60). С. 125-130. DOI: 10.31673/2409-7292.2024.040013.
8. Бажан Т. О. Інструменти очищення даних для прогнозування інвестицій за допомогою машинного
навчання // Інновації: Збірник наукових праць Державного університету інформаційно-комунікаційних
технологій. Київ: ДУІКТ, 2024. С. 45-50. URL: https://duikt.edu.ua/uploads/p_2779_56719466.pdf.