Predicting Construction Costs with Machine Learning: A Comparative Study on Ensemble and Linear Models
Abstract
Accurate prediction of construction costs plays a pivotal role in ensuring successful project delivery, influencing budget formulation, resource allocation, and financial risk management. However, traditional estimation methods often struggle to handle complex, nonlinear relationships inherent in construction datasets. This study proposes a process innovation by systematically evaluating six machine learning (ML) models, i.e., Ridge Regression, Lasso Regression, Elastic Net, K-Nearest Neighbors (KNN), XGBoost, and CatBoost, on a standardized RSMeans dataset comprising 4,477 real-world construction data points. The primary aim is to benchmark the predictive performance, generalizability, and stability of both linear and ensemble models in construction cost forecasting. Each model is subjected to rigorous hyperparameter tuning using grid search with 5-fold cross-validation. Performance is assessed using R² (coefficient of determination), RMSE (root mean squared error), and MBE (mean bias error), while confidence intervals are computed to quantify predictive uncertainty. Results indicate that linear models achieve modest accuracy (R² ≈ 0.83), but struggle to model nonlinear interactions. In contrast, ensemble-based models significantly outperform , i.e., XGBoost and CatBoost achieve R² values of 0.988 and 0.987, respectively, RMSE values below 0.5, and near-zero MBE. Moreover, confidence interval visualization and feature importance analysis provide transparency and interpretability, enhancing the models practical applicability. Unlike prior studies that compare models in isolation, this work introduces a unified, interpretable framework and highlights the trade-offs between accuracy, overfitting, and deployment readiness. The findings have real-world implications for contractors, project managers, and cost engineers seeking reliable, data-driven decision support systems. In summary, this study present a scalable and robust ML-based framework that facilitate process innovation in construction cost estimation, paving the way for more intelligent, efficient, and risk-aware construction project management.
Downloads
References
Y. Wang, et al., “Cost prediction of building projects using the novel hybrid RA-ANN model,” Eng. Constr. Archit. Manag., Jan. 2023.
C. S. Chan, J. Lu, and B. Zhang, “Attaining cost efficiency in constructing sports facilities for Beijing 2008 Olympic Games by use of operations simulation,” in Proc. Winter Simulation Conf., Dec. 2006.
W. Jennings, “Why costs overrun: risk, optimism and uncertainty in budgeting for the London 2012 Olympic Games,” Constr. Manag. Econ., vol. 30, no. 6, pp. 455–462, Jun. 2012.
D. Blomberg, P. Cotellesso, W. Sitzabee, and A. E. Thal, “Discovery of internal and external factors causing military construction cost premiums,” J. Constr. Eng. Manag., vol. 140, no. 3, pp. 04013060, Mar. 2014.
S. Ahn, S. Shokri, S. Lee, C. T. Haas, and R. C. G. Haas, “Effectiveness of interface-management practices in large-scale construction projects,” J. Manag. Eng., vol. 33, no. 2, pp. 04016039, Mar. 2017.
O. Swei, J. Gregory, and R. Kirchain, “Construction cost estimation: a parametric approach for better estimates of expected cost and variation,” Transp. Res. Part B Methodol., vol. 101, pp. 295–305, Jul. 2017.
H. H. Elmousalami, “Artificial intelligence and parametric construction cost estimate modeling: State-of-the-art review,” J. Constr. Eng. Manag., vol. 146, no. 1, pp. 03119008, Jan. 2020.
S.-W. Yang, S.-W. Moon, H. Jang, S. Choo, and S.-A. Kim, “Parametric method and building information modeling-based cost estimation model for construction cost prediction in architectural planning,” Appl. Sci., vol. 12, no. 19, pp. 9553, Sep. 2022.
L. H., C. L, and Z. R, “Research on project cost management under the mode of bill of quantities valuation,” Int. J. Front. Eng. Technol., vol. 4, no. 2, 2022.
H. Al‐Tabtabai, N. Kartam, I. Flood, and A. P. Alex, "Expert judgment in forecasting construction project completion,” Eng. Constr. Archit. Manag., vol. 4, no. 4, pp. 271–293, Apr. 1997.
S. M. AbouRizk, G. M. Babey, and G. Karumanasseri, “Estimating the cost of capital projects: an empirical study of accuracy levels for municipal government projects,” Can. J. Civ. Eng., vol. 29, no. 5, pp. 653–661, Oct. 2002.
S. S. Khanal, P. W. C. Prasad, A. Alsadoon, and A. Maag, “A systematic review: machine learning based recommendation systems for e-learning,” Educ. Inf. Technol., Dec. 2019.
K. Rasheed, A. Qayyum, M. Ghaly, A. Al-Fuqaha, A. Razi, and J. Qadir, “Explainable, trustworthy, and ethical machine learning for healthcare: A survey,” Comput. Biol. Med., vol. 149, no. 106043, p. 106043, Oct. 2022.
A. Chlingaryan, S. Sukkarieh, and B. Whelan, “Machine learning approaches for crop yield prediction and nitrogen status estimation in precision agriculture: A review,” Comput. Electron. Agric., vol. 151, pp. 61–69, Aug. 2018.
M. J. Esfandiari and G. S. Urgessa, “Progressive collapse design of reinforced concrete frames using structural optimization and machine learning,” Structures, vol. 28, pp. 1252–1264, Dec. 2020.
M. Flah, I. Nunez, W. Ben Chaabene, and M. L. Nehdi, “Machine learning algorithms in civil structural health monitoring: A systematic review,” Arch. Comput. Methods Eng., vol. 28, no. 4, pp. 2621–2643, Jul. 2020.
H. G. Melhem and Y. Cheng, “Prediction of remaining service life of bridge decks using machine learning,” J. Comput. Civ. Eng., vol. 17, no. 1, pp. 1–9, Jan. 2003.
C. Chen, Y. Zuo, W. Ye, X. Li, Z. Deng, and S. P. Ong, “A critical review of machine learning of energy materials,” Adv. Energy Mater., vol. 10, no. 8, pp. 1903242, Jan. 2020.
P. Davis, F. Aziz, M. T. Newaz, W. Sher, and L. Simon, “The classification of construction waste material using a deep convolutional neural network,” Autom. Constr., vol. 122, pp. 103481, Feb. 2021.
C.-H. Huang and S.-H. Hsieh, “Predicting BIM labor cost with random forest and simple linear regression,” Autom. Constr., vol. 118, pp. 103280, Oct. 2020.
G.-H. Kim, J.-E. Yoon, S.-H. An, H.-H. Cho, and K.-I. Kang, “Neural network model incorporating a genetic algorithm in estimating construction costs,” Build. Environ., vol. 39, no. 11, pp. 1333–1340, Nov. 2004.
C. Hai, “Construction and application of multiple linear regression model for construction project cost,” in Int. Conf. Advanc. Enterp. Inf. Syst., Jun. 2021.
George Harrison Coffie and F. Cudjoe, “Using extreme gradient boosting (XGBoost) machine learning to predict construction cost overruns,” Int. J. Constr. Manag., pp. 1–9, Dec. 2023.
D. J. Lowe, M. W. Emsley, and A. Harding, “Predicting construction cost using multiple regression techniques,” J. Constr. Eng. Manag., vol. 132, no. 7, pp. 750–758, Jul. 2006.
R. Jafarzadeh, J. M. Ingham, K. Q. Walsh, N. Hassani, and G. R. Ghodrati Amiri, “Using statistical regression analysis to establish construction cost models for seismic retrofit of confined masonry buildings,” J. Constr. Eng. Manag., vol. 141, no. 5, pp. 04014098, May 2015.
R. Martin Skitmore and S. Thomas Ng, “Forecast models for actual construction time and cost,” Build. Environ., vol. 38, no. 8, pp. 1075–1083, Aug. 2003.
M. W. Emsley, D. J. Lowe, A. R. Duff, A. Harding, and A. Hickson, “Data modelling and the application of a neural network approach to the prediction of total construction costs,” Constr. Manag. Econ., vol. 20, no. 6, pp. 465–472, Sep. 2002.
S. M. Shahandashti and B. Ashuri, “Highway Construction Cost Forecasting Using Vector Error Correction Models,” J. Manag. Eng., vol. 32, no. 2, p. 04015040, Mar. 2016.
S. Petruseva, V. Z. Pancovska, V. Zujo and A. Brkan-Vejzovic, “Construction costs forecasting: comparison of the accuracy of linear regression and support vector machine models,” Tech. Vjesn., vol. 24, no. 5, Oct. 2017.
C.-H. Huang and S.-H. Hsieh, “Predicting BIM labor cost with random forest and simple linear regression,” Autom. Constr., vol. 118, p. 103280, Oct. 2020.
G. H. Kim, D. Seo, and K.-I. Kang, “Hybrid models of neural networks and genetic algorithms for predicting preliminary cost estimates,” J. Comput. Civ. Eng., vol. 19, no. 2, pp. 208–211, Apr. 2005.
M.-Y. Cheng, N.-D. Hoang, and Y.-W. Wu, “Hybrid intelligence approach based on LS-SVM and differential evolution for construction cost index estimation: A Taiwan case study,” Autom. Constr., vol. 35, pp. 306–313, Nov. 2013.
O. Alshboul, A. Shehadeh, G. Almasabha, R. E. A. Mamlook, and A. S. Almuflih, “Evaluating the impact of external support on green building construction cost: A hybrid mathematical and machine learning prediction approach,” Buildings, vol. 12, no. 8, p. 1256, Aug. 2022.
C. Zhang, J. Zhu, T. Shi, and X. Li, “Influence line estimation of bridge based on elastic net and vehicle-induced response,” Meas., vol. 202, pp. 111883–111883, Oct. 2022.
A. Shehadeh, O. Alshboul, R. E. Al Mamlook, and O. Hamedat, “Machine learning models for predicting the residual value of heavy construction equipment: An evaluation of modified decision tree, LightGBM, and XGBoost regression,” Autom. Constr., vol. 129, p. 103827, Sep. 2021.
N. Simić, N. Ivanišević, Đ. Nedeljković, A. Senić, Z. Stojadinović, and M. Ivanović, “Early highway construction cost estimation: Selection of key cost drivers,” Sustainability, vol. 15, no. 6, p. 5584, Mar. 2023.
O. Alshboul, A. Shehadeh, G. Almasabha, and A. S. Almuflih, “Extreme gradient boosting-based machine learning approach for green building cost prediction,” Sustainability, vol. 14, no. 11, p. 6651, May 2022.
G.-H. Kim, J.-M. Shin, S. Kim, and Y. Shin, “Comparison of school building construction costs estimation methods using regression analysis, neural network, and support vector machine,” J. Build. Constr. Plan. Res., vol. 01, no. 01, pp. 1–7, Mar. 2013.
M.-Y. Cheng and N.-D. Hoang, “Interval Estimation of Construction Cost at Completion Using Least Squares Support Vector Machine,” ,” J. Civ. Eng. Manag., vol. 20, no. 2, pp. 223–236, Mar. 2014.
S. Yun, “Performance Analysis of Construction Cost Prediction Using Neural Network for Multioutput Regression,” Appl. Sci., vol. 12, no. 19, p. 9592, Sep. 2022.
A. E. Hoerl and R. W. Kennard, “Ridge Regression: Biased Estimation for Nonorthogonal Problems,” Technometrics, vol. 12, no. 1, pp. 55–67, Feb. 1970, doi: https://doi.org/10.1080/00401706.1970.10488634
R. Tibshirani, “Regression Shrinkage and Selection Via the Lasso,” Journal of the Royal Statistical Society: Series B (Methodological), vol. 58, no. 1, pp. 267–288, Jan. 1996, doi: https://doi.org/10.1111/j.2517-6161.1996.tb02080.x
H. Zou and T. Hastie, “Regularization and variable selection via the elastic net,” Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol. 67, no. 2, pp. 301–320, Apr. 2005, doi: https://doi.org/10.1111/j.1467-9868.2005.00503.x
F. VALAFAR, “Pattern Recognition Techniques in Microarray Data Analysis,” Annals of the New York Academy of Sciences, vol. 980, no. 1, pp. 41–64, Dec. 2002, doi: https://doi.org/10.1111/j.1749-6632.2002.tb04888.x
T. Chen and C. Guestrin, “XGBoost: a Scalable Tree Boosting System,” Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16, vol. 1, no. 1, pp. 785–794, Aug. 2016, doi: https://doi.org/10.1145/2939672.2939785
Liudmila Ostroumova Prokhorenkova, Gleb Gusev, Aleksandr Vorobev, Anna Veronika Dorogush, and Andrey Gulin, “CatBoost: unbiased boosting with categorical features,” arXiv (Cornell University), Jun. 2017, doi: https://doi.org/10.48550/arxiv.1706.09516
D. Chakraborty, H. Elhegazy, H. Elzarka, and L. Gutierrez, “A novel construction cost prediction model using hybrid natural and light gradient boosting,” Advanced Engineering Informatics, vol. 46, p. 101201, Oct. 2020, doi: https://doi.org/10.1016/j.aei.2020.101201
Copyright (c) 2025 Lifei Chen, Sew Sun Tiang, Kim Soon Chong, Abhishek Sharma, Tarek Berghout, Wei Hong Lim

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Authors who publish with this journal agree to the following terms:
- Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-ShareAlikel 4.0 International (CC BY-SA 4.0) that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
- Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
- Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).