Performance Evaluation of Classification Algorithms for Parkinson’s Disease Diagnosis: A Comparative Study

Dhiraj  Baruah; Rizwan Rehman; Pranjal Kumar Bora; Priyakshi Mahanta; Kankana Dutta; Pinakshi Konwar

doi:10.35882/jeeemi.v7i3.713

Dhiraj Baruah Centre for Computer Science and Applications, Dibrugarh University, Assam, India https://orcid.org/0009-0000-7312-1231
Rizwan Rehman Centre for Computer Science and Applications, Dibrugarh University, Assam, India https://orcid.org/0000-0002-4725-6877
Pranjal Kumar Bora Centre for Computer Science and Applications, Dibrugarh University, Assam, India https://orcid.org/0000-0002-7350-7721
Priyakshi Mahanta Department of Computer Applications, Jorhat Engineering College, Jorhat, India https://orcid.org/0009-0009-2928-9825
Kankana Dutta Centre for Computer Science and Applications, Dibrugarh University, Assam, India https://orcid.org/0000-0001-5888-0029
Pinakshi Konwar Centre for Computer Science and Applications, Dibrugarh University, Assam, India https://orcid.org/0009-0002-1845-0280

DOI: https://doi.org/10.35882/jeeemi.v7i3.713

Keywords: Classification, Decision tree, logistic regression, Parkinson’s disease, random forest, support vector machine (SVM).

Abstract

Selection and implementation of classification algorithms along with proper preprocessing methods are important for the accuracy of predictive models. This paper compares some well-known and frequently used algorithms for classification tasks and performs in depth analysis. In this study we analyzed four most frequently used algorithm viz random forest (RF), decision tree (DT), logistic regression (LR) and support vector machine (SVM). To conduct the study on the well-known Oxford Parkinson’s disease Detection dataset obtained from the UCI Machine Learning Repository. We evaluated the algorithms' performance using six distinct approaches. Firstly, we used the classifiers where we didn’t used any method to enhance the performance of the classifier. Secondly, we applied Principal Component Analysis (PCA) to minimize the dimensionality of the dataset. Thirdly, we used collinearity-based feature elimination (CFE) method where we applied correlation among the features and if the correlation between a pair of features exceeds the threshold of 0.9, we eliminated one from the pair. Fourthly, we adopt synthetic minority oversampling technique (SMOTE) to synthetically increase the instances of the minority class. Fifth, we combined PCA+SMOTE and on sixth method, we combined CFE + SMOTE. The study demonstrates that SVM is highly effective for Parkinson’s disease classification. SVM maintained high accuracy, precision, recall and F1-score across various preprocessing techniques including PCA, CFE and SMOTE, making it robust and reliable for clinical applications. RF showed improved results with SMOTE. However, it experienced reduced performance with PCA and CFE, indicating its dependence on original feature interactions. DT benefited from PCA, while LR showed limited improvements and sensitivity to oversampling. These findings emphasize the importance of selecting appropriate preprocessing techniques to enhance model performance.

Downloads

Download data is not yet available.

References

P. C. Sen, M. Hajra, and M. Ghosh, “Supervised Classification Algorithms in Machine Learning: A Survey and Review,” in Advances in Intelligent Systems and Computing, Springer Verlag, 2020, pp. 99–111. doi: 10.1007/978-981-13-7403-6_11.

W. S. McCulloch and W. Pitts, “A logical calculus of the ideas immanent in nervous activity,” Bull Math Biophys, vol. 5, no. 4, pp. 115–133, 1943, doi: 10.1007/BF02478259.

F. Rosenblatt, “The perceptron: a probabilistic model for information storage and organization in the brain.,” Psychol Rev, vol. 65 6, pp. 386–408, 1958, [Online]. Available: https://api.semanticscholar.org/CorpusID:12781225

M. Minsky and S. A. Papert, Perceptrons: An Introduction to Computational Geometry. The MIT Press, 2017.

J. R. Quinlan, “Induction of decision trees,” Mach Learn, vol. 1, no. 1, pp. 81–106, 1986, doi: 10.1007/BF00116251.

S. L. Salzberg, “C4.5: Programs for Machine Learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., 1993,” Mach Learn, vol. 16, no. 3, pp. 235–240, 1994, doi: 10.1007/BF00993309.

C. Cortes and V. Vapnik, “Support-vector networks,” Mach Learn, vol. 20, no. 3, pp. 273–297, 1995, doi: 10.1007/BF00994018.

M. P. Sesmero, J. A. Iglesias, E. Magán, A. Ledezma, and A. Sanchis, “Impact of the learners diversity and combination method on the generation of heterogeneous classifier ensembles,” Appl Soft Comput, vol. 111, p. 107689, 2021, doi: https://doi.org/10.1016/j.asoc.2021.107689.

Y. Freund and R. E. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” J Comput Syst Sci, vol. 55, no. 1, pp. 119–139, 1997, doi: https://doi.org/10.1006/jcss.1997.1504.

L. Breiman, “Random Forests,” Mach Learn, vol. 45, no. 1, pp. 5–32, 2001, doi: 10.1023/A:1010933404324.

Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015, doi: 10.1038/nature14539.

S. B. Kotsiantis, “Supervised Machine Learning: A Review of Classification Techniques,” 2007.

S. Aich, H.-C. Kim, K. younga, K. L. Hui, A. A. Al-Absi, and M. Sain, “A Supervised Machine Learning Approach using Different Feature Selection Techniques on Voice Datasets for Prediction of Parkinson’s Disease,” in 2019 21st International Conference on Advanced Communication Technology (ICACT), 2019, pp. 1116–1121.doi: 10.23919/ICACT.2019.8701961.

A. U. Haqet al., “Feature Selection Based on L1-Norm Support Vector Machine and Effective Recognition System for Parkinson’s Disease Using Voice Recordings,” IEEE Access, vol. 7, pp. 37718–37734, 2019, doi: 10.1109/ACCESS.2019.2906350.

S. A. Mostafa et al., “Examining multiple feature evaluation and classification methods for improving the diagnosis of Parkinson’s disease,” Cogn Syst Res, vol. 54, pp. 90–99, 2019, doi: https://doi.org/10.1016/j.cogsys.2018.12.004.

K. Polat and M. Nour, “Parkinson disease classification using one against all based data sampling with the acoustic features from the speech signals,” Med Hypotheses, vol. 140, p. 109678, 2020, doi: https://doi.org/10.1016/j.mehy.2020.109678.

S. Sharanyaa, P. N. Renjith, and K. Ramesh, “Classification of Parkinson’s Disease using Speech Attributes with Parametric and Nonparametric Machine Learning Techniques,” in 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS), 2020, pp. 437–442. doi: 10.1109/ICISS49785.2020.9316078.

Z. KarapinarSenturk, “Early diagnosis of Parkinson’s disease using machine learning algorithms,” Med Hypotheses, vol. 138, p. 109603, 2020, doi: https://doi.org/10.1016/j.mehy.2020.109603.

S. A. Syed, M. Rashid, S. Hussain, A. Imtiaz, H. Abid, and H. Zahid, “Inter classifier comparison to detect voice pathologies,” Mathematical Biosciences and Engineering, vol. 18, no. 3, pp. 2258–2273, 2021, doi: 10.3934/mbe.2021114.

E. and O. A. and M. A. and V. J. Martínez David and Lleida, “Voice Pathology Detection on the Saarbrücken Voice Database with Calibration and Fusion of Scores Using MultiFocal Toolkit,” in Advances in Speech and Language Technologies for Iberian Languages, A. and G. R. J. and H. G. L. and S. S. H. R. and R. C. D. Torre Toledano Doroteo and Ortega Giménez Alfonso and Teixeira, Ed., Berlin, Heidelberg: Springer Berlin Heidelberg, 2012, pp. 99–109.

Y. C. Tai, P. G. Bryan, F. Loayza, and E. Peláez, “A voice analysis approach for recognizing Parkinson’s disease patterns,” IFAC-PapersOnLine, vol. 54, no. 15, pp. 382–387, 2021, doi: https://doi.org/10.1016/j.ifacol.2021.10.286.

B. M. Bot et al., “The mPower study, Parkinson disease mobile data collected using ResearchKit,” Sci Data, vol. 3, no. 1, p. 160011, 2016, doi: 10.1038/sdata.2016.11.

A. Toye and S. Kompalli, Comparative Study of Speech Analysis Methods to Predict Parkinson’s Disease. 2021. doi: 10.48550/arXiv.2111.10207.

A. Govindu and S. Palwe, “Early detection of Parkinson’s disease using machine learning,” Procedia Comput Sci, vol. 218, pp. 249–261, 2023, doi: https://doi.org/10.1016/j.procs.2023.01.007.

R. Alshammri, G. Alharbi, E. Alharbi, and I. Almubark, “Machine learning approaches to identify Parkinson’s disease using voice signal features,” Front ArtifIntell, vol. 6, 2023, doi: 10.3389/frai.2023.1084001.

K. M. Alalayah, E. M. Senan, H. F. Atlam, I. A. Ahmed, and H. S. A. Shatnawi, “Automatic and Early Detection of Parkinson’s Disease by Analyzing Acoustic Signals Using Classification Algorithms Based on Recursive Feature Elimination Method,” Diagnostics, vol. 13, no. 11, 2023, doi: 10.3390/diagnostics13111924.

S. Dhanalakshmi, S. Das, and R. Senthil, “Speech features-based Parkinson’s disease classification using combined SMOTE-ENN and binary machine learning,” Health Technol (Berl), vol. 14, no. 2, pp. 393–406, 2024, doi: 10.1007/s12553-023-00810-x.

M. Ur Rehman, A. Shafique, Q.-U.-A. Azhar, S. S. Jamal, Y. Gheraibia, and A. B. Usman, “Voice disorder detection using machine learning algorithms: An application in speech and language pathology,” Eng Appl ArtifIntell, vol. 133, p. 108047, 2024, doi: https://doi.org/10.1016/j.engappai.2024.108047.

N. Islam, M. S. A. Turza, S. I. Fahim, and R. M. Rahman, “Single and Multi-modal Analysis for Parkinson’s Disease to Detect Its Underlying Factors,” Human-Centric Intelligent Systems, vol. 4, no. 2, pp. 316–334, 2024, doi: 10.1007/s44230-024-00069-z.

S. Chatterjee and A. Hadi, “Regression Analysis by Example, Fourth Edition,” pp. i–xvi, Apr. 2006, doi: 10.1002/0470055464.fmatter.

J. Cervantes, F. Garcia-Lamont, L. Rodríguez-Mazahua, and A. Lopez, “A comprehensive survey on support vector machine classification: Applications, challenges and trends,” Neurocomputing, vol. 408, pp. 189–215, 2020, doi: https://doi.org/10.1016/j.neucom.2019.10.118.

S. Suthaharan, “Support Vector Machine,” in Machine Learning Models and Algorithms for Big Data Classification: Thinking with Examples for Effective Learning, Boston, MA: Springer US, 2016, pp. 207–235. doi: 10.1007/978-1-4899-7641-3_9.

R. Murty M. N. and Raghava, “Kernel-Based SVM,” in Support Vector Machines and Perceptrons: Learning, Optimization, Classification, and Application to Social Networks, Cham: Springer International Publishing, 2016, pp. 57–67. doi: 10.1007/978-3-319-41063-0_5.

H. P. Bhavsar and M. Panchal, “A Review on Support Vector Machine for Data Classification,” 2012. [Online]. Available: https://api.semanticscholar.org/CorpusID:16365537

Q. and R. K. and T. X. Che Dongsheng and Liu, “Decision Tree and Ensemble Learning Algorithms with Their Applications in Bioinformatics,” in Software Tools and Algorithms for Biological Systems, Q.-N. Arabnia Hamid R. and Tran, Ed., New York, NY: Springer New York, 2011, pp. 191–199. doi: 10.1007/978-1-4419-7046-6_19.

L. Breiman, “Random Forests,” Mach Learn, vol. 45, no. 1, pp. 5–32, 2001, doi: 10.1023/A:1010933404324.

R. and P. V. Parmar Aakash and Katariya, “A Review on Random Forest: An Ensemble Classifier,” in International Conference on Intelligent Data Communication Technologies and Internet of Things (ICICI) 2018, X. and L. P. and B. Z. Hemanth Jude and Fernando, Ed., Cham: Springer International Publishing, 2019, pp. 758–763.

M. A. Little, P. E. McSharry, S. J. Roberts, D. A. E. Costello, and I. M. Moroz, “Exploiting Nonlinear Recurrence and Fractal Scaling Properties for Voice Disorder Detection,” Biomed Eng Online, vol. 6, no. 1, p. 23, 2007, doi: 10.1186/1475-925X-6-23.

D. Krstajic, L. Buturovic, D. Leahy, and S. Thomas, “Cross-validation pitfalls when selecting and assessing regression and classification models,” J Cheminform, vol. 6, p. 10, Mar. 2014, doi: 10.1186/1758-2946-6-10.

I. Jolliffe and J. Cadima, “Principal component analysis: A review and recent developments,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 374, p. 20150202, Apr. 2016, doi: 10.1098/rsta.2015.0202.

J. Zheng and C. Rakovski, “On the Application of Principal Component Analysis to Classification Problems,” Data Sci J, vol. 20, Aug. 2021, doi: 10.5334/dsj-2021-026.

A. Suppaet al., “Voice in Parkinson’s Disease: A Machine Learning Study,” Front Neurol, vol. 13, 2022, doi: 10.3389/fneur.2022.831428.

M. B. Reddy and L. S. S. Reddy, “Dimensionality Reduction: An Empirical Study on the Usability of IFE-CF (Independent Feature Elimination- by C-Correlation and F-Correlation) Measures,” ArXiv, vol. abs/1002.1156, 2010, [Online]. Available: https://api.semanticscholar.org/CorpusID:10545780

M. Hall, “Correlation-Based Feature Selection for Machine Learning,” Department of Computer Science, vol. 19, Jun. 2000.

N. Chawla, K. Bowyer, L. Hall, and W. Kegelmeyer, “SMOTE: Synthetic Minority Over-sampling Technique,” J. Artif. Intell. Res. (JAIR), vol. 16, pp. 321–357, Jun. 2002, doi: 10.1613/jair.953.

T. Kosolwattana, C. Liu, R. Hu, S. Han, H. Chen, and Y. Lin, “A self-inspected adaptive SMOTE algorithm (SASMOTE) for highly imbalanced data classification in healthcare,” BioData Min, vol. 16, no. 1, p. 15, 2023, doi: 10.1186/s13040-023-00330-4.

H. He and E. A. Garcia, “Learning from Imbalanced Data,” Knowledge and Data Engineering, IEEE Transactions on, vol. 21, pp. 1263–1284, Oct. 2009, doi: 10.1109/TKDE.2008.239.

A. Bourouhou, A. Jilbab, C. Nacir, and A. Hammouch, “Comparison of classification methods to detect the Parkinson disease,” in 2016 International Conference on Electrical and Information Technologies (ICEIT), 2016, pp. 421–424. doi: 10.1109/EITech.2016.7519634.

N. Pah, V. Indrawati, and D. Kumar, “Voice-Based SVM Model Reliability for Identifying Parkinson’s Disease,” IEEE Access, vol. PP, p. 1, Sep. 2023, doi: 10.1109/ACCESS.2023.3344464.

J. Rusz, R. Cmejla, H. Ruzickova, and E. Ruzicka, “Quantitative acoustic measurements for characterization of speech and voice disorders in early untreated Parkinson’s disease,” J Acoust Soc Am, vol. 129, no. 1, pp. 350–367, Feb. 2011, doi: 10.1121/1.3514381.

S. Lahmiri and A. Shmuel, “Detection of Parkinson’s disease based on voice patterns ranking and optimized support vector machine,” Biomed Signal Process Control, vol. 49, pp. 427–433, 2019, doi: https://doi.org/10.1016/j.bspc.2018.08.029.

J. Cervantes, F. Garcia-Lamont, L. Rodríguez-Mazahua, and A. Lopez, “A comprehensive survey on support vector machine classification: Applications, challenges and trends,” Neurocomputing, vol. 408, pp. 189–215, 2020, doi: https://doi.org/10.1016/j.neucom.2019.10.118.

M. Wright and I. König, “Splitting on categorical predictors in random forests,” PeerJ, vol. 7, p. e6339, Feb. 2019, doi: 10.7717/peerj.6339.

M. Bramer, “Avoiding Overfitting of Decision Trees,” in Principles of Data Mining, M. Bramer, Ed., London: Springer London, 2016, pp. 121–136. doi: 10.1007/978-1-4471-7307-6_9.

R. van den Goorbergh, M. van Smeden, D. Timmerman, and B. Van Calster, “The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression,” Journal of the American Medical Informatics Association, vol. 29, no. 9, pp. 1525–1534, Sep. 2022, doi: 10.1093/jamia/ocac093.