Deep Learning-Based Lung Sound Classification Using Mel-Spectrogram Features for Early Detection of Respiratory Diseases

Keywords: Lung Sound, Feature Extraction, Mel-Spectrogram, Audio Classification, Convolutional Neural Network

Abstract

Respiratory diseases such as asthma, chronic obstructive pulmonary disease, and pneumonia remain among the leading causes of death globally. Traditional diagnostic approaches, including auscultation, rely heavily on the subjective expertise of medical practitioners and the quality of the instruments used. Recent advancements in artificial intelligence offer promising alternatives for automated lung sound analysis. However, audio is an unstructured data format that must be converted into a suitable format for AI algorithms. Another significant challenge lies in the imbalanced class distribution within available datasets, which can adversely affect classification performance and model reliability. This study applied several comprehensive preprocessing techniques, including random undersampling to address data imbalance, resampling audio at 4000 Hz for standardization, and standardizing audio duration to 2.7 seconds for consistency. Feature extraction was then performed using the Mel Spectrogram method, converting audio signals into image representations to serve as input for classification algorithms based on deep learning architectures.  To determine optimal performance characteristics, various Convolutional Neural Network (CNN) architectures were systematically evaluated, including LeNet-5, AlexNet, VGG-16, VGG-19, ResNet-50, and ResNet-152. VGG-16 achieved the highest classification accuracy of the tested models at 75.5%, demonstrating superior performance in respiratory sound classification tasks. This study demonstrates the potential of AI-based lung sound classification systems as a complementary diagnostic tool for healthcare professionals and the general public in supporting early identification of respiratory abnormalities and diseases. The findings suggest that automated lung sound analysis could enhance diagnostic accessibility and provide more valuable support for clinical decision-making in respiratory healthcare applications

Downloads

Download data is not yet available.

References

L. Brunese, F. Mercaldo, A. Reginelli, and A. Santone, “A Neural Network-Based Method for Respiratory Sound Analysis and Lung Disease Detection,” Applied Sciences 2022, Vol. 12, Page 3877, vol. 12, no. 8, p. 3877, Apr. 2022, doi: 10.3390/APP12083877.

J. Saldanha, S. Chakraborty, S. Patil, K. Kotecha, S. Kumar, and A. Nayyar, “Data augmentation using Variational Autoencoders for improvement of respiratory disease classification,” PLoS One, vol. 17, no. 8, p. e0266467, Aug. 2022, doi: 10.1371/JOURNAL.PONE.0266467.

D. M. Huang, J. Huang, K. Qiao, N. S. Zhong, H. Z. Lu, and W. J. Wang, “Deep learning-based lung sound analysis for intelligent stethoscope,” Military Medical Research 2023 10:1, vol. 10, no. 1, pp. 1–23, Sep. 2023, doi: 10.1186/S40779-023-00479-3.

L. Pham, H. Phan, R. Palaniappan, A. Mertins, and I. McLoughlin, “CNN-MoE Based Framework for Classification of Respiratory Anomalies and Lung Disease Detection,” IEEE J Biomed Health Inform, vol. 25, no. 8, pp. 2938–2947, Aug. 2021, doi: 10.1109/JBHI.2021.3064237.

A. M. Alqudah, S. Qazan, and Y. M. Obeidat, “Deep learning models for detecting respiratory pathologies from raw lung auscultation sounds,” Soft comput, vol. 26, no. 24, pp. 13405–13429, Dec. 2022, doi: 10.1007/S00500-022-07499-6/TABLES/4.

Q. Zhang et al., “SPRSound: Open-Source SJTU Paediatric Respiratory Sound Database,” IEEE Trans Biomed Circuits Syst, vol. 16, no. 5, pp. 867–881, Oct. 2022, doi: 10.1109/TBCAS.2022.3204910.

F. Wang, X. Yuan, Y. Liu, and C. T. Lam, “LungNeXt: A novel lightweight network utilizing enhanced mel-spectrogram for lung sound classification,” Journal of King Saud University - Computer and Information Sciences, vol. 36, no. 8, p. 102200, Oct. 2024, doi: 10.1016/J.JKSUCI.2024.102200.

H. Gulzar, J. Li, A. Manzoor, S. Rehmat, U. Amjad, and H. J. Khan, “DETECTION OF CRACKLES AND WHEEZES IN LUNG SOUND USING TRANSFER LEARNING,” Health Informatics - An International Journal, vol. 12, no. 2, pp. 01–14, May 2023, doi: 10.5121/HIIJ.2023.12201.

S. Carvalho and E. F. Gomes, “Automatic Classification of Bird Sounds: Using MFCC and Mel Spectrogram Features with Deep Learning,” Vietnam Journal of Computer Science, vol. 10, no. 1, pp. 39–54, Feb. 2023, doi: 10.1142/S2196888822500300.

P. A. Riadi, M. R. Faisal, D. Kartini, R. A. Nugroho, D. T. Nugrahadi, and D. B. Magfira, “A Comparative Study of Machine Learning Methods for Baby Cry Detection Using MFCC Features,” Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 6, no. 1, pp. 73–83, Jan. 2024, doi: 10.35882/JEEEMI.V6I1.350.

S. Kim, J. Y. Baek, and S. P. Lee, “COVID-19 Detection Model with Acoustic Features from Cough Sound and Its Application,” Applied Sciences 2023, Vol. 13, Page 2378, vol. 13, no. 4, p. 2378, Feb. 2023, doi: 10.3390/APP13042378.

S. Carvalho and E. F. Gomes, “Automatic Identification of Bird Species from Audio,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 12672 LNAI, pp. 41–52, 2021, doi: 10.1007/978-3-030-73280-6_4.

R. F. Junaidi et al., “Baby Cry Sound Detection: A Comparison of Mel Spectrogram Image on Convolutional Neural Network Models,” Journal of Electronics, Electromedical Engineering, and Medical Informatics, vol. 6, no. 4, pp. 355–369, Sep. 2024, doi: 10.35882/JEEEMI.V6I4.465.

N. Bacanin et al., “Respiratory Condition Detection Using Audio Analysis and Convolutional Neural Networks Optimized by Modified Metaheuristics,” Axioms 2024, Vol. 13, Page 335, vol. 13, no. 5, p. 335, May 2024, doi: 10.3390/AXIOMS13050335.

O. H. Anidjar and R. Yozevitch, “Transformer-based language-independent gender recognition in noisy audio environments,” Scientific Reports 2025 15:1, vol. 15, no. 1, pp. 1–16, Apr. 2025, doi: 10.1038/s41598-025-99011-x.

M. K. Gourisaria, R. Agrawal, M. Sahni, and P. K. Singh, “Comparative analysis of audio classification with MFCC and STFT features using machine learning techniques,” Discover Internet of Things, vol. 4, no. 1, pp. 1–23, Dec. 2024, doi: 10.1007/S43926-023-00049-Y/TABLES/9.

Z. Neili and K. Sundaraj, “Gammatonegram based Pulmonary Pathologies Classification using Convolutional Neural Networks,” 2022 19th IEEE International Multi-Conference on Systems, Signals and Devices, SSD 2022, pp. 1112–1118, 2022, doi: 10.1109/SSD54932.2022.9955783.

N. Zakaria, F. Mohamed, R. Abdelghani, and K. Sundaraj, “VGG16, ResNet-50, and GoogLeNet Deep Learning Architecture for Breathing Sound Classification: A Comparative Study,” 2021 Proceedings of the International Conference on Artificial Intelligence for Cyber Security Systems and Privacy, AI-CSP 2021, 2021, doi: 10.1109/AI-CSP52968.2021.9671124.

K. V. Suma, D. Koppad, P. Kumar, N. A. Kantikar, and S. Ramesh, “Multi-task Learning for Lung Sound and Lung Disease Classification,” SN Comput Sci, vol. 6, no. 1, pp. 1–13, Jan. 2025, doi: 10.1007/S42979-024-03506-9/METRICS.

M. Fauzan Nafiz, D. Kartini, M. R. Faisal, F. Indriani, and T. H. Saragih, “Automated Detection of COVID-19 Cough Sound using Mel-Spectrogram Images and Convolutional Neural Network,” Jurnal Ilmiah Teknik Elektro Komputer dan Informatika (JITEKI), vol. 9, no. 3, pp. 535–548, 2023, doi: 10.26555/jiteki.v9i3.26374.

S. Balasubramaniam, Y. Velmurugan, D. Jaganathan, and S. Dhanasekaran, “A Modified LeNet CNN for Breast Cancer Diagnosis in Ultrasound Images,” Diagnostics 2023, Vol. 13, Page 2746, vol. 13, no. 17, p. 2746, Aug. 2023, doi: 10.3390/DIAGNOSTICS13172746.

M. S. Jamil, P. N. Gunaratne, and H. Tamura, “A Study on Classification of Faulty Motor Sound Using Convolutional Neural Networks,” Proceedings of International Conference on Artificial Life and Robotics, pp. 918–922, 2024, doi: 10.5954/ICAROB.2024.GS1-2.

S. Das, S. M. M. Ahsan, M. Rahman, and M. S. Karim, “A Voting Approach for Heart Sounds Classification Using Discrete Wavelet Transform and CNN Architecture,” SN Comput Sci, vol. 5, no. 2, pp. 1–14, Feb. 2024, doi: 10.1007/S42979-023-02580-9/METRICS.

T. Nguyen and F. Pernkopf, “Lung Sound Classification Using Co-Tuning and Stochastic Normalization,” IEEE Trans Biomed Eng, vol. 69, no. 9, pp. 2872–2882, Sep. 2022, doi: 10.1109/TBME.2022.3156293.

B. Y. Lu, M. L. Hsueh, and H. D. Wu, “Transmission Perspective on the Mechanism of Coarse and Fine Crackle Sounds,” Archives of Acoustics, vol. Vol. 46, No. 2, no. 2, pp. 289–300, 2021, doi: 10.24425/AOA.2021.136583.

J. S. Park, K. Kim, J. H. Kim, Y. J. Choi, K. Kim, and D. I. Suh, “A machine learning approach to the development and prospective evaluation of a pediatric lung sound classification model,” Scientific Reports 2023 13:1, vol. 13, no. 1, pp. 1–10, Jan. 2023, doi: 10.1038/s41598-023-27399-5.

Y. Kim et al., “Respiratory sound classification for crackles, wheezes, and rhonchi in the clinical field using deep learning,” Scientific Reports 2021 11:1, vol. 11, no. 1, pp. 1–11, Aug. 2021, doi: 10.1038/s41598-021-96724-7.

G. Petmezas et al., “Automated Lung Sound Classification Using a Hybrid CNN-LSTM Network and Focal Loss Function,” Sensors 2022, Vol. 22, Page 1232, vol. 22, no. 3, p. 1232, Feb. 2022, doi: 10.3390/S22031232.

A. Carlini, C. Bordeau, and M. Ambard, “Auditory localization: a comprehensive practical review,” Front Psychol, vol. 15, p. 1408073, Jul. 2024, doi: 10.3389/FPSYG.2024.1408073/FULL.

S. Chen, M. Thielk, and T. Q. Gentner, “Auditory Feature-based Perceptual Distance,” bioRxiv, p. 2024.02.28.582631, Mar. 2024, doi: 10.1101/2024.02.28.582631.

A. Alfaidi, A. Alshahrani, and M. Aljohani, “A Novel Approach to COVID-19 Diagnosis Based on Mel Spectrogram Features and Artificial Intelligence Techniques,” IJCSNS International Journal of Computer Science and Network Security, vol. 22, no. 9, 2022, doi: 10.22937/IJCSNS.2022.22.9.29.

V. Sareen and K. R. Seeja, “Speech Emotion Recognition using Mel Spectrogram and Convolutional Neural Networks (CNN),” Procedia Comput Sci, vol. 258, pp. 3693–3702, Jan. 2025, doi: 10.1016/J.PROCS.2025.04.624.

A. Sebastian, O. Elharrouss, S. Al-Maadeed, and N. Almaadeed, “A Survey on Deep-Learning-Based Diabetic Retinopathy Classification,” Diagnostics 2023, Vol. 13, Page 345, vol. 13, no. 3, p. 345, Jan. 2023, doi: 10.3390/DIAGNOSTICS13030345.

K. L. Kermanidis, M. Maragoudakis, and M. Krichen, “Convolutional Neural Networks: A Survey,” Computers 2023, Vol. 12, Page 151, vol. 12, no. 8, p. 151, Jul. 2023, doi: 10.3390/COMPUTERS12080151.

C. Yang, E. A. Fridgeirsson, J. A. Kors, J. M. Reps, and P. R. Rijnbeek, “Impact of random oversampling and random undersampling on the performance of prediction models developed using observational health data,” J Big Data, vol. 11, no. 1, pp. 1–17, Dec. 2024, doi: 10.1186/S40537-023-00857-7/FIGURES/6.

R. M. Pereira, Y. M. G. Costa, and C. N. Silla, “Toward hierarchical classification of imbalanced data using random resampling algorithms,” Inf Sci (N Y), vol. 578, pp. 344–363, Nov. 2021, doi: 10.1016/J.INS.2021.07.033.

J. W. Kim, C. Yoon, and H. Y. Jung, “A Military Audio Dataset for Situational Awareness and Surveillance,” Sci Data, vol. 11, no. 1, pp. 1–10, Dec. 2024, doi: 10.1038/S41597-024-03511-W;SUBJMETA.

M. Hosni, “Encoding Techniques for Handling Categorical Data in Machine Learning-Based Software Development Effort Estimation,” International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management, IC3K - Proceedings, vol. 1, pp. 460–467, 2023, doi: 10.5220/0012259400003598.

D. Breskuvien˙, G. Dzemyda, D. Breskuvien, and G. Dzemyda, “Categorical Feature Encoding Techniques for Improved Classifier Performance when Dealing with Imbalanced Data of Fraudulent Transactions,” INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, vol. 18, no. 3, p. 5433, May 2023, doi: 10.15837/IJCCC.2023.3.5433.

M. Wojciuk, Z. Swiderska-Chadaj, K. Siwek, and A. Gertych, “Improving classification accuracy of fine-tuned CNN models: Impact of hyperparameter optimization,” Heliyon, vol. 10, no. 5, p. e26586, Mar. 2024, doi: 10.1016/j.heliyon.2024.e26586.

Published
2026-01-03
How to Cite
[1]
M. Yabani, M. R. Faisal, F. Indriani, D. T. Nugrahadi, D. Kartini, and K. Satou, “Deep Learning-Based Lung Sound Classification Using Mel-Spectrogram Features for Early Detection of Respiratory Diseases”, j.electron.electromedical.eng.med.inform, vol. 8, no. 1, pp. 168-184, Jan. 2026.
Section
Medical Engineering