Intrinsically Modified Physio-Biological Features Driven Heterogenous Ensemble Learning Model for Cardio-Vascular Disease Prediction

  • L. Hamsaveni et al.
Keywords: Heart Disease Prediction, Data Mining, Machine Learning, SMOTE-ENN, Significant Predictor Test, Heterogenous Ensemble Learning, Computer Aided Diagnosis.

Abstract

Cardiovascular disease and other non-communicable illnesses have been on the rise in recent years. Despite innovations in computer-aided diagnosis (CAD) and clinical decision systems, unlike vision-based e-healthcare practices, heart-disease prediction requires learning over the different bio-physiological parameters related to the heart’s health. The limitations of the datasets including class-imbalance, redundant computation and the threat of local minima and convergence, and resulting low-accuracy confine real-time significance of the at hand cardiovascular disease prediction (CDP) systems. In this paper a robust intrinsically modified bio-physiological parameters driven heterogenous ensemble learning based CVD prediction model is proposed. We focused on both feature optimization as well as computational efficacy to achieve a robust CAD solution towards CVD diagnosis. Our proposed method applies age, gender, cholesterol, protein profiles, body mass index information, stoke profile or history, electro-cardiogram information etc. from the benchmark dataset to enable a scalable CVD prediction model. To ensure semantic feature driven learning, the aforesaid features were processed for Word2Vec embedding, which was followed by resampling by using synthetic minority over-sampling technique (SMOTE) and its variants, SMOTE-Boundary Line and SMOTE-ENN which helped to alleviate any probability of class-imbalance. Subsequently, Principal Component Analysis (PCA), Cross-Correlation Analysis (CCRA) and Significant Predictor Test (SPT) methods were applied distinctly to retain the optimal feature sets. The selected feature instances were normalized by applying Min-max Scalar Normalization method. The normalized features were taught using a mixed-method ensemble learning strategy that comprised Base Classifier (RF), Decision Tree (DT), Support Vector Machine (SVM) variations, Naïve Bayes (NB), Logistic Regression (LOGR), Linear Regression (LR), Random Forest (RF), and Extra Tree Classifier (ETC) as foundational classifiers. It used the maximum voting ensemble (MVE) method to determine if each individual was CDV-Positive or CVD-Negative. The results show that the proposed method is resilient for application in real-world CDS scenarios, as it surpasses all prior state-of-the-art approaches in terms of CVD prediction accuracy  (99.93%), precision (99.69%), recall (99.53%), and F-Measure (99.60%).

Author Biography

L. Hamsaveni et al.

L. Hamsaveni1, Rajesh B3, M Vinayaka Murthy2, Muralidhara B L4

1Associate Professor, DoS in Computer Science, University of Mysore, Mysuru, Karnataka, India

hamsa1367@gmail.com

2Senior Lecturer, School of Management, Mahindra University Hyderbad, Telangana, India

Rajesh.balarama@mahindrauniversity.edu.in

3Professor, School of Computer Science, REVA University, Bengaluru, Karnataka, India.

dr.m.vinayakamurthy@gmail.com

4Senior Professor,Department of Computer Science, Bangalore University, Bangalore , Karnataka, India

murali@bub.ernet.in

Published
2024-02-04
Section
Regular Issue