Intrinsically Modified Physio-Biological Features Driven Heterogenous Ensemble Learning Model for Cardio-Vascular Disease Prediction
Abstract
Cardiovascular disease and other non-communicable illnesses have been on the rise in recent years. Despite innovations in computer-aided diagnosis (CAD) and clinical decision systems, unlike vision-based e-healthcare practices, heart-disease prediction requires learning over the different bio-physiological parameters related to the heart’s health. The limitations of the datasets including class-imbalance, redundant computation and the threat of local minima and convergence, and resulting low-accuracy confine real-time significance of the at hand cardiovascular disease prediction (CDP) systems. In this paper a robust intrinsically modified bio-physiological parameters driven heterogenous ensemble learning based CVD prediction model is proposed. We focused on both feature optimization as well as computational efficacy to achieve a robust CAD solution towards CVD diagnosis. Our proposed method applies age, gender, cholesterol, protein profiles, body mass index information, stoke profile or history, electro-cardiogram information etc. from the benchmark dataset to enable a scalable CVD prediction model. To ensure semantic feature driven learning, the aforesaid features were processed for Word2Vec embedding, which was followed by resampling by using synthetic minority over-sampling technique (SMOTE) and its variants, SMOTE-Boundary Line and SMOTE-ENN which helped to alleviate any probability of class-imbalance. Subsequently, Principal Component Analysis (PCA), Cross-Correlation Analysis (CCRA) and Significant Predictor Test (SPT) methods were applied distinctly to retain the optimal feature sets. The selected feature instances were normalized by applying Min-max Scalar Normalization method. The normalized features were taught using a mixed-method ensemble learning strategy that comprised Base Classifier (RF), Decision Tree (DT), Support Vector Machine (SVM) variations, Naïve Bayes (NB), Logistic Regression (LOGR), Linear Regression (LR), Random Forest (RF), and Extra Tree Classifier (ETC) as foundational classifiers. It used the maximum voting ensemble (MVE) method to determine if each individual was CDV-Positive or CVD-Negative. The results show that the proposed method is resilient for application in real-world CDS scenarios, as it surpasses all prior state-of-the-art approaches in terms of CVD prediction accuracy (99.93%), precision (99.69%), recall (99.53%), and F-Measure (99.60%).
Letters in High Energy Physics (LHEP) is an open access journal. The articles in LHEP are distributed according to the terms of the creative commons license CC-BY 4.0. Under the terms of this license, copyright is retained by the author while use, distribution and reproduction in any medium are permitted provided proper credit is given to original authors and sources.
Terms of Submission
By submitting an article for publication in LHEP, the submitting author asserts that:
1. The article presents original contributions by the author(s) which have not been published previously in a peer-reviewed medium and are not subject to copyright protection.
2. The co-authors of the article, if any, as well as any institution whose approval is required, agree to the publication of the article in LHEP.