A Mathematical Model of Analytical Supervised Learning Algorithms for Stroke Prediction Using PySpark: Precision, Dispersion and Random Noise Fluctuation Analysis

Ogoegbulem Ozioma , Suit Patrick OGHENERHORO, Angela Okwuolise OKONYE

doi:10.5281/zenodo.20510833

Research Article

A Mathematical Model of Analytical Supervised Learning Algorithms for Stroke Prediction Using PySpark: Precision, Dispersion and Random Noise Fluctuation Analysis

Ogoegbulem Ozioma , Suit Patrick OGHENERHORO, Angela Okwuolise OKONYE

✉️ Ozioma.Ogoegbulem@dou.edu.ng

JournalIJMS

Volume1

Issue1

Year2026

Pages1-20

DOI10.5281/zenodo.20510833

PublishedJun 02, 2026

👁️ View PDF ⬇️ Download PDF

How to Cite

Ogoegbulem Ozioma , Suit Patrick OGHENERHORO, Angela Okwuolise OKONYE. (2026). A Mathematical Model of Analytical Supervised Learning Algorithms for Stroke Prediction Using PySpark: Precision, Dispersion and Random Noise Fluctuation Analysis. Ktrend - International Journal of Mathematics and Statistics (IJMS), 1(1), 1-20. https://doi.org/10.5281/zenodo.20510833

Abstract

<p>Stroke prediction is a significant problem in computational medicine because stroke occurrence is influenced by nonlinear interactions among demographic, physiological, and lifestyle risk variables. This paper develops a journal-ready mathematical model of analytical supervised learning algorithms for the prediction of stroke using PySpark. The study treats stroke prediction as a binary classification problem and formulates the learning pipeline using empirical risk minimization, logistic probability maps, impurity-based recursive partitioning, ensemble aggregation, gradient boosting updates, separating hyperplanes, confusion matrices, receiver operating characteristic curves, and cross-validation. In addition to the conventional machine learning pipeline, the paper introduces a precision-dispersion and random-noise fluctuation framework for studying the stability of medical predictors. This extension is motivated by recent work on data precision and dispersion analysis of interacting simulated data with random noise fluctuation, and it is used to quantify how feature variability may influence model reliability. The rebuilt model includes actual publication-style graphical components: a TikZ analytical workflow, a performance comparison chart, a feature-importance chart, conceptual ROC curves, a three-dimensional stroke-risk surface, a precision-dispersion plot, a random-noise fluctuation plot, cross-validation graphics, and confusion-matrix heatmaps. The comparative results indicate that Random Forest and Gradient Boosted Trees provide the strongest predictive behaviour among the five supervised classifiers considered. Random Forest achieved a testing AUC of 92.41%, accuracy of 86.64%, and F1 score of 87.20% before cross-validation, and maintained a testing AUC of 92.26% with F1 score of 87.74% after cross-validation. Feature-importance and risk-surface analysis indicate that age, body-mass index, average glucose level, hypertension, and heart disease are dominant predictive factors. The paper concludes that PySpark-based ensemble learning, when supplemented with precision, dispersion, and noise-fluctuation analysis, provides a scalable mathematical framework for interpretable stroke-risk prediction. However, any clinical deployment requires external validation, privacy protection, fairness auditing, and professional medical oversight.</p>

← Previous

Next →

📚 Journal Info

IJMS
ISSN: 3141-6438
Vol. 1, Iss. 1
← Back to Journal

📄 Related Articles

A Deterministic Mathematical Model for the Transmission Dynamics of Typhoid Feve...
Robert Folorunsho Akerejola, Michael Nsikan John,
On the Identity of Implicit Deformation Geometry Generated by a Nonlinear Balanc...
Eno John
Homomorphism Counting, Fuzzy Group Actions and Conjugacy-Based Cryptography in F...
Michael Nsikan John
Differential Semigroups of Automaton Perturbations and Incremental Syntactic Rec...
Michael Nsikan John, Okeke Ikenna Stephen, Robert

✅ Quality Indicators

✓ Open Access
✓ Peer Reviewed
✓ Double-Blind Review
✓ DOI Available
✓ Ethical Publishing

A Mathematical Model of Analytical Supervised Learning Algorithms for Stroke Prediction Using PySpark: Precision, Dispersion and Random Noise Fluctuation Analysis

How to Cite

Abstract

📚 Journal Info

📄 Related Articles

✅ Quality Indicators

Quick Links

Resources

Get In Touch