Abstract: To its slower progression and less obvious onset, Chronic Kidney Disease can easily become a challenging health issue to recognize directly. issue from a global perspective with associated high disease morbidity and mortality rates and hence induces other diseases as well. However, investigations are conducted at different stages related to the stage of CKD, a majority of do not even recognize that they have the disease. Once CKD has been diagnosed at an early stage, timely treatment can be offered to manage the progression of this disease. In such situations, machine learning applications may help achieve the speed and accuracy needed for diagnosis; hence, the study, i.e., "A machine learning methodology for diagnosing chronic kidney disease," has been originated. CKD data covering instances with A very large collection of missing data was obtained From UC Irvine's Machine Learning Repository, also known as UCI. This is how the data came to be then subjected to KNN imputation to fill missing values. K-nearest neighbors imputation works by selecting for each incomplete sample some To perform the imputation, it would require samples that are most analogous to the observations done before the actual procedure. Missing data situations are commonplace Some measurements of the patients remain unrecorded under some conditions in the real-life medical settings. After the instances when the patients missed measurements, the physician prescribes the medication and returns the patient for another measurement. suitable imputation processes were completed on the incomplete data set, modeling was done with The six machine learning methods include: logistic regression, random forest, support vector machine, k-nearest neighbor, Naive Bayes classifier, and feedforward neural network. Overall, random forest was able to achieve the highest accuracy across a range of machine learning models. Learning from the errors in models developed thus requires an emphasis on designing an integrated model that can incorporate logistic regression and random forest through Perceptron, optimal in speed for this. Therefore, thereby we speculated that this could be a solution that can be generalized to other more complex clinical data with diseases.
Keywords: Logistic Regression, Random Forests, Support Vector Machines, k-nearest neighbors, and Naive Bayes in addition to feed forward neural networks.