Mwambi, Henry Godwell.Omolo, Bernard Oguna.Mohammed, Mohanad Mohammed Adam.2022-10-312022-10-3120212021https://researchspace.ukzn.ac.za/handle/10413/21038Doctoral Degree. University of KwaZulu-Natal, Pietermaritzburg.Statistical and machine learning methods have been applied in broad domains including the medical field. These methods have a massive impact on healthcare by providing the support for decision making to the specialist in diagnosis and prognosis of patient disease status and disease progression. Non-communicable diseases (NCDs) remain a major challenge the world over in the 21st century, especially in developing countries where resources are limited. Recent global public health research shows an epidemiological paradigm shift from infection to non-communicable diseases, which include cancer. Cancer is considered the most devastating among all NCDs and is ranked second to malaria as the leading causes of death in the developing countries. Cancer occurs in many different types affecting all community members, where the general mechanism of cancer disease etiology is uncontrolled cells proliferation that leads to a malignant or cancerous tumor, and abnormalities at the molecular level. However, earlier detection and accurate diagnosis of cancer symptoms increase the probability of curing the condition, which has become the best strategy for fighting the disease. In the past few years, a vast amount of cancer data have been generated through new high throughput technologies. Traditional clinical and experimental approaches lack the capacity to handle such a massive scale of data. Therefore, computational methods have been introduced to biomedical investigations, including genes/biomarkers selection of cancer types and stages of the disease. Many computational tools have been developed based on different statistical and machine learning strategies and data science approaches. We used statistical, machine and deep learning methods for cancer types, subtypes, and survival prediction in this work. First, we developed a hybrid (DNA mutation and RNA expression) signature and assessed its predictive properties for colorectal cancer (CRC) patients’ mutation status and survival. In addition, we proposed a stacking ensemble deep learning approach to evaluate and compare its predictive performance for cancer types (as a multi-class classification problem) with the different standard machine and deep learning methods. Finally, we assessed the predictive performance of the Cox proportional hazard and random survival forests methods based on a signature obtained using three gene mutations (KRAS, BRAF, and TP53). However, the most significant limitation lies in the sample size being small, and there is a lack of using independent data for validation. Also, we did not consider different features such as methylation and mutation data. Moreover, it is unfortunate that the study did not include detailed simulation studies to compare the traditional statistical and machine learning methods. Overall, the most prominent finding to emerge from this investigation is that combining different data sources leads to more robust statistical significance. Also, the stacking approach is more reliable and promising compared to a single machine or deep learning. Furthermore, the RSF is a proper and striking method for survival analysis since it does not depend on any model assumptions. Iqoqa Izindlela zokufunda zezibalomidanti nemishini zisetshenziswa kakhulu ezizindeni ezibanzi ezibandakanya nomkhakha wezokwelapha. Lezi zindlela zinomthelela omkhulu kwezokunakekela ngokwempilo ngoba zeseka ukuthathwa kwezinqumo ngodokotela abawongoti uma kwenziwa inhlonzasimo kanye nohlahlokwelapha ngesimo sesifo sesiguli noma nokudlebeleka kwesifo. Izifo ezingathelelani ezaziwa ngeNon-communicable diseases (NCDs) zilokhu ziyingqinamba enkulu emhlabeni wonke jikelele ngekhulunyaka lama-21, ikakhulukazi emazweni asathuthuka lapho izinsizasidingo zigqoza khona. Ucwaningo lomhlaba olusanda kwenziwa ngempilo yomphakathi lukhombisa ukushintsha kwendlelakubuka ngembangela yokusabalala kwezifo kusuka ekuthelelekeni kuya ezifweni ezingathelelani, ezibandakanya isifo somdlavuza. Umdlavuza uthathwa njengesifo esicekelana phansi kakhulu uma kubukwa wonke amaNCD kanti singesesibili emuva kukamalaleveva njengesifo esiyimbangela yokufa kwabaningi emazweni asathuthuka. Umdlavuza uvela ngezindlela eziningi ezahlukene kanti uhlasela wonke amalunga omphakathi, lapho indlela ejwayelekile yembangelasifo emdlavuzeni kuba ukungalawuleki kokwanda kwamacells agcina edala izimila ezinomdlavuza, kanye nokungalungi ezingeni lamamolecule. Kepha ukutholwa kwesifo ngokushesha kanye nenhlonzasimo enembayo yezimpawu zomdlavuza kukhulisa amathuba okuselapha isifo okuyisu eliphuma phambili lokulwa nomdlavuza. Eminyakeni embalwa edlule, kuqoqwe indathane yemininingo yesifo somdlavuza kusetshenziswa ezobuchwepheshe. Izindlela zakudala zokwelapha nokuhlola okuyilinge kuyehluleka ukuthwala umthamo omkhulu wemininingo. Ngakho-ke, izindlela zobuchwepheshe sezazisiwe ekucwaningeni kokwelapha, okubandakanya ukukhetha ngokofuzo izinhlobo zomdlavuza nezigaba zesifo. Izinsiza zobuchwepheshe eziningi sezakhiwe kusetshenziswa njengesisekelo amasu okufunda emidantizibalo nemishini kanye nezinye izindlela zesayensi yemininingo. Kulolu cwaningo kwasetshenziswa imidantizibalo, imishini nezindlela zokufunda ezijulile ngezinhlobo zomdlavuza, izinhlotshana, kanye nokuqagula isikhathi sokuphila. Kwaqalwa ngokubunjwa kwenhlanganisela iDNA mutation neRNA expression kwase kuhlolwa isimo sezimpawu zokubikezela ngomdlavuza womtshazo icolorectal cancer (CRC) nokusinda kweziguli. Ngaphezu kwalokho, kwahlongozwa indlela ejulile yokufunda ehlanganisayo ukuhlola nokuqhathanisa ukubikezela kwezinhlobo zomdlavuza (njengendlela yokwahlukanisa izinto ngokwezigaba) ngezinhlobo ezahlukene zemishini nezindlela ezijulile zokufunda. Kwaphethwa ngokuhlola izindlela zokubikezela kweCox proportional hazard nerandom survival forests kusetshenziswa okutholakele ngokwezinhlobo kusetshenziswa izinguqukolibofuzo ezintathu (KRAS, BRAF, neTP53). Yize kunjalo, isithiyo esikhulu umkhawulo wokuthi isampula lincane, kanti kunokuswelakala kokusebenzisa imininingo ezimele ukuze kube nokuqinisekiswa. Kanti futhi ucwaningo aluzibhekanga ezinye izici ezahlukene njengemininingo yemethylation neyoguqukolibofuzo. Ngaphezu kwalokho, kuyishwa ukuthi lolu cwaningo alubandakanyanga ukusetshenziswa kwesingakwenza ukuze kuqhathaniswe izindlela zezibalomidanti zakudala nezindlela zokufunda ngemishini. Esiphethweni, umphumela onqala ovele kulolu cwaningo owokuthi ukuhlanganisa izinhlobo ezahlukene zezizinda zemininingo ocwaningweni kuholela ekuqineni kobalomidanti oluningi olubalulekile. Nokuthi ukunqwabelanisa kwethembekile futhi kuyethembisa uma kuqhathaniswa nomshini owodwa noma ukufunda okukodwa okujulile. Ngaphezu kwalokho, iRSF iyona ndlela efanele nencomekayo yokuhlaziya ukusinda ngoba ayincikile kwezinye izindlela ezicatshangwayo.enStatistical learning methods.Machine learning methods.Non-communicable diseases.Cox Proportional Hazard Model.Random survival forests method.Statistical and deep learning methods for cancer genomic data = Izindlela zokufunda ezijulile zezibalomidanti zemininingo yeqoqozinhlayiyafuzo lomdlavuza.Thesis