Statistical and deep learning methods for cancer genomic data = Izindlela zokufunda ezijulile zezibalomidanti zemininingo yeqoqozinhlayiyafuzo lomdlavuza.
Date
2021
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Statistical and machine learning methods have been applied in broad domains
including the medical field. These methods have a massive impact on healthcare by
providing the support for decision making to the specialist in diagnosis and
prognosis of patient disease status and disease progression. Non-communicable
diseases (NCDs) remain a major challenge the world over in the 21st century,
especially in developing countries where resources are limited. Recent global public
health research shows an epidemiological paradigm shift from infection to
non-communicable diseases, which include cancer.
Cancer is considered the most devastating among all NCDs and is ranked second to
malaria as the leading causes of death in the developing countries. Cancer occurs in
many different types affecting all community members, where the general
mechanism of cancer disease etiology is uncontrolled cells proliferation that leads
to a malignant or cancerous tumor, and abnormalities at the molecular level.
However, earlier detection and accurate diagnosis of cancer symptoms increase the
probability of curing the condition, which has become the best strategy for fighting
the disease. In the past few years, a vast amount of cancer data have been generated
through new high throughput technologies. Traditional clinical and experimental
approaches lack the capacity to handle such a massive scale of data. Therefore,
computational methods have been introduced to biomedical investigations,
including genes/biomarkers selection of cancer types and stages of the disease.
Many computational tools have been developed based on different statistical and
machine learning strategies and data science approaches.
We used statistical, machine and deep learning methods for cancer types, subtypes,
and survival prediction in this work. First, we developed a hybrid (DNA mutation
and RNA expression) signature and assessed its predictive properties for colorectal
cancer (CRC) patients’ mutation status and survival. In addition, we proposed a
stacking ensemble deep learning approach to evaluate and compare its predictive
performance for cancer types (as a multi-class classification problem) with the
different standard machine and deep learning methods. Finally, we assessed the
predictive performance of the Cox proportional hazard and random survival
forests methods based on a signature obtained using three gene mutations (KRAS,
BRAF, and TP53). However, the most significant limitation lies in the sample size
being small, and there is a lack of using independent data for validation. Also, we
did not consider different features such as methylation and mutation data.
Moreover, it is unfortunate that the study did not include detailed simulation
studies to compare the traditional statistical and machine learning methods.
Overall, the most prominent finding to emerge from this investigation is that
combining different data sources leads to more robust statistical significance. Also,
the stacking approach is more reliable and promising compared to a single machine
or deep learning. Furthermore, the RSF is a proper and striking method for survival
analysis since it does not depend on any model assumptions.
Iqoqa
Izindlela zokufunda zezibalomidanti nemishini zisetshenziswa kakhulu ezizindeni ezibanzi ezibandakanya nomkhakha wezokwelapha. Lezi zindlela zinomthelela omkhulu kwezokunakekela ngokwempilo ngoba zeseka ukuthathwa kwezinqumo ngodokotela abawongoti uma kwenziwa inhlonzasimo kanye nohlahlokwelapha ngesimo sesifo sesiguli noma nokudlebeleka kwesifo. Izifo ezingathelelani ezaziwa ngeNon-communicable diseases (NCDs) zilokhu ziyingqinamba enkulu emhlabeni wonke jikelele ngekhulunyaka lama-21, ikakhulukazi emazweni asathuthuka lapho izinsizasidingo zigqoza khona. Ucwaningo lomhlaba olusanda kwenziwa ngempilo yomphakathi lukhombisa ukushintsha kwendlelakubuka ngembangela yokusabalala kwezifo kusuka ekuthelelekeni kuya ezifweni ezingathelelani, ezibandakanya isifo somdlavuza.
Umdlavuza uthathwa njengesifo esicekelana phansi kakhulu uma kubukwa wonke amaNCD kanti singesesibili emuva kukamalaleveva njengesifo esiyimbangela yokufa kwabaningi emazweni asathuthuka. Umdlavuza uvela ngezindlela eziningi ezahlukene kanti uhlasela wonke amalunga omphakathi, lapho indlela ejwayelekile yembangelasifo emdlavuzeni kuba ukungalawuleki kokwanda kwamacells agcina edala izimila ezinomdlavuza, kanye nokungalungi ezingeni lamamolecule. Kepha ukutholwa kwesifo ngokushesha kanye nenhlonzasimo enembayo yezimpawu zomdlavuza kukhulisa amathuba okuselapha isifo okuyisu eliphuma phambili lokulwa nomdlavuza. Eminyakeni embalwa edlule, kuqoqwe indathane yemininingo yesifo somdlavuza kusetshenziswa ezobuchwepheshe. Izindlela zakudala zokwelapha nokuhlola okuyilinge kuyehluleka ukuthwala umthamo omkhulu wemininingo. Ngakho-ke, izindlela zobuchwepheshe sezazisiwe ekucwaningeni kokwelapha, okubandakanya ukukhetha ngokofuzo izinhlobo zomdlavuza nezigaba zesifo. Izinsiza zobuchwepheshe eziningi sezakhiwe kusetshenziswa njengesisekelo amasu okufunda emidantizibalo nemishini kanye nezinye izindlela zesayensi yemininingo.
Kulolu cwaningo kwasetshenziswa imidantizibalo, imishini nezindlela zokufunda ezijulile ngezinhlobo zomdlavuza, izinhlotshana, kanye nokuqagula isikhathi sokuphila. Kwaqalwa ngokubunjwa kwenhlanganisela iDNA mutation neRNA expression kwase kuhlolwa isimo sezimpawu zokubikezela ngomdlavuza womtshazo icolorectal cancer (CRC) nokusinda kweziguli. Ngaphezu kwalokho, kwahlongozwa indlela ejulile yokufunda ehlanganisayo ukuhlola nokuqhathanisa ukubikezela kwezinhlobo zomdlavuza (njengendlela yokwahlukanisa izinto ngokwezigaba) ngezinhlobo ezahlukene zemishini nezindlela ezijulile zokufunda. Kwaphethwa ngokuhlola izindlela zokubikezela kweCox proportional hazard nerandom survival forests kusetshenziswa okutholakele ngokwezinhlobo kusetshenziswa izinguqukolibofuzo ezintathu (KRAS, BRAF, neTP53). Yize kunjalo, isithiyo esikhulu umkhawulo wokuthi isampula lincane, kanti kunokuswelakala kokusebenzisa imininingo ezimele ukuze kube nokuqinisekiswa. Kanti futhi ucwaningo aluzibhekanga ezinye izici ezahlukene njengemininingo yemethylation neyoguqukolibofuzo. Ngaphezu kwalokho, kuyishwa ukuthi lolu cwaningo alubandakanyanga ukusetshenziswa kwesingakwenza ukuze kuqhathaniswe izindlela zezibalomidanti zakudala nezindlela zokufunda ngemishini.
Esiphethweni, umphumela onqala ovele kulolu cwaningo owokuthi ukuhlanganisa izinhlobo ezahlukene zezizinda zemininingo ocwaningweni kuholela ekuqineni kobalomidanti oluningi olubalulekile. Nokuthi ukunqwabelanisa kwethembekile futhi kuyethembisa uma kuqhathaniswa nomshini owodwa noma ukufunda okukodwa okujulile. Ngaphezu kwalokho, iRSF iyona ndlela efanele nencomekayo yokuhlaziya ukusinda ngoba ayincikile kwezinye izindlela ezicatshangwayo.
Description
Doctoral Degree. University of KwaZulu-Natal, Pietermaritzburg.