Statistical and machine learning methods of online behaviours analysis.
dc.contributor.advisor | Chifurira, Retius. | |
dc.contributor.advisor | Zewotir, Temesgen Tenaw. | |
dc.contributor.author | Soobramoney, Judah. | |
dc.date.accessioned | 2024-11-18T12:36:54Z | |
dc.date.available | 2024-11-18T12:36:54Z | |
dc.date.created | 2024 | |
dc.date.issued | 2024 | |
dc.description | Doctoral Degree. University of KwaZulu-Natal, Durban. | |
dc.description.abstract | The success of corporates is highly influenced by the effectiveness and appeal of each corporate’s website. This study was conducted on TEKmation, a South African corporate, whose board of directors lacked insight regarding the website’s usage. The study aimed to quantify the web-traffic flow, detect the underlying browsing patterns, and validate the web-design effectiveness. The website experienced 7,935 visits and 57,154 page views from 1 June 2021 to 30 June 2023 (data sourced by Google Analytics). Grubb’s test has identified outliers in visit frequency, the pageviews per visit, and the visit duration per visit. A small degree of missingness was observed on the mobile device branding (1.24%) and operating system (0.03%) features which were imputed using a Bayesian network model. To address a data-shift detected, an artificial neural network (ANN) was proposed to flag future data-shifts with important predictors being the period of year and volume of sessions. Prior to clustering, feature selection methods assessed the feature variability and feature association. Results indicated that low-incidence webpages and features with natural relationships should be omitted. The K-means, DBScan and hierarchical unsupervised machine learning methods were employed to identify the visit personas, labelled get-in-touch (12%), accidentals (11%), dropoffs (30%), engrossed (38%) and seekers (9%). It was evident that the premature drop-offs needed further exploration. The Cox proportional hazards survival model and the random survival forest (RSF) model have identified that the web browser, visit frequency, device category, distance, certain webpages, volume of hits, and organic searches proved to be drop-offs hazards. A tiered Markov chain model was developed to compute the transition probabilities of dropping-off. The contact (63%) and clients (50%) states recorded a high likelihood to drop-off early within the visit. In conclusion, using statistical methods, the study informed the board on of its audience, the flaws of the website and proposed recommendations to address concerns. | |
dc.identifier.uri | https://hdl.handle.net/10413/23403 | |
dc.language.iso | en | |
dc.rights | CC0 1.0 Universal | en |
dc.rights.uri | http://creativecommons.org/publicdomain/zero/1.0/ | |
dc.subject.other | Bayesian networks. | |
dc.subject.other | Google Analytics. | |
dc.subject.other | Machine learning methods. | |
dc.subject.other | Markov chains. | |
dc.subject.other | Web personas. | |
dc.title | Statistical and machine learning methods of online behaviours analysis. | |
dc.type | Thesis | |
local.sdg | SDG4 |