Repository logo
 

Statistical and machine learning methods of online behaviours analysis.

dc.contributor.advisorChifurira, Retius.
dc.contributor.advisorZewotir, Temesgen Tenaw.
dc.contributor.authorSoobramoney, Judah.
dc.date.accessioned2024-11-18T12:36:54Z
dc.date.available2024-11-18T12:36:54Z
dc.date.created2024
dc.date.issued2024
dc.descriptionDoctoral Degree. University of KwaZulu-Natal, Durban.
dc.description.abstractThe success of corporates is highly influenced by the effectiveness and appeal of each corporate’s website. This study was conducted on TEKmation, a South African corporate, whose board of directors lacked insight regarding the website’s usage. The study aimed to quantify the web-traffic flow, detect the underlying browsing patterns, and validate the web-design effectiveness. The website experienced 7,935 visits and 57,154 page views from 1 June 2021 to 30 June 2023 (data sourced by Google Analytics). Grubb’s test has identified outliers in visit frequency, the pageviews per visit, and the visit duration per visit. A small degree of missingness was observed on the mobile device branding (1.24%) and operating system (0.03%) features which were imputed using a Bayesian network model. To address a data-shift detected, an artificial neural network (ANN) was proposed to flag future data-shifts with important predictors being the period of year and volume of sessions. Prior to clustering, feature selection methods assessed the feature variability and feature association. Results indicated that low-incidence webpages and features with natural relationships should be omitted. The K-means, DBScan and hierarchical unsupervised machine learning methods were employed to identify the visit personas, labelled get-in-touch (12%), accidentals (11%), dropoffs (30%), engrossed (38%) and seekers (9%). It was evident that the premature drop-offs needed further exploration. The Cox proportional hazards survival model and the random survival forest (RSF) model have identified that the web browser, visit frequency, device category, distance, certain webpages, volume of hits, and organic searches proved to be drop-offs hazards. A tiered Markov chain model was developed to compute the transition probabilities of dropping-off. The contact (63%) and clients (50%) states recorded a high likelihood to drop-off early within the visit. In conclusion, using statistical methods, the study informed the board on of its audience, the flaws of the website and proposed recommendations to address concerns.
dc.identifier.urihttps://hdl.handle.net/10413/23403
dc.language.isoen
dc.rightsCC0 1.0 Universalen
dc.rights.urihttp://creativecommons.org/publicdomain/zero/1.0/
dc.subject.otherBayesian networks.
dc.subject.otherGoogle Analytics.
dc.subject.otherMachine learning methods.
dc.subject.otherMarkov chains.
dc.subject.otherWeb personas.
dc.titleStatistical and machine learning methods of online behaviours analysis.
dc.typeThesis
local.sdgSDG4

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Soobramoney_Judah_2024.pdf
Size:
5.5 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.64 KB
Format:
Item-specific license agreed upon to submission
Description: