Variable selection in discrete survival models

Show simple item record

dc.contributor.advisor Bere, A.
dc.contributor.advisor Sigauke, C.
dc.contributor.author Mabvuu, Coster
dc.date 2020
dc.date.accessioned 2020-09-29T19:33:45Z
dc.date.available 2020-09-29T19:33:45Z
dc.date.issued 2020-02-27
dc.identifier.citation Mabvuu, Coster (2020) Variable selection in discrete survival models. University of Venda, South Africa.<http://hdl.handle.net/11602/1552>.
dc.identifier.uri http://hdl.handle.net/11602/1552
dc.description MSc (Statistics) en_ZA
dc.description Department of Statistics
dc.description.abstract Selection of variables is vital in high dimensional statistical modelling as it aims to identify the right subset model. However, variable selection for discrete survival analysis poses many challenges due to a complicated data structure. Survival data might have unobserved heterogeneity leading to biased estimates when not taken into account. Conventional variable selection methods have stability problems. A simulation approach was used to assess and compare the performance of Least Absolute Shrinkage and Selection Operator (Lasso) and gradient boosting on discrete survival data. Parameter related mean squared errors (MSEs) and false positive rates suggest Lasso performs better than gradient boosting. Frailty models outperform discrete survival models that do not account for unobserved heterogeneity. The two methods were also applied on Zimbabwe Demographic Health Survey (ZDHS) 2016 data on age at first marriage and did not select exactly the same variables. Gradient boosting retained more variables into the model. Place of residence, highest educational level attained and age cohort are the major influential factors of age at first marriage in Zimbabwe based on Lasso. en_ZA
dc.description.sponsorship NRF en_ZA
dc.format.extent 1 online resource (xviii, 83 leaves)
dc.language.iso en en_ZA
dc.rights University of Venda
dc.subject Boosting en_ZA
dc.subject Discrete-time hazard model en_ZA
dc.subject Lasso en_ZA
dc.subject Penalised variable selection methods en_ZA
dc.subject Unobservrd heterogeneity en_ZA
dc.subject.ddc 519.546
dc.subject.lcsh Survival analysis (Biometry)
dc.subject.lcsh Biometry
dc.subject.lcsh Failure time data analysis
dc.title Variable selection in discrete survival models en_ZA
dc.type Dissertation en_ZA

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search UnivenIR


My Account