Variable selection in discrete survival models

dc.contributor.advisorBere, A.
dc.contributor.advisorSigauke, C.
dc.contributor.authorMabvuu, Coster
dc.date2020
dc.date.accessioned2020-09-29T19:33:45Z
dc.date.available2020-09-29T19:33:45Z
dc.date.issued2020-02-27
dc.descriptionMSc (Statistics)en_ZA
dc.descriptionDepartment of Statistics
dc.description.abstractSelection of variables is vital in high dimensional statistical modelling as it aims to identify the right subset model. However, variable selection for discrete survival analysis poses many challenges due to a complicated data structure. Survival data might have unobserved heterogeneity leading to biased estimates when not taken into account. Conventional variable selection methods have stability problems. A simulation approach was used to assess and compare the performance of Least Absolute Shrinkage and Selection Operator (Lasso) and gradient boosting on discrete survival data. Parameter related mean squared errors (MSEs) and false positive rates suggest Lasso performs better than gradient boosting. Frailty models outperform discrete survival models that do not account for unobserved heterogeneity. The two methods were also applied on Zimbabwe Demographic Health Survey (ZDHS) 2016 data on age at first marriage and did not select exactly the same variables. Gradient boosting retained more variables into the model. Place of residence, highest educational level attained and age cohort are the major influential factors of age at first marriage in Zimbabwe based on Lasso.en_ZA
dc.description.sponsorshipNRFen_ZA
dc.format.extent1 online resource (xviii, 83 leaves)
dc.identifier.apacitationMabvuu, C. (2020). <i>Variable selection in discrete survival models</i>. (). . Retrieved from http://hdl.handle.net/11602/1552en_ZA
dc.identifier.chicagocitationMabvuu, Coster. <i>"Variable selection in discrete survival models."</i> ., , 2020. http://hdl.handle.net/11602/1552en_ZA
dc.identifier.citationMabvuu, Coster (2020) Variable selection in discrete survival models. University of Venda, South Africa.<http://hdl.handle.net/11602/1552>.
dc.identifier.ris TY - Dissertation AU - Mabvuu, Coster AB - Selection of variables is vital in high dimensional statistical modelling as it aims to identify the right subset model. However, variable selection for discrete survival analysis poses many challenges due to a complicated data structure. Survival data might have unobserved heterogeneity leading to biased estimates when not taken into account. Conventional variable selection methods have stability problems. A simulation approach was used to assess and compare the performance of Least Absolute Shrinkage and Selection Operator (Lasso) and gradient boosting on discrete survival data. Parameter related mean squared errors (MSEs) and false positive rates suggest Lasso performs better than gradient boosting. Frailty models outperform discrete survival models that do not account for unobserved heterogeneity. The two methods were also applied on Zimbabwe Demographic Health Survey (ZDHS) 2016 data on age at first marriage and did not select exactly the same variables. Gradient boosting retained more variables into the model. Place of residence, highest educational level attained and age cohort are the major influential factors of age at first marriage in Zimbabwe based on Lasso. DA - 2020-02-27 DB - ResearchSpace DP - Univen KW - Boosting KW - Discrete-time hazard model KW - Lasso KW - Penalised variable selection methods KW - Unobservrd heterogeneity LK - https://univendspace.univen.ac.za PY - 2020 T1 - Variable selection in discrete survival models TI - Variable selection in discrete survival models UR - http://hdl.handle.net/11602/1552 ER - en_ZA
dc.identifier.urihttp://hdl.handle.net/11602/1552
dc.identifier.vancouvercitationMabvuu C. Variable selection in discrete survival models. []. , 2020 [cited yyyy month dd]. Available from: http://hdl.handle.net/11602/1552en_ZA
dc.language.isoenen_ZA
dc.rightsUniversity of Venda
dc.subjectBoostingen_ZA
dc.subjectUCTDen_ZA
dc.subjectLassoen_ZA
dc.subjectPenalised variable selection methodsen_ZA
dc.subjectUnobservrd heterogeneityen_ZA
dc.subject.ddc519.546
dc.subject.lcshSurvival analysis (Biometry)
dc.subject.lcshBiometry
dc.subject.lcshFailure time data analysis
dc.titleVariable selection in discrete survival modelsen_ZA
dc.typeDissertationen_ZA
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Dissertation - Mabvuu, c.-.pdf
Size:
1.21 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: