Department of Mathematical and Computational Sciences
Permanent URI for this community
Browse
Browsing Department of Mathematical and Computational Sciences by Author "Bere, A."
Now showing 1 - 10 of 10
Results Per Page
Sort Options
Item Open Access A Bayesian multilevel model for women unemployment in South Africa(2021-08) Ramarumo, V. P.; Bere, A.; Sigauke, CastonThe study is aimed at investigating and explaining the demographic and socio-economic determinants components a ecting women unemployment in South Africa. The classical and the Bayesian estimation approach were applied to a multilevel logistic regression (MLR) model. Secondary data acquired from the Demographic and Health survey (DHS) held in South Africa in 2016 was used in the study. Information criteria revealed that the random intercept model outperformed the MLR model of the null and random coe cient multilevel models. The Intraclass Correlation Coe cient (ICC) proposes that there is an understandable di erence in women unemployment level over various provinces of South Africa. The results of the classical MLR and the Bayesian MLR indicate in ated commonness for women unemployment and the chance of being without employment for women was established to decrease with an increase of age, wealth index, and educational attainment.Item Embargo Comparison of Some Statistical and Machine Learning Models for Continuous Survival Analysis(2024-09-06) Ndou, Sedzani Emanuel; Mulaudzi, T. B.; Bere, A.While statistical models have been traditionally utilized, there is a growing interest in exploring the potential of machine learning techniques. Existing literature shows varying results on their performance which is based on the dateset employed. This study will conduct a comparative evaluation of the predictive accuracy of both statistical and machine learning models for continuous survival analysis utilizing two distinct datasets: time to first alcohol intake and North Carolina recidivism data. LassoCV was used to select variables for both datasets by encouraging limited coefficient estimates. Kaplan-Meier survival curves were utilized to compare the survival distributions among groups of variables incorporated in the model, alongside the logrank test. The proposed methods include the Cox Proportional Hazards, Lasso-regularized Cox, Survival Trees, Random Survival Forest, and Neural Networks. Model performance was evaluated using Integrated Brier score (IBS), Area Under the Curve and Concordance index. Our findings shows consistent dominance of Neural Network (NN) and Random Survival Forest (RSF) models across multiple metrics for both datasets. Specifically, Neural Network demonstrates remarkable performance, closely followed by RSF, CoxPH and CoxLasso models with slightly lower performance, and Survival Tree (ST) consistently lags behind. This study can contribute to advancing knowledge and provides practical guidance for improving survival in recidivism and alcohol intakeItem Open Access Discrete survival models with flexible link functions for age at first marriage among woman in Swaziland(2019-05-18) Nevhungoni, Thambeleni Portia; Bere, A.; Sigauke, C.This study explores the use of exible link functions in discrete survival models through a simulation study and an application to the Swaziland Demographic and Health Survey (SDHS) data. The objective of the research study is to perform simulation exercises in order to compare the e ectiveness of di erent families of link functions and to construct a discrete multilevel survival model for age at rst marriage among women in Swaziland using a exible link function. The Pareto hazard model, Pregibon and Gosset families of link functions were considered in models with and without unobserved heterogeneity. The Pareto model where the family parameter is estimated from the data was found to outperform the other models, followed by the Pregibon and the Gosset family of link functions. The results from both simulation study and real data analysis of the SDHS data illustrated that, misspecication of the link function causes bias on the estimation of results. This demonstrates the importance of choosing the right link. The ndings of this study reveal that women who are highly educated, stay in the Manzini and Shiselweni region, those who reside in urban areas were more likely to marry later compared to their counterparts in Swaziland. The results also reveal that the proportion of early rst marriages is declining since the di erence among birth cohorts is found to be very high, with women of younger cohorts getting married later compared to older women.Item Open Access Forecasting Foreign Direct Investment in South Africa using Non-Parametric Quantile Regression Models(2019-05-16) Netshivhazwaulu, Nyawedzeni; Sigauke, C.; Bere, A.Foreign direct investment plays an important role in the economic growth process in the host country, since foreign direct investment is considered as a vehicle transferring new ideas, capital, superior technology and skills from developed country to developing country. Non-parametric quantile regression is used in this study to estimate the relationship between foreign direct investment and the factors in uencing it in South Africa, using the data for the period 1996 to 2015. The variables are selected using the least absolute shrinkage and selection operator technique, and all the variables were selected to be in the models. The developed non-parametric quantile regression models were used for forecasting the future in ow of foreign direct investment in South Africa. The forecast evaluation was done for all models and the laplace radial basis kernel, ANOVA radial basis kernel and linear quantile regression averaging were selected as the three best models based on the accuracy measures (mean absolute percentage error, root mean square error and mean absolute error). The best set of forecast was selected based on the prediction interval coverage probability, Prediction interval normalized average deviation and prediction interval normalized average width. The results showed that linear quantile regression averaging is the best model to predict foreign direct investment since it had 100% coverage of the predictions. Linear quantile regression averaging was also con rmed to be the best model under the forecast error distribution. One of the contributions of this study was to bring the accurate foreign direct investment forecast results that can help policy makers to come up with good policies and suitable strategic plans to promote foreign direct investment in ows into South Africa.Item Open Access Hierarchical forecasting of monthly electricity demand(2022-07-15) Chauke, Ignitious; Sigauke, C.; Bere, A.Energy demand forecasting is a vital tool for energy management, maintenance planning, environmental security, and investment decision-making in liberalised energy markets. The mini-dissertation investigates ways to anticipate power usage using hierarchical time series and South African data. Approaches such as topdown, bottom-up, and optimal combination are applied. Top-down forecasting is based on disaggregating total series projections and spreading them down the hierarchy based on historical data proportions. The bottom-up strategy aggregates individual projections at lower levels, whereas the optimal combination methodology optimally combines bottom forecasts. An out-of-sample prediction performance evaluation was performed to assess the models’ predicting ability. The best model was chosen using mean absolute percentage error. The top-down technique based on predicted proportions (Top-down forecasted proportions) was superior to the optimal combination and bottom-up approach. To integrate forecasts and build prediction ranges for the proposed models, linear quantile regression, linear regression, simple average, and median were used. The best set of forecasts was picked based on the prediction interval normalised average width. At 95%, the best model based on the prediction interval normalised average width was a simple average.Item Open Access Multilevel modelling of determinants of contraceptive method choice among women in South Africa(2021-04-28) Nematswerani, Phumudzo; Bere, A.; Sigauke,C.Multilevel models take into account various degrees of aggregation in the data. This study aims to bring together multilevel models from both frequentist and Bayesian perspectives in identifying determinants of contraceptive choices. The study uses the data from the 2016 South African Demographic and Health Survey (SADHS). To analyse the dataset, a multinomial logistic regression model has been used, model parameters were estimated in SPSS for frequentist models. The Bayesian analyses with non informative priors were strengthened by the use of the state of the art Hamiltonian Monte Carlo algorithm (HMC), as implemented in the RStan package in the R statistical software. The Bayesian nal model was selected based on Watanabe{Akaike information criterion (WAIC), which has been shown to outperform conventional information-criterion such as DIC. The results established that an individual woman's choice of contraception is a function of both individual characteristics and community e ects. In bivariate analysis, injections showed a continued dominance as a preferred choice in SA. Community level education was the most useful determinant of contraceptive choices. Thus, this study recommends that Empowering woman through education, will have a positive e ect on overall contraceptive prevalence.Item Open Access Probabilistic solar power forecasting using partially linear additive quantile regression models: an application to South African data(2019-05-18) Mpfumali, Phathutshedzo; Sigauke, C.; Bere, A.; Mulaudzi, T, S,This study discusses an application of partially linear additive quantile regression models in predicting medium-term global solar irradiance using data from Tellerie radiometric station in South Africa for the period August 2009 to April 2010. Variables are selected using a least absolute shrinkage and selection operator (Lasso) via hierarchical interactions and the parameters of the developed models are estimated using the Barrodale and Roberts's algorithm. The best models are selected based on the Akaike information criterion (AIC), Bayesian information criterion (BIC), adjusted R squared (AdjR2) and generalised cross validation (GCV). The accuracy of the forecasts is evaluated using mean absolute error (MAE) and root mean square errors (RMSE). To improve the accuracy of forecasts, a convex forecast combination algorithm where the average loss su ered by the models is based on the pinball loss function is used. A second forecast combination method which is quantile regression averaging (QRA) is also used. The best set of forecasts is selected based on the prediction interval coverage probability (PICP), prediction interval normalised average width (PINAW) and prediction interval normalised average deviation (PINAD). The results show that QRA is the best model since it produces robust prediction intervals than other models. The percentage improvement is calculated and the results demonstrate that QRA model over GAM with interactions yields a small improvement whereas QRA over a convex forecast combination model yields a higher percentage improvement. A major contribution of this dissertation is the inclusion of a non-linear trend variable and the extension of forecast combination models to include the QRA.Item Open Access Renewable energy forecasting in South Africa(2021-06-12) Mamphaga, Ratshilengo; Sigauke, C.; Bere, A.Renewable energy forecasts are critical to renewable energy grids and backup plans, operational plans and short-term power purchases. This dissertation focused on forecasting solar irradiance at one radiometric station in South Africa using high-frequency data obtained from the Vuwani radiometric station (USAid Venda). The aim of this dissertation was to compare the predictive performance of the Genetic Algorithm (GA), recurrent neural networks (RNN) and k-nearest neighbour (KNN) models in forecasting short-term solar irradiance where KNN is used as a benchmark model. From the results it is discovered that the RNN is the best forecasting model in terms of the relative mean absolute error (rMAE). The forecasts of the machine learning algorithms combined using convex combination technique and quantile regression averaging (QRA) found that QRA is the best model. Predictive interval widths analysis with 95% level of confidence was performed and the results showed that QRA over RNN is the best model for forecasting solar irradiance when looking at the PICP and PANAW. The Diebold-Mariano test discovered that the tests fall between the -1.96 and 1.96 range, which tells us that it accepts the null hypothesis. The Murphy diagram presented and showed the 95% pointwise confidence intervals. The study will have an impact on the South African power utility decision-makers to align electricity demand and its supply in an efficient way that promotes potential economic growth and environmental sustainability.Item Open Access Short term load forecasting using quantile regression with an application to the unit commitment problem(2018-09-21) Lebotsa, Moshoko Emily; Sigauke, C.; Bere, A.Generally, short term load forecasting is essential for any power generating utility. In this dissertation the main objective was to develop short term load forecasting models for the peak demand periods (i.e. from 18:00 to 20:00 hours) in South Africa using. Quantile semi-parametric additive models were proposed and used to forecast electricity demand during peak hours. In addition to this, forecasts obtained were then used to nd an optimal number of generating units to commit (switch on or o ) daily in order to produce the required electricity demand at minimal costs. A mixed integer linear programming technique was used to nd an optimal number of units to commit. Driving factors such as calendar e ects, temperature, etc. were used as predictors in building these models. Variable selection was done using the least absolute shrinkage and selection operator (Lasso). A feasible solution to the unit commitment problem will help utilities meet the demand at minimal costs. This information will be helpful to South Africa's national power utility, Eskom.Item Open Access Variable selection in discrete survival models(2020-02-27) Mabvuu, Coster; Bere, A.; Sigauke, C.Selection of variables is vital in high dimensional statistical modelling as it aims to identify the right subset model. However, variable selection for discrete survival analysis poses many challenges due to a complicated data structure. Survival data might have unobserved heterogeneity leading to biased estimates when not taken into account. Conventional variable selection methods have stability problems. A simulation approach was used to assess and compare the performance of Least Absolute Shrinkage and Selection Operator (Lasso) and gradient boosting on discrete survival data. Parameter related mean squared errors (MSEs) and false positive rates suggest Lasso performs better than gradient boosting. Frailty models outperform discrete survival models that do not account for unobserved heterogeneity. The two methods were also applied on Zimbabwe Demographic Health Survey (ZDHS) 2016 data on age at first marriage and did not select exactly the same variables. Gradient boosting retained more variables into the model. Place of residence, highest educational level attained and age cohort are the major influential factors of age at first marriage in Zimbabwe based on Lasso.