Department of Mathematical and Computational Sciences
Permanent URI for this community
Browse
Browsing Department of Mathematical and Computational Sciences by Title
Now showing 1 - 20 of 69
Results Per Page
Sort Options
Item Open Access Alternative methods for solving nonlinear two-point boundary value problems(2018-03-18) Ghomanjani, Fateme; Shateyi, StanfordIn this sequel, the numerical solution of nonlinear two-point boundary value problems (NTBVPs) for ordinary di erential equations (ODEs) is found by Bezier curve method (BCM) and orthonormal Bernstein polynomials (OBPs). OBPs will be constructed by Gram-Schmidt technique. Stated methods are more easier and applicable for linear and nonlinear problems. Some numerical examples are solved and they are stated the accurate findings.Item Open Access Analysis of a boundary value problem for a system on non-homogeneous ordinary differential equations (ODE), with variable coefficients(2015-01-16) Makhabane, Paul Suunyboy; Hlomuka, V. J.; Garira, W.Item Embargo Assessing models for de-identification of Electronic Discharge Summary Using Machine Learning tools(2024-09-06) Mudau, Tshilisanani; Garira, Winston; Netshikweta, RendaniBackground: De-identification is a technique that eliminates identifying information from Clinical Records in order to protect individual privacy. This procedure decreases the chance of personal information being collected, processed, distributed, and published from being used to identify the person. When Machine Learning techniques were included in the de-identification process, it substantially improved over the previous method. Research Problem: The Electronic Discharge Summary(EDS) has evolved into a significantly improved technique of providing discharge summaries though this information contains Protected Health Information (PHI), which poses a risk to patients’ privacy. This makes the process of de-identification to be mandatory. There have lately been several Machine Learning approaches to de-identify data. This study focuses on applying Machine Learning techniques to figure out which model can best de-identify a data set. Methods: The open source data set from Harvard Medical School was used. This data set contains 899 Electronic Health Records (EHR), 669 for training and 220 for test purpose. The Conditional Random Fields (CRF), Long Short Term Memory (LSTM) and Random Forest models were used, and the performance of each model was assessed. Findings: In order to assess each model’s performance, evaluation metrics were used to compare F-measure, Recall and Precision at token level to determine which Machine Learning model performed best. The Long Short Term Memory was found to outperform both Conditional Random Fields and Random Forest with micro average F-measure, Recall and precision of 99%, and macro average F-measure of 77%, Recall of 73% and Precision of 90%.Item Open Access A Bayesian multilevel model for women unemployment in South Africa(2021-08) Ramarumo, V. P.; Bere, A.; Sigauke, CastonThe study is aimed at investigating and explaining the demographic and socio-economic determinants components a ecting women unemployment in South Africa. The classical and the Bayesian estimation approach were applied to a multilevel logistic regression (MLR) model. Secondary data acquired from the Demographic and Health survey (DHS) held in South Africa in 2016 was used in the study. Information criteria revealed that the random intercept model outperformed the MLR model of the null and random coe cient multilevel models. The Intraclass Correlation Coe cient (ICC) proposes that there is an understandable di erence in women unemployment level over various provinces of South Africa. The results of the classical MLR and the Bayesian MLR indicate in ated commonness for women unemployment and the chance of being without employment for women was established to decrease with an increase of age, wealth index, and educational attainment.Item Open Access A class of efficient iterative solvers for the steady state incompressible fluid flow : a unified approach(2016-02-01) Muzhinji, Kizito;Item Open Access Commodity Futures Market Prices: Decomposition Approach(2023-10-05) Antwi, EmmanuelFinancial investments on commodity markets have attracted many investigations due to its importance to the global economy, and worldwide trade as a whole. The radical price changes in commodity market prices, especially agricultural, energy and industrial metal products have significant consequences on consumers and producers of economic activities. It is very crucial to accurately estimate and predict volatility in commodity futures market prices, since continuous price fluctuations have dire consequences for investors, portfolio managers, dealers and policymakers in taking prudent and sustainable decisions. Commodity price component determination and forecasting are challenging due to remarkable price volatility, uncertainty, and complexity in the futures market. As a result, commodity futures price series is nonlinear and nonstationary. Various studies are reported in the literature, in an attempt to develop models to study the persistent changes in the commodity futures price series, but these models have failed to account for the inherent complexity in the commodity futures price series. This study aims to use decomposition techniques, combined with back-propagation neural network (BPNN) and autoregressive integrated moving average (ARIMA) models to address difficulties in studying commodity futures market prices. As said earlier, this study utilized the decomposition methods, Empirical Mode Decomposition (EMD) and Variational Mode Decomposition (VMD), to analyze the daily real price series of three commodity futures market prices of: corn from agricultural products, crude oil from energy, and gold from industrial metal, using the data from 4th May 2016 to 30th April 2021. In the first part of the study, we explored the descriptive and statistical properties of the data. It was found that the three commodities market futures prices series were nonstationary and nonlinear. Subsequently, we performed an EMD-Granger causality test to establish the spillover effects among the three commodities’ markets. It was revealed that there exists a strong mutual relationship among the three commodity markets price series, which implies that the price movement of one market can be used to explain the price fluctuations of the other markets. In the second part, the EMD and VMD methods were applied to decompose the daily data of each commodity price from different periods and frequencies to their respective individual intrinsic mode functions. First, we used the Hierarchical Clustering Method and Euclidean Distance Approach to classify the IMFs, residue, and modes into high-frequency, low-frequency, and trend. Next, applying statistical measures, particularly, the Pearson product-moment correlation coefficient, Kendall rank correlation, and Spearman rank correlation coefficient, we observed that the trend and low-frequency parts of the market prices are the main drivers of commodity futures markets prices’ fluctuations and that special events caused the low frequency. In essence, commodity futures prices are affected by economic development rather than short-lived market variations caused by ordinary supply-demand disequilibrium. The third part compared the EMD and the VMD- based models using three forecasting performance evaluation criteria and statistical measures, such as, mean absolute error (MAE), root mean square error (RMSE), and mean percentage error (MAPE) to compare the capabilities of the suggested models. We also introduced Diebold Mariano (DM) test in selecting the optimal models for each commodity, since MAE, RMSE and MAPE have some shortcomings. The combined models outperformed the individual back propagation neural network (BPNN) and autoregressive integrated moving average (ARIMA) models in forecasting the series of corn and crude oil’s futures prices. At the same time, BPNN emerged as the optimal model for predicting gold futures prices’ series. In addition, variational mode decomposition emerged as the ideal data pre-treatment method and contributed to enhancing the predicting ability of the BPNN and the ARIMA models. The empirical results showed that models combined with decomposition methods predict commodity futures prices accurately and can easily capture the volatility in commodity futures prices. By utilizing the decomposition-based models in studying commodity market prices, the study filled the following gap in the existing literature as follows: the pre-treatment effect of the EMD and VMD can be compared horizontally, in decomposing commodity market price series and studying the underlying components that cause the above mentioned commodity markets price fluctuations is a novel approach in studying commodity market prices. In addition, utilizing Hierarchical Clustering and Euclidean Distance Approaches, the IMFs, residue and modes were classified into their distinctive frequencies, namely, high-frequency, low-frequency, and trend units. The effect of these frequencies and trends on commodity market price fluctuation is the first of its kind in the literature. Furthermore, applying statistical measures such as Pearson product-moment correlation coefficient, Kendall rank correlation, and Spearman rank correlation coefficient to evaluate the contribution of the IMFs, residue, and modes to the net variance of the volatility of crude oil, corn, and gold markets price fluctuations, is an innovative approach to studying financial times series. The EMD-Causality technique proposed to study the causal relationship of corn, crude oil, and gold futures prices movement, is novel in the financial market. This new approach to study price movement of commodity markets, will provide a vital information about one commodity market to explain the other commodity market price fluctuations in various markets. Also, Decomposition of financial data before forecasting have high forecasting precision accuracy in commodity futures price prediction. Additionally, using decomposition techniques in agriculture, energy, and industrial metal commodities futures markets, effectively, minimizes the prediction complexity. Furthermore, using econometric and machine learner models incorporated with decomposition methods can capture the price series information up to acceptable degrees. Finally, decomposition-based predicting techniques can effectively raise the predicting performance capability of BPNN and ARIMA models and reduce errors, thus, the proposed novel combination method can statistically improve forecast accuracy. This study, therefore, may assist in arresting the agricultural, energy, and industrial commodities markets trends and estimate volatility risk factors accurately, consequently serving as a guide for investors, governments policymakers and related sectors such as agriculture, energy, and metal industry to take prudent and sustainable planning and investment decisions. The suggested decomposition strategy, particularly VMD-based is robust in analyzing the determinants, modeling, and forecasting commodity futures market prices fluctuations, thereby, improving forecasting precision accuracy. Remarkably, in using the decomposition approach in estimating compositions of commodity prices data series separately, different predicting strategies can be explored. For instance, based on the features of decomposed IMFs or modes, a suitable predicting technique can be considered to forecast each IMF or mode; for example, the residue can be estimated by utilizing a polynomial function, while Fourier transform can be considered in predicting low-frequency IMFs or modes, hence, it is recommended that researchers, institutions, investors, and policymakers interested in studying commodity price movements should consider using this novel technique to achieve better results. It is further suggested that the decomposition approach could be utilized in other fields of study to prove the approach’s generality. Finally, further study can extend the proposed methodology by considering other decompositions techniques rather than just EMD and VMD and evaluate their robustness in studying financial markets, as EMD approach has the problem of mode mixing and endpoint effects. Eventually, we propose that a new model or consolidated predicting technique should be investigated to cater for special events’ influences on commodity market prices since no one can predict the time and the place they will occur.Item Open Access Comparative analysis of Machine Learning Algorithms for Estimating Global Solar Radiation at Selected Weather Stations in Vhembe District Municipality(2023-10-05) Marandela, Mulalo Veronica; Mulaudzi, T. S.; Maluta, N. E.hstimating anct assessing the energy talling in a particular area 1s essential tor installers ot renewable technologies. Different equations have been applied as the most reliable empir ical for estimating global solar radiation(GSR) in different climatic conditions. The main objective of this work is to estimate the global solar radiation of two stations namely, Mu tale and Messina found in Vhembe District, Limpopo Province, South Africa. Four different methods (Random forest(RF) regression, K-nearest neighour (K-NN), Support Vector Ma chines(SVM) and Extreme Gradient Boosting mechanism(XGBoost)) is used to estimate the GRS in this study. The RF model on Mutale station was found to be the best fitting model with R² = 0.9902, MSE = 0.4085 and RMSE = 0.6391, followed by XGB with R² = 0.9898, MSE = 0.4245 and RMSE = 0.6515. RF was also found to be the best for Messina station with R² = 0.9636, MSE = 0.1.4138 and RMSE = 1.1890, followed by XGB model with R² = 0.9595, MSE = 1.5723 and RMSE = 1.2539. From the results, it can be concluded that RF is a better model for estimating GSR for different stations.Item Open Access A comparison of some methods of modeling baseline hazard function in discrete survival models(2019-09-20) Mashabela, Mahlageng Retang; Bere, Alphonce; Sigauke, CastonThe baseline parameter vector in a discrete-time survival model is determined by the number of time points. The larger the number of the time points, the higher the dimension of the baseline parameter vector which often leads to biased maximum likelihood estimates. One of the ways to overcome this problem is to use a simpler parametrization that contains fewer parameters. A simulation approach was used to compare the accuracy of three variants of penalised regression spline methods in smoothing the baseline hazard function. Root mean squared error (RMSE) analysis suggests that generally all the smoothing methods performed better than the model with a discrete baseline hazard function. No single smoothing method outperformed the other smoothing methods. These methods were also applied to data on age at rst alcohol intake in Thohoyandou. The results from real data application suggest that there were no signi cant di erences amongst the estimated models. Consumption of other drugs, having a parent who drinks, being a male and having been abused in life are associated with high chances of drinking alcohol very early in life.Item Embargo Comparison of Some Statistical and Machine Learning Models for Continuous Survival Analysis(2024-09-06) Ndou, Sedzani Emanuel; Mulaudzi, T. B.; Bere, A.While statistical models have been traditionally utilized, there is a growing interest in exploring the potential of machine learning techniques. Existing literature shows varying results on their performance which is based on the dateset employed. This study will conduct a comparative evaluation of the predictive accuracy of both statistical and machine learning models for continuous survival analysis utilizing two distinct datasets: time to first alcohol intake and North Carolina recidivism data. LassoCV was used to select variables for both datasets by encouraging limited coefficient estimates. Kaplan-Meier survival curves were utilized to compare the survival distributions among groups of variables incorporated in the model, alongside the logrank test. The proposed methods include the Cox Proportional Hazards, Lasso-regularized Cox, Survival Trees, Random Survival Forest, and Neural Networks. Model performance was evaluated using Integrated Brier score (IBS), Area Under the Curve and Concordance index. Our findings shows consistent dominance of Neural Network (NN) and Random Survival Forest (RSF) models across multiple metrics for both datasets. Specifically, Neural Network demonstrates remarkable performance, closely followed by RSF, CoxPH and CoxLasso models with slightly lower performance, and Survival Tree (ST) consistently lags behind. This study can contribute to advancing knowledge and provides practical guidance for improving survival in recidivism and alcohol intakeItem Open Access Computational analysis of magnetohydrodynamics boundary layer flow of nanofluid over a stretching sheet in the presence of heat generation or absorption and chemical reaction(2022-07-15) Molaudzi, Vhutshilo; Shateyi, S.; Muzhinji, K,In this study, we present the effect of two-dimensional magnetohydrodynamics of a nanofluid over a stretching sheet in the presence of chemical reaction, as well as heat generation or absorption. The partial differential equations are reduced to coupled nonlinear ordinary differential equations using similarity transformations, which are then solved numerically using spectral local linearization and spectral relaxation methods. The effects of different parameters, Lewis number, Eckert number, stretching, chemical reaction, local Reynolds number, Prandtl number, constant, heat source, Brownian motion, and Thermophoresis are analysed and compared. The numerical results for velocity, temperature, skin friction coefficient, concentration, Sherwood number, and Nusselt number are presented in tabular form and visualized graphically. The findings of the spectral local linearization and spectral relaxation methods are very similar to the bvp4c method’s results. When compared to the spectral relaxation method, the results from the spectral local linearization method were more effective. We found that the velocity profile are increased with increasing values of the Grashof number (Gr). Since Grashof number (Gr) is ratio of buoyancy to viscous forces in the boundary layer it causes an increase in the buoyancy forces relative to the viscous forces which influence the velocity in the boundary layer region. An increase in the heat source/sink parameter (S) results in the increase in velocity and temperature, but a decrease in concentration. The concentration diffusion species were reduced due to the heat source/sink parameter (S). The results also show that heat generation increases the momentum and thermal boundary layer thickness while decreasing the nanofluid concentration boundary layer thickness.Item Open Access Credit Card Fraud Detection using Boosted Random Forest Algorithm(2023-10-05) Mashamba, Thanganedzo Beverly; Chagwiza, W.; Garira,W.Financial fraud is a growing concern with far-reaching concerns in financial institutions, government, and corporate organizations, leading to substantial monetary losses. The primary cause of financial loss is credit card fraud; it affects issuers and clients, which is a significant threat to the business as clients will run to their competitors, wherein they will feel secure. Solving fraud problems is beyond human capability, so financial institutions can utilize machine learning algorithms to detect fraudulent behaviour by learning through credit card transactions. This thesis develops the boosted random forest, integrating an adaptive boosting algorithm into a random forest algorithm, such that the performance of a model is improved in predicting credit card fraudulent transactions. The confusion matrix is used to evaluate the performance of the models, wherein random forest, adaptive boosting and boosted random forest were compared. The results indicated that the boosted random forest outperformed the individual models with an accuracy of 99.9%, which corresponded with the results from confusion matrix. However random forest and adaptive boosting had 100% and 99% respectively, which did not correspond to the results on confusion matrix, meaning the individual models need to be more accurate. Thus, by implementing the proposed approach to a credit card management system, financial loss will be reduced to a greater extent.Item Open Access Determination of factors contributing towards women's unemployment in the Capricorn and Sekhukhune districts in the Limpopo Province(2017-09-18) Maboko, Tumisho; Kyei, K. A.See the attached abstract belowItem Open Access Determination of factors that influence digit preference: A Case study of South African Census 2011 Age-Sex date(2020-01) Netshiozwi, Masala; Kyei, K. A.; Moyo, S.The age distribution of a population is one of the most important demographic factors that plays a major role in describing and making projections about the population. Age distribution determine life expectancy, fertility and migration. It suffers most of the difficulties with regard to its accuracy, due to age misstatement and other factors. The study sought to determine the factors that influence digit preference in Age data using the South African census 2011 Age-sex data. Various methods were applied to examine the objectives of the study. The Visual Inspection methods (Line graph and Population Pyramid), Statistical methods (Age Ratio and Sex Ratio) and multivariate methods (Generalized linear model, Principal Component analysis and Regression analysis) which have been reviewed in detail in the study. This study utilized a full age dataset in single years. Based on the United Nation Age-sex Accuracy Index which was found to be 18.3, it shows that the data collected was of good quality. Besides the results deduced from the analysis to determine the quality of data, the study found that education level, place of residence, gender and ethnic group are the factors that influence digit preference. This was provided as evidence by calculated p-values <0.05, showing a positive relationship for generalized linear model. Principal component analysis and Regression analysis confirm the findings by Generalized linear modelItem Open Access The Development and Application of Coupled Multiscale Models of Malaria Disease System(2022-11-10) Maregere, Bothwell; Garira, W.; Mathebula, D.The purpose of this thesis is to develop coupled multi-scale dynamics of infectious disease systems. An infectious disease system consists of three subsystems interacting, which are the host, the pathogen, and the environment. Each level has two different interaction scales (micro-scale and macro-scale) and is organized into hierarchical levels of an organization, from the cellular level to the macro-ecosystem level, and is arranged into hierarchical levels of an organization. There are two main theories of infectious diseases: (i) the transmission mechanism theory, (ii) the replication-transmission relativity theory. A significant difference exists between these theories in that (i) the transmission mechanism theory considers transmission to be the primary cause of infectious disease spread at the macro-scale, while (ii) replicationtransmission relativity theory is an extension of the first theory. It is important to consider the interaction between two scales when pathogen replication occurs within the host and transmission occurs between hosts (macro-scale). Our research primarily focuses on the replication-transmission relativity theory of pathogens. The main purpose of this study is to develop coupled multi-scale models of direct vectorborne diseases using malaria as a paradigm. We have developed a basic coupled multi-scale model with a combination of two other categories of multi-scale models, which are a nested multi-scale model in the human host and an embedded multi-scale model in the mosquito host. The developed multi-scale model consists of approaches of nonlinear differential equations that are employed to provide the mathematical results to the underlying issues of the multi-scale cycle of pathogen replication and transmission of malaria disease. Stability analyses of the models were evolved to substantiate that the infection-free equilibrium is locally and globally asymptotically stable whenever R0 < 1, and the endemic equilibrium exists and is globally asymptotically stable whenever R0 > 1. We applied the vaccination process as a governing measure on the multi-scale model of malaria with mosquito life cycle by comprising the three stages of vaccination, namely pre-erythrocyte stage vaccines, blood stage vaccines and transmission stage vaccines. The impact of vaccination on malaria disease has been proven. Through numerical simulation, it was found that when the comparative of vaccination efficacy is high, the community pathogen load (GH and PV ) decreases and the reproductive number can be reduced by 89.09%, that is, the transmission of malaria can be reduced on the dynamics of individual level and population-level.We also evolved the multi-scale model with the human immune response on a within-human sub-model which is stimulated by the malaria parasite. We investigated the effect of immune cells on reducing malaria infection at both the betweenhost scale and within-host scale. We incorporate the environmental factor, such as temperature in the multi-scale model of the malaria disease system with a mosquito life cycle. We discovered that as the temperature enhances the mosquito population also increases which has the impact of increasing malaria infection at the individual level and at the community-scale. We also investigated the influence of the mosquito life cycle on the multi-scale model of the malaria disease system. The increase in eggs, larval and pupal stages of mosquitoes result in the increase of mosquito density and malaria transmission at the individual level and community-scale. Therefore, the suggestion is that immature and mature mosquitoes be controlled to lessen malaria transmission. The results indicated that the combination of malaria health interventions with the highest efficacy has the influence of reducing malaria infection at the populationlevel. Models developed and analyzed in this study can play a significant role in preventing malaria outbreaks. Using the coupled multi-scale models that were developed in this study, we made conclusions about the malaria disease system based on the results obtained. It is possible to apply the multi-scale framework in this study to other vector-borne diseases as well.Item Open Access Discrete survival models with flexible link functions for age at first marriage among woman in Swaziland(2019-05-18) Nevhungoni, Thambeleni Portia; Bere, A.; Sigauke, C.This study explores the use of exible link functions in discrete survival models through a simulation study and an application to the Swaziland Demographic and Health Survey (SDHS) data. The objective of the research study is to perform simulation exercises in order to compare the e ectiveness of di erent families of link functions and to construct a discrete multilevel survival model for age at rst marriage among women in Swaziland using a exible link function. The Pareto hazard model, Pregibon and Gosset families of link functions were considered in models with and without unobserved heterogeneity. The Pareto model where the family parameter is estimated from the data was found to outperform the other models, followed by the Pregibon and the Gosset family of link functions. The results from both simulation study and real data analysis of the SDHS data illustrated that, misspecication of the link function causes bias on the estimation of results. This demonstrates the importance of choosing the right link. The ndings of this study reveal that women who are highly educated, stay in the Manzini and Shiselweni region, those who reside in urban areas were more likely to marry later compared to their counterparts in Swaziland. The results also reveal that the proportion of early rst marriages is declining since the di erence among birth cohorts is found to be very high, with women of younger cohorts getting married later compared to older women.Item Open Access Existence and Uniqueness of a solution to a flow problem about a Rotating Obstacle at low Reynolds number(2015-05) Nyathi, Freeman; Moyo, S.See the attached abstract belowItem Open Access Exploring the Multi-scale character of infectious disease dynamics(2023-05-19) Mufoya, Blessings; Garira, W.; Mathebula, D.This research study characterised multiscale models of infectious disease dynamics. This was achieved by establishing when it is appropriate to implement particular mathematical methods for different multiscale models. The study of infectious disease systems has been elucidated ever since the discovery of mathematical modelling. Due to the vast complexities in the dynamics of infectious disease systems, modellers are increasingly gravitating towards multiscale modelling approach as a favourable alternative. Among the diseases that have persistently plagued most developing countries are vector-borne diseases like Malaria and directly transmitted diseases like Foot-and-Mouth disease (FMD). Globally, FMD has caused major losses in the economic sector (particularly agriculture) as well as tourism. On the other hand, Malaria remains amongst the most severe public health problems worldwide with millions of people estimated to live in permanent risk of contracting the disease. We developed multiscale models that can describe both local transmission and global transmission of infectious disease systems at any hierarchical level of organization using FMD and Malaria disease as paradigms. The first stage in formulating the multiscale models in this study was to integrate two submodels namely: (i) the between-host submodel and (ii) within-host submodel of an infectious disease system using the nested approach. The outcome was a system of nonlinear ordinary differential equations which described the local transmission mechanism of the infectious disease system. The next step was to incorporate graph theoretic methods to the system of differential equations. This approach enabled modelling the migration of humans/animals between communities (also called patches or geographical distant locations) thereby describing the global transmission mechanism of infectious disease systems. At whole organism-level we considered the organs in a host as patches and the transmission within-organ scale as direct transmission represented by ordinary differential equations. However, at between-organ scale there was movement of pathogen between the organs through the blood. This transmission mechanism called global transmission was represented by graph-theoretic methods. At macrocommunity-level we considered communities as patches and established that at withincommunity scale there was direct transmission of pathogen represented by ordinary differental equations and at between-community scale there was movement of infected individuals. Furthermore, the systems of differential equations were extended to stochastic differential equations in order to incorporate randomness in the infectious disease dynamics. By adopting a cocktail of computational and analytical tools we sufficiently analyzed the impact of the transmission mechanisms in the different multiscale models. We established that once we used a graph-theoretic method at host level it would be difficult to extend this to community level. However, when we used different methods then it was easy to extend to community level. This was the main aspect of the characterization of multiscale models that we investigated in this thesis which has not been done before. We also established distinctions between local transmission and global transmission mechanisms which enable us to implement intervention strategies targeted torwards both local transmission such as vaccination and global transmission such as travel restrictions. In spite of the fact that the results collected in this study are restricted to FMD and Malaria, the multiscale modelling frameworks established are suitable for other directly transmitted diseases and vector-borne diseases.Item Open Access Factors associated with maternal mortality in South Africa (2003-2008)(2015-03-02) Mukondeleli, Livhuwani Ellen; Amey, A. K. A.; Kyei, K. A.Item Open Access Forecasting Foreign Direct Investment in South Africa using Non-Parametric Quantile Regression Models(2019-05-16) Netshivhazwaulu, Nyawedzeni; Sigauke, C.; Bere, A.Foreign direct investment plays an important role in the economic growth process in the host country, since foreign direct investment is considered as a vehicle transferring new ideas, capital, superior technology and skills from developed country to developing country. Non-parametric quantile regression is used in this study to estimate the relationship between foreign direct investment and the factors in uencing it in South Africa, using the data for the period 1996 to 2015. The variables are selected using the least absolute shrinkage and selection operator technique, and all the variables were selected to be in the models. The developed non-parametric quantile regression models were used for forecasting the future in ow of foreign direct investment in South Africa. The forecast evaluation was done for all models and the laplace radial basis kernel, ANOVA radial basis kernel and linear quantile regression averaging were selected as the three best models based on the accuracy measures (mean absolute percentage error, root mean square error and mean absolute error). The best set of forecast was selected based on the prediction interval coverage probability, Prediction interval normalized average deviation and prediction interval normalized average width. The results showed that linear quantile regression averaging is the best model to predict foreign direct investment since it had 100% coverage of the predictions. Linear quantile regression averaging was also con rmed to be the best model under the forecast error distribution. One of the contributions of this study was to bring the accurate foreign direct investment forecast results that can help policy makers to come up with good policies and suitable strategic plans to promote foreign direct investment in ows into South Africa.Item Open Access Forecasting hourly electricity demand in South Africa using machine learning models(2020-08-12) Thanyani, Maduvhahafani; Sigauke, Caston; Bere, AlphonceShort-term load forecasting in South Africa using machine learning and statistical models is discussed in this study. The research is focused on carrying out a comparative analysis in forecasting hourly electricity demand. This study was carried out using South Africa’s aggregated hourly load data from Eskom. The comparison is carried out in this study using support vector regression (SVR), stochastic gradient boosting (SGB), artificial neural networks (NN) with generalized additive model (GAM) as a benchmark model in forecasting hourly electricity demand. In both modelling frameworks, variable selection is done using least absolute shrinkage and selection operator (Lasso). The SGB model yielded the least root mean square error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE) on testing data. SGB model also yielded the least RMSE, MAE and MAPE on training data. Forecast combination of the models’ forecasts is done using convex combination and quantile regres- sion averaging (QRA). The QRA was found to be the best forecast combination model ibased on the RMSE, MAE and MAPE.