Theses and Dissertations

Permanent URI for this collection

https://univendspace.univen.ac.za/handle/11602/2138

Browse

Now showing 1 - 20 of 66

Embargo
Comparison of Some Statistical and Machine Learning Models for Continuous Survival Analysis
(2024-09-06) Ndou, Sedzani Emanuel; Mulaudzi, T. B.; Bere, A.
While statistical models have been traditionally utilized, there is a growing interest in exploring the potential of machine learning techniques. Existing literature shows varying results on their performance which is based on the dateset employed. This study will conduct a comparative evaluation of the predictive accuracy of both statistical and machine learning models for continuous survival analysis utilizing two distinct datasets: time to first alcohol intake and North Carolina recidivism data. LassoCV was used to select variables for both datasets by encouraging limited coefficient estimates. Kaplan-Meier survival curves were utilized to compare the survival distributions among groups of variables incorporated in the model, alongside the logrank test. The proposed methods include the Cox Proportional Hazards, Lasso-regularized Cox, Survival Trees, Random Survival Forest, and Neural Networks. Model performance was evaluated using Integrated Brier score (IBS), Area Under the Curve and Concordance index. Our findings shows consistent dominance of Neural Network (NN) and Random Survival Forest (RSF) models across multiple metrics for both datasets. Specifically, Neural Network demonstrates remarkable performance, closely followed by RSF, CoxPH and CoxLasso models with slightly lower performance, and Survival Tree (ST) consistently lags behind. This study can contribute to advancing knowledge and provides practical guidance for improving survival in recidivism and alcohol intake.
Open Access
Probabilistic renewable energy modelling in South Africa
(2024-05-05) Ravele, Thakhani; Sigauke, Caston; Jhamba, Lodwell
The variability of solar power creates problems in planning and managing power system operations. It is critical to forecast accurately in order to maintain the safety and stability of large-scale integration of solar power into the grid. Accurate forecasting is vital because it prevents transmission obstruction and maintains a power equilibrium. This thesis uses robust models to solve this problem by addressing four main issues. The first issue involves the construction of quantile regression models for forecasting extreme peak electricity demand and determining the optimal number of units to commit at minimal costs for each period using the forecasts obtained from the developed models. The bounded variable mixed-integer linear programming (MILP) model solves the unit commitment (UC) problem. This is based on priority constraints where demand is first met from renewable energy sources followed by energy from fossil fuels. Secondly, the thesis discusses the modelling and prediction of extremely high quantiles of solar power. The methods used are a semi-parametric extremal mixture (SPEM), generalised additive extreme value (GAEV) or quantile regression via asymmetric Laplace distribution (QR-ALD), additive quantile regression with covariate t (AQR-1), additive quantile regression with temperature variable (AQR-2) and penalised cubic regression smoothing spline (benchmark) models. The predictions from this study are valuable to power utility decision-makers and system operators in knowing the maximum possible solar power which can be generated. This helps them make high-risk decisions and regulatory frameworks requiring high-security levels. As far as we know, this is the first application to conduct a comparative analysis of the proposed robust models using South African solar irradiance data. The interaction between global horizontal irradiance (GHI) and temperature helps determine the maximum amount of solar power generated. As temperature increases, GHI increases up to the point that it increases at a decreasing rate and then decreases. Therefore, system operators need to know the temperature range in which the maximum possible solar power can be generated. The study used the multivariate adaptive regression splines and extreme value theory to determine the maximum temperature to generate the maximum GHI ceteris paribus. Lastly, the study discusses extremal dependence modelling of GHI with temperature and relative humidity (RH) using the conditional multivariate extreme value (CMEV) and copula modes. Due to the nonlinearity and different structure of the dependence on GHI against temperature and RH, unlike previous literature, we use three Archimedean copula functions: Clayton, Frank and Gumbel, to model the dependence structure. This work was then extended by constructing a mixture copula model which combined the Frank and Gumbel models. One of the contributions of this thesis is the construction of additive quantile regression models for forecasting extreme quantiles of electrical load, which are then used in solving the UC problem with bounded MILP with priority constraints. The other contribution is developing a modelling framework that shows that GHI converges to its upper limit if temperature converges to the upper bound. Another contribution is constructing a mixture of some copulas for modelling the extremal dependence of GHI with temperature and RH. This thesis reveals the following key findings: (i) the additive quantile regression model is the best-fitting model for hours 18:00 and 19:00. In contrast, the linear quantile regression model is the best-fitting model for hours 20:00 and 21:00. The UC problem results show that using all the generating units, such as hydroelectric, wind power, concentrated solar power and solar photovoltaic is less costly. (ii) the AQR-2 was the best-fitting model and gave the most accurate prediction of quantiles at τ = 0.95, 0.97, 0.99 and 0.999, while at 0.9999- quantile, the GAEV model had the most accurate predictions. (iii) the marginal increases of GHI converge to 0.12 W/m2 when temperature converges to 44.26 ◦C and the marginal increases of GHI converge to −0.1 W/m2 when RH converges to 103.26%. Conditioning on GHI, the study found that temperature and RH variables have a negative extremal dependence on large values of GHI. (iv) the dependence structure between GHI and variable temperature and RH is asymmetric. Furthermore, the Frank copula is the best-fitting model for variable temperature and RH, implying the presence of extreme co-movements. The modelling framework discussed in this thesis could be useful to decisioniii makers in power utilities, who must optimally integrate highly intermittent renewable energies on the grid. It could be helpful to system operators that face uncertainty in GHI power production due to extreme temperatures and RH, including maintaining the minimum cost by scheduling and dispatching electricity during peak hours when the grid is constrained due to peak load demand.
Embargo
Comparison of Some Statistical and Machine Learning Models for Continuous Survival Analysis
(2024-09-06) Ndou, Sedzani Emanuel; Mulaudzi, T. B.; Bere, A.
While statistical models have been traditionally utilized, there is a growing interest in exploring the potential of machine learning techniques. Existing literature shows varying results on their performance which is based on the dateset employed. This study will conduct a comparative evaluation of the predictive accuracy of both statistical and machine learning models for continuous survival analysis utilizing two distinct datasets: time to first alcohol intake and North Carolina recidivism data. LassoCV was used to select variables for both datasets by encouraging limited coefficient estimates. Kaplan-Meier survival curves were utilized to compare the survival distributions among groups of variables incorporated in the model, alongside the logrank test. The proposed methods include the Cox Proportional Hazards, Lasso-regularized Cox, Survival Trees, Random Survival Forest, and Neural Networks. Model performance was evaluated using Integrated Brier score (IBS), Area Under the Curve and Concordance index. Our findings shows consistent dominance of Neural Network (NN) and Random Survival Forest (RSF) models across multiple metrics for both datasets. Specifically, Neural Network demonstrates remarkable performance, closely followed by RSF, CoxPH and CoxLasso models with slightly lower performance, and Survival Tree (ST) consistently lags behind. This study can contribute to advancing knowledge and provides practical guidance for improving survival in recidivism and alcohol intake
Embargo
Multiscale Modelling of Foodborne Diseases
(2024-09-06) Maphiri, Azwindini Delinah; Muzhinyi, K.; Garira, W.; Mathebula, D.
Infectious disease systems are essentially multiscale complex system wherein pathogens multiply within hosts, spread across people, and infect entire populations of hosts. The description of most biological processes involves multiple, interconnected phenomena occurring on different spatial and temporal scales in the human body. Traditional approaches for modelling infectious disease systems rely on the principles and concepts of the transmission mechanism theory that considers transmission to be the primary cause of infectious disease spread at the macroscale. Modellers of infectious diseases are increasingly using multiscale modelling approach in response to this challenge. Multiscale models of infectious disease systems encompass intricate structures that revolve around the interplay of three distinct sub-systems: the host, the pathogen, and the environmental subsystems. The replication-transmission relativity theory is a novel theory designed for the purpose of multiscale modeling of infectious disease systems, accounting for variations in time and space by incorporating pathogen replication that leads to transmission. Replicationtransmission relativity theory consists of seven distinct levels of organization within an infectious disease system, each level including the within-host scale (microscale) and between-host scale (macroscale). Five separate classifications of multiscale models can be formulated that integrate the microscale and macroscale. A research gap has been created in an attempt to establish a multiscale framework in order to understand the mechanisms on how foodborne pathogens cause infections on human beings and animals, as very little has been done in modelling of foodborne disease. The primary goal of this study is to create multiscale models for foodborne diseases to examine whether a mutual influence exists between the microscale and macroscale, guided by the principles of replication-relativity theory. The multiscale models are developed by considering three environmental transmitted diseases at host level caused by pathogens: norovirus, E. coli O157:H7 and taenia solium. We start by developing a single-scale model of foodborne diseases caused by viruses in general, which is then extended to create a multiscale model for norovirus. We formulate a non-standard finite difference scheme for the single-scale model, norovirus, and E. coli O157:H7. For taenia solium, we use ODE solvers in Python, specifically, ODE int function in the sci.integrate. The numerical findings from the study confirm the applicability of the replication-transmission relativity theory in cases where the reciprocal impact between the within-host scale and the between-host scale involves both infection/super-infection (for the effect of the between-host scale on the within-host scale) and pathogen excretion/shedding (for the effect of the within-host scale on the between-host scale). We expect that our study will help modellers integrate microscale and macroscale dynamics across various levels of organization within infectious disease systems.
Embargo
Long term peak electricity demand forecastion in South Africa using quantile regression
(2024-09-06) Maswanganyi, Norman; Sigauke, Caston; Ranganai Edmore
It is widely accepted that South Africa needs to maximise sustainable electricity supply growth to meet the new and growing demand for higher economic growth rates, especially in energy-intensive sectors. To diversify the energy mix, the country also needs to take urgent actions to ensure the sustainability of renewable energy and energy e ciency by 2030. Hence, it is important to provide a modelling framework for forecasting long-term peak electricity demand and quantifying uncertainty of future electricity demand for better electricity security management. In order to estimate and capture changes in long-term peak electricity demand, the study employed quantile regression (QR) based models, including hybrid models for assessing and managing electricity demand using South African data. The changes in long-term electricity demand depend on network location areas and the uncertainties within the energy sectors. Long-term peak electricity demand forecasting using QR models seems scarce in South Africa. The current study closes a gap by developing a modelling framework that can be used for future electricity demand forecasting. Although many studies have been done on short-, medium and long-term peak electricity demand forecasting, an investigation of the extremal quantile regression (EQR) model for forecasting electricity demand (based on combined economic and weather conditions) still needs to be explored as far as we know. Accurately predicting extreme electricity demand distributions would signi cantly mitigate load shedding and overloading and allow energy-e cient storage. This thesis identi es weather-related and non-weather-related factors using the EQR approach to modelling and estimating the error of extremely low and high quantiles of peak electricity demand. Results from the thesis show that EQR provides a higher level of detail and can model the non-central behaviour of electricity demand than the other models used in the study. The study has shown how the additive quantile regression (AQR) model can provide the highest predictive ability and create superior accuracy of the forecast results. Power systems reliability requires a probabilistic characterisation of extreme peak loads, which results in severe system stress and causes grid problems. Accurate predictions of long-term electricity demand are very important as such forecasts can be used in the timing and rate of occurrence of such extreme peak loads. The study used hybrid additive quantile regression coupled with autoregressive models and variable selection using Lasso for hierarchical interactions to examine the power system's reliability in random extreme peak loads.
Embargo
Assessing models for de-identification of Electronic Discharge Summary Using Machine Learning tools
(2024-09-06) Mudau, Tshilisanani; Garira, Winston; Netshikweta, Rendani
Background: De-identification is a technique that eliminates identifying information from Clinical Records in order to protect individual privacy. This procedure decreases the chance of personal information being collected, processed, distributed, and published from being used to identify the person. When Machine Learning techniques were included in the de-identification process, it substantially improved over the previous method. Research Problem: The Electronic Discharge Summary(EDS) has evolved into a significantly improved technique of providing discharge summaries though this information contains Protected Health Information (PHI), which poses a risk to patients’ privacy. This makes the process of de-identification to be mandatory. There have lately been several Machine Learning approaches to de-identify data. This study focuses on applying Machine Learning techniques to figure out which model can best de-identify a data set. Methods: The open source data set from Harvard Medical School was used. This data set contains 899 Electronic Health Records (EHR), 669 for training and 220 for test purpose. The Conditional Random Fields (CRF), Long Short Term Memory (LSTM) and Random Forest models were used, and the performance of each model was assessed. Findings: In order to assess each model’s performance, evaluation metrics were used to compare F-measure, Recall and Precision at token level to determine which Machine Learning model performed best. The Long Short Term Memory was found to outperform both Conditional Random Fields and Random Forest with micro average F-measure, Recall and precision of 99%, and macro average F-measure of 77%, Recall of 73% and Precision of 90%.
Open Access
Comparative analysis of Machine Learning Algorithms for Estimating Global Solar Radiation at Selected Weather Stations in Vhembe District Municipality
(2023-10-05) Marandela, Mulalo Veronica; Mulaudzi, T. S.; Maluta, N. E.
hstimating anct assessing the energy talling in a particular area 1s essential tor installers ot renewable technologies. Different equations have been applied as the most reliable empir ical for estimating global solar radiation(GSR) in different climatic conditions. The main objective of this work is to estimate the global solar radiation of two stations namely, Mu tale and Messina found in Vhembe District, Limpopo Province, South Africa. Four different methods (Random forest(RF) regression, K-nearest neighour (K-NN), Support Vector Ma chines(SVM) and Extreme Gradient Boosting mechanism(XGBoost)) is used to estimate the GRS in this study. The RF model on Mutale station was found to be the best fitting model with R² = 0.9902, MSE = 0.4085 and RMSE = 0.6391, followed by XGB with R² = 0.9898, MSE = 0.4245 and RMSE = 0.6515. RF was also found to be the best for Messina station with R² = 0.9636, MSE = 0.1.4138 and RMSE = 1.1890, followed by XGB model with R² = 0.9595, MSE = 1.5723 and RMSE = 1.2539. From the results, it can be concluded that RF is a better model for estimating GSR for different stations.
Open Access
Credit Card Fraud Detection using Boosted Random Forest Algorithm
(2023-10-05) Mashamba, Thanganedzo Beverly; Chagwiza, W.; Garira,W.
Financial fraud is a growing concern with far-reaching concerns in financial institutions, government, and corporate organizations, leading to substantial monetary losses. The primary cause of financial loss is credit card fraud; it affects issuers and clients, which is a significant threat to the business as clients will run to their competitors, wherein they will feel secure. Solving fraud problems is beyond human capability, so financial institutions can utilize machine learning algorithms to detect fraudulent behaviour by learning through credit card transactions. This thesis develops the boosted random forest, integrating an adaptive boosting algorithm into a random forest algorithm, such that the performance of a model is improved in predicting credit card fraudulent transactions. The confusion matrix is used to evaluate the performance of the models, wherein random forest, adaptive boosting and boosted random forest were compared. The results indicated that the boosted random forest outperformed the individual models with an accuracy of 99.9%, which corresponded with the results from confusion matrix. However random forest and adaptive boosting had 100% and 99% respectively, which did not correspond to the results on confusion matrix, meaning the individual models need to be more accurate. Thus, by implementing the proposed approach to a credit card management system, financial loss will be reduced to a greater extent.
Open Access
Solar power forecasting using Gaussian process regression
(2023-10-05) Chandiwana, Edina; Sigauke, Caston; Bere, Alphonce
Solar power forecasting has become an important aspect affecting crucial day-to-day activities in people's lives. Many African countries are now facing blackouts due to a shortage of energy. This has caused the urge to encourage people to use other energy sources to rise, resulting in different energy inputs into the main electricity grid. When the number of power sources being fed into the main grid increases, so does the need for efficient methods of forecasting these inputs. Thus, there is a need to come up with efficient prediction techniques inorder to facilitate proper grid management. The main goal of this thesis is to explore how Gaussian process predicting frameworks can be developed and used to predict global horiz0ontal irra- diance. Data on Global horizontal irrandiance and some weather variables collected from various meterological stations were made available through SAURAN (Southern African Universities Radiometric Network). The length of the dataset ranged from 496 to 17325 datapoints. Ve proposed using Gaussian process regression (GPR) to predict solar power generation. In South Africa, studies based on GPR regarding forecasting solar power are still very few, and more needs to be done in this area. At first, we explored covariance function selection, and a GPR was developed using Core vector regression (CVR). The predictions produced through this method were more accurate than the benchmark models used: Gradient Boosting Regression (GBR) and Support Vector Regression then, we explored interval estimation, Quantile re- gression and GPR were coupled in order to develop the modelling framework. This was also done to improve the accuracy of the GPR models. The results proved that the model performed better than the Bayesian Structural Time Series Regression. Ve also explored spatial dependence; spatio-temporal regression was incorporated into the modelling framework coupled with GPR. This was done to incorporate various weather stations' conditions into the modelling process. The spatial analysis results proved that GPR coupled with spatial analysis produced results that were superior to the Autoregressive Spatial analysis and benchmark model used: Linear Spatial analysis. The GPR results had accuracy measures that proved superior to the benchmark models. Various other tools were used to improve the accuracy of i the GPR results. This includes the use of combining forecasts and standardisation of predictions. The superior results indicate a vast benefit economic-wise because it allows those who manage the power grid to do so effectively and efficiently. Effective power grid management reduces power blackouts, thus benefitting the nation eco- nomically and socially.
Open Access
Commodity Futures Market Prices: Decomposition Approach
(2023-10-05) Antwi, Emmanuel
Financial investments on commodity markets have attracted many investigations due to its importance to the global economy, and worldwide trade as a whole. The radical price changes in commodity market prices, especially agricultural, energy and industrial metal products have significant consequences on consumers and producers of economic activities. It is very crucial to accurately estimate and predict volatility in commodity futures market prices, since continuous price fluctuations have dire consequences for investors, portfolio managers, dealers and policymakers in taking prudent and sustainable decisions. Commodity price component determination and forecasting are challenging due to remarkable price volatility, uncertainty, and complexity in the futures market. As a result, commodity futures price series is nonlinear and nonstationary. Various studies are reported in the literature, in an attempt to develop models to study the persistent changes in the commodity futures price series, but these models have failed to account for the inherent complexity in the commodity futures price series. This study aims to use decomposition techniques, combined with back-propagation neural network (BPNN) and autoregressive integrated moving average (ARIMA) models to address difficulties in studying commodity futures market prices. As said earlier, this study utilized the decomposition methods, Empirical Mode Decomposition (EMD) and Variational Mode Decomposition (VMD), to analyze the daily real price series of three commodity futures market prices of: corn from agricultural products, crude oil from energy, and gold from industrial metal, using the data from 4th May 2016 to 30th April 2021. In the first part of the study, we explored the descriptive and statistical properties of the data. It was found that the three commodities market futures prices series were nonstationary and nonlinear. Subsequently, we performed an EMD-Granger causality test to establish the spillover effects among the three commodities’ markets. It was revealed that there exists a strong mutual relationship among the three commodity markets price series, which implies that the price movement of one market can be used to explain the price fluctuations of the other markets. In the second part, the EMD and VMD methods were applied to decompose the daily data of each commodity price from different periods and frequencies to their respective individual intrinsic mode functions. First, we used the Hierarchical Clustering Method and Euclidean Distance Approach to classify the IMFs, residue, and modes into high-frequency, low-frequency, and trend. Next, applying statistical measures, particularly, the Pearson product-moment correlation coefficient, Kendall rank correlation, and Spearman rank correlation coefficient, we observed that the trend and low-frequency parts of the market prices are the main drivers of commodity futures markets prices’ fluctuations and that special events caused the low frequency. In essence, commodity futures prices are affected by economic development rather than short-lived market variations caused by ordinary supply-demand disequilibrium. The third part compared the EMD and the VMD- based models using three forecasting performance evaluation criteria and statistical measures, such as, mean absolute error (MAE), root mean square error (RMSE), and mean percentage error (MAPE) to compare the capabilities of the suggested models. We also introduced Diebold Mariano (DM) test in selecting the optimal models for each commodity, since MAE, RMSE and MAPE have some shortcomings. The combined models outperformed the individual back propagation neural network (BPNN) and autoregressive integrated moving average (ARIMA) models in forecasting the series of corn and crude oil’s futures prices. At the same time, BPNN emerged as the optimal model for predicting gold futures prices’ series. In addition, variational mode decomposition emerged as the ideal data pre-treatment method and contributed to enhancing the predicting ability of the BPNN and the ARIMA models. The empirical results showed that models combined with decomposition methods predict commodity futures prices accurately and can easily capture the volatility in commodity futures prices. By utilizing the decomposition-based models in studying commodity market prices, the study filled the following gap in the existing literature as follows: the pre-treatment effect of the EMD and VMD can be compared horizontally, in decomposing commodity market price series and studying the underlying components that cause the above mentioned commodity markets price fluctuations is a novel approach in studying commodity market prices. In addition, utilizing Hierarchical Clustering and Euclidean Distance Approaches, the IMFs, residue and modes were classified into their distinctive frequencies, namely, high-frequency, low-frequency, and trend units. The effect of these frequencies and trends on commodity market price fluctuation is the first of its kind in the literature. Furthermore, applying statistical measures such as Pearson product-moment correlation coefficient, Kendall rank correlation, and Spearman rank correlation coefficient to evaluate the contribution of the IMFs, residue, and modes to the net variance of the volatility of crude oil, corn, and gold markets price fluctuations, is an innovative approach to studying financial times series. The EMD-Causality technique proposed to study the causal relationship of corn, crude oil, and gold futures prices movement, is novel in the financial market. This new approach to study price movement of commodity markets, will provide a vital information about one commodity market to explain the other commodity market price fluctuations in various markets. Also, Decomposition of financial data before forecasting have high forecasting precision accuracy in commodity futures price prediction. Additionally, using decomposition techniques in agriculture, energy, and industrial metal commodities futures markets, effectively, minimizes the prediction complexity. Furthermore, using econometric and machine learner models incorporated with decomposition methods can capture the price series information up to acceptable degrees. Finally, decomposition-based predicting techniques can effectively raise the predicting performance capability of BPNN and ARIMA models and reduce errors, thus, the proposed novel combination method can statistically improve forecast accuracy. This study, therefore, may assist in arresting the agricultural, energy, and industrial commodities markets trends and estimate volatility risk factors accurately, consequently serving as a guide for investors, governments policymakers and related sectors such as agriculture, energy, and metal industry to take prudent and sustainable planning and investment decisions. The suggested decomposition strategy, particularly VMD-based is robust in analyzing the determinants, modeling, and forecasting commodity futures market prices fluctuations, thereby, improving forecasting precision accuracy. Remarkably, in using the decomposition approach in estimating compositions of commodity prices data series separately, different predicting strategies can be explored. For instance, based on the features of decomposed IMFs or modes, a suitable predicting technique can be considered to forecast each IMF or mode; for example, the residue can be estimated by utilizing a polynomial function, while Fourier transform can be considered in predicting low-frequency IMFs or modes, hence, it is recommended that researchers, institutions, investors, and policymakers interested in studying commodity price movements should consider using this novel technique to achieve better results. It is further suggested that the decomposition approach could be utilized in other fields of study to prove the approach’s generality. Finally, further study can extend the proposed methodology by considering other decompositions techniques rather than just EMD and VMD and evaluate their robustness in studying financial markets, as EMD approach has the problem of mode mixing and endpoint effects. Eventually, we propose that a new model or consolidated predicting technique should be investigated to cater for special events’ influences on commodity market prices since no one can predict the time and the place they will occur.
Open Access
Exploring the Multi-scale character of infectious disease dynamics
(2023-05-19) Mufoya, Blessings; Garira, W.; Mathebula, D.
This research study characterised multiscale models of infectious disease dynamics. This was achieved by establishing when it is appropriate to implement particular mathematical methods for different multiscale models. The study of infectious disease systems has been elucidated ever since the discovery of mathematical modelling. Due to the vast complexities in the dynamics of infectious disease systems, modellers are increasingly gravitating towards multiscale modelling approach as a favourable alternative. Among the diseases that have persistently plagued most developing countries are vector-borne diseases like Malaria and directly transmitted diseases like Foot-and-Mouth disease (FMD). Globally, FMD has caused major losses in the economic sector (particularly agriculture) as well as tourism. On the other hand, Malaria remains amongst the most severe public health problems worldwide with millions of people estimated to live in permanent risk of contracting the disease. We developed multiscale models that can describe both local transmission and global transmission of infectious disease systems at any hierarchical level of organization using FMD and Malaria disease as paradigms. The first stage in formulating the multiscale models in this study was to integrate two submodels namely: (i) the between-host submodel and (ii) within-host submodel of an infectious disease system using the nested approach. The outcome was a system of nonlinear ordinary differential equations which described the local transmission mechanism of the infectious disease system. The next step was to incorporate graph theoretic methods to the system of differential equations. This approach enabled modelling the migration of humans/animals between communities (also called patches or geographical distant locations) thereby describing the global transmission mechanism of infectious disease systems. At whole organism-level we considered the organs in a host as patches and the transmission within-organ scale as direct transmission represented by ordinary differential equations. However, at between-organ scale there was movement of pathogen between the organs through the blood. This transmission mechanism called global transmission was represented by graph-theoretic methods. At macrocommunity-level we considered communities as patches and established that at withincommunity scale there was direct transmission of pathogen represented by ordinary differental equations and at between-community scale there was movement of infected individuals. Furthermore, the systems of differential equations were extended to stochastic differential equations in order to incorporate randomness in the infectious disease dynamics. By adopting a cocktail of computational and analytical tools we sufficiently analyzed the impact of the transmission mechanisms in the different multiscale models. We established that once we used a graph-theoretic method at host level it would be difficult to extend this to community level. However, when we used different methods then it was easy to extend to community level. This was the main aspect of the characterization of multiscale models that we investigated in this thesis which has not been done before. We also established distinctions between local transmission and global transmission mechanisms which enable us to implement intervention strategies targeted torwards both local transmission such as vaccination and global transmission such as travel restrictions. In spite of the fact that the results collected in this study are restricted to FMD and Malaria, the multiscale modelling frameworks established are suitable for other directly transmitted diseases and vector-borne diseases.
Open Access
The Development and Application of Coupled Multiscale Models of Malaria Disease System
(2022-11-10) Maregere, Bothwell; Garira, W.; Mathebula, D.
The purpose of this thesis is to develop coupled multi-scale dynamics of infectious disease systems. An infectious disease system consists of three subsystems interacting, which are the host, the pathogen, and the environment. Each level has two different interaction scales (micro-scale and macro-scale) and is organized into hierarchical levels of an organization, from the cellular level to the macro-ecosystem level, and is arranged into hierarchical levels of an organization. There are two main theories of infectious diseases: (i) the transmission mechanism theory, (ii) the replication-transmission relativity theory. A significant difference exists between these theories in that (i) the transmission mechanism theory considers transmission to be the primary cause of infectious disease spread at the macro-scale, while (ii) replicationtransmission relativity theory is an extension of the first theory. It is important to consider the interaction between two scales when pathogen replication occurs within the host and transmission occurs between hosts (macro-scale). Our research primarily focuses on the replication-transmission relativity theory of pathogens. The main purpose of this study is to develop coupled multi-scale models of direct vectorborne diseases using malaria as a paradigm. We have developed a basic coupled multi-scale model with a combination of two other categories of multi-scale models, which are a nested multi-scale model in the human host and an embedded multi-scale model in the mosquito host. The developed multi-scale model consists of approaches of nonlinear differential equations that are employed to provide the mathematical results to the underlying issues of the multi-scale cycle of pathogen replication and transmission of malaria disease. Stability analyses of the models were evolved to substantiate that the infection-free equilibrium is locally and globally asymptotically stable whenever R0 < 1, and the endemic equilibrium exists and is globally asymptotically stable whenever R0 > 1. We applied the vaccination process as a governing measure on the multi-scale model of malaria with mosquito life cycle by comprising the three stages of vaccination, namely pre-erythrocyte stage vaccines, blood stage vaccines and transmission stage vaccines. The impact of vaccination on malaria disease has been proven. Through numerical simulation, it was found that when the comparative of vaccination efficacy is high, the community pathogen load (GH and PV ) decreases and the reproductive number can be reduced by 89.09%, that is, the transmission of malaria can be reduced on the dynamics of individual level and population-level.We also evolved the multi-scale model with the human immune response on a within-human sub-model which is stimulated by the malaria parasite. We investigated the effect of immune cells on reducing malaria infection at both the betweenhost scale and within-host scale. We incorporate the environmental factor, such as temperature in the multi-scale model of the malaria disease system with a mosquito life cycle. We discovered that as the temperature enhances the mosquito population also increases which has the impact of increasing malaria infection at the individual level and at the community-scale. We also investigated the influence of the mosquito life cycle on the multi-scale model of the malaria disease system. The increase in eggs, larval and pupal stages of mosquitoes result in the increase of mosquito density and malaria transmission at the individual level and community-scale. Therefore, the suggestion is that immature and mature mosquitoes be controlled to lessen malaria transmission. The results indicated that the combination of malaria health interventions with the highest efficacy has the influence of reducing malaria infection at the populationlevel. Models developed and analyzed in this study can play a significant role in preventing malaria outbreaks. Using the coupled multi-scale models that were developed in this study, we made conclusions about the malaria disease system based on the results obtained. It is possible to apply the multi-scale framework in this study to other vector-borne diseases as well.
Open Access
Time-frequency domain analysis of exchange rate market integration in Southern Africa Development Community: A Hilbert-Huang Transform approach
(2022-11-10) Adam, Anokye Mohammed; Kyei, Kwabena A.; Moyo, Simiso; Gill, Ryan S.; Gyamfi, Emmanuel N.
The desire of most African economic communities to introduce a common currency has persisted for years. As postulated by the Optimum Currency Area hypothesis, coordination of policy indicators among member countries is desirable for stable monetary union. In this regard, the integration of exchange rate markets has been studied and cited as one of the key indicators that could signal economic integration. Therefore, analysis of similarities, interdependence, and information transfer across exchange rate markets in Southern African Development Community (SADC) is a necessity to measure the extent of integration in the region. However, the intrinsic complexity of exchange rate data generation and its stylised characteristics of non-stationarity and non-linearity influence the modelling of such data in terms of the accuracy of the analysis and the embedded policy direction. In response, this thesis proposes empirical mode decomposition-based market integration analysis to address the limitations of the existing literature which fails to recognise the heterogeneity of market participants and data generation of the exchange rate in SADC. The data employed for the thesis are the daily real exchange rates from 15 out of 16 member countries of the SADC from 3rd January, 1994 to 7th January 2019. The choice of study window and countries was based on the availability of adequate and consistent data for robust analysis and the period after South Africa, the largest economy, joined SADC. Based on the criteria, Zimbabwe was excluded from the analysis. To achieve the purpose of this thesis, a four-step approach was used. The first step reviewed and explored the non-stationarity and non-linearity stylised facts about the data and observed that exchange series in SADC are non-stationary and non-linear. The second stage compared the performance of two Hilbert-Huang Transforms (EMD and EEMD) to decompose SADC exchange rate markets of which EEMD emerged superior. The components of the decomposed series were examined for dominance and ability to define the exchange rate trajectory in SADC. The residue of all the markets explained over 80% of the variation of the original series except Angola. The short- and long-term comovement was analysed through the analysis of the characteristics of IMFs and residues. The analysis of the IMFs and residues obtained from EEMD showed that exchange rate markets in SADC are driven by economic fundamentals and 12 out of 15 countries examined showed some level of similarity in the long-term trend. In the third stage, EEMD-DCCA based multifrequency network was introduced to study the dynamic interdependence structure of the exchange rate markets in SADC. This was done by first decomposing all series into intrinsic mode functions using EEMD and reconstructing the series into three frequency modes: high, medium, and low frequency, and residue. The DCCA method was used to analyse the cross-correlation between the various frequencies, residues and original series. These were meant to address the non-linearity and non-stationarity in observed exchange rate data. A correlation network was formed from the cross-correlation coefficients to reveal rich information iii than would have been obtained from the original series. The results showed similarities between the nature of cross-correlation between high-frequency series mimicking the original series. There was also a significant cross-correlation of long-term trends of most SADC countries’ exchange rate markets. The final stage proposed EEMD-Effective transfer entropy-based model to study exchange rate market information transmission in SADC at various frequencies. The combination of Ensemble Empirical Mode Decomposition (EEMD) and the Rényi effective transfer entropy techniques to investigate the multiscale information transfer helped quantify the directional flow of information at four frequency domains, high-, medium-, and low-frequencies, representing short-, medium-, and long-terms, respectively, in addition to the residue (fundamental feature). This revealed a significant positive information flow in the high frequency, but negative flow in the medium and low frequencies. Based on the findings of this thesis we recommend that EEMD based method be used in the analysis of financial data that susceptible to non-linearity and non-stationary to elicit the time-frequency information. In terms of policy towards monetary formulation, we recommend a stepwise approach to monetary integration in SADC.
Open Access
Share Price Prediction for Increasing Market Efficiency using Random Forest
(2022-11-10) Mbedzi, Tshinanne Angel; Chagwiza, W.; Garira, W.
The price of a single share of a collection of sell-able shares, options, or other financial assets, shall be the price of a share price. The share price is unpredictable since it primarily depends on buyers’ and sellers’ expectations. Share is a primary and secondary market equity security. In this study we will use machine learning techniques to predict the share price for increasing market efficiency. In addition, it is important for us to build a models to create appropriate features to improve the performance of the models. The random forest and the recurrent neural network will be used to achieve this. To fix class imbalance, we analyse preprocessing of the data set, like the selection of the features using filter and wrapper methods and selected oversampling techniques. The model’s performance will be evaluated using Mean absolute error (MAE), Mean square error (MSE), Root mean square error (RMSE), Relative MAE (rMAE), and Relative RMSE (rRMSE). The performance of the RNN and Rf algorithms was compared for the prediction of the closing price. The Rf model was found to be the best model for predicting the stock price (closing price). This research project together with its findings will have an impact in increasing market efficiency. This will also promote potential economic growth.
Open Access
Fundamental Analysis for Stocks using Extreme Gradient Boosting
(2022-11-10) Gumani, Thanyani Rodney; Chagwiza, Wilbert; Kubjana, Tlou
When it comes to stock price prediction, machine learning has grown in popularity. Accurate stock prediction is a very difficult activity as financial stock markets are unpredictable and non-linear in nature. With the advent of machine learning and improved computational capabilities, programmed prediction methods have proven to be more effective in stock price prediction. Extreme gradient boosting(XGBoost) is the variant of the gradient boosting machine. XGBoost, an ensemble method of classification trees, is investigated for the prediction of stock prices based on the fundamental analysis. XGBoost outperformed the competition and had higher accuracy. The developed XGBoost model proved to be an effective model that accurately predicts the stock market trend, which is considered to be much better than conventional non-ensemble learning techniques.
Open Access
Predicting an Economic Recession Using Machine Learning Techniques
(2022-11-10) Molepo, Mashaka Ruth; Chagwiza, Wilbert; Kubjana, Tlou
few economic downturns were predicted months in advance. This research has the ability to give the best performing models to assist businesses in navigating prior recession periods. The study address the subject of identifying the most important variables to improve the overall performance of the algorithm that would effectively predict recessions. The primary aim of this study was to improve economic recession prediction using machine learning (ML) techniques by developing an inch-perfect and efficient prediction model in order to avoid greater government deficits, growing inequality, significantly decreased income, and higher unemployment. The study objective was to establish the relevant method for addressing imbalance data with suitable features selection strategy to enhance the performance of the machine learning algorithm developed. Furthermore, artificial neural network(ANN) and Random Forest (RF) were used in predicting economic recession using ML techniques. This study would not have been possible without the publicly available data from the online open source Kaggle, which provided ordinal categorical data for the specific data utilized. The major findings of this study were that the ML algorithm RF performed better at recession prediction than its rival ANN. Due to the fact that two ML algorithms in this research were employed , further ML tools can be used to improve the statistical components of the study.
Open Access
Computational analysis of magnetohydrodynamics boundary layer flow of nanofluid over a stretching sheet in the presence of heat generation or absorption and chemical reaction
(2022-07-15) Molaudzi, Vhutshilo; Shateyi, S.; Muzhinji, K,
In this study, we present the effect of two-dimensional magnetohydrodynamics of a nanofluid over a stretching sheet in the presence of chemical reaction, as well as heat generation or absorption. The partial differential equations are reduced to coupled nonlinear ordinary differential equations using similarity transformations, which are then solved numerically using spectral local linearization and spectral relaxation methods. The effects of different parameters, Lewis number, Eckert number, stretching, chemical reaction, local Reynolds number, Prandtl number, constant, heat source, Brownian motion, and Thermophoresis are analysed and compared. The numerical results for velocity, temperature, skin friction coefficient, concentration, Sherwood number, and Nusselt number are presented in tabular form and visualized graphically. The findings of the spectral local linearization and spectral relaxation methods are very similar to the bvp4c method’s results. When compared to the spectral relaxation method, the results from the spectral local linearization method were more effective. We found that the velocity profile are increased with increasing values of the Grashof number (Gr). Since Grashof number (Gr) is ratio of buoyancy to viscous forces in the boundary layer it causes an increase in the buoyancy forces relative to the viscous forces which influence the velocity in the boundary layer region. An increase in the heat source/sink parameter (S) results in the increase in velocity and temperature, but a decrease in concentration. The concentration diffusion species were reduced due to the heat source/sink parameter (S). The results also show that heat generation increases the momentum and thermal boundary layer thickness while decreasing the nanofluid concentration boundary layer thickness.
Open Access
Multi-objective Loan Portfolio Optimization in Peer-to-Peer Lending Markets using Machine-Learning Techniques
(2022-07-15) Maakgetlwa, Saleme Shoky; Moyo, S.; Mphephu, N.
Portfolio optimization problems in the Peer-to-Peer lending Platforms involve selecting good loan applications (less risky) from various potential borrowers. Such loans have lower level of risk in terms of funding and earning higher returns. The aim of this study is to find ways to maximize returns and minimize the risks associated with the investment. It becomes more complicated to optimally allocate weights to the loan application when there is an increased number of applications for funding. This study focused on devising techniques which can be used to optimally select portfolios of loan applications for funding with desired returns on the investment. Harry Markowitz pioneered the Modern Portfolio theory also known as Meanvariance theory to construct a portfolio but the theory failed since it was built on unrealistic assumptions in terms of real life situations. This study explored and compared the meanvariance theory and other machine learning methods to construct a portfolio of loans from peer-to-peer lending market in order to be able to recommend the best approach to achieving high returns with minimum risk. The study employed the evolutionary algorithms (Particle Swarm Optimization and Genetic Algorithm) and the Reinforcement learning algorithm
Open Access
Hierarchical forecasting of monthly electricity demand
(2022-07-15) Chauke, Ignitious; Sigauke, C.; Bere, A.
Energy demand forecasting is a vital tool for energy management, maintenance planning, environmental security, and investment decision-making in liberalised energy markets. The mini-dissertation investigates ways to anticipate power usage using hierarchical time series and South African data. Approaches such as topdown, bottom-up, and optimal combination are applied. Top-down forecasting is based on disaggregating total series projections and spreading them down the hierarchy based on historical data proportions. The bottom-up strategy aggregates individual projections at lower levels, whereas the optimal combination methodology optimally combines bottom forecasts. An out-of-sample prediction performance evaluation was performed to assess the models’ predicting ability. The best model was chosen using mean absolute percentage error. The top-down technique based on predicted proportions (Top-down forecasted proportions) was superior to the optimal combination and bottom-up approach. To integrate forecasts and build prediction ranges for the proposed models, linear quantile regression, linear regression, simple average, and median were used. The best set of forecasts was picked based on the prediction interval normalised average width. At 95%, the best model based on the prediction interval normalised average width was a simple average.
Open Access
Modelling volatility, equity risk and extremal dependence of the BRICS Stock Markets
(2022-07-15) Mukhodobwane, Rosinah Mphedziseni; Sigauke, Caston; Chagwiza, Wilbert; Garira, Winston
With the use of empirical data of the BRICS (Brazil, Russia, India, China, and South Africa) stock markets, this thesis focuses on solving three main nancial and investment issues involving returns volatility, risk and extremal dependence via robust statistical modelling. The rst issue involves modelling nancial returns volatility (when the true distribution is unknown) using the univariate GARCH model under the assumptions of seven error distributions. The ndings, using two of the error distributions, show that the Chinese market has the highest volatility persistence, followed by the South African, Russian, Indian and Brazilian markets in that order. For risk modelling and analysis, the ndings show that the Russian market has the highest risk level, followed by the South African, Chinese, Brazilian and Indian markets, respectively. For the extremal dependence modelling, using the bivariate point process and conditional multivariate extreme value (CMEV) models, the ndings show varied levels of low extremal dependence structure whose outcomes are highly bene cial to investors, portfolio managers and other market participants who are interested in maximising their investment returns and nancial gains. However, it is observed that the point process was able to model many more extreme observations or exceedances that contribute to the likelihood estimation and it gives more information than the threshold excess method of the CMEV model.