Theses and Dissertations
http://hdl.handle.net/11602/2138
2024-03-29T00:22:35ZComparative analysis of Machine Learning Algorithms for Estimating Global Solar Radiation at Selected Weather Stations in Vhembe District Municipality
http://hdl.handle.net/11602/2654
Comparative analysis of Machine Learning Algorithms for Estimating Global Solar Radiation at Selected Weather Stations in Vhembe District Municipality
Marandela, Mulalo Veronica
hstimating anct assessing the energy talling in a particular area 1s essential tor installers ot
renewable technologies. Different equations have been applied as the most reliable empir ical for
estimating global solar radiation(GSR) in different climatic conditions. The main objective of this
work is to estimate the global solar radiation of two stations namely, Mu tale and Messina found
in Vhembe District, Limpopo Province, South Africa. Four different methods (Random forest(RF)
regression, K-nearest neighour (K-NN), Support Vector Ma chines(SVM) and Extreme Gradient Boosting
mechanism(XGBoost)) is used to estimate the
GRS in this study. The RF model on Mutale station was found to be the best fitting model with R² =
0.9902, MSE = 0.4085 and RMSE = 0.6391, followed by XGB with R² = 0.9898, MSE = 0.4245 and RMSE =
0.6515. RF was also found to be the best for Messina station with R² = 0.9636, MSE = 0.1.4138 and
RMSE = 1.1890, followed by XGB model with R² = 0.9595, MSE = 1.5723 and RMSE = 1.2539. From the
results, it can be concluded that
RF is a better model for estimating GSR for different stations.
MSc (e-Science); Department of Mathematics and Computational Sciences
2023-10-05T00:00:00ZCredit Card Fraud Detection using Boosted Random Forest Algorithm
http://hdl.handle.net/11602/2651
Credit Card Fraud Detection using Boosted Random Forest Algorithm
Mashamba, Thanganedzo Beverly
Financial fraud is a growing concern with far-reaching concerns in financial institutions,
government, and corporate organizations, leading to substantial monetary
losses. The primary cause of financial loss is credit card fraud; it affects issuers
and clients, which is a significant threat to the business as clients will run to their
competitors, wherein they will feel secure. Solving fraud problems is beyond human
capability, so financial institutions can utilize machine learning algorithms to
detect fraudulent behaviour by learning through credit card transactions. This thesis
develops the boosted random forest, integrating an adaptive boosting algorithm
into a random forest algorithm, such that the performance of a model is improved in
predicting credit card fraudulent transactions. The confusion matrix is used to evaluate
the performance of the models, wherein random forest, adaptive boosting and
boosted random forest were compared. The results indicated that the boosted random
forest outperformed the individual models with an accuracy of 99.9%, which
corresponded with the results from confusion matrix. However random forest and
adaptive boosting had 100% and 99% respectively, which did not correspond to the
results on confusion matrix, meaning the individual models need to be more accurate.
Thus, by implementing the proposed approach to a credit card management
system, financial loss will be reduced to a greater extent.
MSc (e-Science); Department of Mathematical and Computational Sciences
2023-10-05T00:00:00ZSolar power forecasting using Gaussian process regression
http://hdl.handle.net/11602/2581
Solar power forecasting using Gaussian process regression
Chandiwana, Edina
Solar power forecasting has become an important aspect affecting crucial day-to-day
activities in people's lives. Many African countries are now facing blackouts due to a shortage of energy. This has caused the urge to encourage people to use other energy sources to rise, resulting in different energy inputs into the main electricity grid. When the number of power sources being fed into the main grid increases, so does the need for efficient methods of forecasting these inputs. Thus, there is a need to come up with efficient prediction techniques inorder to facilitate proper grid management. The main goal of this thesis is to explore how Gaussian process predicting frameworks can be developed and used to predict global horiz0ontal irra- diance. Data on Global horizontal irrandiance and some weather variables collected from various meterological stations were made available through SAURAN (Southern African Universities Radiometric Network). The length of the dataset ranged from 496 to 17325 datapoints. Ve proposed using Gaussian process regression (GPR) to predict solar power generation. In South Africa, studies based on GPR regarding forecasting solar power are still very few, and more needs to be done in this area. At first, we explored covariance function selection, and a GPR was developed using Core vector regression (CVR). The predictions produced through this method were more accurate than the benchmark models used: Gradient Boosting Regression (GBR) and Support Vector Regression then, we explored interval estimation, Quantile re- gression and GPR were coupled in order to develop the modelling framework. This was also done to improve the accuracy of the GPR models. The results proved that the model performed better than the Bayesian Structural Time Series Regression. Ve also explored spatial dependence; spatio-temporal regression was incorporated into the modelling framework coupled with GPR. This was done to incorporate various weather stations' conditions into the modelling process. The spatial analysis results proved that GPR coupled with spatial analysis produced results that were superior to the Autoregressive Spatial analysis and benchmark model used: Linear Spatial analysis. The GPR results had accuracy measures that proved superior to the benchmark models. Various other tools were used to improve the accuracy of
i
the GPR results. This includes the use of combining forecasts and standardisation
of predictions. The superior results indicate a vast benefit economic-wise because it allows those who manage the power grid to do so effectively and efficiently. Effective power grid management reduces power blackouts, thus benefitting the nation eco- nomically and socially.
PhD (Statistics); Department of Mathematical and Computational Sciences
2023-10-05T00:00:00ZCommodity Futures Market Prices: Decomposition Approach
http://hdl.handle.net/11602/2559
Commodity Futures Market Prices: Decomposition Approach
Antwi, Emmanuel
Financial investments on commodity markets have attracted many investigations due to its importance to
the global economy, and worldwide trade as a whole. The radical price changes in commodity market
prices, especially agricultural, energy and industrial metal products have significant consequences on consumers
and producers of economic activities. It is very crucial to accurately estimate and predict volatility
in commodity futures market prices, since continuous price fluctuations have dire consequences for investors,
portfolio managers, dealers and policymakers in taking prudent and sustainable decisions. Commodity
price component determination and forecasting are challenging due to remarkable price volatility,
uncertainty, and complexity in the futures market. As a result, commodity futures price series is nonlinear
and nonstationary. Various studies are reported in the literature, in an attempt to develop models to study
the persistent changes in the commodity futures price series, but these models have failed to account for
the inherent complexity in the commodity futures price series. This study aims to use decomposition techniques,
combined with back-propagation neural network (BPNN) and autoregressive integrated moving
average (ARIMA) models to address difficulties in studying commodity futures market prices.
As said earlier, this study utilized the decomposition methods, Empirical Mode Decomposition (EMD)
and Variational Mode Decomposition (VMD), to analyze the daily real price series of three commodity
futures market prices of: corn from agricultural products, crude oil from energy, and gold from industrial
metal, using the data from 4th May 2016 to 30th April 2021.
In the first part of the study, we explored the descriptive and statistical properties of the data. It was found
that the three commodities market futures prices series were nonstationary and nonlinear. Subsequently,
we performed an EMD-Granger causality test to establish the spillover effects among the three commodities’
markets. It was revealed that there exists a strong mutual relationship among the three commodity
markets price series, which implies that the price movement of one market can be used to explain the price
fluctuations of the other markets.
In the second part, the EMD and VMD methods were applied to decompose the daily data of each commodity
price from different periods and frequencies to their respective individual intrinsic mode functions.
First, we used the Hierarchical Clustering Method and Euclidean Distance Approach to classify the IMFs,
residue, and modes into high-frequency, low-frequency, and trend. Next, applying statistical measures,
particularly, the Pearson product-moment correlation coefficient, Kendall rank correlation, and Spearman
rank correlation coefficient, we observed that the trend and low-frequency parts of the market prices are
the main drivers of commodity futures markets prices’ fluctuations and that special events caused the
low frequency. In essence, commodity futures prices are affected by economic development rather than
short-lived market variations caused by ordinary supply-demand disequilibrium.
The third part compared the EMD and the VMD- based models using three forecasting performance
evaluation criteria and statistical measures, such as, mean absolute error (MAE), root mean square error
(RMSE), and mean percentage error (MAPE) to compare the capabilities of the suggested models. We
also introduced Diebold Mariano (DM) test in selecting the optimal models for each commodity, since
MAE, RMSE and MAPE have some shortcomings. The combined models outperformed the individual
back propagation neural network (BPNN) and autoregressive integrated moving average (ARIMA) models
in forecasting the series of corn and crude oil’s futures prices. At the same time, BPNN emerged as the
optimal model for predicting gold futures prices’ series. In addition, variational mode decomposition
emerged as the ideal data pre-treatment method and contributed to enhancing the predicting ability of the
BPNN and the ARIMA models. The empirical results showed that models combined with decomposition
methods predict commodity futures prices accurately and can easily capture the volatility in commodity
futures prices.
By utilizing the decomposition-based models in studying commodity market prices, the study filled the
following gap in the existing literature as follows: the pre-treatment effect of the EMD and VMD can
be compared horizontally, in decomposing commodity market price series and studying the underlying
components that cause the above mentioned commodity markets price fluctuations is a novel approach
in studying commodity market prices. In addition, utilizing Hierarchical Clustering and Euclidean Distance
Approaches, the IMFs, residue and modes were classified into their distinctive frequencies, namely,
high-frequency, low-frequency, and trend units. The effect of these frequencies and trends on commodity
market price fluctuation is the first of its kind in the literature. Furthermore, applying statistical measures
such as Pearson product-moment correlation coefficient, Kendall rank correlation, and Spearman
rank correlation coefficient to evaluate the contribution of the IMFs, residue, and modes to the net variance
of the volatility of crude oil, corn, and gold markets price fluctuations, is an innovative approach to
studying financial times series. The EMD-Causality technique proposed to study the causal relationship
of corn, crude oil, and gold futures prices movement, is novel in the financial market. This new approach
to study price movement of commodity markets, will provide a vital information about one commodity
market to explain the other commodity market price fluctuations in various markets. Also, Decomposition
of financial data before forecasting have high forecasting precision accuracy in commodity futures
price prediction. Additionally, using decomposition techniques in agriculture, energy, and industrial metal
commodities futures markets, effectively, minimizes the prediction complexity. Furthermore, using econometric
and machine learner models incorporated with decomposition methods can capture the price series
information up to acceptable degrees. Finally, decomposition-based predicting techniques can effectively
raise the predicting performance capability of BPNN and ARIMA models and reduce errors, thus, the
proposed novel combination method can statistically improve forecast accuracy. This study, therefore,
may assist in arresting the agricultural, energy, and industrial commodities markets trends and estimate
volatility risk factors accurately, consequently serving as a guide for investors, governments policymakers
and related sectors such as agriculture, energy, and metal industry to take prudent and sustainable planning
and investment decisions.
The suggested decomposition strategy, particularly VMD-based is robust in analyzing the determinants,
modeling, and forecasting commodity futures market prices fluctuations, thereby, improving forecasting
precision accuracy. Remarkably, in using the decomposition approach in estimating compositions of commodity
prices data series separately, different predicting strategies can be explored. For instance, based
on the features of decomposed IMFs or modes, a suitable predicting technique can be considered to forecast
each IMF or mode; for example, the residue can be estimated by utilizing a polynomial function,
while Fourier transform can be considered in predicting low-frequency IMFs or modes, hence, it is recommended
that researchers, institutions, investors, and policymakers interested in studying commodity
price movements should consider using this novel technique to achieve better results. It is further suggested
that the decomposition approach could be utilized in other fields of study to prove the approach’s
generality.
Finally, further study can extend the proposed methodology by considering other decompositions techniques
rather than just EMD and VMD and evaluate their robustness in studying financial markets, as
EMD approach has the problem of mode mixing and endpoint effects. Eventually, we propose that a new
model or consolidated predicting technique should be investigated to cater for special events’ influences
on commodity market prices since no one can predict the time and the place they will occur.
PhD (Statistics)
2023-10-05T00:00:00Z