Monday, June 14, 2021

Journal of Information and Communication Technology (JICT) Vol.20, No.3, July 2021

Oluwatobi Shadrach Akanji, Opeyemi Aderiike Abisoye& Mohammed Awwal Iliyasu
Hashibah Hamid, Nor Idayu Mahat & Safwati Ibrahim
Suraya Alias, Mohd Shamrie Sainin, Siti Khaotijah Mohammad
Anita Ramalingam & Subalalitha Chinnaudayar Navaneethakrishnan Narender Kumar & Dharmender Kumar
Hayder Naser Khraibet AL-Behadili & Ku Ruhana Ku-Mahamud
Adil Yaseen Taha, Sabrina Tiun, Abdul Hadi Abd Rahman & Ali Sabah

Distributed Denial of Service (DDoS) attacks has been one of the persistent forms of attacks on information technology infrastructure connected to public networks due to the ease of access to DDoS attack tools. Researchers have been able to develop several techniques to curb volumetric DDoS which overwhelms the target with a large number of request packets. However, compared to slow DDoS, limited number of research has been executed on mitigating slow DDoS. Attackers have resorted to slow DDoS because it mimics the behaviour of a slow legitimate client thereby causing service unavailability.  This paper provides the scholarly community with an approach to boosting service availability in web servers under slow Hypertext Transfer Protocol (HTTP) DDoS attacks through attack detection using Genetic Algorithm and Support Vector Machine which facilitates attack mitigation in a Software-Defined Networking (SDN) environment simulated in GNS3. Genetic algorithm was used to select the Netflow features which indicates the presence of an attack and also determine the appropriate regularization parameter, C, and gamma parameter for the Support Vector Machine classifier. Results obtained showed that the classifier had detection accuracy, Area Under Receiver Operating Curve (AUC), true positive rate, false positive rate and a false negative rate of 99.89%, 99.89%, 99.95%, 0.18%, and 0.05% respectively. Also, the algorithm for subsequent implementation of the selective adaptive bubble burst mitigation mechanism was presented. This study contributes to the ongoing research in detecting and mitigating slow HTTP DDoS attacks with emphasis on the use of machine learning classification and meta-heuristic algorithms.
Keywords: Genetic Algorithm, Slow DDoS Mitigation, Slow Distributed Denial of Service, Software Defined Network, Support Vector Machine.
1&2 School of Quantitative Sciences, UUM College of Arts and Sciences, 06010 UUM Sintok, Kedah
3Institute of Engineering Mathematics, Universiti Malaysia Perlis, 02600 UniMAP Arau, Perlis;;
The strategy surrounding the extraction of a number of mixed variables is examined in this paper in building a model for Linear Discriminant Analysis (LDA). Two methods for extracting crucial variables from a dataset with categorical and continuous variables were employed, namely, multiple correspondence analysis (MCA) and principal component analysis (PCA). However, in this case, direct use of either MCA or PCA on mixed variables is impossible due to restrictions on the structure of data that each method could handle. Therefore, this paper executes some adjustments including a strategy for managing mixed variables so that those mixed variables are equivalent in values. With this, both MCA and PCA can be performed on mixed variables simultaneously. The variables following this strategy of extraction were then utilised in the construction of the LDA model before applying them to classify objects going forward. The suggested models, using three real sets of medical data were then tested, where the results indicated that using a combination of the two methods of MCA and PCA for extraction and LDA could reduce the model’s size, having a positive effect on classifying and better performance of the model since it leads towards minimising the leave-one-out error rate. Accordingly, the models proposed in this paper, including the strategy that was adapted was successful in presenting good results over the full LDA model. Regarding the indicators that were used to extract and to retain the variables in the model, cumulative variance explained (CVE), eigenvalue, and a non-significant shift in the CVE (constant change), could be considered a useful reference or guideline for practitioners experiencing similar issues in future.
Keywords: Classification, Linear discriminant analysis, Multiple correspondence analysis, Mixed variables, Principal component analysis.
1Suraya Alias, 1Mohd Shamrie Sainin, 2Siti Khaotijah Mohammad
1Faculty of Computing and Informatics, Universiti Malaysia Sabah, Malaysia
2School of Computer Sciences, Universiti Sains Malaysia, Malaysia;;
In the Automatic Text Summarization domain, a Sentence Compression (SC) technique is applied to the summary sentence to remove unnecessary words or phrases. The purpose of SC is to preserve the important information in the sentence and to remove the unnecessary ones without sacrificing the sentence's grammar. The existing development of Malay Natural Language Processing (NLP) tools is still under study with limited open access. The issue is the lack of a benchmark dataset in the Malay language to evaluate the quality of the summaries and to validate the compressed sentence produced by the summarizer model. Hence, our paper outlines a Syntactic-based Sentence Validation technique for Malay sentences by referring to the Malay Grammar Pattern. In this work, we propose a new derivation set of Syntactic Rules based on the Malay main Word Class to validate a Malay sentence that undergoes the SC procedure. We experimented using the Malay dataset of 100 new articles covering the Natural Disaster and Events domain to find the optimal compression rate and its effect on the summary content. An automatic evaluation using ROUGE (Recall-Oriented Understudy for Gisting Evaluation) produced a result with an average F-measure of 0.5826 and an average Recall value of 0.5925 with an optimum compression rate of 0.5 Confidence Conf value. Furthermore, a manual summary evaluation by a group of Malay experts on the grammaticality of the compressed summary sentence produced a good result of 4.11 and a readability score of 4.12 out of 5. This depicts the reliability of the proposed technique to validate the Malay sentence with promising summary content and readability results.
Keywords: Malay Text Summarization, Sentence Compression, Syntactic rules, POS, Parser.

1Anita Ramalingam & 2Subalalitha Chinnaudayar Navaneethakrishnan
1,2Department of Computer Science and Engineering, SRM Institute of Science and Technology, India, 
Tamil literature has many valuable thoughts that can help the human community to lead a successful and a happy life. Tamil literary works are abundantly available and searched on the World Wide Web (WWW), but the existing search systems follow a keyword-based match strategy which fails to satisfy the user needs. This necessitates the demand for a focused Information Retrieval System that semantically analyses the Tamil literary text which will eventually improve the search system performance. This paper proposes a novel Information Retrieval framework that uses discourse processing techniques which aids in semantic analysis and representation of the Tamil Literary text. The proposed framework has been tested using two ancient literary works, the Thirukkural and Naladiyar, which were written during 300 BCE. The Thirukkural comprises 1330 couplets, each 7 words long, while the Naladiyar consists of 400 quatrains, each 15 words long.  The proposed system, tested with all the 1330 Thirukkural couplets and 400 Naladiyar quatrains, achieved a mean average precision (MAP) score of 89%. The performance of the proposed framework has been compared with Google Tamil search and a keyword-based search which is a substandard version of the proposed framework. Google Tamil search achieved a MAP score of 56% and keyword-based method achieved a MAP score of 62% which shows that the discourse processing techniques improves the search performance of an Information Retrieval system.
Keywords: Discourse Parser, Morphological Analyzer, Inverted indexing, Ranking, Tamil Information Retrieval.
1 Hayder Naser Khraibet AL-Behadili & 2Ku Ruhana Ku-Mahamud
1Computer Science Department, Shatt Al-Arab University Collage, Iraq
2School of Computing, Universiti Utara Malaysia, Kedah, Malaysia; 
Diabetes classification is one of the most crucial applications of healthcare diagnosis. Even though various studies have been conducted in this application, the classification problem remains challenging. Fuzzy logic techniques have recently obtained impressive achievements in different application domains especially medical diagnosis. Fuzzy logic technique is not able to deal with data of a large number of input variables in constructing a classification model. In this research, a fuzzy logic technique using greedy hill climbing feature selection methods was proposed for the classification of diabetes. A dataset of 520 patients from the Hospital of Sylhet in Bangladesh was used to train and evaluate the proposed classifier. Six classification criteria were considered to authenticate the results of the proposed classifier. Comparative analysis proved the effectiveness of the proposed classifier against Naive Bayes, support vector machine, K-nearest neighbour, decision tree, and multilayer perceptron neural network classifiers. Results of the proposed classifier demonstrated the potential of fuzzy logic in analyzing diabetes patterns in all classification criteria.
Keywords: Data mining, Diabetes, Feature selection, Fuzzy logic, Machine learning.
Adil Yaseen Taha, Sabrina Tiun, Abdul Hadi Abd Rahman & Ali Sabah
Faculty of Information Science and Technology, Universiti Kebangsaan Malaysia, Malaysia; sabrinatiun;;
Simultaneous multiple labelling of documents, also known as multilabel text classification, will not perform optimally if the class is highly imbalanced. Class imbalanced entails skewness in the fundamental data for distribution that leads to more difficulty in classification. Random over-sampling and under-sampling are common approaches to solve the class imbalanced problem. However, these approaches have several drawbacks; the under-sampling is likely to dispose of useful data, whereas the over-sampling can heighten the probability of overfitting. Therefore, a new method that can avoid discarding useful data and overfitting problems is needed. This study proposes a method to tackle the class imbalanced problem by combining multilabel over-sampling and under-sampling with class alignment (ML-OUSCA). In the proposed ML-OUSCA, instead of using all the training instances, it draws a new training set by over-sampling small size classes and under-sampling big size classes. To evaluate our proposed ML-OUSCA, evaluation metrics of average precision, average recall and average F-measure on three benchmark datasets, namely, Reuters-21578, Bibtex, and Enron datasets, were performed. Experimental results showed that the proposed ML-OUSCA outperformed the chosen baseline random resampling approaches; K-means SMOTE and KNN-US. Thus, based on the results, we can conclude that designing a resampling method based on the class imbalanced together with class alignment will improve multilabel classification even better than just the random resampling method.
Keywords: Data mining, multilabel text classification, class imbalanced problem, resampling method, class alignment.
