From Complexity to Clarity: Improving Microarray Classification with Correlation-Based Feature Selection
DOI:
https://doi.org/10.62486/latia202584Keywords:
Feature selection, gene expression data, Correlation-based Feature Selection algorithm, Decision Table, JRip, and OneRAbstract
Gene microarray classification is yet a difficult task because of the bigness of the data and limited number of samples available. Thus, the need for efficient selection of a subset of genes is necessary to cut down on computation costs and improve classification performance. Consistently, this study employs the Correlation-based Feature Selection (CFS) algorithm to identify a subset of informative genes, thereby decreasing data dimensions and isolating discriminative features. Thereafter, three classifiers, Decision Table, JRip and OneR were used to assess the classification performance. The strategy was implemented on eleven microarray samples such that the reduced samples were compared with the complete gene set results. The observed results lead to a conclusion that CFS efficiently eliminates irrelevant, redundant, and noisy features as well. This method showed great prediction opportunities and relevant gene differentiation for datasets. JRip performed best among the Decision Table and OneR by average accuracy in all mentioned datasets. However, this approach has many advantages and enhances the classification of several classes with large numbers of genes and high time complexity.
References
Wahed MA, Alqaraleh M, Alzboon MS, Al-Batah MS. Application of Artificial Intelligence for Diagnosing Tumors in the Female Reproductive System: A Systematic Review. Multidiscip. 2025;3:54. DOI: https://doi.org/10.62486/agmu202554
Wahed MA, Alqaraleh M, Alzboon MS, Al-Batah MS. Evaluating AI and Machine Learning Models in Breast Cancer Detection: A Review of Convolutional Neural Networks (CNN) and Global Research Trends. LatIA. 2025;3:117. DOI: https://doi.org/10.62486/latia2025117
Muhyeeddin Alqaraleh, Mohammad Al-Batah, Mowafaq Salem Alzboon EA. Automated quantification of vesicoureteral reflux using machine learning with advancing diagnostic precision. Data Metadata. 2025;4:460. DOI: https://doi.org/10.56294/dm2025460
Al-shanableh N, Alzyoud M, Al-husban RY, Alshanableh NM, Al-Oun A, Al-Batah MS, et al. Advanced Ensemble Machine Learning Techniques for Optimizing Diabetes Mellitus Prognostication: A Detailed Examination of Hospital Data. Data Metadata. 2024;3:363. DOI: https://doi.org/10.56294/dm2024.363
Al-Batah MS, Salem Alzboon M, Solayman Migdadi H, Alkhasawneh M, Alqaraleh M. Advanced Landslide Detection Using Machine Learning and Remote Sensing Data. Data Metadata [Internet]. 2024 Oct 7;3. Available from: https://dm.ageditor.ar/index.php/dm/article/view/419/782 DOI: https://doi.org/10.56294/dm2024.419
Alqaraleh M, Abdel M. Advancing Medical Image Analysis : The Role of Adaptive Optimization Techniques in Enhancing COVID-19 Detection , Lung Infection , and Tumor Segmentation Avances en el análisis de imágenes médicas : el papel de las técnicas de optimización adaptativa para. LatIA. 2024;2(74). DOI: https://doi.org/10.62486/latia202474
Alzboon MS, Alqaraleh M, Al-Batah MS. AI in the Sky: Developing Real-Time UAV Recognition Systems to Enhance Military Security. Data Metadata. 2024;3(417). DOI: https://doi.org/10.56294/dm2024.417
Mohammad Al-Batah, Mowafaq Salem Alzboon, Muhyeeddin Alqaraleh FA. Comparative Analysis of Advanced Data Mining Methods for Enhancing Medical Diagnosis and Prognosis. Data Metadata. 2024;3:465. DOI: https://doi.org/10.56294/dm2024.465
Ahmad A, Alzboon MS, Alqaraleh MK. Comparative Study of Classification Mechanisms of Machine Learning on Multiple Data Mining Tool Kits. Am J Biomed Sci Res 2024 [Internet]. 2024;22(1):577–9. Available from: www.biomedgrid.com
Al-Batah MS, Alzboon MS, Alzyoud M, Al-Shanableh N. Enhancing Image Cryptography Performance with Block Left Rotation Operations. Appl Comput Intell Soft Comput. 2024;2024(1):3641927. DOI: https://doi.org/10.1155/2024/3641927
Alqaraleh M, Alzboon MS, Al-Batah MS, Wahed MA, Abuashour A, Alsmadi FH. Harnessing Machine Learning for Quantifying Vesicoureteral Reflux: A Promising Approach for Objective Assessment. Int J Online & Biomed Eng. 2024;20(11). DOI: https://doi.org/10.3991/ijoe.v20i11.49673
Alzboon MS, Al-Batah M, Alqaraleh M, Abuashour A, Bader AF. A Comparative Study of Machine Learning Techniques for Early Prediction of Diabetes. In: 2023 IEEE 10th International Conference on Communications and Networking, ComNet 2023 - Proceedings. 2023. p. 1–12. DOI: https://doi.org/10.1109/ComNet60156.2023.10366688
Alzboon MS, Al-Batah M, Alqaraleh M, Abuashour A, Bader AF. A Comparative Study of Machine Learning Techniques for Early Prediction of Prostate Cancer. In: 2023 IEEE 10th International Conference on Communications and Networking, ComNet 2023 - Proceedings. 2023. p. 1–12. DOI: https://doi.org/10.1109/ComNet60156.2023.10366703
Alzboon MS, Al-Batah MS, Alqaraleh M, Abuashour A, Bader AFH. Early Diagnosis of Diabetes: A Comparison of Machine Learning Methods. Int J online Biomed Eng. 2023;19(15):144–65. DOI: https://doi.org/10.3991/ijoe.v19i15.42417
Al-Batah MS, Alzboon MS, Alazaidah R. Intelligent Heart Disease Prediction System with Applications in Jordanian Hospitals. Int J Adv Comput Sci Appl. 2023;14(9):508–17. DOI: https://doi.org/10.14569/IJACSA.2023.0140954
Dash R. An Adaptive Harmony Search Approach for Gene Selection and Classification of High Dimensional Medical Data. J King Saud Univ - Comput Inf Sci. 2021;33(2):195–207. DOI: https://doi.org/10.1016/j.jksuci.2018.02.013
Khan J, Wei JS, Ringnér M, Saal LH, Ladanyi M, Westermann F, et al. Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat Med. 2001;7(6):673–9. DOI: https://doi.org/10.1038/89044
Cheung LWK. Classification approaches for microarray gene expression data analysis. Methods Mol Biol. 2012;802:73–85. DOI: https://doi.org/10.1007/978-1-61779-400-1_5
Shen L, Jiang H, He M, Liu G. Collaborative representation-based classification of microarray gene expression data. PLoS One. 2017;12(12):e0189533. DOI: https://doi.org/10.1371/journal.pone.0189533
Ruskin H. Computational Modeling and Analysis of Microarray Data: New Horizons. Microarrays. 2016;5(4):26. DOI: https://doi.org/10.3390/microarrays5040026
Jain I, Jain VK, Jain R. Correlation feature selection based improved-Binary Particle Swarm Optimization for gene selection and cancer classification. Appl Soft Comput. 2018;62:203–15. DOI: https://doi.org/10.1016/j.asoc.2017.09.038
Huang HH, Liu XY, Liang Y. Feature selection and cancer classification via sparse logistic regression with the hybrid L1/2 +2 regularization. PLoS One [Internet]. 2016;11(5):e0149675. Available from: https://doi.org/10.1371/journal.pone.0149675 DOI: https://doi.org/10.1371/journal.pone.0149675
Kumar M, Rath NK, Swain A, Rath SK. Feature Selection and Classification of Microarray Data using MapReduce based ANOVA and K-Nearest Neighbor. In: Procedia Computer Science. 2015. p. 301–10. DOI: https://doi.org/10.1016/j.procs.2015.06.035
Hameed SS, Muhammad FF, Hassan R, Saeed F. Gene selection and classification in microarray datasets using a hybrid approach of PCC-BPSO/GA with multi classifiers. J Comput Sci. 2018;14(6):868–80. DOI: https://doi.org/10.3844/jcssp.2018.868.880
Ghaddar B, Naoum-Sawaya J. High dimensional data classification and feature selection using support vector machines. Eur J Oper Res. 2018;265(3):993–1004. DOI: https://doi.org/10.1016/j.ejor.2017.08.040
Czajkowski M, Grześ M, Kretowski M. Multi-test decision tree and its application to microarray data classification. Artif Intell Med. 2014;61(1):35–44. DOI: https://doi.org/10.1016/j.artmed.2014.01.005
Agrawal S, Agrawal J. Neural network techniques for cancer prediction: A survey. In: Procedia Computer Science. 2015. p. 769–74. DOI: https://doi.org/10.1016/j.procs.2015.08.234
Alzboon MS, Qawasmeh S, Alqaraleh M, Abuashour A, Bader AF, Al-Batah M. Machine Learning Classification Algorithms for Accurate Breast Cancer Diagnosis. In: 2023 3rd International Conference on Emerging Smart Technologies and Applications, eSmarTA 2023. 2023. DOI: https://doi.org/10.1109/eSmarTA59349.2023.10293415
Alzboon MS, Al-Batah MS. Prostate Cancer Detection and Analysis using Advanced Machine Learning. Int J Adv Comput Sci Appl. 2023;14(8):388–96. DOI: https://doi.org/10.14569/IJACSA.2023.0140843
Alzboon MS, Qawasmeh S, Alqaraleh M, Abuashour A, Bader AF, Al-Batah M. Pushing the Envelope: Investigating the Potential and Limitations of ChatGPT and Artificial Intelligence in Advancing Computer Science Research. In: 2023 3rd International Conference on Emerging Smart Technologies and Applications, eSmarTA 2023. 2023. DOI: https://doi.org/10.1109/eSmarTA59349.2023.10293294
Alzboon MS, Bader AF, Abuashour A, Alqaraleh MK, Zaqaibeh B, Al-Batah M. The Two Sides of AI in Cybersecurity: Opportunities and Challenges. In: Proceedings of 2023 2nd International Conference on Intelligent Computing and Next Generation Networks, ICNGN 2023. 2023. DOI: https://doi.org/10.1109/ICNGN59831.2023.10396670
Alzboon M. Semantic Text Analysis on Social Networks and Data Processing: Review and Future Directions. Inf Sci Lett. 2022;11(5):1371–84. DOI: https://doi.org/10.18576/isl/110506
Alzboon MS. Survey on Patient Health Monitoring System Based on Internet of Things. Inf Sci Lett. 2022;11(4):1183–90. DOI: https://doi.org/10.18576/isl/110418
Alzboon MS, Aljarrah E, Alqaraleh M, Alomari SA. Nodexl Tool for Social Network Analysis. Vol. 12, Turkish Journal of Computer and Mathematics Education. 2021.
Al-Batah MS, Al-Eiadeh MR. An improved discreet Jaya optimisation algorithm with mutation operator and opposition-based learning to solve the 0-1 knapsack problem. Int J Math Oper Res. 2023;26(2):143-69. DOI: https://doi.org/10.1504/IJMOR.2023.134491
Alomari SA, Alqaraleh M, Aljarrah E, Alzboon MS. Toward achieving self-resource discovery in distributed systems based on distributed quadtree. J Theor Appl Inf Technol. 2020;98(20):3088–99.
Al-Batah MS, Al-Eiadeh MR. An improved binary crow-JAYA optimisation system with various evolution operators, such as mutation for finding the max clique in the dense graph. Int J Comput Sci Math. 2024;19(4):327-38. DOI: https://doi.org/10.1504/IJCSM.2024.139088
Al-Batah M, Zaqaibeh B, Alomari SA, Alzboon MS. Gene Microarray Cancer classification using correlation based feature selection algorithm and rules classifiers. Int J online Biomed Eng. 2019;15(8):62–73. DOI: https://doi.org/10.3991/ijoe.v15i08.10617
Al-Batah MS. Modified recursive least squares algorithm to train the hybrid multilayered perceptron (HMLP) network. Appl Soft Comput. 2010;10(1):236-44. DOI: https://doi.org/10.1016/j.asoc.2009.06.018
Al Tal S, Al Salaimeh S, Ali Alomari S, Alqaraleh M. The modern hosting computing systems for small and medium businesses. Acad Entrep J. 2019;25(4):1–7.
Al-Batah MS. Testing the probability of heart disease using classification and regression tree model. Annu Res Rev Biol. 2014;4(11):1713-25. DOI: https://doi.org/10.9734/ARRB/2014/7786
Alzboon MS. Internet of things between reality or a wishing-list: a survey. Int J Eng & Technol. 2018;7(2):956–61.
Al-Batah MS. Integrating the principal component analysis with partial decision tree in microarray gene data. IJCSNS Int J Comput Sci Netw Secur. 2019;19(3):24-29.
Alzboon M, Alomari SA, Al-Batah MS, Banikhalaf M. The characteristics of the green internet of things and big data in building safer, smarter, and sustainable cities. Int J Eng & Technol. 2017;6(3):83–92.
Al-Batah MS. Ranked features selection with MSBRG algorithm and rules classifiers for cervical cancer. Int J Online Biomed Eng. 2019;15(12):4. DOI: https://doi.org/10.3991/ijoe.v15i12.10803
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Muhyeeddin Alqaraleh, Mowafaq Salem Alzboon, Mohammad Subhi Al-Batah, Hatim Solayman Migdadi (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
The article is distributed under the Creative Commons Attribution 4.0 License. Unless otherwise stated, associated published material is distributed under the same licence.