Research & Studies

Automated research methodology classification using machine learning

iASK researcher Zsolt T. Kosztyán has co-authored a brand new article. It was released in the journal titled Engineering Applications of Artificial Intelligence in 2025.

Abstract

Scientific papers have become the primary means for disseminating scientific research, and thus, the ability to classify research papers based on different aspects has become essential. Therefore, many works have developed classification approaches; however, they focused solely on research topic-based classification. In addition, no solution has been developed to classify papers based on the applied methodology, and finally, the accuracy of the existing paper classification methods is not satisfactory. In this study, a novel automated classification methodology using a refined Extreme Gradient boosting (XGBoost) model is presented to classify the research methods employed in scientific papers. Three article sets, including quantitative and qualitative research methods, were collected from the topics of tourism, medical science and information systems, consisting of 229, 557 and 787 papers, respectively. The classification problem was considered a binary classification task to maintain interpretability. The developed model was trained and tested on article set 1 (tourism) and 2 (medical science), and then, the proposed model was applied to article set 3, (information systems and tourism). The high accuracy achieved in different research fields (90%–95% accuracies on average) indicates that the proposed classification model is generalizable because it can be successfully applied in many disciplines. The automated classifier enables the rapid acquisition of vital information and the identification of significant differences among the applied methodologies in various research domains. A future development direction will be to increase the scalability of the proposed model to achieve efficient operations on large volumes of research papers.

Keywords: Machine learning, Classification, Applied research methods, Extreme Gradient Boosting, Large language models

The article is available HERE with full text.