Code Plagiarism Detection Using Graphic Neural Network Based On Abstract Syntax Tree

Authors

  • Fitra Affandi Hasibuan Universitas Muhammadiyah Sumatera Utara
  • Al-Khowarizmi Universitas Muhammadiyah Sumatera Utara

DOI:

https://doi.org/10.62123/aqila.v3i1.177

Keywords:

Code Plagiarism, Graph Neural Network, Abstract Syntax Tree, Cosine Similarity, Similarity Detection

Abstract

Code plagiarism is a common issue in education and software development, which is difficult to detect accurately using text-based approaches. Conventional methods such as Term Frequency–Inverse Document Frequency (TF-IDF) and cosine similarity tend to focus only on token similarity, making them less effective in handling structural changes in code. Therefore, this study aims to develop a structure-based code plagiarism detection system using Abstract Syntax Tree (AST) and Graph Neural Network (GNN). The proposed method involves parsing source code into AST, representing it as a graph, and processing it using a GNN model in a pairwise scheme. In addition, a comparison is conducted with a baseline method based on TF-IDF and cosine similarity to evaluate model performance. The dataset used consists of both synthetic and real data, which are divided into training and testing sets. The results show that the GNN model achieves excellent performance with an accuracy of 0.9946, precision of 0.9949, recall of 0.9974, and F1-score of 0.9962, while the baseline method only achieves an accuracy of 0.7392 and a recall of 0.6343. These results indicate that the GNN model is more effective in detecting plagiarism, especially in handling structural code modifications. Therefore, it can be concluded that the structure-based approach using AST and GNN outperforms text-based approaches in code plagiarism detection.

References

[1] M. Zakeri-Nasrabadi, S. Parsa, M. Ramezani, C. Roy, and M. Ekhtiarzadeh, “A systematic literature review on source code similarity measurement and clone detection: Techniques, applications, and challenges,” Journal of Systems and Software, vol. 204, p. 111796, Oct. 2023, doi: 10.1016/j.jss.2023.111796.

[2] I. G. A. E. Putra and I. W. Supriana, “Deteksi Plagiarisme Source Code Tugas Mahasiswa Menggunakan Algoritma Cosine Similarity Dan Pembobotan TF-IDF,” J. Nas. Teknol. Inf. dan Apl, vol. 1, no. 1, p. 575, 2022.

[3] Z. Zhang and T. Saber, “Exploring the Boundaries Between LLM Code Clone Detection and Code Similarity Assessment on Human and AI-Generated Code,” Big Data and Cognitive Computing, vol. 9, no. 2, p. 41, Feb. 2025, doi: 10.3390/bdcc9020041.

[4] R. Maertens et al., “Discovering and exploring cases of educational source code plagiarism with Dolos,” SoftwareX, vol. 26, p. 101755, May 2024, doi: 10.1016/j.softx.2024.101755.

[5] Z. Dong, Q. Hu, Z. Zhang, and J. Zhao, “On the effectiveness of graph data augmentation for source code learning,” Knowl. Based. Syst., vol. 285, p. 111328, Feb. 2024, doi: 10.1016/j.knosys.2023.111328.

[6] O. O. Büyük and A. Nizam, “Deep learning with class-level abstract syntax tree and code histories for detecting code modification requirements,” Journal of Systems and Software, vol. 206, p. 111851, Dec. 2023, doi: 10.1016/j.jss.2023.111851.

[7] X. Guo and J. Ma, “Heritage applications of landscape design in environmental art based on image style migration,” Results in Engineering, vol. 20, p. 101485, Dec. 2023, doi: 10.1016/j.rineng.2023.101485.

[8] J. Zhou et al., “Graph neural networks: A review of methods and applications,” AI Open, vol. 1, pp. 57–81, 2020, doi: 10.1016/j.aiopen.2021.01.001.

[9] C. Chen et al., “A Survey on Graph Neural Networks and Graph Transformers in Computer Vision: A Task-Oriented Perspective,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 12, pp. 10297–10318, Dec. 2024, doi: 10.1109/TPAMI.2024.3445463.

[10] Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A Comprehensive Survey on Graph Neural Networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 1, pp. 4–24, Jan. 2021, doi: 10.1109/TNNLS.2020.2978386.

[11] Y. Wu and J. Wan, “A survey of text classification based on pre-trained language model,” Neurocomputing, vol. 616, p. 128921, 2025.

[12] C. K. Roy, J. R. Cordy, and R. Koschke, “Comparison and evaluation of code clone detection techniques and tools: A qualitative approach,” Sci. Comput. Program., vol. 74, no. 7, pp. 470–495, May 2009, doi: 10.1016/j.scico.2009.02.007.

[13] C. Li, A. Sirikham, J. Konpang, and Y. Wang, “Code Clone Detection With Self-Supervision on Dual Graphs,” Operational Research in Engineering Sciences: Theory and Applications, vol. 7, no. 4, 2024.

[14] J. Guo, J. Liu, X. Liu, Y. Wan, and L. Li, “Summarizing source code with Heterogeneous Syntax Graph and dual position,” Inf. Process. Manag., vol. 60, no. 5, p. 103415, Sep. 2023, doi: 10.1016/j.ipm.2023.103415.

[15] Y. Zhang, J. Yang, and O. Ruan, “Cross-language Source Code Clone Detection Based On Graph Neural Network,” in Proceedings of the 2024 3rd International Conference on Cryptography, Network Security and Communication Technology, New York, NY, USA: ACM, Jan. 2024, pp. 189–194. doi: 10.1145/3673277.3673310.

[16] Q. Yu, X. Liu, Q. Zhou, J. Zhuge, and C. Wu, “Code classification with graph neural networks: Have you ever struggled to make it work?,” Expert Syst. Appl., vol. 233, p. 120978, Dec. 2023, doi: 10.1016/j.eswa.2023.120978.

[17] G. Yang, T. Jin, and L. Dou, “Heterogeneous Directed Hypergraph Neural Network over abstract syntax tree (AST) for Code Classification,” Jul. 2023, pp. 274–279. doi: 10.18293/SEKE2023-136.

[18] F. Ebrahim and M. Joy, “Source Code Plagiarism Detection with Pre-Trained Model Embeddings and Automated Machine Learning,” in Proceedings of the Conference Recent Advances in Natural Language Processing - Large Language Models for Natural Language Processings, INCOMA Ltd., Shoumen, BULGARIA, 2023, pp. 301–309. doi: 10.26615/978-954-452-092-2_034.

[19] F. Ebrahim and M. Joy, “Semantic Similarity Search for Source Code Plagiarism Detection: An Exploratory Study,” in Proceedings of the 2024 on Innovation and Technology in Computer Science Education V. 1, New York, NY, USA: ACM, Jul. 2024, pp. 360–366. doi: 10.1145/3649217.3653622.

[20] H. Yang, Z. Li, and X. Guo, “A Novel Source Code Clone Detection Method Based on Dual-GCN and IVHFS,” Electronics (Basel)., vol. 12, no. 6, p. 1315, Mar. 2023, doi: 10.3390/electronics12061315.

[21] Z. Li, H. Lei, Z. Ma, and F. Zhang, “Code Similarity Prediction Model for Industrial Management Features Based on Graph Neural Networks,” Entropy, vol. 26, no. 6, p. 505, Jun. 2024, doi: 10.3390/e26060505.

[22] D. Yu, Q. Yang, X. Chen, J. Chen, and Y. Xu, “Graph-based code semantics learning for efficient semantic code clone detection,” Inf. Softw. Technol., vol. 156, p. 107130, Apr. 2023, doi: 10.1016/j.infsof.2022.107130.

Downloads

Published

2026-06-29

How to Cite

Hasibuan, F. A., & Al-Khowarizmi. (2026). Code Plagiarism Detection Using Graphic Neural Network Based On Abstract Syntax Tree. Acceleration, Quantum, Information Technology and Algorithm Journal, 3(1), 29–34. https://doi.org/10.62123/aqila.v3i1.177

Similar Articles

You may also start an advanced similarity search for this article.