CFP last date
15 April 2024
Reseach Article

Plagiarism Detection using Sequential Pattern Mining

by Ali El-matarawy, Mohammad El-ramly, Reem Bahgat
International Journal of Applied Information Systems
Foundation of Computer Science (FCS), NY, USA
Volume 5 - Number 2
Year of Publication: 2013
Authors: Ali El-matarawy, Mohammad El-ramly, Reem Bahgat
10.5120/ijais12-450846

Ali El-matarawy, Mohammad El-ramly, Reem Bahgat . Plagiarism Detection using Sequential Pattern Mining. International Journal of Applied Information Systems. 5, 2 ( January 2013), 24-29. DOI=10.5120/ijais12-450846

@article{ 10.5120/ijais12-450846,
author = { Ali El-matarawy, Mohammad El-ramly, Reem Bahgat },
title = { Plagiarism Detection using Sequential Pattern Mining },
journal = { International Journal of Applied Information Systems },
issue_date = { January 2013 },
volume = { 5 },
number = { 2 },
month = { January },
year = { 2013 },
issn = { 2249-0868 },
pages = { 24-29 },
numpages = {9},
url = { https://www.ijais.org/archives/volume5/number2/416-0846/ },
doi = { 10.5120/ijais12-450846 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2023-07-05T16:01:01.999309+05:30
%A Ali El-matarawy
%A Mohammad El-ramly
%A Reem Bahgat
%T Plagiarism Detection using Sequential Pattern Mining
%J International Journal of Applied Information Systems
%@ 2249-0868
%V 5
%N 2
%P 24-29
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This research presents a new technique for plagiarism detection using sequential pattern mining titled EgyCD. Over the last decade many techniques and tools for software clone detection have been proposed such as textual approaches, lexical approaches, syntactic approaches, semantic approaches …, etc. In this paper, the research explores the potential of data mining techniques in plagiarism detection. In particular, the research proposed a plagiarism technique based on sequential pattern mining (SPM), words/statements are treated as a sequence of transactions processed by the SPM algorithm to find frequent itemsets. The research submits an experiment to discover copy/paste in the text source and it gave good results in a reasonable and acceptable time.

References
  1. D. A. Black, Tracing Web Plagiarism – A guide for teachers, Internal Document, Department of Communication, Seton Hall University, Version 0. 3, Fall 1999.
  2. P. Clough ,Plagiarism in natural and programming languages: an overview of current tools and technologies, July 2000, Department of Computer Science, University of Sheffield
  3. L. R. Jones, Academic Integrity & Academic Dishonesty:A Handbook About Cheating & Plagiarism, Revised & Expanded Edition, Florida Institute of Technology, Melbourne, Florida.
  4. Schleimer, S. , Wilkerson, D. S. , Aiken, A. : Winnowing: local algorithms for document fingerprinting. In: SIGMOD '03: Proceedings of the 2003 ACM SIGMOD international conference on Management of data. pp. 76–85. ACM, New York, NY, USA (2003).
  5. Approaches for Intrinsic and External Plagiarism Detection Notebook for PAN at CLEF 2011, Gabriel Oberreuter, Gaston L'Huillier, Sebastián A. Ríos, and Juan D. Velásquez, Department of Industrial Engineering, University of Chile.
  6. Potthast, M. , Barrón-Cedeño, A. , Eiselt, A. , Stein, B. , Rosso, P. : Overview of the 2nd international competition on plagiarism detection. In: Braschler, M. , Harman, D. (eds. ) Notebook Papers of CLEF 2010 LABs and Workshops, 22-23 September, Padua, Italy (2010).
  7. Potthast, M. , Stein, B. , Eiselt, A. , Barrón-Cedeño, A. , Rosso, P. : Overview of the 1st international competition on plagiarism detection. In: Stein, B. , Rosso, P. , Stamatatos, E. , Koppel, M. , Agirre, E. (eds. ) SEPLN 2009 Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (PAN 09). pp. 1–9. CEUR-WS. org (Sep 2009), http://ceur-ws. org/Vol-502.
  8. A. B. Cede˜no, P. Rosso ,On Automatic Plagiarism Detection Based on n-Grams Comparison, Natural Language Engineering Lab. , Dpto. Sistemas Inform´aticos y Computaci´on, Universidad Polit´ecnica de Valencia, Spain.
  9. Lyon, C. , Barrett, R. , Malcolm, J. : A Theoretical Basis to the Automated Detection of Copying Between Texts, and its Practical Implementation in the Ferret Plagiarism and Collusion Detector. In: Plagiarism: Prevention, Practice and Policies Conference, Newcastle, UK (2004).
  10. Kang, N. , Gelbukh, A. : PPChecker: Plagiarism Pattern Checker in Document Copy Detection. In: Sojka, P. , Kope?cek, I. , Pala, K. (eds. ) TSD 2006. LNCS, vol. 4188, pp. 661–667. Springer, Heidelberg (2006).
  11. M. -S. Chen, J. Han, and P. S. Yu. Data mining: an overview from a database perspective. IEEE Trans. On Knowledge And Data Engineering 8, 866-883 (1996).
  12. Q. Zhao, S. S. Bhowmick, Sequential pattern mining: a survey, Technical Report Center for Advanced Information Systems, School of Computer Engineering, Nanyang Technological University, Singapore, (2003).
  13. C. Liu, C. Chen, J. Han and P. Yu, GPLAG: Detection of Software Plagiarism by Program Dependence Graph Analysis, in: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2006, pp. 872-881 (2006).
  14. Vera Wahler, Dietmar Seipel, J¨urgen Wolff v. Gudenberg, and Gregor Fischer. Clone Detection in Source Code by Frequent Itemset Techniques, Source Code Analysis and Manipulation, 2004. Fourth IEEE International Workshop on16-16 Sept. 2004.
  15. M. Gabel, L. Jiang and Z. Su, Scalable Detection of Semantic Clones, in: Proceedings of the 30th International Conference on Software Engineering, ICSE 2008, pp. 321-330 (2008).
  16. A. Leitlao, Detection of Redundant Code Using R2D2, Software Quality Journal, 12(4):361-382 (2004).
Index Terms

Computer Science
Information Sciences

Keywords

Plagiarism Detector Plagiarized Clones Textual Approach Lexical Approach Syntactic Approach Data Mining Apriori Property Sequential Pattern Mining