Google scholar arxiv informatics ads IJAIS publications are indexed with Google Scholar, NASA ADS, Informatics et. al.

Call for Paper


August Edition 2021

International Journal of Applied Information Systems solicits high quality original research papers for the August 2021 Edition of the journal. The last date of research paper submission is July 15, 2021.

Tokenization and Filtering Process in RapidMiner

Tanu Verma, Renu, Deepti Gaur Published in Information Sciences

International Journal of Applied Information Systems
Year of Publication: 2014
© 2013 by IJAIS Journal
Download full text
  1. Tanu Verma, Renu and Deepti Gaur. Article: Tokenization and Filtering Process in RapidMiner. International Journal of Applied Information Systems 7(2):16-18, April 2014. BibTeX

    	author = "Tanu Verma and Renu and Deepti Gaur",
    	title = "Article: Tokenization and Filtering Process in RapidMiner",
    	journal = "International Journal of Applied Information Systems",
    	year = 2014,
    	volume = 7,
    	number = 2,
    	pages = "16-18",
    	month = "April",
    	note = "Published by Foundation of Computer Science, New York, USA"


Text mining is defined as a knowledge-intensive process in which a user interacts with a document collection. As in data mining[2,4,9], text mining seeks to extract useful information from data sources through the identi?cation and exploration of interesting patterns. A key element of text mining is its focus on the document collection. A document collection can be any grouping of text-based documents. Most text mining solutions are aimed at discovering patterns across very large document collections. The number of documents can range from the many thousands to millions. In this paper, we will see how text mining is implemented in Rapidminer.


  1. R. Agrawal and R. Srikant. Fast algorithms for mining association rules in Proceedings of the 20th International Conference on Very Large Databases (VLDB-94), Chile, Sept. 1994.
  2. Margaret H. Dunham, Data Mining "Introduction and Advanced Topics".
  3. R. Baeza-Yates and B. Ribeiro-Neto, "Modern Information Retrieval" ACM Press, New York, 1999.
  4. Agrawal , T. lmielinski and A. Swami " Database mining: A performance perspective", IEEE Transactions on knowledge and Data Eng. , vol. 5, no. 6.
  5. M. E. Califf, editor. Papers from the Sixteenth National Conference on Arti?cial Intelligence(AAAI-99) Workshop on Machine Learning for Information Extraction, Orlando, FL, 1999. AAAI Press.
  6. M. E. Califf and R. J. Mooney, " Relational learning of pattern-match rules for information extraction" in Proceedings of the 16th National Conference on Arti?cial Intelligence(AAAI-99), pages 328–334, Orlando, FL, July 1999.
  7. C. Cardie, "Empirical methods in information extraction", AI Magazine, 18(4):65–79, 1997.
  8. C. Cardie and R. J. Mooney, "Machine learning and natural language (Introduction to special issue on natural language learning)" Machine Learning, 34:5–9, 1999.
  9. Jiawei Han and Micheline Kamber, "Data Mining Concepts and Techniques", Morgan Kaufmann Publisher, 722
  10. Yang Y M, "An evaluation of statistical approach to text categorization [R]" in Technical Report CMU - CS - 97-127. Computer Science Department, Carnegie Mellon University, 1997
  11. C. Choi and Y. Park "R&D proposal screening system based on text-mining approach", Int. J. Technol. Intell. Plan. , vol. 2, no. 1, pp. 61 -72 2006
  12. H. C. Yang and C. H. Lee "A text mining approach for automatic construction of hypertexts", Expert Syst. Appl. , vol. 29, no. 4, pp. 723 -734 2005
  13. Agrawal R, Imielinski T and Swami A, "Mining association rules between sets of items in large database[M]", Washington, DC: SIGMOD, 1993. 207-216.


Text mining, Tokenize, Filtering, Stop words, Stemming.