Tokenization and Filtering Process in RapidMiner

Tanu Verma, Renu, Deepti Gaur Published in Information Sciences

International Journal of Applied Information Systems
Year of Publication: 2014
© 2013 by IJAIS Journal
Download full text
  1. Tanu Verma, Renu and Deepti Gaur. Article: Tokenization and Filtering Process in RapidMiner. International Journal of Applied Information Systems 7(2):16-18, April 2014. BibTeX

    	author = "Tanu Verma and Renu and Deepti Gaur",
    	title = "Article: Tokenization and Filtering Process in RapidMiner",
    	journal = "International Journal of Applied Information Systems",
    	year = 2014,
    	volume = 7,
    	number = 2,
    	pages = "16-18",
    	month = "April",
    	note = "Published by Foundation of Computer Science, New York, USA"


Text mining is defined as a knowledge-intensive process in which a user interacts with a document collection. As in data mining[2,4,9], text mining seeks to extract useful information from data sources through the identi?cation and exploration of interesting patterns. A key element of text mining is its focus on the document collection. A document collection can be any grouping of text-based documents. Most text mining solutions are aimed at discovering patterns across very large document collections. The number of documents can range from the many thousands to millions. In this paper, we will see how text mining is implemented in Rapidminer.


Text mining, Tokenize, Filtering, Stop words, Stemming.