Tokenization and Filtering Process in RapidMiner

Tanu Verma; Renu; Deepti Gaur

Call for Paper

July Edition

IJAIS solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 29 June 2026

Submit your paper

Know more

The week's pick

Optimized Decision Tree Classifier for Data Aggregation in Wireless Sensor Networks Using IoT Sensor Data

Jagan Kurma Raghuvaran Kendyala Varun Bitkuri Avinash Attipalli Jaya Vardhani Mamidala Sunil Jacob Enokkaren

Random Articles

Eliminating Noisy Information in Web Pages using featured DOM tree

May

2012

A Web-based E-Library System for Tertiary Institutions

May

2017

Numerical Modeling of the Role of Reverse Parameter in a range of Population Inversions of Differential rate Equations of Tm-doped Material

Feb

2018

Effect of Ensemble Methods for Software Fault Prediction at Various Metrics Level

January

2013

Reseach Article

Tokenization and Filtering Process in RapidMiner

by Tanu Verma, Renu, Deepti Gaur

International Journal of Applied Information Systems

Foundation of Computer Science (FCS), NY, USA

Volume 7 - Number 2

Year of Publication: 2014

Authors: Tanu Verma, Renu, Deepti Gaur

10.5120/ijais14-451139

Tanu Verma, Renu, Deepti Gaur . Tokenization and Filtering Process in RapidMiner. International Journal of Applied Information Systems. 7, 2 ( April 2014), 16-18. DOI=10.5120/ijais14-451139

@article{ 10.5120/ijais14-451139,

author = { Tanu Verma, Renu, Deepti Gaur },

title = { Tokenization and Filtering Process in RapidMiner },

journal = { International Journal of Applied Information Systems },

issue_date = { April 2014 },

volume = { 7 },

number = { 2 },

month = { April },

year = { 2014 },

issn = { 2249-0868 },

pages = { 16-18 },

numpages = {9},

url = { https://www.ijais.org/archives/volume7/number2/620-1139/ },

doi = { 10.5120/ijais14-451139 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2023-07-05T18:54:38.635493+05:30

%A Tanu Verma

%A Renu

%A Deepti Gaur

%T Tokenization and Filtering Process in RapidMiner

%J International Journal of Applied Information Systems

%@ 2249-0868

%V 7

%N 2

%P 16-18

%D 2014

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Text mining is defined as a knowledge-intensive process in which a user interacts with a document collection. As in data mining[2,4,9], text mining seeks to extract useful information from data sources through the identi?cation and exploration of interesting patterns. A key element of text mining is its focus on the document collection. A document collection can be any grouping of text-based documents. Most text mining solutions are aimed at discovering patterns across very large document collections. The number of documents can range from the many thousands to millions. In this paper, we will see how text mining is implemented in Rapidminer.

References

R. Agrawal and R. Srikant. Fast algorithms for mining association rules in Proceedings of the 20th International Conference on Very Large Databases (VLDB-94), Chile, Sept. 1994.
Margaret H. Dunham, Data Mining "Introduction and Advanced Topics".
R. Baeza-Yates and B. Ribeiro-Neto, "Modern Information Retrieval" ACM Press, New York, 1999.
Agrawal , T. lmielinski and A. Swami " Database mining: A performance perspective", IEEE Transactions on knowledge and Data Eng. , vol. 5, no. 6.
M. E. Califf, editor. Papers from the Sixteenth National Conference on Arti?cial Intelligence(AAAI-99) Workshop on Machine Learning for Information Extraction, Orlando, FL, 1999. AAAI Press.
M. E. Califf and R. J. Mooney, " Relational learning of pattern-match rules for information extraction" in Proceedings of the 16th National Conference on Arti?cial Intelligence(AAAI-99), pages 328–334, Orlando, FL, July 1999.
C. Cardie, "Empirical methods in information extraction", AI Magazine, 18(4):65–79, 1997.
C. Cardie and R. J. Mooney, "Machine learning and natural language (Introduction to special issue on natural language learning)" Machine Learning, 34:5–9, 1999.
Jiawei Han and Micheline Kamber, "Data Mining Concepts and Techniques", Morgan Kaufmann Publisher, 722
Yang Y M, "An evaluation of statistical approach to text categorization [R]" in Technical Report CMU - CS - 97-127. Computer Science Department, Carnegie Mellon University, 1997
C. Choi and Y. Park "R&D proposal screening system based on text-mining approach", Int. J. Technol. Intell. Plan. , vol. 2, no. 1, pp. 61 -72 2006
H. C. Yang and C. H. Lee "A text mining approach for automatic construction of hypertexts", Expert Syst. Appl. , vol. 29, no. 4, pp. 723 -734 2005
Agrawal R, Imielinski T and Swami A, "Mining association rules between sets of items in large database[M]", Washington, DC: SIGMOD, 1993. 207-216.

Index Terms

Computer Science

Information Sciences

Keywords

Text mining Tokenize Filtering Stop words Stemming.