CFP last date
17 June 2024
Reseach Article

Tokenization and Filtering Process in RapidMiner

by Tanu Verma, Renu, Deepti Gaur
International Journal of Applied Information Systems
Foundation of Computer Science (FCS), NY, USA
Volume 7 - Number 2
Year of Publication: 2014
Authors: Tanu Verma, Renu, Deepti Gaur
10.5120/ijais14-451139

Tanu Verma, Renu, Deepti Gaur . Tokenization and Filtering Process in RapidMiner. International Journal of Applied Information Systems. 7, 2 ( April 2014), 16-18. DOI=10.5120/ijais14-451139

@article{ 10.5120/ijais14-451139,
author = { Tanu Verma, Renu, Deepti Gaur },
title = { Tokenization and Filtering Process in RapidMiner },
journal = { International Journal of Applied Information Systems },
issue_date = { April 2014 },
volume = { 7 },
number = { 2 },
month = { April },
year = { 2014 },
issn = { 2249-0868 },
pages = { 16-18 },
numpages = {9},
url = { https://www.ijais.org/archives/volume7/number2/620-1139/ },
doi = { 10.5120/ijais14-451139 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2023-07-05T18:54:38.635493+05:30
%A Tanu Verma
%A Renu
%A Deepti Gaur
%T Tokenization and Filtering Process in RapidMiner
%J International Journal of Applied Information Systems
%@ 2249-0868
%V 7
%N 2
%P 16-18
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Text mining is defined as a knowledge-intensive process in which a user interacts with a document collection. As in data mining[2,4,9], text mining seeks to extract useful information from data sources through the identi?cation and exploration of interesting patterns. A key element of text mining is its focus on the document collection. A document collection can be any grouping of text-based documents. Most text mining solutions are aimed at discovering patterns across very large document collections. The number of documents can range from the many thousands to millions. In this paper, we will see how text mining is implemented in Rapidminer.

References
  1. R. Agrawal and R. Srikant. Fast algorithms for mining association rules in Proceedings of the 20th International Conference on Very Large Databases (VLDB-94), Chile, Sept. 1994.
  2. Margaret H. Dunham, Data Mining "Introduction and Advanced Topics".
  3. R. Baeza-Yates and B. Ribeiro-Neto, "Modern Information Retrieval" ACM Press, New York, 1999.
  4. Agrawal , T. lmielinski and A. Swami " Database mining: A performance perspective", IEEE Transactions on knowledge and Data Eng. , vol. 5, no. 6.
  5. M. E. Califf, editor. Papers from the Sixteenth National Conference on Arti?cial Intelligence(AAAI-99) Workshop on Machine Learning for Information Extraction, Orlando, FL, 1999. AAAI Press.
  6. M. E. Califf and R. J. Mooney, " Relational learning of pattern-match rules for information extraction" in Proceedings of the 16th National Conference on Arti?cial Intelligence(AAAI-99), pages 328–334, Orlando, FL, July 1999.
  7. C. Cardie, "Empirical methods in information extraction", AI Magazine, 18(4):65–79, 1997.
  8. C. Cardie and R. J. Mooney, "Machine learning and natural language (Introduction to special issue on natural language learning)" Machine Learning, 34:5–9, 1999.
  9. Jiawei Han and Micheline Kamber, "Data Mining Concepts and Techniques", Morgan Kaufmann Publisher, 722
  10. Yang Y M, "An evaluation of statistical approach to text categorization [R]" in Technical Report CMU - CS - 97-127. Computer Science Department, Carnegie Mellon University, 1997
  11. C. Choi and Y. Park "R&D proposal screening system based on text-mining approach", Int. J. Technol. Intell. Plan. , vol. 2, no. 1, pp. 61 -72 2006
  12. H. C. Yang and C. H. Lee "A text mining approach for automatic construction of hypertexts", Expert Syst. Appl. , vol. 29, no. 4, pp. 723 -734 2005
  13. Agrawal R, Imielinski T and Swami A, "Mining association rules between sets of items in large database[M]", Washington, DC: SIGMOD, 1993. 207-216.
Index Terms

Computer Science
Information Sciences

Keywords

Text mining Tokenize Filtering Stop words Stemming.