Google scholar arxiv informatics ads IJAIS publications are indexed with Google Scholar, NASA ADS, Informatics et. al.

Call for Paper


July Edition 2023

International Journal of Applied Information Systems solicits high quality original research papers for the July 2023 Edition of the journal. The last date of research paper submission is June 15, 2023.

Approach for Transforming Monolingual Text Corpus into XML Corpus

Deepak Sharma, Prakash. R. Devale Published in

International Journal of Applied Information Systems
Year of Publication 2012
© 2010 by IJAIS Journal
Info Co-published with IJCA
Download full text
  1. Deepak Sharma and Prakash.r.devale. Article: Approach for Transforming Monolingual Text Corpus into XML Corpus. International Journal of Applied Information Systems 1(9):1-5, April 2012. BibTeX

    	author = "Deepak Sharma and Prakash.r.devale",
    	title = "Article: Approach for Transforming Monolingual Text Corpus into XML Corpus",
    	journal = "International Journal of Applied Information Systems",
    	year = 2012,
    	volume = 1,
    	number = 9,
    	pages = "1-5",
    	month = "April",
    	note = "Published by Foundation of Computer Science, New York, USA"


In this paper, we are presenting the approach to convert the text based monolingual corpus to Part-Of-Speech tagging using an standard tagging tool in tagged file and then convert tagged file in the XML format as per defined DTD (Document Type Definition). The tagged text document is parsed through the logic to generate the corpus in XML and also, it can be further used for Information Retrieval, Text-To-Speech conversion, Word Sense Disambiguation and also useful for preprocessing step of parsing by providing unique tag to each word which reduces the number of parses.


  1. Andrew MacKinlay and Timothy Baldwin, "POS Tagging with a More Informative Tagset", at Proceedings of the Australasian Language Technology Workshop 2005, pages 40–48, Sydney, Australia, December 2005.
  2. Christopher D. Manning, Part-Of-Speech Tagging From 97% To 100%: Is It Time For Some Linguistics?, in CICLing2011.
  3. Su Cheng Haw, G. S. V. Radha Krishna Rao,,"A Comparative Study and Benchmarking on XML Parsers", Faculty of Information Technology, Multimedia University, 63100 Cyberjaya.
  4. Edwin Goei, Software Engineer, Sun Microsystems," Java and XML Parsing Using Standard APIs", September 11, 2000
  5. Nishchal Bhalla, Sahba Kazerooni,"Web Services Vulnerabilities", at Security Compass Inc 2007.
  6. C. Ramisch, A. Villavicencio, C. Boitet, Mwetoolkit: A Framework For Multiword Expression Identification", in: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC 2010), Valetta, Malta, May 2010


Part-of-speech Tagging, Java Xml Library, Dom Parser