Google scholar arxiv informatics ads IJAIS publications are indexed with Google Scholar, NASA ADS, Informatics et. al.

Call for Paper

-

August Edition 2021

International Journal of Applied Information Systems solicits high quality original research papers for the August 2021 Edition of the journal. The last date of research paper submission is July 15, 2021.

A Comparative Analysis of Feature Extraction Techniques and Classifiers Inaccuracies for Bilingual Printed Documents (Gujarati-English)

Shailesh A. Chaudhari, Ravi M. Gulati Published in Information Sciences

IJAIS Proceedings on International Conference on Communication Computing and Virtualization
Year of Publication: 2016
© 2015 by IJAIS Journal
Download full text
  1. Shailesh A Chaudhari and Ravi M Gulati. Article: A Comparative Analysis of Feature Extraction Techniques and Classifiers Inaccuracies for Bilingual Printed Documents (Gujarati-English). IJAIS Proceedings on International Conference on Communication Computing and Virtualization ICCCV 2016(1):16-20, July 2016. BibTeX

    @article{key:article,
    	author = "Shailesh A. Chaudhari and Ravi M. Gulati",
    	title = "Article: A Comparative Analysis of Feature Extraction Techniques and Classifiers Inaccuracies for Bilingual Printed Documents (Gujarati-English)",
    	journal = "IJAIS Proceedings on International Conference on Communication Computing and Virtualization",
    	year = 2016,
    	volume = "ICCCV 2016",
    	number = 1,
    	pages = "16-20",
    	month = "July",
    	note = "Published by Foundation of Computer Science, New York, USA"
    }
    

Abstract

In a bilingual or multi-lingual optical character recognition system script identification is a challenging task. A remarkable research work on script identification have been noted in Indian or non-Indian context. As many commercial and official regional documents of different states of India are in bilingual containing one regional language of respective state and the other international intersperse language English. Therefore script identification is one of the primary tasks in multi-script document recognition. English words are mostly interspersed in regional documents of different states of India. In this paper script identification of Gujarati and English at word level is presented. For feature extraction two approach are used. In the first approach statistical features and in second approach the Gabor features of a word using Gabor filters with suitable frequencies and orientations are extracted. The proposed system uses two classifiers k-NN and SVM with different kernel functions used to classify the extracted features in one of the script. From the experiment it has been perceived that SVM outperform then k-NN.

Reference

  1. . Ghosh D. , Dube T. , Shivaprasad A. P. , Script Recognition A Review. IEEE, Transactions on Patter Analysis and Machine Intelligence 2010. vol. 32, no. 12, pp. 2142-2161.
  2. . Chaudhari S. , Gulati R. , A Survey on Script Identification in Multi-script Indian Documents. VNSGU journal of Science and Technology 2012. Vol 3, Issue 2, pp. 138-152.
  3. . Chaudhuri. B. B, Pal. U, An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi). In Proc. 4th ICDAR, Uhn. 1997.
  4. . Pal U. , Chaudhuri B. B. , Script Line Separation from Indian Multi-Script Documents. Proc. Int'l Conf. Document Analysis and Recognition. 1999. pp. 406-409.
  5. . Pal U. , Chaudhuri. B. B, Automatic identification of English, Chinese, Arabic, Devnagari and Bangla script line. Proc. 6th Intl. Conf: Document Analysis and Recognition (ICDAR'OI). 2001. pages 790-794.
  6. . Padma M. C. , Vijaya P. A. Global Approach for Script Identification using Wavelet Packet Based Features. International Journal of Signal Processing, Image Processing and Pattern Recognition. 2010. Vol. 3, No. 3.
  7. . Patil B. , Subbareddy N. V. Neural network based system for script identification in Indian documents. Sadhana 2002. Vol. 27, part-i1, pp 83-97.
  8. . Dhandra B. V. , Nagabhushan P. , Hangarge M. , Hegadi R. , Malemath V. S. , Script Identification Based on Morphological Reconstruction in Document Images. Proc. IEEE Int'l Conf. Pattern Recognition. 2006. vol. 2, pp. 950-953.
  9. . Vikram T. N. , Guru D. S. Appearance based models in document script identification. ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - 2007. Volume 02.
  10. . Dhanya. D, Ramakrishnan. A. G, Peeta B. P. Script Identification In Printed Bilingual Documents. Sadhana, 2002. Vol. 27, Part-1, Pp. 73-82.
  11. . Sukalpa C. , Pal S. , Katrin F. , Pal U. Two-stage Approach for Word-wise Script Identification. 10th International Conference on Document Analysis and Recognition. 2009.
  12. . Pal U. , Sinha S. , Chaudhuri B. B. Multi-Script Line Identification from Indian Documents. Proc. Int'l Conf. Document Analysis and Recognition. 2003. pp. 880-884.
  13. . Kunte R. S. , Sudhaker S. A Bilingual Machine-Interface OCR for Printed Kannada and English Text Employing Wavelet Features. 10th International Conference on Information Technology. 2007.
  14. . Aparna KG, Dhanya D. , Ramakrishnan AG, Bilingual (Tamil – Roman) Text Recognition on Windows, Tamil Internet. California, USA 2002.
  15. . Dhandra BV, Mallikarjun H. , Hegadi R. , Malemath VS Word–wise Script Identification based on Morphological Reconstruction in Printed Bilingual Documents. In the proc. of IET International Conference on Vision Information Engineering VIE, Bangalore 2006. pp. 389-393.
  16. . Dhandra BV, Mallikarjun H. On Separation of English Numerals from Multilingual Document Images, In the journal of multimedia 2007. Vol 2, No 6, pp. 26-33.
  17. . Cortes C, Vapnik VSupport vector network. Machine Learning. , 1995. 20:273–297.

Keywords

Gabor Filter, Support Vector Machine, Feature Extraction.