A Comparative Analysis of Feature Extraction Techniques and Classifiers Inaccuracies for Bilingual Printed Documents (Gujarati-English)

Call for Paper

September Edition

IJAIS solicits high quality original research papers for the upcoming September edition of the journal. The last date of research paper submission is 28 August 2026

Submit your paper

Know more

The week's pick

An Enhanced U-Net Architecture with Attention Gates and Atrous Spatial Pyramid Pooling for Building Segmentation in Aerial Imagery

Joseph Ngwa Elie Fute Tagne Nde Nguti

Random Articles

Content Authentication of English Text via Internet using Zero Watermarking Technique and Markov Model

April

2014

Design of SD/eMMC Protocol Compliance Solutions

August

2014

Facial Expression Recognition using Patch based Gabor Features

March

2016

Phishing Detection in E-mails using Machine Learning

October

2017

Reseach Article

A Comparative Analysis of Feature Extraction Techniques and Classifiers Inaccuracies for Bilingual Printed Documents (Gujarati-English)

Published on July 2016 by Shailesh A. Chaudhari, Ravi M. Gulati

International Conference on Communication Computing and Virtualization

Foundation of Computer Science USA

ICCCV2016 - Number 1

July 2016

Authors: Shailesh A. Chaudhari, Ravi M. Gulati

Shailesh A. Chaudhari, Ravi M. Gulati . A Comparative Analysis of Feature Extraction Techniques and Classifiers Inaccuracies for Bilingual Printed Documents (Gujarati-English). International Conference on Communication Computing and Virtualization. ICCCV2016, 1 (July 2016), 0-0.

@article{

author = { Shailesh A. Chaudhari, Ravi M. Gulati },

title = { A Comparative Analysis of Feature Extraction Techniques and Classifiers Inaccuracies for Bilingual Printed Documents (Gujarati-English) },

journal = { International Conference on Communication Computing and Virtualization },

issue_date = { July 2016 },

volume = { ICCCV2016 },

number = { 1 },

month = { July },

year = { 2016 },

issn = 2249-0868,

pages = { 0-0 },

numpages = 1,

url = { /proceedings/icccv2016/number1/914-1654/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 International Conference on Communication Computing and Virtualization

%A Shailesh A. Chaudhari

%A Ravi M. Gulati

%T A Comparative Analysis of Feature Extraction Techniques and Classifiers Inaccuracies for Bilingual Printed Documents (Gujarati-English)

%J International Conference on Communication Computing and Virtualization

%@ 2249-0868

%V ICCCV2016

%N 1

%P 0-0

%D 2016

%I International Journal of Applied Information Systems

Abstract

In a bilingual or multi-lingual optical character recognition system script identification is a challenging task. A remarkable research work on script identification have been noted in Indian or non-Indian context. As many commercial and official regional documents of different states of India are in bilingual containing one regional language of respective state and the other international intersperse language English. Therefore script identification is one of the primary tasks in multi-script document recognition. English words are mostly interspersed in regional documents of different states of India. In this paper script identification of Gujarati and English at word level is presented. For feature extraction two approach are used. In the first approach statistical features and in second approach the Gabor features of a word using Gabor filters with suitable frequencies and orientations are extracted. The proposed system uses two classifiers k-NN and SVM with different kernel functions used to classify the extracted features in one of the script. From the experiment it has been perceived that SVM outperform then k-NN.

References

. Ghosh D. , Dube T. , Shivaprasad A. P. , Script Recognition A Review. IEEE, Transactions on Patter Analysis and Machine Intelligence 2010. vol. 32, no. 12, pp. 2142-2161.
. Chaudhari S. , Gulati R. , A Survey on Script Identification in Multi-script Indian Documents. VNSGU journal of Science and Technology 2012. Vol 3, Issue 2, pp. 138-152.
. Chaudhuri. B. B, Pal. U, An OCR system to read two Indian language scripts: Bangla and Devnagari (Hindi). In Proc. 4th ICDAR, Uhn. 1997.
. Pal U. , Chaudhuri B. B. , Script Line Separation from Indian Multi-Script Documents. Proc. Int'l Conf. Document Analysis and Recognition. 1999. pp. 406-409.
. Pal U. , Chaudhuri. B. B, Automatic identification of English, Chinese, Arabic, Devnagari and Bangla script line. Proc. 6th Intl. Conf: Document Analysis and Recognition (ICDAR'OI). 2001. pages 790-794.
. Padma M. C. , Vijaya P. A. Global Approach for Script Identification using Wavelet Packet Based Features. International Journal of Signal Processing, Image Processing and Pattern Recognition. 2010. Vol. 3, No. 3.
. Patil B. , Subbareddy N. V. Neural network based system for script identification in Indian documents. Sadhana 2002. Vol. 27, part-i1, pp 83-97.
. Dhandra B. V. , Nagabhushan P. , Hangarge M. , Hegadi R. , Malemath V. S. , Script Identification Based on Morphological Reconstruction in Document Images. Proc. IEEE Int'l Conf. Pattern Recognition. 2006. vol. 2, pp. 950-953.
. Vikram T. N. , Guru D. S. Appearance based models in document script identification. ICDAR '07 Proceedings of the Ninth International Conference on Document Analysis and Recognition - 2007. Volume 02.
. Dhanya. D, Ramakrishnan. A. G, Peeta B. P. Script Identification In Printed Bilingual Documents. Sadhana, 2002. Vol. 27, Part-1, Pp. 73-82.
. Sukalpa C. , Pal S. , Katrin F. , Pal U. Two-stage Approach for Word-wise Script Identification. 10th International Conference on Document Analysis and Recognition. 2009.
. Pal U. , Sinha S. , Chaudhuri B. B. Multi-Script Line Identification from Indian Documents. Proc. Int'l Conf. Document Analysis and Recognition. 2003. pp. 880-884.
. Kunte R. S. , Sudhaker S. A Bilingual Machine-Interface OCR for Printed Kannada and English Text Employing Wavelet Features. 10th International Conference on Information Technology. 2007.
. Aparna KG, Dhanya D. , Ramakrishnan AG, Bilingual (Tamil – Roman) Text Recognition on Windows, Tamil Internet. California, USA 2002.
. Dhandra BV, Mallikarjun H. , Hegadi R. , Malemath VS Word–wise Script Identification based on Morphological Reconstruction in Printed Bilingual Documents. In the proc. of IET International Conference on Vision Information Engineering VIE, Bangalore 2006. pp. 389-393.
. Dhandra BV, Mallikarjun H. On Separation of English Numerals from Multilingual Document Images, In the journal of multimedia 2007. Vol 2, No 6, pp. 26-33.
. Cortes C, Vapnik VSupport vector network. Machine Learning. , 1995. 20:273–297.

Index Terms

Computer Science

Information Sciences

Keywords

Gabor Filter Support Vector Machine Feature Extraction.