MANOJ MANUJA, DEEPAK GARG

Intelligent text classification system based on self-administered ontology

Over the last couple of decades, web classification has gradually transitioned from a syntax- to semantic-centered approach that classifies the text based on domain ontologies. These ontologies are either built manually or populated automatically using machine learning techniques. A prerequisite condition to build such systems is the availability of ontology, which may be either full-fledged domain ontology or a seed ontology that can be enriched automatically. This is a dependency condition for any given semantics-based text classification system. We share the details of a proof of concept of a web classification system that is self-governed in terms of ontology population and does not require any prebuilt ontology, neither full-fledged nor seed. It starts from a user query, builds a seed ontology from it, and automatically enriches it by extracting concepts from the downloaded documents only. The evaluated parameters like precision (85{\%}), accuracy (86{\%}), AUC (convex), and MCC (high positive) demonstrate the better performance of the proposed system when compared with similar automated text classification systems.

Anahtar Kelimeler:

Ontology, support vector machine, resource description framework, text classification

Intelligent text classification system based on self-administered ontology

Keywords:

Ontology, support vector machine, resource description framework, text classification,

PDF

___

area, our framework is independent of any specific domain area and purely focused towards user query, picking up the domain at run time. Prefixing of the domain also requires [12] to generate seed ontology first and then proceed towards the enrichment process. Our framework does not have such dependencies, which results in making it a ‘self-governed’ learning system.
Conclusion and future work
Christopher CS, Tylman J. Enterprise information portals. Electron Libr 1998; 18: 354–362.
Sheth A. Computing for human experience: semantics-empowered sensors, services, and social computing on the ubiquitous Web. IEEE Internet Comput 2010; 14: 88–91.
Sebastiani F. Machine learning in automated text categorization. Comput Surv 2002; 34: 1–47.
Gupta V, Lehal G. A survey of text mining techniques and applications. Journal of Emerging Technologies in Web Intelligence 2009; 1: 60–76.
Navigli R, Faralli S, Soroa A, de Lacalle OL, Agirre E. Two birds with one stone: learning semantic models for text categorization and word sense disambiguation. In: 20th ACM Conference on Information and Knowledge Management; 24–28 October 2011; Glasgow, UK. New York, NY, USA: ACM. pp. 2317–2320.
Bizer C, Heath T, Berners-Lee T. Linked data—the story so far. Int J Semant Web Inf 2009; 5: 1–22.
Heath T, Bizer C. Linked Data: Evolving the Web into a Global Data Space. San Rafael, CA USA: Morgan & Claypool Publishers, 2011.
Buitelaar P, Cimiano P, Magnini B. Ontology Learning from Text: Methods, Evaluation and Applications. Philadel- phia, PA, USA: IOS Press, 2005.
Maedche A, Staab S. Ontology learning for the semantic web. IEEE Intell Syst 2001; 16: 72–79.
Navigli R, Velardi P, Gangemi A. Ontology learning and its application to automated terminology translation. IEEE Intell Syst 2003; 18: 22–31.
Wei GY, Wu GX, Gu YY, Ling Y. An ontology based approach for chinese web texts classification. Inform Technol J 2008; 7: 796–801.
Luong HP, Gauch S, Wang Q. Ontology learning through focused crawling and information extraction. In: Interna- tional Conference on Knowledge and Systems Engineering; 13–17 October 2009; Hanoi, Vietnam. New York, NY, USA: IEEE. pp. 106–112.
Brank J, Mladenic D, Grobelnik M. Large-scale hierarchical text classification using SVM and coding matrices. In: Large-Scale Hierarchical Classification Workshop of the ECIR 2010; 28–31 March 2010; Milton Keynes, UK.
Speretta M, Gauch S. Using text mining to enrich the vocabulary of domain ontologies. In: IEEE International Conference on Web Intelligence and Intelligent Agent Technology; 9–12 December 2008; Sydney, Australia. New York, NY, USA: IEEE. pp. 549–552.
Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Inform Process Manag 1998; 24: 513–523.
Abe S. Support vector machines for pattern classification. New York, NY, USA: Springer-Verlag, 2005.
RDF Working Group. Resource Description Framework (RDF). W3C–Semantic Web, 2004.
Chang C, Lin C. LIBSVM: A Library for Support Vector Machines. Software, 2001.
Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International Joint Conference on Artificial Intelligence; 1995; Montreal, Canada. pp. 1137–1145.
Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 1975; 405: 442–451.
Lewis D. The Reuters-21578 Text Categorization Test Collection, 1997.
Lang, K. NewsWeeder: Learning to filter netnews. In: 12th International Conference on Machine Learning; 9–12 July 1995; Lake Tahoe, CA, USA. Washington, DC, USA: IMLS. pp. 331–339.
Craven M, DiPasquo D, Freitag D, McCallum A, Mitchell T, Nigam K, Slattery S. Learning to construct knowledge bases from the World Wide Web. Artif Intell 2000; 118: 69–114.
Siolas G, d’Alche Buc F. Support vector machines based on semantic kernel for text categorization. In: International Joint Conference on Neural Networks; 27 July 2000; Como, Italy. New York, NY, USA: IEEE. pp. 205–209.
Cristianini N, Shawe-Taylor J, Lodhi H. Latent semantic kernels. J Intell Inf Syst 2002; 18: 127–152.
Wang P, Domeniconi C. Building semantic kernels for text classification using Wikipedia. In: 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 24–27 August 2008; Las Vegas, NV, USA. New York, NY, USA: ACM Press. pp. 713–721.