Intelligent text classification system based on self-administered ontology

Over the last couple of decades, web classification has gradually transitioned from a syntax- to semantic-centered approach that classifies the text based on domain ontologies. These ontologies are either built manually or populated automatically using machine learning techniques. A prerequisite condition to build such systems is the availability of ontology, which may be either full-fledged domain ontology or a seed ontology that can be enriched automatically. This is a dependency condition for any given semantics-based text classification system. We share the details of a proof of concept of a web classification system that is self-governed in terms of ontology population and does not require any prebuilt ontology, neither full-fledged nor seed. It starts from a user query, builds a seed ontology from it, and automatically enriches it by extracting concepts from the downloaded documents only. The evaluated parameters like precision (85{\%}), accuracy (86{\%}), AUC (convex), and MCC (high positive) demonstrate the better performance of the proposed system when compared with similar automated text classification systems.

Intelligent text classification system based on self-administered ontology

Over the last couple of decades, web classification has gradually transitioned from a syntax- to semantic-centered approach that classifies the text based on domain ontologies. These ontologies are either built manually or populated automatically using machine learning techniques. A prerequisite condition to build such systems is the availability of ontology, which may be either full-fledged domain ontology or a seed ontology that can be enriched automatically. This is a dependency condition for any given semantics-based text classification system. We share the details of a proof of concept of a web classification system that is self-governed in terms of ontology population and does not require any prebuilt ontology, neither full-fledged nor seed. It starts from a user query, builds a seed ontology from it, and automatically enriches it by extracting concepts from the downloaded documents only. The evaluated parameters like precision (85{\%}), accuracy (86{\%}), AUC (convex), and MCC (high positive) demonstrate the better performance of the proposed system when compared with similar automated text classification systems.

___

  • area, our framework is independent of any specific domain area and purely focused towards user query, picking up the domain at run time. Prefixing of the domain also requires [12] to generate seed ontology first and then proceed towards the enrichment process. Our framework does not have such dependencies, which results in making it a ‘self-governed’ learning system.
  • Conclusion and future work
  • Christopher CS, Tylman J. Enterprise information portals. Electron Libr 1998; 18: 354–362.
  • Sheth A. Computing for human experience: semantics-empowered sensors, services, and social computing on the ubiquitous Web. IEEE Internet Comput 2010; 14: 88–91.
  • Sebastiani F. Machine learning in automated text categorization. Comput Surv 2002; 34: 1–47.
  • Gupta V, Lehal G. A survey of text mining techniques and applications. Journal of Emerging Technologies in Web Intelligence 2009; 1: 60–76.
  • Navigli R, Faralli S, Soroa A, de Lacalle OL, Agirre E. Two birds with one stone: learning semantic models for text categorization and word sense disambiguation. In: 20th ACM Conference on Information and Knowledge Management; 24–28 October 2011; Glasgow, UK. New York, NY, USA: ACM. pp. 2317–2320.
  • Bizer C, Heath T, Berners-Lee T. Linked data—the story so far. Int J Semant Web Inf 2009; 5: 1–22.
  • Heath T, Bizer C. Linked Data: Evolving the Web into a Global Data Space. San Rafael, CA USA: Morgan & Claypool Publishers, 2011.
  • Buitelaar P, Cimiano P, Magnini B. Ontology Learning from Text: Methods, Evaluation and Applications. Philadel- phia, PA, USA: IOS Press, 2005.
  • Maedche A, Staab S. Ontology learning for the semantic web. IEEE Intell Syst 2001; 16: 72–79.
  • Navigli R, Velardi P, Gangemi A. Ontology learning and its application to automated terminology translation. IEEE Intell Syst 2003; 18: 22–31.
  • Wei GY, Wu GX, Gu YY, Ling Y. An ontology based approach for chinese web texts classification. Inform Technol J 2008; 7: 796–801.
  • Luong HP, Gauch S, Wang Q. Ontology learning through focused crawling and information extraction. In: Interna- tional Conference on Knowledge and Systems Engineering; 13–17 October 2009; Hanoi, Vietnam. New York, NY, USA: IEEE. pp. 106–112.
  • Brank J, Mladenic D, Grobelnik M. Large-scale hierarchical text classification using SVM and coding matrices. In: Large-Scale Hierarchical Classification Workshop of the ECIR 2010; 28–31 March 2010; Milton Keynes, UK.
  • Speretta M, Gauch S. Using text mining to enrich the vocabulary of domain ontologies. In: IEEE International Conference on Web Intelligence and Intelligent Agent Technology; 9–12 December 2008; Sydney, Australia. New York, NY, USA: IEEE. pp. 549–552.
  • Salton G, Buckley C. Term-weighting approaches in automatic text retrieval. Inform Process Manag 1998; 24: 513–523.
  • Abe S. Support vector machines for pattern classification. New York, NY, USA: Springer-Verlag, 2005.
  • RDF Working Group. Resource Description Framework (RDF). W3C–Semantic Web, 2004.
  • Chang C, Lin C. LIBSVM: A Library for Support Vector Machines. Software, 2001.
  • Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. In: International Joint Conference on Artificial Intelligence; 1995; Montreal, Canada. pp. 1137–1145.
  • Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 1975; 405: 442–451.
  • Lewis D. The Reuters-21578 Text Categorization Test Collection, 1997.
  • Lang, K. NewsWeeder: Learning to filter netnews. In: 12th International Conference on Machine Learning; 9–12 July 1995; Lake Tahoe, CA, USA. Washington, DC, USA: IMLS. pp. 331–339.
  • Craven M, DiPasquo D, Freitag D, McCallum A, Mitchell T, Nigam K, Slattery S. Learning to construct knowledge bases from the World Wide Web. Artif Intell 2000; 118: 69–114.
  • Siolas G, d’Alche Buc F. Support vector machines based on semantic kernel for text categorization. In: International Joint Conference on Neural Networks; 27 July 2000; Como, Italy. New York, NY, USA: IEEE. pp. 205–209.
  • Cristianini N, Shawe-Taylor J, Lodhi H. Latent semantic kernels. J Intell Inf Syst 2002; 18: 127–152.
  • Wang P, Domeniconi C. Building semantic kernels for text classification using Wikipedia. In: 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 24–27 August 2008; Las Vegas, NV, USA. New York, NY, USA: ACM Press. pp. 713–721.
Turkish Journal of Electrical Engineering and Computer Science-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: Yılda 6 Sayı
  • Yayıncı: TÜBİTAK
Sayıdaki Diğer Makaleler

Optimal IPFC damping controller design based on simplex method and self-tuned fuzzy damping scheme in a two-area multimachine power system

AHMET METE VURAL, KAMİL ÇAĞATAY BAYINDIR

Semiconductor laser beam bending

REMZİ YILDIRIM, FATİH VEHBİ ÇELEBİ

OFDMA-based multicast with multiple base stations

AHMET CİHAT KAZEZ, TOLGA GİRİCİ

3D imaging for ground-penetrating radars via dictionarydimension reduction

MUHAMMED DUMAN, ALİ CAFER GÜRBÜZ

PRESCIENT: A predictive channel access schemefor IEEE 802.15.4-compliant devices considering IEEE 802.11 coexistence

TOLGA ÇÖPLÜ, SEMA FATMA OKTUĞ

Automatic classification of harmonic data using $k$-means and least square support vector machine

HÜSEYİN ERİŞTİ, VEDAT TÜMEN, ÖZAL YILDIRIM, BELKIS ERİŞTİ, YAKUP DEMİR

Study of MAC routing in the BPLC P1901 access network: fixed vs. adaptive approach

ASMIR GOGIC, ALJO MUJCIC, İSMAİL HAKKI ÇAVDAR, MATEJ ZAJC, NERMIN SULJANOVIC

A new bitwise voting strategy for safety-critical systems with binary decisions

MUSTAFA SEÇKİN DURMUŞ, OYTUN ERİŞ, UĞUR YILDIRIM, MEHMET TURAN SÖYLEMEZ

Solution of TiO$_{2}$ memristor-capacitor series circuit excited by a constant voltage source and its application to calculate operation frequency of a programmable TiO$_{2}$ memristor-capacitor relaxation oscillator

REŞAT MUTLU

Exact solution of conducting half plane problems in terms of a rapidly convergent series and an application of the multiplicative calculus

ALİ UZER