Selective word encoding for effective text representation

Selective word encoding for effective text representation

Determining the category of a text document from its semantic content is highly motivated in the literatureand it has been extensively studied in various applications. Also, the compact representation of the text is a fundamental step in achieving precise results for the applications and the studies are generously concentrated to improve itsperformance. In particular, the studies which exploit the aggregation of word-level representations are the mainstreamtechniques used in the problem. In this paper, we tackle text representation to achieve high performance in differenttext classification tasks. Throughout the paper, three critical contributions are presented. First, to encode the wordlevel representations for each text, we adapt a trainable orderless aggregation algorithm to obtain a more discriminativeabstract representation by transforming word vectors to the text-level representation. Second, we propose an effectiveterm-weighting scheme to compute the relative importance of words from the context based on their conjunction with theproblem in an end-to-end learning manner. Third, we present a weighted loss function to mitigate the class-imbalanceproblem between the categories. To evaluate the performance, we collect two distinct datasets as Turkish parliamentrecords (i.e. written speeches of four major political parties including 30731/7683 train and test documents) and newspaper articles (i.e. daily articles of the columnists including 16000/3200 train and test documents) whose data is availableon the web. From the results, the proposed method introduces significant performance improvements to the baselinetechniques (i.e. VLAD and Fisher Vector) and achieves 0.823% and 0.878% true prediction accuracies for the partymembership and the estimation of the category of articles respectively. The performance validates that the proposed contributions (i.e. trainable word-encoding model, trainable term-weighting scheme and weighted loss function) significantlyoutperform the baselines.

___

  • [1] Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J. Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems Conference, 2013. pp. 3111-3119.
  • [2] Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems Conference, 2012. pp. 1097-1105.
  • [3] Cho K, Merriënboer BV, Gulcehre C, Bahdanau D, Bougares F, Schwenk H, Bengio Y. Learning phrase representations using rnn encoder-decoder for statistical machine translation. arXiv 1406.1078, 2014.
  • [4] Pennington J, Socher R, Manning C. Glove: global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2014. pp. 1532-1543.
  • [5] Yu LC, Wang J, Lai KR, Zhang X. Refining word embeddings for sentiment analysis. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, 2017. pp. 534-539.
  • [6] Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. arXiv 1301.3781, 2013.
  • [7] Bakarov A, Gureenkova O. Automated detection of non-relevant posts on the russian imageboard 2ch: importance of the choice of word representations. In: International Conference on Analysis of Images, Social Networks and Texts, 2017. pp. 16-21.
  • [8] Lin CC, Ammar W, Dyer C, Levi L. Unsupervised pos induction with word embeddings. arXiv preprint arXiv:1503.06760, 2015.
  • [9] Rossiello G, Basile P, Semeraro G. Centroid-based text summarization through compositionality of word embeddings. In: Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres; 2017. pp. 12-21.
  • [10] Kalender M, Korkmaz EE. Turkish entity discovery with word embeddings. Turkish Journal of Electrical Engineering and Computer Sciences 2017; 25: 2388-2398.
  • [11] Akın AA, Akın MD. Zemberek, an open source NLP framework for Turkic languages. Structure 2007; 10: 1-5.
  • [12] Babenko A, Lempitsky V. Aggregating local deep features for image retrieval. In: Proceedings of the IEEE International Conference on Computer Vision; 2015. pp. 1269-1277.
  • [13] Perronnin F, Sánchez J, Mensink T. Improving the fisher kernel for large-scale image classification. In: European Conference on Computer Vision; Berlin, Heidelberg; 2010. pp. 143-156.
  • [14] Koç A, Utlu I, Senel LK, Ozaktas HM. Imparting interpretability to word embeddings. arXiv 1807.07279, 2018.
  • [15] Young T, Hazarika D, Poria S, Cambria E. Recent trends in deep learning based natural language processing. IEEE Computational Intelligence Magazine 2018; 13 (3): 55-75.
  • [16] Arandjelovic R, Gronat P, Torii A, Pajdla T, Sivic J. NetVLAD: CNN architecture for weakly supervised place recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. pp. 5297-5307.
  • [17] Jégou H, Douze M, Schmid C, and Pérez P. Aggregating local descriptors into a compact image representation. In: Computer Vision and Pattern Recognition (CVPR), 2010. pp. 3304-3311.
  • [18] Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R. Indexing by latent semantic analysis. Journal of the American Society for Information Science 1990; 41(6): 391-407.
  • [19] Le Q, Mikolov T. Distributed representations of sentences and documents. In: International Conference on Machine Learning, 2014. pp. 1188-1196.
  • [20] Guo W, Diab M. Modeling sentences in the latent space. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics 2012. pp. 864-872.
  • [21] Islam A, Inkpen D. Semantic text similarity using corpus-based word similarity and string similarity. ACM Transactions on Knowledge Discovery from Data (TKDD) 2008; 2 (2): 10.
  • [22] Clinchant S, Perronnin F. Aggregating continuous word embeddings for information retrieval. In: Proceedings of the Workshop on Continuous Vector Space Models and Their Compositionality, 2013. pp. 100-109.
  • [23] Boom CD, Canneyt SV, Demeester T, Dhoedt B. Representation learning for very short texts using weighted word embedding aggregation. Pattern Recognition Letters 2016; 80: 150-156.
  • [24] Kenter T, Rijke MD. Short text similarity with word embeddings. In: Proceedings of the 24th ACM international on Conference on Information and Knowledge Management, 2015. pp. 1411-1420.
  • [25] Norouzi M, Blei DM. Minimal loss hashing for compact binary codes. In: Proceedings of the 28th International Conference on Machine Learning, 2011. pp. 353-360.
  • [26] He H, Gimpel K, Lin J. Multi-perspective sentence similarity modeling with convolutional neural networks. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015. pp. 1576-1586.
  • [27] Hu B, Lu Z, Li H, Chen Q. Convolutional neural network architectures for matching natural language sentences. In: Advances in Neural Information Processing Systems Conference, 2014. pp. 2042-2050.
  • [28] Kim Y. Convolutional neural networks for sentence classification. arXiv 1408.5882, 2014.
  • [29] Kiros R, Zhu Y, Salakhutdinov RR, Zemel R, Urtasun R, Torralba A, Fidler S. Skip-thought vectors. In: Advances in Neural Information Processing Systems Conference, 2014. pp. 3294-3302.
  • [30] Esen E, Özkan S. Analysis of turkish parliament records in terms of party coherence. In: Signal Processing and Communications Applications Conference, 2017. pp. 1-4.
  • [31] Zheng G, Callan J. Learning to reweight terms with distributed representations. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015. pp. 575-584.
  • [32] Ji Y, Eisenstein J. Discriminative improvements to distributional sentence similarity. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, 2013. pp. 891-896.
  • [33] Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z. Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016. pp. 2818-2826.
  • [34] Ba JR, Kiros JR, Hinton GE. Layer normalization. arXiv 1607.06450, 2016.
  • [35] Kingma DP and Ba J. Adam: a method for stochastic optimization. arXiv 1412.6980, 2014.