Turkish entity discovery with word embeddings

Turkish entity discovery with word embeddings

Entity-linking systems link noun phrase mentions in a text to their corresponding knowledge base entities in order to enrich a text with metadata. Wikipedia is a popular and comprehensive knowledge base that is widely used in entity-linking systems. However, long-tail entities are not popular enough to have their own Wikipedia articles. Therefore, a knowledge base created by using Wikipedia entities would be limited to only popular entities. In order to overcome the knowledge base coverage limitation of Wikipedia-based entity-linking systems, this paper presents an entity-discovery system that can detect semantic types of entities that are not defined in Wikipedia. The effectiveness of the proposed system was validated empirically through the use of generated data sets for the Turkish language. The experimental results show that, in terms of accuracy, our system performs competitively in comparison to the previous methods in the literature. Its high performance is achieved through a method that learns word embeddings for candidate entities

___

  • [1] Shen W, Wang J, Han J. Entity linking with a knowledge base: issues, techniques, and solutions. IEEE T Knowl Data En 2015; 27: 443-460.
  • [2] Nakashole N, Tylenda T, Weikum G. Fine-grained semantic typing of emerging entities. In: ACL 2013 51st Annual Meeting of the Association for Computational Linguistics; 4–9 August 2013; Sofia, Bulgaria. pp. 1488-1497.
  • [3] Ling X, Weld DS. Fine-grained entity recognition. In: 26th AAAI Conference on Artificial Intelligence; 22–26 July 2012; Toronto, Canada. Palo Alto, CA, USA: AAAI Press. pp. 94-100.
  • [4] Xing C, Wang D, Zhang X, Liu C. Document classification with distributions of word vectors. In: APSIPA 2014 Asia-Pacific Signal and Information Processing Association Conference; 9–12 December 2014; Siem Reap, Cambodia. New York, NY, USA: IEEE. pp. 1-5.
  • [5] Luong T, Socher R, Manning CD. Better word representations with recursive neural networks for morphology. In: CoNLL 2013 Computational Natural Language Learning Conference; Sofia, Bulgaria. pp. 104-113.
  • [6] Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. In: ICLR 2013 International Conference on Learning Representations; 2–4 May 2013; Scottsdale, AZ, USA.
  • [7] Pennington J, Socher R, Manning CD. Glove: global vectors for word representation. In: EMNLP 2014 Empirical Methods in Natural Language Processing Conference; 25–29 October; Doha, Qatar. pp. 1532-1543.
  • [8] Nadeau D, Sekine S. A survey of named entity recognition and classification. Linguisticae Investigationes 2007; 30: 3-26.
  • [9] Lin T, Mausam, Etzioni O. No noun phrase left behind: detecting and typing unlinkable entities. In: EMNLPCoNLL 2012 Empirical Methods in Natural Language Processing and Computational Natural Language Learning Conference; 12–14 July 2012; Stroudsburg, PA, USA. pp. 893-903.
  • [10] Rahman A, Ng V. Inducing fine-grained semantic classes via hierarchical and collective classification. In: COLING 2010 23rd International Conference on Computational Linguistics; 23–27 August 2010; Stroudsburg, PA, USA. pp. 931-939.
  • [11] Yosef MA, Bauer S, Hoffart J, Spaniol M, Weikum G. Hyena: hierarchical type classification for entity names. In: COLING 2012 24th International Conference on Computational Linguistics; 8–15 December 2012; Mumbai, India. p. 1361.
  • [12] Desmet B, Hoste V. Fine-grained Dutch named entity recognition. Lang Resour Eval 2014; 48: 307-343.
  • [13] Bollacker K, Evans C, Paritosh P, Sturge T, Taylor J. Freebase: a collaboratively created graph database for structuring human knowledge. In: ACM SIGMOD 2008 International Conference on Management of Data; 9–12 June 2008; Vancouver, Canada. New York, NY, USA: ACM. pp. 1247-1250.
  • [14] Hoffart J, Suchanek FM, Berberich K, Weikum G. Yago2: a spatially and temporally enhanced knowledge base from Wikipedia. Artif Intell 2013; 194: 28-61.
  • [15] Yogatama D, Gillick D, Lazic N. Embedding methods for fine grained entity type classification. In: ACL-IJCNLP 2015 53rd Annual Meeting of the Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing; 26–31 July 2015; Beijing, China. pp. 291-296.
  • [16] Seker GA, Eryigit G. Initial explorations on using CRFs for Turkish named entity recognition. In: COLING 2012 24th International Conference on Computational Linguistics; 8–15 December 2012; Mumbai, India. pp. 2459-2474.
  • [17] Tatar S, Cicekli I. Automatic rule learning exploiting morphological features for named entity recognition in Turkish. J Inf Sci 2011; 37: 137-151.
  • [18] Akin AA, Akin MD. Zemberek, an open source NLP framework for Turkic languages. Structure 2007; 10: 1-5.
  • [19] Eryigit G. ITU Turkish NLP Web Service. In: EACL 2014 14th Conference of the European Chapter of the Association for Computational Linguistics; 26–30 April 2014; Gothenburg, Sweden. pp. 1-4.
  • [20] Can F, Kocberber S, Balcik E, Kaynak C, Ocalan HC, Vursavas OM. Information retrieval on Turkish texts. J Am Soc Inf Sci Technol 2008; 59: 407-421.
  • [21] Sekine S. Extended named entity ontology with attribute information. In: LREC 2008 6th International Conference on Language Resources and Evaluation; 28–30 May 2008; Marrakech, Morocco. pp. 52-57.
  • [22] Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ. Liblinear: a library for large linear classification. J Mach Learn 2008; 9: 1871-1874.
  • [23] Heaton J. Encog: library of interchangeable machine learning models for Java and C#. J Mach Learn Res 2015; 16: 1243-1247.
Turkish Journal of Electrical Engineering and Computer Sciences-Cover
  • ISSN: 1300-0632
  • Yayın Aralığı: Yılda 6 Sayı
  • Yayıncı: TÜBİTAK
Sayıdaki Diğer Makaleler

A ternary zero-correlation zone sequence sets construction procedure

Mouad ADDAD, Ali DJEBBARI

A CNFET full adder cell design for high-speed arithmetic units

Behnam GHAVAMI, Mokhtar GHANATGHESTANI MOHAMMADI, Honeya SALEHPOUR

An area-efficient and wide-range digital DLL for per-pin deskew applications

Ching-Che CHUNG, Chien-Ying YU

Significant insights into the operation of DC-link voltage control of a shunt active power filter using different control algorithms: a comparative study

Nasrudin RAHIM ABD, Nor Farahaida RAHMAN ABDUL, Mohd Amran RADZI MOHD, Azura SOH CHE, Norman MARIUN

A neural network approach to navigation of a mobile robot and obstacle avoidance in dynamic and unknown environments

Farhad SHAMSFAKHR, Bahram SADEGHIBIGHAM

Reconstruction of a single square pulse originally having 40 ps width coming from a lossy and noisy channel in a point to point interconnect

Alak MAJUMDER, Bidyut BHATTACHARYYA

Comparative study of conventional modulation schemes in terms of conducted and radiated EMI generated by three-phase inverters

Mahmoud HAMOUDA, Mohamed SALEM, Jaleleddine SLAMA HADJ BEN

A steganographic approach to hide secret data in digital audio based on XOR operands triplet property with high embedding rate and good quality audio

Krishna BHOWAL, Debasree SARKAR, Susanta BISWAS, Partha Pratim SARKAR

Edge distance graph kernel and its application to small molecule classification

Mehmet TAN

Turkish entity discovery with word embeddings

Emin Erkan KORKMAZ, Murat KALENDER