Big BiRD: A Large, Fine-Grained, Bigram Relatedness Dataset for Examining Semantic Composition

Aus International Center for Computational Logic
Wechseln zu:Navigation, Suche

Toggle side column

Big BiRD: A Large, Fine-Grained, Bigram Relatedness Dataset for Examining Semantic Composition

Shima AsaadiShima Asaadi,  Saif M. MohammadSaif M. Mohammad,  Svetlana KiritchenkoSvetlana Kiritchenko
Shima Asaadi, Saif M. Mohammad, Svetlana Kiritchenko
Big BiRD: A Large, Fine-Grained, Bigram Relatedness Dataset for Examining Semantic Composition
Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL), June 2019
  • KurzfassungAbstract
    Bigrams (two-word sequences) hold a special place in semantic composition research since they are the smallest unit formed by composing words. A semantic relatedness dataset that includes bigrams will thus be useful in the development of automatic methods of semantic composition. However, existing relatedness datasets only include pairs of unigrams (single words). Further, existing datasets were created using rating scales and thus suffer from limitations such as inconsistent annotations and scale region bias. In this paper, we describe how we created a large, fine-grained, bigram relatedness dataset (BiRD), using a comparative annotation technique called Best–Worst Scaling. Each of BiRD’s 3,345 English term pairs involves at least one bigram. We show that the relatedness scores obtained are highly reliable (split-half reliability r = 0.937). We analyze the data to obtain insights into bigram semantic relatedness. Finally, we present benchmark experiments on using the relatedness dataset as a testbed to evaluate simple unsupervised measures of semantic composition. BiRD is made freely available to foster further research on how meaning can be represented and how meaning can be composed.
  • Weitere Informationen unter:Further Information: LinkLink
  • Forschungsgruppe:Research Group: Computational LogicComputational Logic
@inproceedings{bird-naacl2019,
  	title={Big BiRD: A Large, Fine-Grained, Bigram Relatedness Dataset for Examining Semantic Composition},
  	author={Asaadi, Shima and Mohammad, Saif M. and Kiritchenko, Svetlana},
	booktitle={Proceedings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL)}, 
	year={2019},
	address={Minneapolis, USA}    
}