How Bad are PoS Tagger in Cross-Corpora Settings? Evaluating Annotation Divergence in the UD Project

Abstract : The performance of Part-of-Speech tagging varies significantly across the treebanks of the Universal Dependencies project. This work points out that these variations may result from divergences between the annotation of train and test sets. We show how the annotation variation principle, introduced by Dickinson and Meurers (2003) to automatically detect errors in gold standard, can be used to identify inconsistencies between annotations; we also evaluate their impact on prediction performance.
Type de document :
Communication dans un congrès
Liste complète des métadonnées

Littérature citée [23 références]  Voir  Masquer  Télécharger

https://hal.archives-ouvertes.fr/hal-02055137
Contributeur : Guillaume Wisniewski <>
Soumis le : vendredi 14 juin 2019 - 10:35:13
Dernière modification le : jeudi 20 juin 2019 - 14:14:08

Fichier

N19-1019.pdf
Fichiers éditeurs autorisés sur une archive ouverte

Identifiants

  • HAL Id : hal-02055137, version 1

Citation

Guillaume Wisniewski, François Yvon. How Bad are PoS Tagger in Cross-Corpora Settings? Evaluating Annotation Divergence in the UD Project. 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Jun 2019, Minneapolis, Minnesota, United States. pp.218 - 227. ⟨hal-02055137⟩

Partager

Métriques

Consultations de la notice

87

Téléchargements de fichiers

10