- Description:
 
The shared task of CoNLL-2003 concerns language-independent named entity recognition and concentrates on four types of named entities: persons, locations, organizations and names of miscellaneous entities that do not belong to the previous three groups.
Source code:
tfds.text.conll2003.Conll2003Versions:
1.0.0(default): Initial release.
Download size:
959.94 KiBDataset size:
3.87 MiBAuto-cached (documentation): Yes
Splits:
| Split | Examples | 
|---|---|
'dev' | 
3,251 | 
'test' | 
3,454 | 
'train' | 
14,042 | 
- Feature structure:
 
FeaturesDict({
    'chunks': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=23)),
    'ner': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=9)),
    'pos': Sequence(ClassLabel(shape=(), dtype=int64, num_classes=47)),
    'tokens': Sequence(Text(shape=(), dtype=string)),
})
- Feature documentation:
 
| Feature | Class | Shape | Dtype | Description | 
|---|---|---|---|---|
| FeaturesDict | ||||
| chunks | Sequence(ClassLabel) | (None,) | int64 | |
| ner | Sequence(ClassLabel) | (None,) | int64 | |
| pos | Sequence(ClassLabel) | (None,) | int64 | |
| tokens | Sequence(Text) | (None,) | string | 
Supervised keys (See
as_superviseddoc):NoneFigure (tfds.show_examples): Not supported.
Examples (tfds.as_dataframe):
- Citation:
 
@inproceedings{tjong-kim-sang-de-meulder-2003-introduction,
    title = "Introduction to the {C}o{NLL}-2003 Shared Task: Language-Independent Named Entity Recognition",
    author = "Tjong Kim Sang, Erik F.  and
      De Meulder, Fien",
    booktitle = "Proceedings of the Seventh Conference on Natural Language Learning at {HLT}-{NAACL} 2003",
    year = "2003",
    url = "https://www.aclweb.org/anthology/W03-0419",
    pages = "142--147",
}