turk_ner

مراجع:

برای بارگذاری این مجموعه داده در TFDS از دستور زیر استفاده کنید:

ds = tfds.load('huggingface:turkish_ner')
  • توضیحات :
Turkish Wikipedia Named-Entity Recognition and Text Categorization
(TWNERTC) dataset is a collection of automatically categorized and annotated
sentences obtained from Wikipedia. The authors constructed large-scale
gazetteers by using a graph crawler algorithm to extract
relevant entity and domain information
from a semantic knowledge base, Freebase.
The constructed gazetteers contains approximately
300K entities with thousands of fine-grained entity types
under 77 different domains.
  • مجوز : مجوز شناخته شده ای وجود ندارد
  • نسخه : 0.0.0
  • تقسیمات :
تقسیم کنید نمونه ها
'train' 532629
  • ویژگی ها :
{
    "id": {
        "dtype": "string",
        "id": null,
        "_type": "Value"
    },
    "tokens": {
        "feature": {
            "dtype": "string",
            "id": null,
            "_type": "Value"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    },
    "domain": {
        "num_classes": 25,
        "names": [
            "architecture",
            "basketball",
            "book",
            "business",
            "education",
            "fictional_universe",
            "film",
            "food",
            "geography",
            "government",
            "law",
            "location",
            "military",
            "music",
            "opera",
            "organization",
            "people",
            "religion",
            "royalty",
            "soccer",
            "sports",
            "theater",
            "time",
            "travel",
            "tv"
        ],
        "names_file": null,
        "id": null,
        "_type": "ClassLabel"
    },
    "ner_tags": {
        "feature": {
            "num_classes": 9,
            "names": [
                "O",
                "B-PERSON",
                "I-PERSON",
                "B-ORGANIZATION",
                "I-ORGANIZATION",
                "B-LOCATION",
                "I-LOCATION",
                "B-MISC",
                "I-MISC"
            ],
            "names_file": null,
            "id": null,
            "_type": "ClassLabel"
        },
        "length": -1,
        "id": null,
        "_type": "Sequence"
    }
}