- Description:
Data sets derived from TED talk transcripts for comparing similar language pairs where one is high resource and the other is low resource.
Source code:
tfds.datasets.ted_hrlr_translate.BuilderVersions:
1.0.0(default): New split API (https://tensorflow.org/datasets/splits)
Download size:
124.94 MiBAuto-cached (documentation): Yes
Figure (tfds.show_examples): Not supported.
Citation:
@inproceedings{Ye2018WordEmbeddings,
author = {Ye, Qi and Devendra, Sachan and Matthieu, Felix and Sarguna, Padmanabhan and Graham, Neubig},
title = {When and Why are pre-trained word embeddings useful for Neural Machine Translation},
booktitle = {HLT-NAACL},
year = {2018},
}
ted_hrlr_translate/az_to_en (default config)
Config description: Translation dataset from az to en in plain text.
Dataset size:
1.61 MiBSplits:
| Split | Examples |
|---|---|
'test' |
903 |
'train' |
5,946 |
'validation' |
671 |
- Feature structure:
Translation({
'az': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| az | Text | string | ||
| en | Text | string |
Supervised keys (See
as_superviseddoc):('az', 'en')Examples (tfds.as_dataframe):
ted_hrlr_translate/aztr_to_en
Config description: Translation dataset from az_tr to en in plain text.
Dataset size:
42.54 MiBSplits:
| Split | Examples |
|---|---|
'test' |
903 |
'train' |
188,396 |
'validation' |
671 |
- Feature structure:
Translation({
'az_tr': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| az_tr | Text | string | ||
| en | Text | string |
Supervised keys (See
as_superviseddoc):('az_tr', 'en')Examples (tfds.as_dataframe):
ted_hrlr_translate/be_to_en
Config description: Translation dataset from be to en in plain text.
Dataset size:
1.47 MiBSplits:
| Split | Examples |
|---|---|
'test' |
664 |
'train' |
4,509 |
'validation' |
248 |
- Feature structure:
Translation({
'be': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| be | Text | string | ||
| en | Text | string |
Supervised keys (See
as_superviseddoc):('be', 'en')Examples (tfds.as_dataframe):
ted_hrlr_translate/beru_to_en
Config description: Translation dataset from be_ru to en in plain text.
Dataset size:
62.45 MiBSplits:
| Split | Examples |
|---|---|
'test' |
664 |
'train' |
212,614 |
'validation' |
248 |
- Feature structure:
Translation({
'be_ru': Text(shape=(), dtype=string),
'en': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| be_ru | Text | string | ||
| en | Text | string |
Supervised keys (See
as_superviseddoc):('be_ru', 'en')Examples (tfds.as_dataframe):
ted_hrlr_translate/es_to_pt
Config description: Translation dataset from es to pt in plain text.
Dataset size:
9.62 MiBSplits:
| Split | Examples |
|---|---|
'test' |
1,763 |
'train' |
44,938 |
'validation' |
1,016 |
- Feature structure:
Translation({
'es': Text(shape=(), dtype=string),
'pt': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| es | Text | string | ||
| pt | Text | string |
Supervised keys (See
as_superviseddoc):('es', 'pt')Examples (tfds.as_dataframe):
ted_hrlr_translate/fr_to_pt
Config description: Translation dataset from fr to pt in plain text.
Dataset size:
9.74 MiBSplits:
| Split | Examples |
|---|---|
'test' |
1,494 |
'train' |
43,873 |
'validation' |
1,131 |
- Feature structure:
Translation({
'fr': Text(shape=(), dtype=string),
'pt': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| fr | Text | string | ||
| pt | Text | string |
Supervised keys (See
as_superviseddoc):('fr', 'pt')Examples (tfds.as_dataframe):
ted_hrlr_translate/gl_to_en
Config description: Translation dataset from gl to en in plain text.
Dataset size:
2.41 MiBSplits:
| Split | Examples |
|---|---|
'test' |
1,007 |
'train' |
10,017 |
'validation' |
682 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'gl': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| en | Text | string | ||
| gl | Text | string |
Supervised keys (See
as_superviseddoc):('gl', 'en')Examples (tfds.as_dataframe):
ted_hrlr_translate/glpt_to_en
Config description: Translation dataset from gl_pt to en in plain text.
Dataset size:
12.90 MiBSplits:
| Split | Examples |
|---|---|
'test' |
1,007 |
'train' |
61,802 |
'validation' |
682 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'gl_pt': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| en | Text | string | ||
| gl_pt | Text | string |
Supervised keys (See
as_superviseddoc):('gl_pt', 'en')Examples (tfds.as_dataframe):
ted_hrlr_translate/he_to_pt
Config description: Translation dataset from he to pt in plain text.
Dataset size:
11.71 MiBSplits:
| Split | Examples |
|---|---|
'test' |
1,623 |
'train' |
48,511 |
'validation' |
1,145 |
- Feature structure:
Translation({
'he': Text(shape=(), dtype=string),
'pt': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| he | Text | string | ||
| pt | Text | string |
Supervised keys (See
as_superviseddoc):('he', 'pt')Examples (tfds.as_dataframe):
ted_hrlr_translate/it_to_pt
Config description: Translation dataset from it to pt in plain text.
Dataset size:
9.94 MiBSplits:
| Split | Examples |
|---|---|
'test' |
1,669 |
'train' |
46,259 |
'validation' |
1,162 |
- Feature structure:
Translation({
'it': Text(shape=(), dtype=string),
'pt': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| it | Text | string | ||
| pt | Text | string |
Supervised keys (See
as_superviseddoc):('it', 'pt')Examples (tfds.as_dataframe):
ted_hrlr_translate/pt_to_en
Config description: Translation dataset from pt to en in plain text.
Dataset size:
10.89 MiBSplits:
| Split | Examples |
|---|---|
'test' |
1,803 |
'train' |
51,785 |
'validation' |
1,193 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'pt': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| en | Text | string | ||
| pt | Text | string |
Supervised keys (See
as_superviseddoc):('pt', 'en')Examples (tfds.as_dataframe):
ted_hrlr_translate/ru_to_en
Config description: Translation dataset from ru to en in plain text.
Dataset size:
63.22 MiBSplits:
| Split | Examples |
|---|---|
'test' |
5,476 |
'train' |
208,106 |
'validation' |
4,805 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'ru': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| en | Text | string | ||
| ru | Text | string |
Supervised keys (See
as_superviseddoc):('ru', 'en')Examples (tfds.as_dataframe):
ted_hrlr_translate/ru_to_pt
Config description: Translation dataset from ru to pt in plain text.
Dataset size:
13.00 MiBSplits:
| Split | Examples |
|---|---|
'test' |
1,588 |
'train' |
47,278 |
'validation' |
1,184 |
- Feature structure:
Translation({
'pt': Text(shape=(), dtype=string),
'ru': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| pt | Text | string | ||
| ru | Text | string |
Supervised keys (See
as_superviseddoc):('ru', 'pt')Examples (tfds.as_dataframe):
ted_hrlr_translate/tr_to_en
Config description: Translation dataset from tr to en in plain text.
Dataset size:
42.33 MiBSplits:
| Split | Examples |
|---|---|
'test' |
5,029 |
'train' |
182,450 |
'validation' |
4,045 |
- Feature structure:
Translation({
'en': Text(shape=(), dtype=string),
'tr': Text(shape=(), dtype=string),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description |
|---|---|---|---|---|
| Translation | ||||
| en | Text | string | ||
| tr | Text | string |
Supervised keys (See
as_superviseddoc):('tr', 'en')Examples (tfds.as_dataframe):