מאמרים_מדעיים

תיאור :

מערכי נתונים של מאמרים מדעיים מכילים שתי קבוצות של מסמכים ארוכים ומובנים. מערכי הנתונים מתקבלים ממאגרי ArXiv ו-PubMed OpenAccess.

גם ל"arxiv" וגם ל"pubmed" יש שתי תכונות:

מאמר: גוף המסמך, עמודים מופרדים ב-"/n".
תקציר: תקציר המסמך, עמודים מופרדים ב-"/n".
section_names: כותרות של קטעים, מופרדים ב-"/n".
תיעוד נוסף : חקור על ניירות עם קוד
דף הבית : https://github.com/armancohan/long-summarization
קוד מקור : tfds.datasets.scientific_papers.Builder
גרסאות :
- 1.1.0 : אין הערות שחרור.
- 1.1.1 (ברירת מחדל): אין הערות שחרור.
גודל הורדה : 4.20 GiB
שמירה אוטומטית במטמון ( תיעוד ): לא
מבנה תכונה :

FeaturesDict({
    'abstract': Text(shape=(), dtype=string),
    'article': Text(shape=(), dtype=string),
    'section_names': Text(shape=(), dtype=string),
})

תיעוד תכונה :

תכונה	מעמד	Dtype
	FeaturesDict
תַקצִיר	טֶקסט	חוּט
מאמר	טֶקסט	חוּט
שמות_קטעים	טֶקסט	חוּט

מפתחות בפיקוח (ראה as_supervised doc ): ('article', 'abstract')
איור ( tfds.show_examples ): לא נתמך.
ציטוט :

@article{Cohan_2018,
   title={A Discourse-Aware Attention Model for Abstractive Summarization of
            Long Documents},
   url={http://dx.doi.org/10.18653/v1/n18-2097},
   DOI={10.18653/v1/n18-2097},
   journal={Proceedings of the 2018 Conference of the North American Chapter of
          the Association for Computational Linguistics: Human Language
          Technologies, Volume 2 (Short Papers)},
   publisher={Association for Computational Linguistics},
   author={Cohan, Arman and Dernoncourt, Franck and Kim, Doo Soon and Bui, Trung and Kim, Seokhwan and Chang, Walter and Goharian, Nazli},
   year={2018}
}

scientific_papers/arxiv (תצורת ברירת מחדל)

תיאור תצורה : מסמכים ממאגר ArXiv.
גודל מערך נתונים : 7.07 GiB
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	6,440
`'train'`	203,037
`'validation'`	6,436

דוגמאות ( tfds.as_dataframe ):

מאמרים מדעיים/פורסם

תיאור תצורה : מסמכים ממאגר PubMed.
גודל מערך נתונים : 2.34 GiB
פיצולים :

לְפַצֵל	דוגמאות
`'test'`	6,658
`'train'`	119,924
`'validation'`	6,633

דוגמאות ( tfds.as_dataframe ):

מאמרים_מדעיים קל לארגן דפים בעזרת אוספים אפשר לשמור ולסווג תוכן על סמך ההעדפות שלך.

scientific_papers/arxiv (תצורת ברירת מחדל)

מאמרים מדעיים/פורסם

מאמרים_מדעיים