big_patent
Stay organized with collections
Save and categorize content based on your preferences.
BIGPATENT, consisting of 1.3 million records of U.S. patent documents along with
human written abstractive summaries. Each US patent application is filed under a
Cooperative Patent Classification (CPC) code. There are nine such classification
categories:
- A (Human Necessities),
- B (Performing Operations; Transporting),
- C (Chemistry; Metallurgy),
- D (Textiles; Paper),
- E (Fixed Constructions),
- F (Mechanical Engineering; Lightning; Heating; Weapons; Blasting),
- G (Physics),
- H (Electricity), and
- Y (General tagging of new or cross-sectional technology)
There are two features:
FeaturesDict({
'abstract': Text(shape=(), dtype=string),
'description': Text(shape=(), dtype=string),
})
Feature |
Class |
Shape |
Dtype |
Description |
|
FeaturesDict |
|
|
|
abstract |
Text |
|
string |
|
description |
Text |
|
string |
|
@misc{sharma2019bigpatent,
title={BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization},
author={Eva Sharma and Chen Li and Lu Wang},
year={2019},
eprint={1906.03741},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
big_patent/all (default config)
Split |
Examples |
'test' |
67,072 |
'train' |
1,207,222 |
'validation' |
67,068 |
big_patent/a
Split |
Examples |
'test' |
9,675 |
'train' |
174,134 |
'validation' |
9,674 |
big_patent/b
Split |
Examples |
'test' |
8,974 |
'train' |
161,520 |
'validation' |
8,973 |
big_patent/c
Split |
Examples |
'test' |
5,614 |
'train' |
101,042 |
'validation' |
5,613 |
big_patent/d
Split |
Examples |
'test' |
565 |
'train' |
10,164 |
'validation' |
565 |
big_patent/e
Split |
Examples |
'test' |
1,914 |
'train' |
34,443 |
'validation' |
1,914 |
big_patent/f
Split |
Examples |
'test' |
4,754 |
'train' |
85,568 |
'validation' |
4,754 |
big_patent/g
Split |
Examples |
'test' |
14,386 |
'train' |
258,935 |
'validation' |
14,385 |
big_patent/h
Split |
Examples |
'test' |
14,279 |
'train' |
257,019 |
'validation' |
14,279 |
big_patent/y
Split |
Examples |
'test' |
6,911 |
'train' |
124,397 |
'validation' |
6,911 |
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2023-07-11 UTC.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Missing the information I need","missingTheInformationINeed","thumb-down"],["Too complicated / too many steps","tooComplicatedTooManySteps","thumb-down"],["Out of date","outOfDate","thumb-down"],["Samples / code issue","samplesCodeIssue","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2023-07-11 UTC."],[],[],null,["# big_patent\n\n\u003cbr /\u003e\n\n- **Description**:\n\nBIGPATENT, consisting of 1.3 million records of U.S. patent documents along with\nhuman written abstractive summaries. Each US patent application is filed under a\nCooperative Patent Classification (CPC) code. There are nine such classification\ncategories:\n\n- A (Human Necessities),\n- B (Performing Operations; Transporting),\n- C (Chemistry; Metallurgy),\n- D (Textiles; Paper),\n- E (Fixed Constructions),\n- F (Mechanical Engineering; Lightning; Heating; Weapons; Blasting),\n- G (Physics),\n- H (Electricity), and\n- Y (General tagging of new or cross-sectional technology)\n\nThere are two features:\n\n- description: detailed description of patent.\n- summary: Patent abstract.\n\n- **Additional Documentation** :\n [Explore on Papers With Code\n north_east](https://paperswithcode.com/dataset/bigpatent)\n\n- **Homepage** :\n \u003chttps://evasharma.github.io/bigpatent/\u003e\n\n- **Source code** :\n [`tfds.datasets.big_patent.Builder`](https://github.com/tensorflow/datasets/tree/master/tensorflow_datasets/datasets/big_patent/big_patent_dataset_builder.py)\n\n- **Versions**:\n\n - `1.0.0`: lower cased tokenized words\n - `2.0.0`: Update to use cased raw strings\n - **`2.1.2`** (default): Fix update to cased raw strings.\n- **Download size** : `9.45 GiB`\n\n- **Auto-cached**\n ([documentation](https://www.tensorflow.org/datasets/performances#auto-caching)):\n No\n\n- **Feature structure**:\n\n FeaturesDict({\n 'abstract': Text(shape=(), dtype=string),\n 'description': Text(shape=(), dtype=string),\n })\n\n- **Feature documentation**:\n\n| Feature | Class | Shape | Dtype | Description |\n|-------------|--------------|-------|--------|-------------|\n| | FeaturesDict | | | |\n| abstract | Text | | string | |\n| description | Text | | string | |\n\n- **Supervised keys** (See\n [`as_supervised` doc](https://www.tensorflow.org/datasets/api_docs/python/tfds/load#args)):\n `('description', 'abstract')`\n\n- **Figure**\n ([tfds.show_examples](https://www.tensorflow.org/datasets/api_docs/python/tfds/visualization/show_examples)):\n Not supported.\n\n- **Citation**:\n\n @misc{sharma2019bigpatent,\n title={BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization},\n author={Eva Sharma and Chen Li and Lu Wang},\n year={2019},\n eprint={1906.03741},\n archivePrefix={arXiv},\n primaryClass={cs.CL}\n }\n\nbig_patent/all (default config)\n-------------------------------\n\n- **Config description**: Patents under all categories.\n\n- **Dataset size** : `35.17 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|-----------|\n| `'test'` | 67,072 |\n| `'train'` | 1,207,222 |\n| `'validation'` | 67,068 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nbig_patent/a\n------------\n\n- **Config description**: Patents under Cooperative Patent Classification\n (CPC)a: Human Necessities\n\n- **Dataset size** : `5.16 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 9,675 |\n| `'train'` | 174,134 |\n| `'validation'` | 9,674 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nbig_patent/b\n------------\n\n- **Config description**: Patents under Cooperative Patent Classification\n (CPC)b: Performing Operations; Transporting\n\n- **Dataset size** : `4.06 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 8,974 |\n| `'train'` | 161,520 |\n| `'validation'` | 8,973 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nbig_patent/c\n------------\n\n- **Config description**: Patents under Cooperative Patent Classification\n (CPC)c: Chemistry; Metallurgy\n\n- **Dataset size** : `3.63 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 5,614 |\n| `'train'` | 101,042 |\n| `'validation'` | 5,613 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nbig_patent/d\n------------\n\n- **Config description**: Patents under Cooperative Patent Classification\n (CPC)d: Textiles; Paper\n\n- **Dataset size** : `255.56 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 565 |\n| `'train'` | 10,164 |\n| `'validation'` | 565 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nbig_patent/e\n------------\n\n- **Config description**: Patents under Cooperative Patent Classification\n (CPC)e: Fixed Constructions\n\n- **Dataset size** : `871.40 MiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 1,914 |\n| `'train'` | 34,443 |\n| `'validation'` | 1,914 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nbig_patent/f\n------------\n\n- **Config description**: Patents under Cooperative Patent Classification\n (CPC)f: Mechanical Engineering; Lightning; Heating; Weapons; Blasting\n\n- **Dataset size** : `2.06 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 4,754 |\n| `'train'` | 85,568 |\n| `'validation'` | 4,754 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nbig_patent/g\n------------\n\n- **Config description**: Patents under Cooperative Patent Classification\n (CPC)g: Physics\n\n- **Dataset size** : `8.19 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 14,386 |\n| `'train'` | 258,935 |\n| `'validation'` | 14,385 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nbig_patent/h\n------------\n\n- **Config description**: Patents under Cooperative Patent Classification\n (CPC)h: Electricity\n\n- **Dataset size** : `7.50 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 14,279 |\n| `'train'` | 257,019 |\n| `'validation'` | 14,279 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples... \n\nbig_patent/y\n------------\n\n- **Config description**: Patents under Cooperative Patent Classification\n (CPC)y: General tagging of new or cross-sectional technology\n\n- **Dataset size** : `3.46 GiB`\n\n- **Splits**:\n\n| Split | Examples |\n|----------------|----------|\n| `'test'` | 6,911 |\n| `'train'` | 124,397 |\n| `'validation'` | 6,911 |\n\n- **Examples** ([tfds.as_dataframe](https://www.tensorflow.org/datasets/api_docs/python/tfds/as_dataframe)):\n\nDisplay examples..."]]