- Description:
Less Basic Python Programming is a collection of 161 programming problems with accompanying unit tests. They were created with the aim of being fresh (not leaked at the time of creation) and more difficult than similar datasets (e.g., HumanEval and MBPP). It can serve as a drop-in replacement or enrichment of those datasets as they are structured in an equivalent way.
Source code:
tfds.datasets.lbpp.Builder
Versions:
2.0.0
(default): No release notes.
Auto-cached (documentation): Yes
Feature structure:
FeaturesDict({
'categories': Sequence(Text(shape=(), dtype=string)),
'completion': Text(shape=(), dtype=string),
'instruction': Text(shape=(), dtype=string),
'language': Text(shape=(), dtype=string),
'signature': Text(shape=(), dtype=string),
'task_id': Text(shape=(), dtype=string),
'test_file': Text(shape=(), dtype=string),
'test_list': Sequence(Text(shape=(), dtype=string)),
'test_setup': Text(shape=(), dtype=string),
'title': Text(shape=(), dtype=string),
})
- Feature documentation:
Feature | Class | Shape | Dtype | Description |
---|---|---|---|---|
FeaturesDict | ||||
categories | Sequence(Text) | (None,) | string | |
completion | Text | string | ||
instruction | Text | string | ||
language | Text | string | ||
signature | Text | string | ||
task_id | Text | string | ||
test_file | Text | string | ||
test_list | Sequence(Text) | (None,) | string | |
test_setup | Text | string | ||
title | Text | string |
Supervised keys (See
as_supervised
doc):None
Figure (tfds.show_examples): Not supported.
Citation:
@inproceedings{matton-etal-2024-leakage,
title = "On Leakage of Code Generation Evaluation Datasets",
author = "Matton, Alexandre and
Sherborne, Tom and
Aumiller, Dennis and
Tommasone, Elena and
Alizadeh, Milad and
He, Jingyi and
Ma, Raymond and
Voisin, Maxime and
Gilsenan-McMahon, Ellen and
Gall{\'e}, Matthias",
editor = "Al-Onaizan, Yaser and
Bansal, Mohit and
Chen, Yun-Nung",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
month = nov,
year = "2024",
address = "Miami, Florida, USA",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.findings-emnlp.772/",
doi = "10.18653/v1/2024.findings-emnlp.772",
pages = "13215--13223",
}
lbpp/all (default config)
Config description: Multilingual LBPP
Download size:
1.78 MiB
Dataset size:
4.30 MiB
Splits:
Split | Examples |
---|---|
'test' |
944 |
- Examples (tfds.as_dataframe):
lbpp/multilingual
Config description: Multilingual LBPP
Download size:
1.78 MiB
Dataset size:
4.30 MiB
Splits:
Split | Examples |
---|---|
'test' |
944 |
- Examples (tfds.as_dataframe):
lbpp/default
Config description: Python LBPP
Download size:
279.90 KiB
Dataset size:
627.04 KiB
Splits:
Split | Examples |
---|---|
'test' |
162 |
- Examples (tfds.as_dataframe):
lbpp/python
Config description: Python LBPP
Download size:
279.90 KiB
Dataset size:
627.04 KiB
Splits:
Split | Examples |
---|---|
'test' |
162 |
- Examples (tfds.as_dataframe):
lbpp/cpp
Config description: C++ LBPP
Download size:
314.45 KiB
Dataset size:
761.87 KiB
Splits:
Split | Examples |
---|---|
'test' |
161 |
- Examples (tfds.as_dataframe):
lbpp/go
Config description: Go LBPP
Download size:
317.09 KiB
Dataset size:
687.23 KiB
Splits:
Split | Examples |
---|---|
'test' |
161 |
- Examples (tfds.as_dataframe):
lbpp/java
Config description: Java LBPP
Download size:
337.90 KiB
Dataset size:
887.40 KiB
Splits:
Split | Examples |
---|---|
'test' |
158 |
- Examples (tfds.as_dataframe):
lbpp/js
Config description: JavaScript LBPP
Download size:
303.40 KiB
Dataset size:
756.69 KiB
Splits:
Split | Examples |
---|---|
'test' |
153 |
- Examples (tfds.as_dataframe):
lbpp/javascript
Config description: JavaScript LBPP
Download size:
303.40 KiB
Dataset size:
756.69 KiB
Splits:
Split | Examples |
---|---|
'test' |
153 |
- Examples (tfds.as_dataframe):
lbpp/rust
Config description: JavaScript LBPP
Download size:
272.61 KiB
Dataset size:
684.31 KiB
Splits:
Split | Examples |
---|---|
'test' |
149 |
- Examples (tfds.as_dataframe):