lbpp

  • Description:

Less Basic Python Programming is a collection of 161 programming problems with accompanying unit tests. They were created with the aim of being fresh (not leaked at the time of creation) and more difficult than similar datasets (e.g., HumanEval and MBPP). It can serve as a drop-in replacement or enrichment of those datasets as they are structured in an equivalent way.

FeaturesDict({
    'categories': Sequence(Text(shape=(), dtype=string)),
    'completion': Text(shape=(), dtype=string),
    'instruction': Text(shape=(), dtype=string),
    'language': Text(shape=(), dtype=string),
    'signature': Text(shape=(), dtype=string),
    'task_id': Text(shape=(), dtype=string),
    'test_file': Text(shape=(), dtype=string),
    'test_list': Sequence(Text(shape=(), dtype=string)),
    'test_setup': Text(shape=(), dtype=string),
    'title': Text(shape=(), dtype=string),
})
  • Feature documentation:
Feature Class Shape Dtype Description
FeaturesDict
categories Sequence(Text) (None,) string
completion Text string
instruction Text string
language Text string
signature Text string
task_id Text string
test_file Text string
test_list Sequence(Text) (None,) string
test_setup Text string
title Text string
@inproceedings{matton-etal-2024-leakage,
    title = "On Leakage of Code Generation Evaluation Datasets",
    author = "Matton, Alexandre  and
      Sherborne, Tom  and
      Aumiller, Dennis  and
      Tommasone, Elena  and
      Alizadeh, Milad  and
      He, Jingyi  and
      Ma, Raymond  and
      Voisin, Maxime  and
      Gilsenan-McMahon, Ellen  and
      Gall{\'e}, Matthias",
    editor = "Al-Onaizan, Yaser  and
      Bansal, Mohit  and
      Chen, Yun-Nung",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2024",
    month = nov,
    year = "2024",
    address = "Miami, Florida, USA",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-emnlp.772/",
    doi = "10.18653/v1/2024.findings-emnlp.772",
    pages = "13215--13223",
}

lbpp/all (default config)

  • Config description: Multilingual LBPP

  • Download size: 1.78 MiB

  • Dataset size: 4.30 MiB

  • Splits:

Split Examples
'test' 944

lbpp/multilingual

  • Config description: Multilingual LBPP

  • Download size: 1.78 MiB

  • Dataset size: 4.30 MiB

  • Splits:

Split Examples
'test' 944

lbpp/default

  • Config description: Python LBPP

  • Download size: 279.90 KiB

  • Dataset size: 627.04 KiB

  • Splits:

Split Examples
'test' 162

lbpp/python

  • Config description: Python LBPP

  • Download size: 279.90 KiB

  • Dataset size: 627.04 KiB

  • Splits:

Split Examples
'test' 162

lbpp/cpp

  • Config description: C++ LBPP

  • Download size: 314.45 KiB

  • Dataset size: 761.87 KiB

  • Splits:

Split Examples
'test' 161

lbpp/go

  • Config description: Go LBPP

  • Download size: 317.09 KiB

  • Dataset size: 687.23 KiB

  • Splits:

Split Examples
'test' 161

lbpp/java

  • Config description: Java LBPP

  • Download size: 337.90 KiB

  • Dataset size: 887.40 KiB

  • Splits:

Split Examples
'test' 158

lbpp/js

  • Config description: JavaScript LBPP

  • Download size: 303.40 KiB

  • Dataset size: 756.69 KiB

  • Splits:

Split Examples
'test' 153

lbpp/javascript

  • Config description: JavaScript LBPP

  • Download size: 303.40 KiB

  • Dataset size: 756.69 KiB

  • Splits:

Split Examples
'test' 153

lbpp/rust

  • Config description: JavaScript LBPP

  • Download size: 272.61 KiB

  • Dataset size: 684.31 KiB

  • Splits:

Split Examples
'test' 149