- Description:
IRC Disentanglement dataset contains over 77,563 messages from Ubuntu IRC channel.
Features include message id, message text and timestamp. Target is list of messages that current message replies to. Each record contains a list of messages from one day of IRC chat.
- Additional Documentation: Explore on Papers With Code 
- Homepage: https://jkk.name/irc-disentanglement 
- Source code: - tfds.datasets.irc_disentanglement.Builder
- Versions: - 2.0.0(default): No release notes.
 
- Download size: - 113.53 MiB
- Dataset size: - 26.59 MiB
- Auto-cached (documentation): Yes 
- Splits: 
| Split | Examples | 
|---|---|
| 'test' | 10 | 
| 'train' | 153 | 
| 'validation' | 10 | 
- Feature structure:
FeaturesDict({
    'day': Sequence({
        'id': Text(shape=(), dtype=string),
        'parents': Sequence(Text(shape=(), dtype=string)),
        'text': Text(shape=(), dtype=string),
        'timestamp': Text(shape=(), dtype=string),
    }),
})
- Feature documentation:
| Feature | Class | Shape | Dtype | Description | 
|---|---|---|---|---|
| FeaturesDict | ||||
| day | Sequence | |||
| day/id | Text | string | ||
| day/parents | Sequence(Text) | (None,) | string | |
| day/text | Text | string | ||
| day/timestamp | Text | string | 
- Supervised keys (See - as_superviseddoc):- None
- Figure (tfds.show_examples): Not supported. 
- Examples (tfds.as_dataframe): 
- Citation:
@InProceedings{acl19disentangle,
  author    = {Jonathan K. Kummerfeld and Sai R. Gouravajhala and Joseph Peper and Vignesh Athreya and Chulaka Gunasekara and Jatin Ganhotra and Siva Sankalp Patel and Lazaros Polymenakos and Walter S. Lasecki},
  title     = {A Large-Scale Corpus for Conversation Disentanglement},
  booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
  location  = {Florence, Italy},
  month     = {July},
  year      = {2019},
  doi       = {10.18653/v1/P19-1374},
  pages     = {3846--3856},
  url       = {https://aclweb.org/anthology/papers/P/P19/P19-1374/},
  arxiv     = {https://arxiv.org/abs/1810.11118},
  software  = {https://jkk.name/irc-disentanglement},
  data      = {https://jkk.name/irc-disentanglement},
}