Reading Comprehension with Multiple Hops

Dataset Distribution under CC BY-SA 3.0

Two New Reading Comprehension Datasets

We have created two new Reading Comprehension datasets focussing on multi-hop (alias multi-step) inference.

Several pieces of information often jointly imply another fact. In multi-hop inference, a new fact is derived by combining facts via a chain of multiple steps.

Our aim is to build Reading Comprehension methods that perform multi-hop inference on text, where individual facts are spread out across different documents.

The two QAngaroo datasets provide a training and evaluation resource for such methods.

Task Overview

In our task, the goal is to answer text understanding queries by combining multiple facts that are spread across different documents.

In each sample, a query is given about a collection of documents. The goal is to identify the correct answer among a set of given type-consistent answer candidates. The candidates — including the correct answer — are mentioned in the documents.

We also provide a masked version of both datasets, where candidates are replaced by random placeholder tokens. More details on the rationale behind this can be found in the paper.


The first of the two datasets is open-domain and based on Wikipedia articles; the goal is to recover Wikidata information by hopping through documents. The example on the right shows the relevant documents leading to the correct answer for the query shown at the bottom.


With the same format as WikiHop, this dataset is based on research paper abstracts from PubMed, and the queries are about interactions between pairs of drugs. The correct answer has to be inferred by combining information from a chain of reactions of drugs and proteins.


Questions? Reach out to: j.welbl [ÄT]