QAngaroo Leaderboards


The QAngaroo Leaderboards

There are two leaderboards, one for WikiHop and one for MedHop. They compare the accuracies of different methods, evaluated on the hidden test data of each respective dataset. All models are evaluated in the standard (unmasked) setting.

Planning to submit a model? Submit your code on codalab.

WikiHop

# Model / Reference Affiliation Date Accuracy[%]
1 [anonymized] [anonymized] September 2018 70.6
2 [anonymized] [anonymized] September 2018 67.6
3 Entity-GCN University of Amsterdam && University of Edinburgh May 2018 67.6
4 SimpleMemNet [anonymized] September 2018 66.9
5 MHQA-GRN IBM && University of Rochester August 2018 65.4
6 Jenga Facebook AI Research February 2018 65.3
8 Vanilla CoAttention Model Nanyang Technological University December 2017 59.9
9 Coref-GRU Carnegie Mellon University April 2018 59.3
10 BiDAF (Seo et al. '17) Initial Benchmarks September 2017 42.9
11 Most Frequent Given Candidate Initial Benchmarks September 2017 38.8
12 Document-cue Initial Benchmarks September 2017 36.7
13 FastQA (Weissenborn et al. '17) Initial Benchmarks September 2017 25.7
14 TF-IDF Initial Benchmarks September 2017 25.6
15 Random Candidate Initial Benchmarks September 2017 11.5

MedHop

# Model / Reference Affiliation Date Accuracy[%]
1 Most Frequent Given Candidate Initial Benchmarks September 2017 58.4
2 Vanilla CoAttention Model Nanyang Technological University December 2017 58.1
3 BiDAF (Seo et al. '17) Initial Benchmarks September 2017 47.8
4 Document-cue Initial Benchmarks September 2017 44.9
5 FastQA (Weissenborn et al. '17) Initial Benchmarks September 2017 23.1
6 Random Candidate Initial Benchmarks September 2017 13.9
7 TF-IDF Initial Benchmarks September 2017 9.0