QAngaroo Leaderboards


The QAngaroo Leaderboards

There are two leaderboards, one for WikiHop and one for MedHop. They compare the accuracies of different methods, evaluated on the hidden test data of each respective dataset. All models are evaluated in the standard (unmasked) setting.

Planning to submit a model? Submit your code on codalab.

WikiHop

# Model Reference / Affiliation Date Accuracy[%]
1 Jenga Facebook AI Research February 2018 65.3
2 Vanilla CoAttention Model Nanyang Technological University December 2017 59.9
3 Coref-GRU Carnegie Mellon University April 2018 59.3
4 BiDAF (Seo et al. '17) Initial Benchmarks September 2017 42.9
5 Most Frequent Given Candidate Initial Benchmarks September 2017 38.8
6 Document-cue Initial Benchmarks September 2017 36.7
7 FastQA (Weissenborn et al. '17) Initial Benchmarks September 2017 25.7
8 TF-IDF Initial Benchmarks September 2017 25.6
9 Random Candidate Initial Benchmarks September 2017 11.5

MedHop

# Model Reference / Affiliation Date Accuracy[%]
1 Most Frequent Given Candidate Initial Benchmarks September 2017 58.4
2 Vanilla CoAttention Model Nanyang Technological University December 2017 58.1
3 BiDAF (Seo et al. '17) Initial Benchmarks September 2017 47.8
4 Document-cue Initial Benchmarks September 2017 44.9
5 FastQA (Weissenborn et al. '17) Initial Benchmarks September 2017 23.1
6 Random Candidate Initial Benchmarks September 2017 13.9
7 TF-IDF Initial Benchmarks September 2017 9.0