QAngaroo Leaderboards


The QAngaroo Leaderboards

There are two leaderboards, one for WikiHop and one for MedHop. They compare the accuracies of different methods, evaluated on the hidden test data of each respective dataset. All models are evaluated in the standard (unmasked) setting.

Planning to submit a model? Submit your code on codalab.

WikiHop

# Model / Reference Affiliation Date Accuracy[%]
1 [anonymized] [anonymized] September 2019 78.3
2 [anonymized] [anonymized] September 2019 76.5
3 ChainEx (single) [anonymized] May 2019 74.9
4 JDReader (ensemble) JD AI Research March 2019 74.3
5 DynSAN (ensemble) Samsung Research (SRC-B) March 2019 73.8
6 GCN-Test (single) Zhejiang University (ZJU) July 2019 72.5
7 DynSAN basic (single) Samsung Research (SRC-B) February 2019 71.4
8 Entity-GCN v2 (ensemble) University of Amsterdam && University of Edinburgh November 2018 71.2
9 HDEGraph JD AI Research February 2019 70.9
10 CFC Salesforce Research September 2018 70.6
11 [anonymized] [anonymized] November 2018 69.6
12 [anonymized] [anonymized] February 2019 69.1
13 BAG University of Sydney March 2019 69.0
14 [anonymized] [anonymized] September 2018 67.6
15 Entity-GCN v1 University of Amsterdam && University of Edinburgh May 2018 67.6
16 SimpleMemNet [anonymized] September 2018 66.9
17 [anonymized] [anonymized] November 2018 66.5
18 MHQA-GRN IBM && University of Rochester August 2018 65.4
19 Jenga Facebook AI Research February 2018 65.3
20 Vanilla CoAttention Model Nanyang Technological University December 2017 59.9
21 Coref-GRU Carnegie Mellon University April 2018 59.3
22 BiDAF (Seo et al. '17) Initial Benchmarks September 2017 42.9
23 Most Frequent Given Candidate Initial Benchmarks September 2017 38.8
24 Document-cue Initial Benchmarks September 2017 36.7
25 FastQA (Weissenborn et al. '17) Initial Benchmarks September 2017 25.7
26 TF-IDF Initial Benchmarks September 2017 25.6
27 Random Candidate Initial Benchmarks September 2017 11.5

MedHop

# Model / Reference Affiliation Date Accuracy[%]
1 [anonymized] [anonymized] February 2019 60.3
2 Most Frequent Given Candidate Initial Benchmarks September 2017 58.4
3 Vanilla CoAttention Model Nanyang Technological University December 2017 58.1
4 BiDAF (Seo et al. '17) Initial Benchmarks September 2017 47.8
5 Document-cue Initial Benchmarks September 2017 44.9
6 FastQA (Weissenborn et al. '17) Initial Benchmarks September 2017 23.1
7 Random Candidate Initial Benchmarks September 2017 13.9
8 TF-IDF Initial Benchmarks September 2017 9.0