Sorry for bothering you. I have reproduced your work several times, including using the pre-trained model you provided in https://github.com/igashov/RetroBridge/blob/main/configs/retrobridge.yaml as well as retraining the model with the best parameters you suggested (the batch_size is compressed to 32 due to limitation of GPU memory). The following tables show our experimental results. Though the round-trip accuracy is on par with the published results, the exact match accuracy is significantly lower than what was reported.
| Round-Trip Accuracy on USPTO-50k (%) |
Top-1 |
Top-3 |
Top-5 |
Top-10 |
| Retrobridge (directly evaluate the offered checkpoint) |
83.96 |
72.75 |
70.47 |
70.22 |
| Retrobridge (re-train from scratch, batch-size: 32) |
83.66 |
72.46 |
69.81 |
69.40 |
| Exact Match Accuracy on USPTO-50k (%) |
Top-1 |
Top-3 |
Top-5 |
Top-10 |
| Retrobridge (directly evaluate the offered checkpoint) |
47.79 |
67.01 |
71.28 |
73.74 |
| Retrobridge (re-train from scratch, batch-size: 32) |
48.37 |
66.95 |
70.94 |
72.82 |
I wonder whether there is something different from the hyperparameter setting of your best model, or is there something wrong in the evaluation codes?
Sorry for bothering you. I have reproduced your work several times, including using the pre-trained model you provided in https://github.com/igashov/RetroBridge/blob/main/configs/retrobridge.yaml as well as retraining the model with the best parameters you suggested (the batch_size is compressed to 32 due to limitation of GPU memory). The following tables show our experimental results. Though the round-trip accuracy is on par with the published results, the exact match accuracy is significantly lower than what was reported.
I wonder whether there is something different from the hyperparameter setting of your best model, or is there something wrong in the evaluation codes?