From 266789ef07efbd40e44264924a764c236bdecb06 Mon Sep 17 00:00:00 2001 From: Quentin Mace Date: Mon, 16 Mar 2026 15:15:53 +0100 Subject: [PATCH 1/5] update readme --- README.md | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 769811d4..e7b5d034 100644 --- a/README.md +++ b/README.md @@ -19,7 +19,30 @@ --- -## What is Pipeline Evaluation? +## Evaluating retrievers on ViDoRe v1-v3 +We shifted from in-house evals to be in line with the general eval framework of retrieval models by moving to [MTEB](https://github.com/embeddings-benchmark/mteb/tree/main) + +Here is a rough sketch of how submission for ViDoRe v1-v3 works, more details in the [MTEB official documentation](https://embeddings-benchmark.github.io/mteb/contributing/adding_a_model/) + +1. Create your model implementation file (if it does not exist already) [here](https://github.com/embeddings-benchmark/mteb/tree/main/mteb/models/model_implementations). And open a PR on the repo, examples for Colpali-like models can be found in [this file for example](https://github.com/embeddings-benchmark/mteb/blob/main/mteb/models/model_implementations/colpali_models.py). + +2. Evaluate your model: +```python +import mteb +from mteb.models.model_implementations.my_custom_model import MyCustomModel + +my_model = MyCustomModel(my_args) +tasks = mteb.get_tasks(["ViDoRe (v3)"], languages= ["en"]) + +results = mteb.evaluate(my_model, tasks=tasks) +``` + +3. Open a PR on the [mteb_results_repo](https://github.com/embeddings-benchmark/results/tree/main) with the generated results file to submit your results to the leaderboard + +4. Optional : In order to eval on private sets + once all this is done, you can ask the MTEB team for evaluating your model on private ViDoRe v3 sets by opening a dedicated issue on [their repo](https://github.com/embeddings-benchmark/mteb/issues) + +## Evaluating a complex pipeline Pipeline evaluation allows you to evaluate **complete end-to-end retrieval systems** on the ViDoRe v3 benchmark datasets. Unlike traditional retriever evaluation that focuses on individual model components, pipeline evaluation lets you test: From f4e378a2220600ec65ea8060aa91d0e6a559189c Mon Sep 17 00:00:00 2001 From: QuentinJGMace <95310069+QuentinJGMace@users.noreply.github.com> Date: Tue, 17 Mar 2026 10:44:03 +0100 Subject: [PATCH 2/5] Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index e7b5d034..131ab5dd 100644 --- a/README.md +++ b/README.md @@ -19,10 +19,10 @@ --- -## Evaluating retrievers on ViDoRe v1-v3 -We shifted from in-house evals to be in line with the general eval framework of retrieval models by moving to [MTEB](https://github.com/embeddings-benchmark/mteb/tree/main) +## Evaluating single-model retrievers on ViDoRe v1–v3 with MTEB +We shifted from in-house evaluations to the general MTEB evaluation framework for retrieval models by moving to [MTEB](https://github.com/embeddings-benchmark/mteb/tree/main). -Here is a rough sketch of how submission for ViDoRe v1-v3 works, more details in the [MTEB official documentation](https://embeddings-benchmark.github.io/mteb/contributing/adding_a_model/) +Below is a high-level overview of how single-model retriever submissions for ViDoRe v1–v3 work via MTEB; see the [MTEB official documentation](https://embeddings-benchmark.github.io/mteb/contributing/adding_a_model/) for full details. This section covers single-model retriever evaluation only; for end-to-end pipeline evaluation, see the section below. 1. Create your model implementation file (if it does not exist already) [here](https://github.com/embeddings-benchmark/mteb/tree/main/mteb/models/model_implementations). And open a PR on the repo, examples for Colpali-like models can be found in [this file for example](https://github.com/embeddings-benchmark/mteb/blob/main/mteb/models/model_implementations/colpali_models.py). From 5c9c6f89866bb4302965a24b9c47de7db5e91a24 Mon Sep 17 00:00:00 2001 From: QuentinJGMace <95310069+QuentinJGMace@users.noreply.github.com> Date: Tue, 17 Mar 2026 10:44:19 +0100 Subject: [PATCH 3/5] Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 131ab5dd..fb778ca5 100644 --- a/README.md +++ b/README.md @@ -24,7 +24,7 @@ We shifted from in-house evaluations to the general MTEB evaluation framework fo Below is a high-level overview of how single-model retriever submissions for ViDoRe v1–v3 work via MTEB; see the [MTEB official documentation](https://embeddings-benchmark.github.io/mteb/contributing/adding_a_model/) for full details. This section covers single-model retriever evaluation only; for end-to-end pipeline evaluation, see the section below. -1. Create your model implementation file (if it does not exist already) [here](https://github.com/embeddings-benchmark/mteb/tree/main/mteb/models/model_implementations). And open a PR on the repo, examples for Colpali-like models can be found in [this file for example](https://github.com/embeddings-benchmark/mteb/blob/main/mteb/models/model_implementations/colpali_models.py). +1. Create your model implementation file (if it does not exist already) [here](https://github.com/embeddings-benchmark/mteb/tree/main/mteb/models/model_implementations), then open a PR to the [MTEB repository](https://github.com/embeddings-benchmark/mteb) with your changes; examples for Colpali-like models can be found in [this file](https://github.com/embeddings-benchmark/mteb/blob/main/mteb/models/model_implementations/colpali_models.py). 2. Evaluate your model: ```python From 7a24499bec9345e377876b6165c45fd1d4c2bccf Mon Sep 17 00:00:00 2001 From: QuentinJGMace <95310069+QuentinJGMace@users.noreply.github.com> Date: Tue, 17 Mar 2026 10:46:57 +0100 Subject: [PATCH 4/5] Potential fix for pull request finding Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com> --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index fb778ca5..a423e8d1 100644 --- a/README.md +++ b/README.md @@ -39,11 +39,11 @@ results = mteb.evaluate(my_model, tasks=tasks) 3. Open a PR on the [mteb_results_repo](https://github.com/embeddings-benchmark/results/tree/main) with the generated results file to submit your results to the leaderboard -4. Optional : In order to eval on private sets - once all this is done, you can ask the MTEB team for evaluating your model on private ViDoRe v3 sets by opening a dedicated issue on [their repo](https://github.com/embeddings-benchmark/mteb/issues) +4. Optional: To evaluate on private sets, once all this is done you can ask the MTEB team to evaluate your model on private ViDoRe v3 sets by opening a dedicated issue on [their repo](https://github.com/embeddings-benchmark/mteb/issues) ## Evaluating a complex pipeline + Pipeline evaluation allows you to evaluate **complete end-to-end retrieval systems** on the ViDoRe v3 benchmark datasets. Unlike traditional retriever evaluation that focuses on individual model components, pipeline evaluation lets you test: - **Multi-stage retrieval systems** (e.g., retrieve + rerank) From bfcb30d1dacd6249d54f31182cafd5b58017f36a Mon Sep 17 00:00:00 2001 From: QuentinJGMace <95310069+QuentinJGMace@users.noreply.github.com> Date: Tue, 17 Mar 2026 10:52:24 +0100 Subject: [PATCH 5/5] Update README for MTEB evaluation process clarity --- README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index a423e8d1..eb66d7d1 100644 --- a/README.md +++ b/README.md @@ -22,7 +22,7 @@ ## Evaluating single-model retrievers on ViDoRe v1–v3 with MTEB We shifted from in-house evaluations to the general MTEB evaluation framework for retrieval models by moving to [MTEB](https://github.com/embeddings-benchmark/mteb/tree/main). -Below is a high-level overview of how single-model retriever submissions for ViDoRe v1–v3 work via MTEB; see the [MTEB official documentation](https://embeddings-benchmark.github.io/mteb/contributing/adding_a_model/) for full details. This section covers single-model retriever evaluation only; for end-to-end pipeline evaluation, see the section below. +Here are the main steps to evaluate and submit your retriever to the ViDoRe V1-V3 leaderboards ; see the [MTEB official documentation](https://embeddings-benchmark.github.io/mteb/contributing/adding_a_model/) for full details. This section covers mteb leaderboards only; for our in-house pipeline leaderboard, see the section below. 1. Create your model implementation file (if it does not exist already) [here](https://github.com/embeddings-benchmark/mteb/tree/main/mteb/models/model_implementations), then open a PR to the [MTEB repository](https://github.com/embeddings-benchmark/mteb) with your changes; examples for Colpali-like models can be found in [this file](https://github.com/embeddings-benchmark/mteb/blob/main/mteb/models/model_implementations/colpali_models.py). @@ -32,14 +32,14 @@ import mteb from mteb.models.model_implementations.my_custom_model import MyCustomModel my_model = MyCustomModel(my_args) -tasks = mteb.get_tasks(["ViDoRe (v3)"], languages= ["en"]) +tasks = mteb.get_tasks(["ViDoRe (v3)"]) results = mteb.evaluate(my_model, tasks=tasks) ``` 3. Open a PR on the [mteb_results_repo](https://github.com/embeddings-benchmark/results/tree/main) with the generated results file to submit your results to the leaderboard -4. Optional: To evaluate on private sets, once all this is done you can ask the MTEB team to evaluate your model on private ViDoRe v3 sets by opening a dedicated issue on [their repo](https://github.com/embeddings-benchmark/mteb/issues) +4. To evaluate on private sets, once all this is done you can ask the MTEB team to evaluate your model on private ViDoRe v3 sets by opening a dedicated issue on [their repo](https://github.com/embeddings-benchmark/mteb/issues) ## Evaluating a complex pipeline @@ -51,7 +51,7 @@ Pipeline evaluation allows you to evaluate **complete end-to-end retrieval syste - **Custom preprocessing pipelines** (e.g., OCR → chunking → embedding) - **Arbitrary retrieval logic** that goes beyond standard dense/sparse retrievers -## 📊 Results Repository & Submission Guidelines +### 📊 Results Repository & Submission Guidelines **This repository serves as the primary community results repository for visual document retrieval benchmarks using complex pipelines.** We encourage researchers and practitioners to submit their pipeline evaluation results to create a centralized location where the community can compare different approaches and track progress on ViDoRe v3 datasets.