Skip to content

[MVEB] Add STARBench video-centric QA task#4746

Open
yaswanth169 wants to merge 2 commits into
embeddings-benchmark:mainfrom
yaswanth169:mveb-starbench-vcqa-v2
Open

[MVEB] Add STARBench video-centric QA task#4746
yaswanth169 wants to merge 2 commits into
embeddings-benchmark:mainfrom
yaswanth169:mveb-starbench-vcqa-v2

Conversation

@yaswanth169

Copy link
Copy Markdown
Contributor

Add STARBenchVideoCentricQA (vt2t) and STARBenchVideoAudioCentricQA (vat2t) tasks for the STAR (Situated Reasoning in Real-World Videos) benchmark. The dataset has 4 question types (feasibility, interaction, prediction, sequence) totaling ~7100 examples. Both classes follow the standard AbsTaskRetrieval + RetrievalSplitData pattern.

For question by @Samoed : We combine all 4 configs into one task so models are evaluated on the full STAR benchmark holistically : the same approach used by OmniVideoBench (16 subsets combined). Happy to split if preferred.

@Samoed

Samoed commented May 27, 2026

Copy link
Copy Markdown
Member

We combine all 4 configs into one task so models are evaluated on the full STAR benchmark holistically : the same approach used by OmniVideoBench (16 subsets combined). Happy to split if preferred.

I think would be better to split them into separate tasks to have more configurability, e.g. model-task specific prompts

@Samoed Samoed added new dataset Issues related to adding a new task or dataset video video extension labels May 27, 2026
@github-actions

Copy link
Copy Markdown
Contributor

This pull request has been automatically marked as stale due to inactivity.

@github-actions github-actions Bot added the stale label Jun 11, 2026
@yaswanth169

Copy link
Copy Markdown
Contributor Author

Hello @Samoed / @isaac-chung,
can you look into it

@github-actions github-actions Bot removed the stale label Jun 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new dataset Issues related to adding a new task or dataset video video extension

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants