R1: Recall@1          R5: Recall@5          R10: Recall@10         
By default, this leaderboard is sorted by R@1 score. To view other sorted results, please click on the corresponding cell.
# | Model | Params | Date | General Retrieval | Spatial Retrieval | Temporal Retrieval | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Text to Video | Video to Text | Text to Video | Video to Text | Text to Video | Video to Text | ||||||||||||||||
R1 | R5 | R10 | R1 | R5 | R10 | R1 | R5 | R10 | R1 | R5 | R10 | R1 | R5 | R10 | R1 | R5 | R10 | ||||
CaRe
Ours |
7B | 2025/3/15 | 77.0 | 95.6 | 98.7 | 79.0 | 96.8 | 99.1 | 76.8 | 96.3 | 98.7 | 78.1 | 95.8 | 99.3 | 50.7 | 85.3 | 94.4 | 53.4 | 86.3 | 94.0 | |
Qwen2-VL*
Alibaba |
7B | 2024/8/30 | 76.6 | 95.3 | 98.7 | 77.4 | 95.6 | 98.7 | 78.2 | 95.5 | 98.5 | 75.4 | 95.0 | 98.1 | 51.9 | 84.8 | 94.9 | 52.7 | 85.4 | 95.2 | |
InternVideo2stage2
Shanghai AI Lab |
1B | 2024/4/25 | 72.5 | 93.7 | 97.3 | 69.5 | 94.6 | 97.8 | 72.4 | 94.2 | 97.4 | 62.7 | 90.5 | 95.9 | 46.0 | 80.8 | 91.9 | 46.6 | 82.5 | 92.5 | |
InternVL2*
Shanghai AI Lab |
8B | 2024/7/4 | 72.1 | 92.6 | 96.8 | 73.6 | 93.4 | 97.4 | 76.8 | 94.2 | 97.7 | 75.7 | 95.2 | 98.0 | 48.1 | 76.8 | 89.0 | 47.6 | 78.2 | 90.3 | |
Tarsier*
ByteDance Research |
7B | 2024/7/4 | 71.0 | 93.8 | 97.8 | 70.6 | 94.2 | 98.0 | 70.2 | 94.0 | 98.2 | 67.4 | 93.5 | 97.4 | 50.1 | 84.1 | 92.8 | 50.0 | 84.7 | 94.9 | |
MiniCPM-V 2.6*
OpenBMB |
8B | 2024/8/6 | 71.0 | 92.2 | 97.0 | 69.3 | 92.8 | 97.1 | 71.7 | 93.6 | 98.0 | 67.6 | 92.3 | 97.7 | 50.5 | 82.9 | 92.1 | 46.1 | 80.9 | 93.3 | |
LLaVA NeXT Video*
LLaVA NeXT Team |
7B | 2024/5/10 | 66.9 | 89.4 | 96.0 | 62.7 | 89.2 | 95.4 | 68.0 | 92.0 | 96.2 | 65.0 | 90.0 | 95.9 | 43.3 | 76.9 | 88.9 | 40.1 | 75.4 | 88.7 | |
LanguageBind
Peking University |
528M | 2023/10/7 | 64.3 | 91.0 | 96.3 | 59.5 | 88.0 | 95.0 | 64.7 | 90.8 | 96.8 | 61.0 | 87.2 | 94.5 | 39.8 | 77.3 | 90.5 | 42.2 | 77.6 | 91.7 | |
Long-CLIP L/14
Shanghai AI Lab |
428M | 2024/3/22 | 62.7 | 88.8 | 95.7 | 60.3 | 88.8 | 94.9 | 65.6 | 90.9 | 96.0 | 61.0 | 88.3 | 94.4 | 33.2 | 68.8 | 81.6 | 34.5 | 71.9 | 86.6 | |
Long-CLIP B/14
Shanghai AI Lab |
150M | 2024/3/22 | 59.2 | 85.3 | 92.1 | 55.8 | 84.7 | 92.9 | 62.5 | 86.0 | 92.7 | 53.8 | 84.1 | 92.7 | 32.0 | 65.4 | 79.3 | 29.7 | 67.3 | 84.1 | |
CLIP L/14
OpenAI |
428M | 2021/2/26 | 51.2 | 83.4 | 90.6 | 54.7 | 86.9 | 93.6 | 49.0 | 81.9 | 91.4 | 55.4 | 85.6 | 93.0 | 33.5 | 70.3 | 84.0 | 39.7 | 76.2 | 87.9 | |
CLIP B/16
OpenAI |
150M | 2021/2/26 | 45.7 | 79.6 | 89.1 | 48.4 | 82.4 | 90.8 | 45.6 | 79.0 | 89.2 | 47.6 | 80.9 | 90.8 | 30.3 | 65.1 | 79.8 | 35.8 | 71.0 | 85.8 | |
InternVL2
Shanghai AI Lab |
8B | 2024/7/4 | 34.6 | 67.1 | 80.2 | 35.1 | 68.5 | 82.0 | 40.4 | 72.9 | 83.8 | 40.3 | 73.0 | 85.7 | 29.3 | 62.5 | 77.4 | 27.1 | 59.8 | 75.9 | |
Qwen2-VL
Alibaba |
7B | 2024/8/30 | 30.9 | 64.7 | 79.1 | 32.9 | 69.6 | 82.7 | 28.1 | 61.3 | 76.1 | 31.6 | 65.6 | 80.4 | 24.3 | 61.5 | 78.4 | 26.4 | 59.2 | 76.1 | |
Tarsier
ByteDance Research |
7B | 2024/7/4 | 26.8 | 64.6 | 83.5 | 32.3 | 68.0 | 84.4 | 40.5 | 74.0 | 88.1 | 41.9 | 75.0 | 87.4 | 26.8 | 64.6 | 83.5 | 32.3 | 68.0 | 84.4 | |
LLaVA NeXT Video
LLaVA NeXT Team |
7B | 2024/5/10 | 22.4 | 51.5 | 65.3 | 25.2 | 54.4 | 67.7 | 34.1 | 63.1 | 76.0 | 31.1 | 63.7 | 75.1 | 18.6 | 48.1 | 62.4 | 20.7 | 47.1 | 62.4 | |
MiniCPM-V 2.6
OpenBMB |
8B | 2024/8/6 | 8.2 | 26.9 | 38.4 | 16.7 | 39.9 | 55.8 | 6.6 | 25.2 | 35.7 | 13.3 | 38.2 | 53.5 | 11.8 | 35.8 | 52.2 | 16.6 | 47.4 | 64.4 |
Date indicates the release date of open-source models          * Contrastively trained MLLM