Imagine All The Relevance:
Scenario-Profiled Indexing with Knowledge Expansion for Dense Retrieval

Sangam Lee¹, Ryang Heo¹, SeongKu Kang², Dongha Lee¹

¹Yonsei University ²Korea University

Existing retrieval methods fail to capture implicit relevance which requires intensive reasoning, as they encode document into single vector without any reasoning. In contrast, SPIKE introduces scenario, explicitly modeling how a document establishes relevance to potential information needs.

Why SPIKE?

Reasoning-Aware Retrieval

Unlike traditional dense retrievers, SPIKE explicitly models the implicit relevance between document and potential information needs, not just surface similarity.

Cross-Format Retrieval

SPIKE effectively connects query-document pairs across different formats such as code snippets, enabling semantic alignment despite format differences.

Enhanced User Experience

By providing scenario explanations, SPIKE makes results more comprehensible and useful for real-world users and LLMs.

Efficient

Using a efficient 3B scenario generator, which is applied offline to build a scenario-profiled index

SPIKE Framework

Overview of SPIKE framework. (1) SPIKE define Scenario and generate it with high-performing large LLM. (2) Then, it construct scenario-augmented training set, and use this to optimize the efficient student LLM. During inference, (3) SPIKE considers scenariolevel relevance alongside document-level relevance to retrieve the documents.

🚀 Performance Highlights

SPIKE significantly enhances retrieval performance across diverse models and domains — all with a 3B scenario generator which is applied offline.

Retrieval Accuracy Gains

+20.7% (E5-Mistral)
+18.6% (SFR)

Consistent improvements on reasoning-intensive benchmarks like BRIGHT.

Human-Preferred Results

SPIKE was consistently preferred across various criteria such as usefulness.

SPIKE enhances retrieval experience for real-world users.

Boosts RAG Performance

Enhances answer generation for LLMs such as Claude-3.5 and LLaMA3-70B.

SPIKE provides valuable additional context for LLMs in RAG.

Imagine All The Relevance:Scenario-Profiled Indexing with Knowledge Expansion for Dense Retrieval