Skillquality 0.45

Hybrid Search Architect

Designs a hybrid retrieval pipeline combining dense vector search and BM25 sparse search with reciprocal rank fusion, and explains when to use each configuration.

Price
free
Protocol
skill
Verified
no

What it does

Hybrid Search Architect

What this skill does

This skill designs a hybrid search pipeline that combines dense vector search (semantic similarity) with BM25 sparse search (keyword matching). Hybrid search outperforms either method alone on most retrieval benchmarks because vector search handles semantic meaning while BM25 handles exact keyword matches, product names, codes, and rare terms. This skill picks the right combination and fusion strategy for your use case.

How to use

Claude Code / Cline

Copy this file to .agents/skills/hybrid-search-architect/SKILL.md in your project root.

Then ask:

  • "Use the Hybrid Search Architect to improve our RAG pipeline's retrieval."
  • "Design a hybrid search system for our product documentation."

Provide:

  • What you're searching (type of documents)
  • What queries look like (keywords, natural language, codes/IDs)
  • Your current search stack (Pinecone, Weaviate, Elasticsearch, pgvector, etc.)
  • Latency requirements

Cursor / Codex

Describe your current retrieval setup and query patterns alongside these instructions.

The Prompt / Instructions for the Agent

Step 1 — Determine if hybrid search is needed

Query patternPure vectorPure BM25Hybrid
Natural language questions
Exact product names / SKUs
Technical codes / IDs
Conceptual / semantic
Mixed (most real-world)

Use hybrid search when: queries are mixed (some keyword, some semantic), documents contain both prose and structured data, or pure vector search misses obvious keyword matches.

Step 2 — Choose a stack

Option A: Weaviate (easiest hybrid, built-in)

# pip install weaviate-client
import weaviate
from weaviate.classes.query import HybridFusion

client = weaviate.connect_to_local()
collection = client.collections.get("Documents")

results = collection.query.hybrid(
    query="payment processing error",
    fusion_type=HybridFusion.RELATIVE_SCORE,  # or RANKED
    alpha=0.5,   # 0 = pure BM25, 1 = pure vector, 0.5 = balanced
    limit=10,
    return_metadata=["score", "explain_score"]
)

Option B: Elasticsearch / OpenSearch (production-grade)

from elasticsearch import Elasticsearch

es = Elasticsearch("http://localhost:9200")

query = {
    "query": {
        "bool": {
            "should": [
                # BM25 component
                {"match": {"content": {"query": user_query, "boost": 1.0}}},
                # Dense vector component (kNN)
                {"knn": {
                    "field": "embedding",
                    "query_vector": get_embedding(user_query),
                    "num_candidates": 100,
                    "boost": 1.0
                }}
            ]
        }
    },
    "size": 10
}
results = es.search(index="documents", body=query)

Option C: pgvector + custom BM25 (for PostgreSQL users)

# pip install pgvector psycopg2
# Run both queries, then fuse results

async def hybrid_search(query: str, k: int = 10) -> list[dict]:
    embedding = await get_embedding(query)

    # Dense search
    vector_results = await db.fetch("""
        SELECT id, content, 1 - (embedding <=> $1::vector) as score
        FROM documents
        ORDER BY embedding <=> $1::vector
        LIMIT $2
    """, embedding, k * 2)

    # Sparse search (tsvector full-text search)
    bm25_results = await db.fetch("""
        SELECT id, content, ts_rank(to_tsvector('english', content), plainto_tsquery($1)) as score
        FROM documents
        WHERE to_tsvector('english', content) @@ plainto_tsquery($1)
        ORDER BY score DESC
        LIMIT $2
    """, query, k * 2)

    return reciprocal_rank_fusion(vector_results, bm25_results, k=k)

Step 3 — Implement Reciprocal Rank Fusion (RRF)

RRF is the standard way to combine results from multiple ranked lists. It's simple, effective, and doesn't require tuning score scales:

def reciprocal_rank_fusion(
    *ranked_lists: list[dict],
    k: int = 60,
    top_n: int = 10
) -> list[dict]:
    """
    Combine multiple ranked result lists using Reciprocal Rank Fusion.
    k=60 is the standard constant (from the original RRF paper).
    """
    scores: dict[str, float] = {}
    docs: dict[str, dict] = {}

    for ranked_list in ranked_lists:
        for rank, doc in enumerate(ranked_list, start=1):
            doc_id = doc["id"]
            scores[doc_id] = scores.get(doc_id, 0) + 1 / (k + rank)
            docs[doc_id] = doc

    sorted_ids = sorted(scores, key=lambda x: scores[x], reverse=True)
    return [docs[doc_id] for doc_id in sorted_ids[:top_n]]

Step 4 — Tune the alpha parameter

If your backend supports an alpha/weight parameter:

Use caseAlpha (vector weight)
Technical docs with many exact terms0.3
General knowledge / FAQ0.5
Semantic / conceptual search0.7
Code search0.4
Mixed content (default)0.5

Test with your actual query distribution — sample 50 real queries and compare precision at k=5 across alpha values.

Step 5 — Add a reranker (optional but high-impact)

After hybrid retrieval, a cross-encoder reranker re-scores the top results with full query-document attention. This is the single highest-impact quality improvement after hybrid search:

# pip install sentence-transformers
from sentence_transformers import CrossEncoder

reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")

def rerank(query: str, results: list[dict], top_n: int = 5) -> list[dict]:
    """Re-rank results using a cross-encoder. Retrieve wide, rerank narrow."""
    pairs = [(query, r["content"]) for r in results]
    scores = reranker.predict(pairs)

    ranked = sorted(zip(results, scores), key=lambda x: x[1], reverse=True)
    return [doc for doc, _ in ranked[:top_n]]

# Usage: retrieve top 20 via hybrid, rerank to top 5
candidates = hybrid_search(query, k=20)
final_results = rerank(query, candidates, top_n=5)

Step 6 — Evaluation

Measure retrieval quality with Recall@K and Mean Reciprocal Rank:

def recall_at_k(relevant_ids: set, retrieved_ids: list[str], k: int) -> float:
    return len(relevant_ids & set(retrieved_ids[:k])) / len(relevant_ids)

def mrr(relevant_ids: set, retrieved_ids: list[str]) -> float:
    for rank, doc_id in enumerate(retrieved_ids, 1):
        if doc_id in relevant_ids:
            return 1 / rank
    return 0.0

Benchmark: run 50 labeled queries, compare Recall@5 for pure vector vs. pure BM25 vs. hybrid. Hybrid should outperform both by 10–25% on mixed query sets.

Capabilities

skillsource-notysotyskill-hybrid-search-architecttopic-agent-skillstopic-claudetopic-claude-codetopic-claude-skillstopic-clinetopic-cursortopic-llmtopic-llm-skillstopic-skills

Install

Quality

0.45/ 1.00

deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 8 github stars · SKILL.md body (6,651 chars)

Provenance

Indexed fromgithub
Enriched2026-05-18 19:13:22Z · deterministic:skill-github:v1 · v1
First seen2026-05-18
Last seen2026-05-18

Agent access