{"id":"ab6691f2-069c-48ca-bb4f-6f2620eb3817","shortId":"AB3zCD","kind":"skill","title":"embedding-strategies","tagline":"Guide to selecting and optimizing embedding models for vector search applications.","description":"# Embedding Strategies\n\nGuide to selecting and optimizing embedding models for vector search applications.\n\n## Do not use this skill when\n\n- The task is unrelated to embedding strategies\n- You need a different domain or tool outside this scope\n\n## Instructions\n\n- Clarify goals, constraints, and required inputs.\n- Apply relevant best practices and validate outcomes.\n- Provide actionable steps and verification.\n- If detailed examples are required, open `resources/implementation-playbook.md`.\n\n## Use this skill when\n\n- Choosing embedding models for RAG\n- Optimizing chunking strategies\n- Fine-tuning embeddings for domains\n- Comparing embedding model performance\n- Reducing embedding dimensions\n- Handling multilingual content\n\n## Core Concepts\n\n### 1. Embedding Model Comparison\n\n| Model | Dimensions | Max Tokens | Best For |\n|-------|------------|------------|----------|\n| **text-embedding-3-large** | 3072 | 8191 | High accuracy |\n| **text-embedding-3-small** | 1536 | 8191 | Cost-effective |\n| **voyage-2** | 1024 | 4000 | Code, legal |\n| **bge-large-en-v1.5** | 1024 | 512 | Open source |\n| **all-MiniLM-L6-v2** | 384 | 256 | Fast, lightweight |\n| **multilingual-e5-large** | 1024 | 512 | Multi-language |\n\n### 2. Embedding Pipeline\n\n```\nDocument → Chunking → Preprocessing → Embedding Model → Vector\n                ↓\n        [Overlap, Size]  [Clean, Normalize]  [API/Local]\n```\n\n## Templates\n\n### Template 1: OpenAI Embeddings\n\n```python\nfrom openai import OpenAI\nfrom typing import List\nimport numpy as np\n\nclient = OpenAI()\n\ndef get_embeddings(\n    texts: List[str],\n    model: str = \"text-embedding-3-small\",\n    dimensions: int = None\n) -> List[List[float]]:\n    \"\"\"Get embeddings from OpenAI.\"\"\"\n    # Handle batching for large lists\n    batch_size = 100\n    all_embeddings = []\n\n    for i in range(0, len(texts), batch_size):\n        batch = texts[i:i + batch_size]\n\n        kwargs = {\"input\": batch, \"model\": model}\n        if dimensions:\n            kwargs[\"dimensions\"] = dimensions\n\n        response = client.embeddings.create(**kwargs)\n        embeddings = [item.embedding for item in response.data]\n        all_embeddings.extend(embeddings)\n\n    return all_embeddings\n\n\ndef get_embedding(text: str, **kwargs) -> List[float]:\n    \"\"\"Get single embedding.\"\"\"\n    return get_embeddings([text], **kwargs)[0]\n\n\n# Dimension reduction with OpenAI\ndef get_reduced_embedding(text: str, dimensions: int = 512) -> List[float]:\n    \"\"\"Get embedding with reduced dimensions (Matryoshka).\"\"\"\n    return get_embedding(\n        text,\n        model=\"text-embedding-3-small\",\n        dimensions=dimensions\n    )\n```\n\n### Template 2: Local Embeddings with Sentence Transformers\n\n```python\nfrom sentence_transformers import SentenceTransformer\nfrom typing import List, Optional\nimport numpy as np\n\nclass LocalEmbedder:\n    \"\"\"Local embedding with sentence-transformers.\"\"\"\n\n    def __init__(\n        self,\n        model_name: str = \"BAAI/bge-large-en-v1.5\",\n        device: str = \"cuda\"\n    ):\n        self.model = SentenceTransformer(model_name, device=device)\n\n    def embed(\n        self,\n        texts: List[str],\n        normalize: bool = True,\n        show_progress: bool = False\n    ) -> np.ndarray:\n        \"\"\"Embed texts with optional normalization.\"\"\"\n        embeddings = self.model.encode(\n            texts,\n            normalize_embeddings=normalize,\n            show_progress_bar=show_progress,\n            convert_to_numpy=True\n        )\n        return embeddings\n\n    def embed_query(self, query: str) -> np.ndarray:\n        \"\"\"Embed a query with BGE-style prefix.\"\"\"\n        # BGE models benefit from query prefix\n        if \"bge\" in self.model.get_sentence_embedding_dimension():\n            query = f\"Represent this sentence for searching relevant passages: {query}\"\n        return self.embed([query])[0]\n\n    def embed_documents(self, documents: List[str]) -> np.ndarray:\n        \"\"\"Embed documents for indexing.\"\"\"\n        return self.embed(documents)\n\n\n# E5 model with instructions\nclass E5Embedder:\n    def __init__(self, model_name: str = \"intfloat/multilingual-e5-large\"):\n        self.model = SentenceTransformer(model_name)\n\n    def embed_query(self, query: str) -> np.ndarray:\n        return self.model.encode(f\"query: {query}\")\n\n    def embed_document(self, document: str) -> np.ndarray:\n        return self.model.encode(f\"passage: {document}\")\n```\n\n### Template 3: Chunking Strategies\n\n```python\nfrom typing import List, Tuple\nimport re\n\ndef chunk_by_tokens(\n    text: str,\n    chunk_size: int = 512,\n    chunk_overlap: int = 50,\n    tokenizer=None\n) -> List[str]:\n    \"\"\"Chunk text by token count.\"\"\"\n    import tiktoken\n    tokenizer = tokenizer or tiktoken.get_encoding(\"cl100k_base\")\n\n    tokens = tokenizer.encode(text)\n    chunks = []\n\n    start = 0\n    while start < len(tokens):\n        end = start + chunk_size\n        chunk_tokens = tokens[start:end]\n        chunk_text = tokenizer.decode(chunk_tokens)\n        chunks.append(chunk_text)\n        start = end - chunk_overlap\n\n    return chunks\n\n\ndef chunk_by_sentences(\n    text: str,\n    max_chunk_size: int = 1000,\n    min_chunk_size: int = 100\n) -> List[str]:\n    \"\"\"Chunk text by sentences, respecting size limits.\"\"\"\n    import nltk\n    sentences = nltk.sent_tokenize(text)\n\n    chunks = []\n    current_chunk = []\n    current_size = 0\n\n    for sentence in sentences:\n        sentence_size = len(sentence)\n\n        if current_size + sentence_size > max_chunk_size and current_chunk:\n            chunks.append(\" \".join(current_chunk))\n            current_chunk = []\n            current_size = 0\n\n        current_chunk.append(sentence)\n        current_size += sentence_size\n\n    if current_chunk:\n        chunks.append(\" \".join(current_chunk))\n\n    return chunks\n\n\ndef chunk_by_semantic_sections(\n    text: str,\n    headers_pattern: str = r'^#{1,3}\\s+.+$'\n) -> List[Tuple[str, str]]:\n    \"\"\"Chunk markdown by headers, preserving hierarchy.\"\"\"\n    lines = text.split('\\n')\n    chunks = []\n    current_header = \"\"\n    current_content = []\n\n    for line in lines:\n        if re.match(headers_pattern, line, re.MULTILINE):\n            if current_content:\n                chunks.append((current_header, '\\n'.join(current_content)))\n            current_header = line\n            current_content = []\n        else:\n            current_content.append(line)\n\n    if current_content:\n        chunks.append((current_header, '\\n'.join(current_content)))\n\n    return chunks\n\n\ndef recursive_character_splitter(\n    text: str,\n    chunk_size: int = 1000,\n    chunk_overlap: int = 200,\n    separators: List[str] = None\n) -> List[str]:\n    \"\"\"LangChain-style recursive splitter.\"\"\"\n    separators = separators or [\"\\n\\n\", \"\\n\", \". \", \" \", \"\"]\n\n    def split_text(text: str, separators: List[str]) -> List[str]:\n        if not text:\n            return []\n\n        separator = separators[0]\n        remaining_separators = separators[1:]\n\n        if separator == \"\":\n            # Character-level split\n            return [text[i:i+chunk_size] for i in range(0, len(text), chunk_size - chunk_overlap)]\n\n        splits = text.split(separator)\n        chunks = []\n        current_chunk = []\n        current_length = 0\n\n        for split in splits:\n            split_length = len(split) + len(separator)\n\n            if current_length + split_length > chunk_size and current_chunk:\n                chunk_text = separator.join(current_chunk)\n\n                # Recursively split if still too large\n                if len(chunk_text) > chunk_size and remaining_separators:\n                    chunks.extend(split_text(chunk_text, remaining_separators))\n                else:\n                    chunks.append(chunk_text)\n\n                # Start new chunk with overlap\n                overlap_splits = []\n                overlap_length = 0\n                for s in reversed(current_chunk):\n                    if overlap_length + len(s) <= chunk_overlap:\n                        overlap_splits.insert(0, s)\n                        overlap_length += len(s)\n                    else:\n                        break\n                current_chunk = overlap_splits\n                current_length = overlap_length\n\n            current_chunk.append(split)\n            current_length += split_length\n\n        if current_chunk:\n            chunks.append(separator.join(current_chunk))\n\n        return chunks\n\n    return split_text(text, separators)\n```\n\n### Template 4: Domain-Specific Embedding Pipeline\n\n```python\nclass DomainEmbeddingPipeline:\n    \"\"\"Pipeline for domain-specific embeddings.\"\"\"\n\n    def __init__(\n        self,\n        embedding_model: str = \"text-embedding-3-small\",\n        chunk_size: int = 512,\n        chunk_overlap: int = 50,\n        preprocessing_fn=None\n    ):\n        self.embedding_model = embedding_model\n        self.chunk_size = chunk_size\n        self.chunk_overlap = chunk_overlap\n        self.preprocess = preprocessing_fn or self._default_preprocess\n\n    def _default_preprocess(self, text: str) -> str:\n        \"\"\"Default preprocessing.\"\"\"\n        # Remove excessive whitespace\n        text = re.sub(r'\\s+', ' ', text)\n        # Remove special characters\n        text = re.sub(r'[^\\w\\s.,!?-]', '', text)\n        return text.strip()\n\n    async def process_documents(\n        self,\n        documents: List[dict],\n        id_field: str = \"id\",\n        content_field: str = \"content\",\n        metadata_fields: List[str] = None\n    ) -> List[dict]:\n        \"\"\"Process documents for vector storage.\"\"\"\n        processed = []\n\n        for doc in documents:\n            content = doc[content_field]\n            doc_id = doc[id_field]\n\n            # Preprocess\n            cleaned = self.preprocess(content)\n\n            # Chunk\n            chunks = chunk_by_tokens(\n                cleaned,\n                self.chunk_size,\n                self.chunk_overlap\n            )\n\n            # Create embeddings\n            embeddings = get_embeddings(chunks, self.embedding_model)\n\n            # Create records\n            for i, (chunk, embedding) in enumerate(zip(chunks, embeddings)):\n                record = {\n                    \"id\": f\"{doc_id}_chunk_{i}\",\n                    \"document_id\": doc_id,\n                    \"chunk_index\": i,\n                    \"text\": chunk,\n                    \"embedding\": embedding\n                }\n\n                # Add metadata\n                if metadata_fields:\n                    for field in metadata_fields:\n                        if field in doc:\n                            record[field] = doc[field]\n\n                processed.append(record)\n\n        return processed\n\n\n# Code-specific pipeline\nclass CodeEmbeddingPipeline:\n    \"\"\"Specialized pipeline for code embeddings.\"\"\"\n\n    def __init__(self, model: str = \"voyage-code-2\"):\n        self.model = model\n\n    def chunk_code(self, code: str, language: str) -> List[dict]:\n        \"\"\"Chunk code by functions/classes.\"\"\"\n        import tree_sitter\n\n        # Parse with tree-sitter\n        # Extract functions, classes, methods\n        # Return chunks with context\n        pass\n\n    def embed_with_context(self, chunk: str, context: str) -> List[float]:\n        \"\"\"Embed code with surrounding context.\"\"\"\n        combined = f\"Context: {context}\\n\\nCode:\\n{chunk}\"\n        return get_embedding(combined, model=self.model)\n```\n\n### Template 5: Embedding Quality Evaluation\n\n```python\nimport numpy as np\nfrom typing import List, Tuple\n\ndef evaluate_retrieval_quality(\n    queries: List[str],\n    relevant_docs: List[List[str]],  # List of relevant doc IDs per query\n    retrieved_docs: List[List[str]],  # List of retrieved doc IDs per query\n    k: int = 10\n) -> dict:\n    \"\"\"Evaluate embedding quality for retrieval.\"\"\"\n\n    def precision_at_k(relevant: set, retrieved: List[str], k: int) -> float:\n        retrieved_k = retrieved[:k]\n        relevant_retrieved = len(set(retrieved_k) & relevant)\n        return relevant_retrieved / k\n\n    def recall_at_k(relevant: set, retrieved: List[str], k: int) -> float:\n        retrieved_k = retrieved[:k]\n        relevant_retrieved = len(set(retrieved_k) & relevant)\n        return relevant_retrieved / len(relevant) if relevant else 0\n\n    def mrr(relevant: set, retrieved: List[str]) -> float:\n        for i, doc in enumerate(retrieved):\n            if doc in relevant:\n                return 1 / (i + 1)\n        return 0\n\n    def ndcg_at_k(relevant: set, retrieved: List[str], k: int) -> float:\n        dcg = sum(\n            1 / np.log2(i + 2) if doc in relevant else 0\n            for i, doc in enumerate(retrieved[:k])\n        )\n        ideal_dcg = sum(1 / np.log2(i + 2) for i in range(min(len(relevant), k)))\n        return dcg / ideal_dcg if ideal_dcg > 0 else 0\n\n    metrics = {\n        f\"precision@{k}\": [],\n        f\"recall@{k}\": [],\n        \"mrr\": [],\n        f\"ndcg@{k}\": []\n    }\n\n    for relevant, retrieved in zip(relevant_docs, retrieved_docs):\n        relevant_set = set(relevant)\n        metrics[f\"precision@{k}\"].append(precision_at_k(relevant_set, retrieved, k))\n        metrics[f\"recall@{k}\"].append(recall_at_k(relevant_set, retrieved, k))\n        metrics[\"mrr\"].append(mrr(relevant_set, retrieved))\n        metrics[f\"ndcg@{k}\"].append(ndcg_at_k(relevant_set, retrieved, k))\n\n    return {name: np.mean(values) for name, values in metrics.items()}\n\n\ndef compute_embedding_similarity(\n    embeddings1: np.ndarray,\n    embeddings2: np.ndarray,\n    metric: str = \"cosine\"\n) -> np.ndarray:\n    \"\"\"Compute similarity matrix between embedding sets.\"\"\"\n    if metric == \"cosine\":\n        # Normalize\n        norm1 = embeddings1 / np.linalg.norm(embeddings1, axis=1, keepdims=True)\n        norm2 = embeddings2 / np.linalg.norm(embeddings2, axis=1, keepdims=True)\n        return norm1 @ norm2.T\n    elif metric == \"euclidean\":\n        from scipy.spatial.distance import cdist\n        return -cdist(embeddings1, embeddings2, metric='euclidean')\n    elif metric == \"dot\":\n        return embeddings1 @ embeddings2.T\n```\n\n## Best Practices\n\n### Do's\n- **Match model to use case** - Code vs prose vs multilingual\n- **Chunk thoughtfully** - Preserve semantic boundaries\n- **Normalize embeddings** - For cosine similarity\n- **Batch requests** - More efficient than one-by-one\n- **Cache embeddings** - Avoid recomputing\n\n### Don'ts\n- **Don't ignore token limits** - Truncation loses info\n- **Don't mix embedding models** - Incompatible spaces\n- **Don't skip preprocessing** - Garbage in, garbage out\n- **Don't over-chunk** - Lose context\n\n## Resources\n\n- [OpenAI Embeddings](https://platform.openai.com/docs/guides/embeddings)\n- [Sentence Transformers](https://www.sbert.net/)\n- [MTEB Benchmark](https://huggingface.co/spaces/mteb/leaderboard)\n\n## Limitations\n- Use this skill only when the task clearly matches the scope described above.\n- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.\n- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.","tags":["embedding","strategies","antigravity","awesome","skills","sickn33","agent-skills","agentic-skills","ai-agent-skills","ai-agents","ai-coding","ai-workflows"],"capabilities":["skill","source-sickn33","skill-embedding-strategies","topic-agent-skills","topic-agentic-skills","topic-ai-agent-skills","topic-ai-agents","topic-ai-coding","topic-ai-workflows","topic-antigravity","topic-antigravity-skills","topic-claude-code","topic-claude-code-skills","topic-codex-cli","topic-codex-skills"],"categories":["antigravity-awesome-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/sickn33/antigravity-awesome-skills/embedding-strategies","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add sickn33/antigravity-awesome-skills","source_repo":"https://github.com/sickn33/antigravity-awesome-skills","install_from":"skills.sh"}},"qualityScore":"0.700","qualityRationale":"deterministic score 0.70 from registry signals: · indexed on github topic:agent-skills · 34831 github stars · SKILL.md body (14,999 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-04-24T06:51:07.318Z","embedding":null,"createdAt":"2026-04-18T21:36:32.165Z","updatedAt":"2026-04-24T06:51:07.318Z","lastSeenAt":"2026-04-24T06:51:07.318Z","tsv":"'-2':137 '/)':1620 '/docs/guides/embeddings)':1615 '/spaces/mteb/leaderboard)':1625 '0':241,292,449,555,619,647,782,803,818,879,894,1324,1348,1372,1402,1404 '1':107,186,674,786,1344,1346,1363,1383,1508,1516 '10':1259 '100':234,598 '1000':593,744 '1024':138,148,165 '1536':131 '2':170,327,1147,1366,1386 '200':748 '256':158 '3':120,129,215,322,507,675,955 '3072':122 '384':157 '4':931 '4000':139 '5':147,1212 '50':531,964 '512':149,166,305,527,960 '8191':123,132 'accuraci':125 'action':66 'add':1106 'all-minilm-l6-v2':152 'all_embeddings.extend':271 'api/local':183 'append':1433,1445,1455,1464 'appli':58 'applic':14,27 'ask':1659 'async':1013 'avoid':1576 'axi':1507,1515 'baai/bge-large-en-v1.5':362 'bar':399 'base':549 'batch':228,232,244,246,250,254,1565 'benchmark':1622 'benefit':425 'best':60,115,1541 'bge':143,420,423,430 'bge-large-en-v1':142 'bge-styl':419 'bool':379,383 'boundari':1559,1667 'break':901 'cach':1574 'case':1549 'cdist':1528,1530 'charact':737,790,1004 'character-level':789 'choos':81 'chunk':87,174,508,519,524,528,536,553,562,564,569,572,575,579,582,584,590,595,601,614,616,634,638,642,644,656,660,662,664,681,690,734,741,745,797,806,808,813,815,834,838,839,843,852,854,862,868,872,885,891,903,918,922,924,957,961,974,978,1059,1060,1061,1074,1081,1086,1093,1099,1103,1151,1160,1177,1186,1204,1555,1607 'chunks.append':574,639,657,708,726,867,919 'chunks.extend':859 'cl100k':548 'clarif':1661 'clarifi':52 'class':348,469,938,1132,1174 'clean':181,1056,1064 'clear':1634 'client':202 'client.embeddings.create':263 'code':140,1129,1137,1146,1152,1154,1161,1193,1550 'code-specif':1128 'codeembeddingpipelin':1133 'combin':1197,1208 'compar':95 'comparison':110 'comput':1482,1493 'concept':106 'constraint':54 'content':104,694,707,714,719,725,732,1025,1028,1046,1048,1058 'context':1179,1184,1188,1196,1199,1200,1609 'convert':402 'core':105 'cosin':1491,1501,1563 'cost':134 'cost-effect':133 'count':540 'creat':1069,1077 'criteria':1670 'cuda':365 'current':615,617,629,637,641,643,645,650,655,659,691,693,706,709,713,715,718,724,727,731,814,816,830,837,842,884,902,906,912,917,921 'current_chunk.append':648,910 'current_content.append':721 'dcg':1361,1381,1396,1398,1401 'def':204,276,297,356,372,408,450,471,482,494,518,583,663,735,766,946,985,1014,1139,1150,1181,1226,1266,1293,1325,1349,1481 'default':986,992 'describ':1638 'detail':71 'devic':363,370,371 'dict':1020,1035,1159,1260 'differ':44 'dimens':101,112,217,258,260,261,293,303,312,324,325,435 'doc':1043,1047,1050,1052,1091,1097,1119,1122,1234,1241,1246,1253,1335,1340,1368,1375,1422,1424 'document':173,452,454,459,464,496,498,505,1016,1018,1037,1045,1095 'domain':45,94,933,943 'domain-specif':932,942 'domainembeddingpipelin':939 'dot':1537 'e5':163,465 'e5embedder':470 'effect':135 'effici':1568 'elif':1522,1535 'els':720,866,900,1323,1371,1403 'emb':373,386,409,415,451,458,483,495,1182,1192 'embed':2,9,15,22,39,82,92,96,100,108,119,128,171,176,188,206,214,224,236,265,272,275,278,286,289,300,309,316,321,329,351,391,395,407,434,935,945,949,954,970,1070,1071,1073,1082,1087,1104,1105,1138,1207,1213,1262,1483,1497,1561,1575,1591,1612 'embedding-strategi':1 'embeddings1':1485,1504,1506,1531,1539 'embeddings2':1487,1512,1514,1532 'embeddings2.t':1540 'en':145 'encod':547 'end':560,568,578 'enumer':1084,1337,1377 'environ':1650 'environment-specif':1649 'euclidean':1524,1534 'evalu':1215,1227,1261 'exampl':72 'excess':995 'expert':1655 'extract':1172 'f':437,491,503,1090,1198,1406,1409,1413,1430,1442,1461 'fals':384 'fast':159 'field':1022,1026,1030,1049,1054,1110,1112,1115,1117,1121,1123 'fine':90 'fine-tun':89 'float':222,283,307,1191,1277,1304,1332,1360 'fn':966,982 'function':1173 'functions/classes':1163 'garbag':1599,1601 'get':205,223,277,284,288,298,308,315,1072,1206 'goal':53 'guid':4,17 'handl':102,227 'header':670,684,692,701,710,716,728 'hierarchi':686 'high':124 'huggingface.co':1624 'huggingface.co/spaces/mteb/leaderboard)':1623 'id':1021,1024,1051,1053,1089,1092,1096,1098,1242,1254 'ideal':1380,1397,1400 'ignor':1582 'import':192,196,198,337,341,344,513,516,541,608,1164,1217,1223,1527 'incompat':1593 'index':461,1100 'info':1587 'init':357,472,947,1140 'input':57,253,1664 'instruct':51,468 'int':218,304,526,530,592,597,743,747,959,963,1258,1276,1303,1359 'intfloat/multilingual-e5-large':477 'item':268 'item.embedding':266 'join':640,658,712,730 'k':1257,1269,1275,1279,1281,1287,1292,1296,1302,1306,1308,1314,1352,1358,1379,1394,1408,1411,1415,1432,1436,1440,1444,1448,1452,1463,1467,1471 'keepdim':1509,1517 'kwarg':252,259,264,281,291 'l6':155 'langchain':756 'langchain-styl':755 'languag':169,1156 'larg':121,144,164,230,849 'legal':141 'len':242,558,626,804,825,827,851,889,898,1284,1311,1319,1392 'length':817,824,831,833,878,888,897,907,909,913,915 'level':791 'lightweight':160 'limit':607,1584,1626 'line':687,696,698,703,717,722 'list':197,208,220,221,231,282,306,342,376,455,514,534,599,677,750,753,772,774,1019,1031,1034,1158,1190,1224,1231,1235,1236,1238,1247,1248,1250,1273,1300,1330,1356 'local':328,350 'localembedd':349 'lose':1586,1608 'markdown':682 'match':1545,1635 'matrix':1495 'matryoshka':313 'max':113,589,633 'metadata':1029,1107,1109,1114 'method':1175 'metric':1405,1429,1441,1453,1460,1489,1500,1523,1533,1536 'metrics.items':1480 'min':594,1391 'minilm':154 'miss':1672 'mix':1590 'model':10,23,83,97,109,111,177,210,255,256,318,359,368,424,466,474,480,950,969,971,1076,1142,1149,1209,1546,1592 'mrr':1326,1412,1454,1456 'mteb':1621 'multi':168 'multi-languag':167 'multilingu':103,162,1554 'multilingual-e5-large':161 'n':689,711,729,763,764,765,1201,1203 'name':360,369,475,481,1473,1477 'ncode':1202 'ndcg':1350,1414,1462,1465 'need':42 'new':871 'nltk':609 'nltk.sent':611 'none':219,533,752,967,1033 'norm1':1503,1520 'norm2':1511 'norm2.t':1521 'normal':182,378,390,394,396,1502,1560 'np':201,347,1220 'np.linalg.norm':1505,1513 'np.log2':1364,1384 'np.mean':1474 'np.ndarray':385,414,457,488,500,1486,1488,1492 'numpi':199,345,404,1218 'one':1571,1573 'one-by-on':1570 'open':75,150 'openai':187,191,193,203,226,296,1611 'optim':8,21,86 'option':343,389 'outcom':64 'output':1644 'outsid':48 'over-chunk':1605 'overlap':179,529,580,746,809,874,875,877,887,892,896,904,908,962,977,979,1068 'overlap_splits.insert':893 'pars':1167 'pass':1180 'passag':444,504 'pattern':671,702 'per':1243,1255 'perform':98 'permiss':1665 'pipelin':172,936,940,1131,1135 'platform.openai.com':1614 'platform.openai.com/docs/guides/embeddings)':1613 'practic':61,1542 'precis':1267,1407,1431,1434 'prefix':422,428 'preprocess':175,965,981,987,993,1055,1598 'preserv':685,1557 'process':1015,1036,1041,1127 'processed.append':1124 'progress':382,398,401 'prose':1552 'provid':65 'python':189,333,510,937,1216 'qualiti':1214,1229,1263 'queri':410,412,417,427,436,445,448,484,486,492,493,1230,1244,1256 'r':673,999,1007 'rag':85 'rang':240,802,1390 're':517 're.match':700 're.multiline':704 're.sub':998,1006 'recal':1294,1410,1443,1446 'recomput':1577 'record':1078,1088,1120,1125 'recurs':736,758,844 'reduc':99,299,311 'reduct':294 'relev':59,443,1233,1240,1270,1282,1288,1290,1297,1309,1315,1317,1320,1322,1327,1342,1353,1370,1393,1417,1421,1425,1428,1437,1449,1457,1468 'remain':783,857,864 'remov':994,1002 'repres':438 'request':1566 'requir':56,74,1663 'resourc':1610 'resources/implementation-playbook.md':76 'respect':605 'respons':262 'response.data':270 'retriev':1228,1245,1252,1265,1272,1278,1280,1283,1286,1291,1299,1305,1307,1310,1313,1318,1329,1338,1355,1378,1418,1423,1439,1451,1459,1470 'return':273,287,314,406,446,462,489,501,581,661,733,779,793,923,925,1011,1126,1176,1205,1289,1316,1343,1347,1395,1472,1519,1529,1538 'revers':883 'review':1656 'safeti':1666 'scipy.spatial.distance':1526 'scope':50,1637 'search':13,26,442 'section':667 'select':6,19 'self':358,374,411,453,473,485,497,948,988,1017,1141,1153,1185 'self._default_preprocess':984 'self.chunk':972,976,1065,1067 'self.embed':447,463 'self.embedding':968,1075 'self.model':366,478,1148,1210 'self.model.encode':392,490,502 'self.model.get':432 'self.preprocess':980,1057 'semant':666,1558 'sentenc':331,335,354,433,440,586,604,610,621,623,624,627,631,649,652,1616 'sentence-transform':353 'sentencetransform':338,367,479 'separ':749,760,761,771,780,781,784,785,788,812,828,858,865,929 'separator.join':841,920 'set':1271,1285,1298,1312,1328,1354,1426,1427,1438,1450,1458,1469,1498 'show':381,397,400 'similar':1484,1494,1564 'singl':285 'sitter':1166,1171 'size':180,233,245,251,525,563,591,596,606,618,625,630,632,635,646,651,653,742,798,807,835,855,958,973,975,1066 'skill':32,79,1629 'skill-embedding-strategies' 'skip':1597 'small':130,216,323,956 'sourc':151 'source-sickn33' 'space':1594 'special':1003,1134 'specif':934,944,1130,1651 'split':767,792,810,820,822,823,826,832,845,860,876,905,911,914,926 'splitter':738,759 'start':554,557,561,567,577,870 'step':67 'still':847 'stop':1657 'storag':1040 'str':209,211,280,302,361,364,377,413,456,476,487,499,523,535,588,600,669,672,679,680,740,751,754,770,773,775,951,990,991,1023,1027,1032,1143,1155,1157,1187,1189,1232,1237,1249,1274,1301,1331,1357,1490 'strategi':3,16,40,88,509 'style':421,757 'substitut':1647 'success':1669 'sum':1362,1382 'surround':1195 'task':35,1633 'templat':184,185,326,506,930,1211 'test':1653 'text':118,127,207,213,243,247,279,290,301,317,320,375,387,393,522,537,552,570,576,587,602,613,668,739,768,769,778,794,805,840,853,861,863,869,927,928,953,989,997,1001,1005,1010,1102 'text-embed':117,126,212,319,952 'text.split':688,811 'text.strip':1012 'thought':1556 'tiktoken':542 'tiktoken.get':546 'token':114,521,532,539,543,544,550,559,565,566,573,612,1063,1583 'tokenizer.decode':571 'tokenizer.encode':551 'tool':47 'topic-agent-skills' 'topic-agentic-skills' 'topic-ai-agent-skills' 'topic-ai-agents' 'topic-ai-coding' 'topic-ai-workflows' 'topic-antigravity' 'topic-antigravity-skills' 'topic-claude-code' 'topic-claude-code-skills' 'topic-codex-cli' 'topic-codex-skills' 'transform':332,336,355,1617 'treat':1642 'tree':1165,1170 'tree-sitt':1169 'true':380,405,1510,1518 'truncat':1585 'ts':1579 'tune':91 'tupl':515,678,1225 'type':195,340,512,1222 'unrel':37 'use':30,77,1548,1627 'v1':146 'v2':156 'valid':63,1652 'valu':1475,1478 'vector':12,25,178,1039 'verif':69 'voyag':136,1145 'voyage-cod':1144 'vs':1551,1553 'w':1008 'whitespac':996 'www.sbert.net':1619 'www.sbert.net/)':1618 'zip':1085,1420","prices":[{"id":"5628ba7c-385b-439a-8649-9a97997d7983","listingId":"ab6691f2-069c-48ca-bb4f-6f2620eb3817","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"sickn33","category":"antigravity-awesome-skills","install_from":"skills.sh"},"createdAt":"2026-04-18T21:36:32.165Z"}],"sources":[{"listingId":"ab6691f2-069c-48ca-bb4f-6f2620eb3817","source":"github","sourceId":"sickn33/antigravity-awesome-skills/embedding-strategies","sourceUrl":"https://github.com/sickn33/antigravity-awesome-skills/tree/main/skills/embedding-strategies","isPrimary":false,"firstSeenAt":"2026-04-18T21:36:32.165Z","lastSeenAt":"2026-04-24T06:51:07.318Z"}],"details":{"listingId":"ab6691f2-069c-48ca-bb4f-6f2620eb3817","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"sickn33","slug":"embedding-strategies","github":{"repo":"sickn33/antigravity-awesome-skills","stars":34831,"topics":["agent-skills","agentic-skills","ai-agent-skills","ai-agents","ai-coding","ai-workflows","antigravity","antigravity-skills","claude-code","claude-code-skills","codex-cli","codex-skills","cursor","cursor-skills","developer-tools","gemini-cli","gemini-skills","kiro","mcp","skill-library"],"license":"mit","html_url":"https://github.com/sickn33/antigravity-awesome-skills","pushed_at":"2026-04-24T06:41:17Z","description":"Installable GitHub library of 1,400+ agentic skills for Claude Code, Cursor, Codex CLI, Gemini CLI, Antigravity, and more. Includes installer CLI, bundles, workflows, and official/community skill collections.","skill_md_sha":"f33c09f3d9a4a4d0c1254e00ecf2d7ee226e2ac4","skill_md_path":"skills/embedding-strategies/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/sickn33/antigravity-awesome-skills/tree/main/skills/embedding-strategies"},"layout":"multi","source":"github","category":"antigravity-awesome-skills","frontmatter":{"name":"embedding-strategies","description":"Guide to selecting and optimizing embedding models for vector search applications."},"skills_sh_url":"https://skills.sh/sickn33/antigravity-awesome-skills/embedding-strategies"},"updatedAt":"2026-04-24T06:51:07.318Z"}}