{"id":"7c1b6777-fc67-48f6-96ad-31016b2e5134","shortId":"CBwfJN","kind":"skill","title":"pinecone-full-text-search","tagline":"Create, ingest into, and query a Pinecone full-text-search (FTS) index using the preview API (2026-01.alpha, public preview). Use when the user or agent asks to build a text search index on Pinecone, add dense or sparse vector fields, ingest documents, construct score_by clauses ","description":"# pinecone-full-text-search\n\n> **Requires `pinecone` Python SDK ≥ 9.0** (`pip install pinecone>=9.0`). The FTS document-schema API lives under `pinecone.preview` and is incomplete or absent in earlier SDK builds. The packaged helper scripts pin `pinecone==9.0.0` via PEP 723 inline metadata; if you're writing your own code against this skill, pin v9 explicitly. The wire API version is `2026-01.alpha`.\n\n> **Authoritative reference (last resort).** If you hit a question this skill and its `references/*.md` files don't answer, the official Pinecone FTS docs are at <https://docs.pinecone.io/guides/search/full-text-search>. Prefer this skill's content for anything covered here — the docs may describe surfaces (e.g. classic vector API) that don't apply to the document-schema FTS path. Consult the link only when you're genuinely stuck.\n\n> **Tell the user up front:** \"This skill ships a helper at `scripts/ingest.py` that handles bulk ingestion safely (batched upsert, error inspection, readiness polling). When we get to the ingest step, I'll use it.\" Surface this at the start of the conversation so the user knows the helper exists. Query construction is hand-written `documents.search(...)` per the **Querying** section below — there is no query helper.\n\nA workflow skill for building a Pinecone full-text-search index with the preview API (`pinecone.preview`, API version `2026-01.alpha`, public preview as of April 2026). Covers schema design (text, dense vector, sparse vector, filterable metadata), ingestion (including async indexing and polling), and query construction (`text` / `query_string` / `dense_vector` / `sparse_vector` scoring; `$match_phrase` / `$match_all` / `$match_any` text-match filters; `$eq` / `$in` / `$gte` / `$exists` / `$and` / `$or` / `$not` metadata filters).\n\n## Scope — this skill is for the document-schema FTS API only\n\nThis skill covers `pc.preview.indexes.create(..., schema=...)`, `pc.preview.index(name)`, `idx.documents.upsert(...)` / `idx.documents.batch_upsert(...)` / `idx.documents.search(...)`. If you find yourself reaching for any of the following, **stop** — those are different Pinecone APIs and this skill's guidance and helpers won't apply:\n\n- **Classic vector / records API**: `pc.Index(name)`, `index.upsert(vectors=[...])` / `index.upsert_records(...)`, `index.query(vector=..., sparse_vector=...)`, `index.search_records(...)`, `pc.create_index(...)` with `ServerlessSpec`, the legacy `pinecone_text.sparse.BM25Encoder` for sparse-dense hybrid. For indexes WITHOUT a schema (raw vectors).\n- **Integrated-embedding indexes**: `pc.create_index_for_model(...)` with `embed={...}`. Pinecone vectorizes text server-side. Different upsert/search shapes. Cannot be combined with `full_text_search` fields in the same index.\n\nIf the user already has a non-document-schema index, they can stand up a separate document-schema index alongside it — the two are independent — but you can't add FTS fields to a classic index after the fact.\n\n## Querying — construct `documents.search(...)` calls\n\nFor any task that asks you to query an FTS index, you write a `documents.search(...)` call directly. The schema is authoritative — describe the index live before constructing the call so you know which fields are FTS-enabled, which are filterable, and which are vectors.\n\n**Workflow:**\n\n1. **Discover the schema.** Call `pc.preview.indexes.describe(<index>)` and read the `schema.fields` dict. Each field's class indicates its type (`PreviewStringField`, `PreviewIntegerField`, `PreviewDenseVectorField`, etc.); attributes tell you whether it's FTS-enabled (`full_text_search`), filterable, or carries a `dimension`. Skip this step only if you've already seen the schema in this conversation.\n2. **Construct the call** matching the rules below — one scoring type per request, hard requirements in `filter`, ranking signals in `score_by`, `include_fields` explicit on every call.\n3. **Execute** with `idx = pc.preview.index(name=<index>); resp = idx.documents.search(...)` and read `resp.matches`.\n\n**Canonical shapes:**\n\n```python\n# Pure BM25 keyword search\nresp = idx.documents.search(\n    namespace=\"__default__\",\n    top_k=10,\n    score_by=[{\"type\": \"text\", \"field\": \"body\", \"query\": \"machine learning\"}],\n    filter={\"year\": {\"$gt\": 2024}, \"category\": {\"$eq\": \"ai\"}},  # optional\n    include_fields=[\"*\"],   # always pass explicitly\n)\n\n# Hybrid: dense ranking with a lexical filter (one type in score_by + filter narrows)\nresp = idx.documents.search(\n    namespace=\"__default__\",\n    top_k=10,\n    score_by=[{\"type\": \"dense_vector\", \"field\": \"embedding\", \"values\": query_embedding}],\n    filter={\"body\": {\"$match_all\": \"TensorFlow\"}, \"year\": {\"$gt\": 2024}},\n    include_fields=[\"*\"],\n)\n```\n\n**Key rules** (the server enforces these; following them locally keeps the agent loop tight):\n\n- `score_by` is a list of clauses, but **exactly one scoring type per request** (server rejects mixed types). Multi-field BM25 is the one exception: multiple `text` clauses, or one `query_string` with `fields: [...]`. To combine BM25 + dense signals, restrict the dense search with a text-match filter (`$match_all` / `$match_phrase` / `$match_any`); do NOT mix scoring types in `score_by`.\n- `filter` keys are field names (must exist in schema and be filterable) OR logical operators (`$and`, `$or`, `$not`). Field values are operator dicts (`{\"$gt\": 5}`, NOT bare values).\n- `include_fields` is required on every call. Pass `[\"*\"]` for all stored fields, `[]` for ids+score only, or a list of names. Some SDK builds 400/422 if it's omitted.\n\n**Clause shapes** (for `score_by`):\n\n| `type` | Required keys | When to pick this |\n|---|---|---|\n| `text` | `field` (string FTS), `query` | Open-ended keyword search; BM25 ranking on one field |\n| `query_string` | `query` (Lucene), `fields` optional | Lucene boost (`^N`), proximity (`~N`), cross-field boolean, phrase prefix |\n| `dense_vector` | `field` (dense_vector), `values` (list of floats) | Semantic / mood / topic ranking |\n| `sparse_vector` | `field` (sparse_vector), `sparse_values` ({indices, values}) | Custom sparse-encoder ranking |\n\n`text` / `dense_vector` / `sparse_vector` use singular `field`. Only `query_string` accepts a `fields` array (and also accepts singular `field` as an alias). `sparse_vector` uses `sparse_values` (NOT `values`) — distinct from dense.\n\n**Filter operators by field type:**\n\n| Field type | Legal operators |\n|---|---|\n| `string` with FTS | `$match_phrase`, `$match_all`, `$match_any` |\n| `string` filterable | `$eq`, `$ne`, `$in`, `$nin`, `$exists` |\n| `string_list` filterable | `$in`, `$nin`, `$exists` |\n| `float` filterable | `$eq`, `$ne`, `$gt`, `$gte`, `$lt`, `$lte`, `$exists` |\n| `boolean` filterable | `$eq`, `$exists` |\n| logical wrappers | `$and: [filters]`, `$or: [filters]`, `$not: filter` |\n\n**Match shape on response:**\n\n```python\nfor m in resp.matches:\n    m._id        # document id\n    m._score     # match score (NOT `score`); some older SDK builds may also surface `score`\n    m.to_dict()  # full doc payload (when include_fields includes the field)\n```\n\nFor deeper coverage — multi-field BM25, Lucene patterns, hybrid composition, RRF merges, common error symptoms — see `references/querying.md`. For schema field types and what they enable on the query side, see `references/schema-design.md`.\n\n## Ingesting — use the packaged helper\n\nFor **any task that asks you to bulk-ingest a JSONL file into an existing FTS index**, the canonical path is to invoke the bundled helper, NOT to hand-write a Python script. **Do not read the script's source** — everything you need is in this section.\n\nThe script does three things bare-LLM ingest code reliably skips, each of which corresponds to a silent production failure:\n\n1. **Bulk-upserts in batches.** No per-doc `upsert` loops.\n2. **Inspects every batch result.** `batch_upsert` returns 202 even when individual documents fail; the failures live in `result.errors` / `result.has_errors`. Without inspection, \"100 docs ingested\" silently becomes \"73 docs ingested + 27 lost.\"\n3. **Polls until searchable.** After upsert, Pinecone is still building the inverted index. A `documents.search` call during that window returns empty. Without the poll, the user debugs their *query* code for an hour without finding the indexing race.\n\nYou provide a prepared, schema-conformant JSONL file and the index name; the script does the rest. Schema validation is upstream concerns (your prep pipeline, or `prepare_documents.py` when it lands) — `ingest.py` trusts what you hand it.\n\n**Invocation:**\n\n```bash\nuv run --script .claude/skills/pinecone-fts-index/scripts/ingest.py \\\n  --data processed.jsonl \\\n  --index <index_name> \\\n  --sentinel-field <fts_field>\n```\n\n**Flags:**\n\n| Flag | Short | Required | Purpose |\n|---|---|---|---|\n| `--data` | `-d` | yes | Path to JSONL file with prepared documents (one per line) |\n| `--index` | `-i` | yes | Pinecone index name (must already exist) |\n| `--sentinel-field` | `-f` | yes | An FTS-enabled field on the index, used for the readiness-poll query. Pick the longest free-text field on your schema. |\n| `--namespace` | `-n` | no | Default `__default__` |\n| `--batch-size` | `-b` | no | Default 100. **Reduce for large dense vectors.** A 50-doc batch with 3072-dim float vectors lands ~5-10 MB and can be rejected; drop to `--batch-size 50` (or lower) at high dimensions. |\n| `--poll-deadline` | — | no | Default 300 (seconds). Time to wait for documents to become searchable before giving up. |\n| `--sentinel` | `-s` | no | Token used for the readiness-poll query. Default: first whitespace-separated token of `doc[0][sentinel-field]`. |\n\n**What the script prints:**\n\n```\nLoading processed.jsonl ...\nLoaded 5000 document(s).\nSentinel: body='The'\n\nUpserting in batches of 100 ...\n  batch @     0:  100 docs in  0.42s  (total: 100/5000)\n  batch @   100:  100 docs in  0.39s  (total: 200/5000)\n  ...\n\nUpsert complete: 5000 doc(s) in 21.4s.\n\nPolling for searchability (deadline 300s) ...\nSearchable after 12.3s (3 probe(s)).\n\nDone — total 33.7s.\n```\n\nIf a batch fails, the script prints every error message and exits non-zero. If the poll deadline expires, the script prints a hint about why (sentinel field isn't FTS-enabled, deadline too tight, docs structurally upserted but rejected by the inverted-index builder) and exits non-zero. **Don't suppress these errors** — they're surfacing real problems with the data or the index.\n\n**When you should NOT use the script:**\n\n- The user is doing per-doc patch updates (single-doc `documents.upsert` calls with selective fields). The script is for bulk loads, not per-record operations.\n- The user is ingesting from a non-JSONL source (CSV, Parquet, Postgres dump). Convert to JSONL first; the script doesn't parse other formats.\n- The user explicitly asks you to write the ingestion code from scratch (teaching context). Honor the request and follow the canonical pattern: `documents.batch_upsert` + `result.has_errors` inspection + `documents.search` polling with sentinel and deadline.\n\nThe script lives at `.claude/skills/pinecone-fts-index/scripts/ingest.py`. PEP 723 inline-metadata script — `uv run --script` installs `typer` and `pinecone` automatically on first invocation. No setup needed.\n\n## Use cases\n\nThree concrete shapes to model your task on. Match the user's request to the closest one and follow its steps; improvise if the task is genuinely a hybrid.\n\n### UC-1: Index a new corpus end-to-end\n\n**Trigger.** \"Index this CSV / JSONL / folder for search,\" \"build a search backend over [my articles / products / tickets / transcripts],\" \"make my [dataset] searchable.\"\n\n**For unprocessed / messy data, load the onboarding walkthrough first.** If the user is showing up with raw data (unclear field types, possibly long text fields exceeding FTS limits, comma-separated tag strings, dates as strings, possibly duplicate IDs, etc.) and they haven't given you an explicit schema, **read `references/onboarding-walkthrough.md` and follow it stage-by-stage.** It's a conversational guide — meet the data, surface the processing decisions to the user, propose a schema, confirm before creating, then process+ingest+verify together. The walkthrough exists because schemas are immutable and \"onboarding a new corpus\" is a high-stakes flow that benefits from explicit user buy-in at each decision point.\n\nIf the user already gave you a clean JSONL + a schema spec, follow the abbreviated steps below.\n\n**Steps (when data is already prepared and the schema is decided):**\n1. Inspect the corpus shape — text fields, structured metadata, do you also need a vector? Match it to one of the canonical shapes in `references/schema-design.md` (articles, products, tickets, image library, code).\n2. Pick analyzer settings on each text field — `language`, `stemming`, `stop_words`. Stemming on for long prose, off for proper nouns / identifiers.\n3. Assemble the schema with `SchemaBuilder` and **confirm it with the user before calling `indexes.create`** — schemas are immutable in `2026-01.alpha`, so a wrong call costs a re-ingest.\n4. Create the index, poll `describe()` until `status.ready: true`.\n5. **Run `scripts/ingest.py --data <jsonl> --index <name> --sentinel-field <fts_field>`** — see the **Ingesting — use the packaged helper** section above. The script handles `batch_upsert` + per-batch error inspection + post-upsert readiness polling in one invocation. Don't hand-write the loop unless the user explicitly asks you to.\n6. (The script polls automatically — by the time it exits cleanly, the index is searchable. If you skip the script and roll your own, you must poll `documents.search` with a sentinel query and a deadline; `batch_upsert` returning ≠ searchable.)\n7. Validate with one or two probe queries against fields you know contain the sentinel content.\n\n**Result.** A working `documents.search` call against the user's data, returning ranked matches.\n\n### UC-2: Add a dense (or sparse) signal to a text-only corpus\n\n**Trigger.** \"Add semantic search,\" \"add embeddings,\" \"make this hybrid,\" or any prompt that describes a query pattern text alone can't serve (visual similarity, mood, cross-modal \"looks like\").\n\n**Steps.**\n1. Confirm the new signal represents a **modality or signal text can't express** — image / audio / external score, *or* a different corpus than the existing FTS field. Re-encoding the same text into a dense field is an anti-pattern (`references/schema-design.md` → \"When to add a dense field at all\").\n2. Because schemas are immutable, **plan a new index, not a migration**. Get user confirmation before recreating.\n3. Pick an embedding provider and pin its output dimension at schema time. Beware payload-size pitfalls at native dimensions — Gemini-3072 etc. need truncation (`references/ingestion.md` → \"Dense-vector payload size\").\n4. Schema → create → wait Ready → ingest with embeddings inline or pre-cached.\n5. Validate with a **hybrid query**: `dense_vector` score_by + text-match filter (`$match_phrase` / `$match_all`). That's the supported single-call cross-modal shape.\n\n**Result.** One index, two retrieval shapes — pure text *and* dense+filter hybrid — both runnable without further setup.\n\n### UC-3: Build a `documents.search` call from a natural-language user prompt (agent mode)\n\n**Trigger.** Agent receives a user prompt like \"find articles about machine learning that mention TensorFlow and were published after 2024\" or \"documents about climate policy ranked by similarity to this paragraph.\" The index already exists.\n\n**Steps.**\n1. **(Optional) Discover the schema** by calling `pc.preview.indexes.describe(<NAME>)` and reading `schema.fields`. Skip if you already know the field types from earlier in the conversation.\n2. **Decompose the user's prompt** into `score_by` / `filter` shapes using the agent-mode decomposition table below. (Hard requirements → `filter`. Ranking signals → `score_by`. Always include `include_fields` explicitly.)\n3. **Construct the `documents.search(...)` call** following the rules in the Querying section above — one scoring type per request, operator/field-type matching, `include_fields` always set.\n4. **Execute** the call. The response carries `resp.matches`; iterate to get `m._id`, `m._score`, and field values via `m.to_dict()`. Use the matches in whatever shape the user asked for.\n5. If results come back empty or wrong, walk the failure tree in `Common gotchas`.\n\n**Result.** Live search results matching the user's intent.\n\n**The four common UC-3 mistakes** to actively avoid:\n- Mixing scoring types in `score_by` (server rejects). Put hard requirements in `filter`; rank by one signal in `score_by`.\n- Putting hard requirements in `score_by` as BM25 terms instead of in `filter` as `$match_all` / `$match_phrase` (returns ranked results that don't *guarantee* the term is present).\n- Operator/field-type mismatches (e.g. `$match_all` on a float field, `$gt` on a string field). Consult the operator table in the Querying section.\n- Omitting `include_fields` (some SDK builds 400/422). Always pass it explicitly.\n\n## Agent-mode query decomposition\n\nMap user prompt cues to API shapes. Read top-down — identify the cue, copy the corresponding shape.\n\n| User prompt cue | API shape |\n|---|---|\n| Open-ended keywords (\"articles about machine learning\", search-bar query) | `score_by=[{\"type\": \"text\", \"field\": \"<field>\", \"query\": \"<terms>\"}]` — BM25 token-OR |\n| Exact phrase, drives ranking (\"rank by 'beautifully written'\") | `score_by=[{\"type\": \"query_string\", \"query\": '<field>:(\"phrase here\")'}]` |\n| Exact phrase, hard requirement (\"must contain 'machine learning'\") | `filter={\"<field>\": {\"$match_phrase\": \"machine learning\"}}` |\n| Required tokens, any order (\"must mention TensorFlow\", \"must be about Illinois\") | `filter={\"<field>\": {\"$match_all\": \"tokens space-separated\"}}` — preferred over `query_string` `+token` because it's a true hard filter, doesn't contribute to score |\n| At least one of these tokens (\"contains AI or ML or robotics\") | `filter={\"<field>\": {\"$match_any\": \"AI ML robotics\"}}` |\n| Excluded tokens (\"not about deprecated\", \"no opinion pieces\") | `filter={\"$not\": {\"<field>\": {\"$match_any\": \"deprecated opinion\"}}}` — or `-token` inside `query_string` |\n| Boolean / boost / slop / phrase-prefix (\"weight 'eagle' 3x\", \"within N words\") | `score_by=[{\"type\": \"query_string\", \"query\": '<expr with ^N / ~N / \"…\"*>'}]` — only Lucene supports these |\n| Cross-field boolean (\"title or body contains X\") | `score_by=[{\"type\": \"query_string\", \"query\": 'title:(X) OR body:(X)'}]` |\n| Numeric / date / range / boolean metadata (\"after 2024\", \"rating > 4\", \"in stock\") | `filter={\"<field>\": {\"$gt\": ..., \"$gte\": ..., \"$eq\": ..., \"$exists\": true}}` |\n| Category / tag / list membership (\"category = fiction\", \"tagged X\") | `filter={\"<field>\": {\"$in\": [...]}}` (works on `string` and `string_list` filterable fields) |\n| Semantic similarity / mood / topic (\"articles about ML\", \"documents that feel sombre\") | `score_by=[{\"type\": \"dense_vector\", \"field\": \"<embedding_field>\", \"values\": embed(<text>)}]` — requires a `dense_vector` field |\n| Visual appearance / cross-modal text query against an image corpus | Same dense_vector shape, with the embedding model that produced the stored image vectors. Multimodal embedders (Gemini-2 etc.) map a text query into the image space. |\n| Hybrid: lexical requirement + semantic ranking (\"articles about ML that mention TensorFlow\") | Lexical → `filter` (`$match_all` / `$match_phrase`); semantic → `score_by` (`dense_vector`). Single call. |\n\n**Two structural rules the agent must enforce, no exceptions:**\n\n- **One scoring type per request.** `score_by` accepts `text` / `query_string` / `dense_vector` / `sparse_vector`, but a request ranks by *one*. Don't mix dense + text in `score_by` — the server rejects it. Multi-field BM25 is the only \"list\" pattern that's allowed (multiple `text` clauses, or one cross-field `query_string`).\n- **Hybrid = filter + score_by, not two `score_by` clauses.** When a prompt has both a lexical requirement and a semantic ranking signal, lexical goes in `filter` (via `$match_*` operators) and semantic goes in `score_by`. If both signals genuinely need to drive *ranking*, run two searches and merge IDs client-side.\n\n## Workflow at a glance\n\nThree phases. Each has its own reference file — consult it before writing code for that phase.\n\n1. **Design the schema.** Decide which string fields are full-text-searchable, which are filterable metadata, whether you need a `dense_vector` field (and whether it earns its place), whether you also need a `sparse_vector` field, and which numeric / boolean / array filters to declare. Schemas are **fixed at index creation** in `2026-01.alpha` — plan carefully. → `references/schema-design.md`\n2. **Ingest documents.** For bulk loads from a prepared JSONL, run the bundled `scripts/ingest.py` helper (it does `batch_upsert` + error inspection + readiness polling correctly by construction — see the **Ingesting — use the packaged helper** section above). For per-doc patch updates, hand-call `documents.upsert`. Either way, documents are indexed asynchronously after the HTTP call returns; `batch_upsert` returning 202 ≠ searchable. → `references/ingestion.md` for the canonical pattern in detail.\n3. **Query the index.** A single search request ranks by **one** scoring type — pass exactly one of `text`, `query_string`, `dense_vector`, or `sparse_vector` in `score_by` (multi-field BM25 is supported via multiple `text` clauses or a cross-field `query_string`). Layer `filter={...}` for text-match (`$match_phrase` / `$match_all` / `$match_any`) and metadata filters (`$eq` / `$in` / `$gte` / `$exists` / `$and` / `$or` / `$not`). Control the response payload with `include_fields`. → `references/querying.md`\n\n## Quick template\n\nEnd-to-end skeleton for a minimal text + filterable-metadata index. Copy it and edit every spot marked `# TODO:`. The template deliberately omits external embedding calls so it stays generic; see `references/ingestion.md` for dense / sparse field patterns and embedding-provider integration, and `references/querying.md` for the four scoring shapes plus text-match and metadata filters.\n\n```python\nimport time\nfrom pinecone import Pinecone\nfrom pinecone.preview import SchemaBuilder\n\nINDEX_NAME = \"my-fts-index\"        # TODO: name your index (lowercase alphanumeric + hyphens, ≤45 chars)\nNAMESPACE = \"__default__\"          # TODO: pick a namespace; auto-created on first upsert\n\npc = Pinecone()                    # reads PINECONE_API_KEY\n# TODO: preprod backends require an x-environment header on the client:\n#   pc = Pinecone(additional_headers={\"x-environment\": \"preprod-aws-0\"})\n\n# 1. Schema — one FTS string field, one filterable string, one filterable float.\n#    Field names must NOT start with `_` (reserved for `_id` / `_score`) or `$`\n#    (reserved for filter operators), and are limited to 64 bytes.\nschema = (\n    SchemaBuilder()\n    .add_string_field(\"body\", full_text_search={\"language\": \"en\"})  # TODO: rename for your content\n    .add_string_field(\"category\", filterable=True)                   # TODO: any exact-match metadata\n    .add_integer_field(\"year\", filterable=True)                      # TODO: any numeric filter — emits `\"type\": \"float\"` on the wire\n    .build()\n)\n\n# 2. Create the index. read_capacity defaults to {\"mode\": \"OnDemand\"}; pass\n#    {\"mode\": \"Dedicated\", ...} only if you specifically want provisioned reads.\nif not pc.preview.indexes.exists(INDEX_NAME):\n    pc.preview.indexes.create(name=INDEX_NAME, schema=schema)\n\n# 3. Wait for the index itself to become Ready.\nwhile not pc.preview.indexes.describe(INDEX_NAME).status.ready:\n    time.sleep(5)\n\nidx = pc.preview.index(name=INDEX_NAME)\n\n# 4. Upsert a single document. `_id` is required, every other field is optional.\n#    upsert REPLACES the document on conflict — there is no per-field merge in 2026-01.alpha.\nidx.documents.upsert(\n    namespace=NAMESPACE,\n    documents=[{\n        \"_id\": \"doc-1\",\n        \"body\": \"Full-text search is great for keyword queries.\",\n        \"category\": \"intro\",\n        \"year\": 2025.0,\n    }],\n)\n\n# 5. Poll until the FTS side is searchable (upsert returns BEFORE docs are indexed).\ndeadline = time.time() + 300\nwhile time.time() < deadline:\n    resp = idx.documents.search(\n        namespace=NAMESPACE, top_k=1,\n        score_by=[{\"type\": \"text\", \"field\": \"body\", \"query\": \"search\"}],  # TODO: sentinel query likely to hit\n        include_fields=[],          # required on every search; [] = lightest payload (ids + _score only)\n    )\n    if resp.matches:\n        break\n    time.sleep(5)\n\n# 6. Search — text scoring composed with metadata filter.\nresp = idx.documents.search(\n    namespace=NAMESPACE,\n    top_k=5,\n    score_by=[{\"type\": \"text\", \"field\": \"body\", \"query\": \"keyword queries\"}],\n    filter={\"year\": {\"$gte\": 2024}},        # TODO: adjust filter or drop it\n    include_fields=[\"*\"],                    # \"*\" = all stored fields; [] = `_id` + `_score` only\n)\nfor m in resp.matches:\n    print(m._id, getattr(m, \"_score\", getattr(m, \"score\", None)), m.to_dict())\n```\n\n## Common gotchas\n\n- **One scoring type per search request.** `score_by` accepts `text`, `query_string`, `dense_vector`, or `sparse_vector` — but a request ranks by *one* type. Multi-field BM25 is fine (pass several `text` clauses, or a single cross-field `query_string`). To combine BM25 ranking with a `dense_vector` (or `sparse_vector`) signal, restrict the dense search with a text-match `filter` operator (`$match_phrase` / `$match_all` / `$match_any`) on the lexical field, *not* by mixing types in `score_by`. The \"blend a dense vector and a text clause in `score_by`\" pattern is rejected by the server.\n- **Text-match filter operators are the cross-modal hinge.** `$match_phrase` (exact phrase), `$match_all` (every token, any order), `$match_any` (at least one token) are filter-side operators on `full_text_search` fields. Each takes a single string (max 128 tokens). They reuse the field's tokenizer / stemmer, compose under `$and` / `$or` / `$not`, and are the supported way to compose lexical pre-filtering with dense or sparse ranking. **Phrase slop (`\"…\"~N`), term boost (`^N`), and phrase prefix (`\"… word\"*`) are scoring-only — they live in `query_string`, not in `filter`.**\n- **Preprod backends need `additional_headers={\"x-environment\": \"...\"}` on the `Pinecone()` client.** Missing the header lands you on prod and you'll see \"index not found\" / empty-result symptoms that look like code bugs but aren't.\n- **`include_fields` is required on every `documents.search(...)` call.** When omitted, defaults to `[]` (`_id` + `_score` only). Pass `[\"*\"]` for all stored fields or a list of names to project. Omitting it on some SDK builds yields `400` / `422` instead of the documented default; always pass it explicitly to avoid surprises.\n- **Match score is `_score`; doc id is `_id`.** Public-preview docs return the system match score on the `_score` field so a user metadata field literally named `score` can coexist. Always prefer `_score` on read; some older SDK builds may still surface plain `score`, so for defensive code use `getattr(m, \"_score\", getattr(m, \"score\", None))`.\n- **Reserved field names: leading `_` and `$`, max 64 bytes.** `_` is for system fields (`_id`, `_score`); `$` is for filter operators. Schema validation rejects names that violate either rule. Length cap is bytes, not characters — be careful with non-ASCII names.\n- **Vector-field cardinality: at most one `dense_vector` and at most one `sparse_vector` per index** in `2026-01.alpha`. Multiple text fields are fine.\n- **`batch_upsert` failures are silent by default.** The return value carries `has_errors`, `failed_batch_count`, and a list of `BatchError` objects with `error_message`. If you don't inspect them, you'll see \"Uploaded 0 / N\" and an indefinite \"not yet indexed\" poll — with the real cause (payload-too-large, schema mismatch, reserved field name) hidden. Always print `result.errors[*].error_message` before downstream steps.\n- **Dense-vector payload size matters at batch time.** A 50-doc batch with 3072-dim float vectors lands around 5–10 MB and can be rejected by the preview backend. If every batch fails, try reducing the embedding dimension via your provider's truncation knob (e.g. Gemini's `output_dimensionality=768`) before debugging schema.\n- **Async indexing: `batch_upsert` returning ≠ searchable.** The server builds inverted indexes in the background after the HTTP call returns. If you query immediately you'll see empty result sets. Always poll `documents.search` with a sentinel query and a deadline (pattern in `references/ingestion.md`).\n- **String FTS field shape is `full_text_search={...}` (dict).** Pass `{}` to enable with all server defaults. **User-settable sub-fields:** `language`, `stemming`, `stop_words`. **Server-applied** (visible in `describe()` responses but NOT settable at index creation): `lowercase` (default `true`) and `max_token_length` (default `40`). Stemming is opt-in (default `false`); `stop_words` is opt-in (default `false`, opposite of pre-public-preview docs). The earlier SDK shape `full_text_searchable=True, language=\"en\"` is legacy and should be avoided.\n- **Schemas are fixed at index creation in `2026-01.alpha`.** Adding, removing, or retyping fields after creation is not supported. Changing dimension or metric on an existing vector field requires a new index. Plan the schema once.\n- **No partial / per-field updates.** `documents.upsert` always replaces the entire document for a given `_id`. To update one field, fetch the doc, modify in client code, and upsert the full doc back under the same `_id`.\n- **Document operations: search supports `filter`, fetch and delete do not.** Fetch is **ID-only** (`POST /documents/fetch` with `ids: [...]`); delete accepts only `ids` or `delete_all: true`. To act on a metadata expression, search first to collect IDs, then fetch or delete those IDs.\n- **Namespaces auto-create on first upsert.** Pass any namespace string to `documents.upsert` / `batch_upsert` and the namespace is created on the fly; documents from different namespaces are fully isolated. Use `\"__default__\"` if you don't need partitioning. **Caveat:** the namespace management endpoints (`POST /namespaces`, `GET /namespaces`, `DELETE /namespaces/{namespace}`) and `describe_index_stats` are NOT yet supported on indexes with document schemas — you can write to a namespace, you just can't list / delete them via the API yet.\n- **Document and request size limits** (preview): per-document max **2 MB**; per-request max **2 MB and 1000 documents**; per FTS-enabled `string` field max **100 KB and 10,000 tokens** (tokens > 256 bytes are truncated by the analyzer); per-document filterable metadata (everything *not* in an FTS field) max **40 KB**. A schema can declare up to **100 FTS string fields**. For long-prose corpora, chunk before ingest — see `references/ingestion.md`.\n- **`score_by` clause shape — singular `field` is canonical for `text`/`dense_vector`/`sparse_vector`; only `query_string` takes a `fields` array.**\n    - `text`: `{\"type\":\"text\", \"field\":\"<fts_field>\", \"query\":\"<terms>\"}`.\n    - `query_string`: `{\"type\":\"query_string\", \"query\":\"<lucene>\", \"fields\":[\"<a>\",\"<b>\"]}` (the optional `fields` array; `query_string` also accepts a bare `\"fields\":\"body\"` string and the legacy `\"field\":\"body\"` as an alias).\n    - `dense_vector`: `{\"type\":\"dense_vector\", \"field\":\"<dense_field>\", \"values\":[/*floats*/]}`.\n    - `sparse_vector`: `{\"type\":\"sparse_vector\", \"field\":\"<sparse_field>\", \"sparse_values\":{\"indices\":[...],\"values\":[...]}}` — note `sparse_values` (NOT `values`) for sparse clauses.\n- **Single-term prefix wildcards aren't supported.** `auto*` doesn't work in `query_string`; use phrase prefix (`\"machine lea\"*` — phrase must contain at least two terms, last term is matched as prefix).\n- **Indexes can't be created in CMEK-enabled projects, no backup/restore, no fuzzy or regex search, no S3 bulk import** for document-shaped indexes in `2026-01.alpha`. If any of these are hard requirements, the public-preview FTS surface isn't yet ready.\n\n## Extension points\n\nCurrently shipped under `scripts/`:\n\n- `scripts/ingest.py` — bulk-ingest a prepared JSONL into an existing FTS index. Handles `batch_upsert` in safe-sized chunks, inspects every batch's `result.errors` and aborts loudly on failure, then polls `documents.search` with a sentinel + deadline until docs are searchable. Schema-agnostic: takes only `--data`, `--index`, `--sentinel-field`. Usage in **Ingesting — use the packaged helper** section above.\n\nQuery construction does NOT have a packaged helper — write `documents.search(...)` calls directly per the **Querying** section above.","tags":["pinecone","full","text","search","skills","pinecone-io","agent-skills","agents","semantic-search","skills-sh"],"capabilities":["skill","source-pinecone-io","skill-pinecone-full-text-search","topic-agent-skills","topic-agents","topic-pinecone","topic-semantic-search","topic-skills-sh"],"categories":["skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/pinecone-io/skills/pinecone-full-text-search","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add pinecone-io/skills","source_repo":"https://github.com/pinecone-io/skills","install_from":"skills.sh"}},"qualityScore":"0.456","qualityRationale":"deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 12 github stars · SKILL.md body (32,814 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:07:27.053Z","embedding":null,"createdAt":"2026-05-18T19:07:27.053Z","updatedAt":"2026-05-18T19:07:27.053Z","lastSeenAt":"2026-05-18T19:07:27.053Z","tsv":"'-1':1685,3490 '-10':1348 '-2':2067,2828 '-3':2271,2457 '-3072':2201 '/documents/fetch':4359 '/guides/search/full-text-search':144 '/namespaces':4431,4433,4435 '0':1402,1425,3324,4057 '0.39':1438 '0.42':1429 '000':4499 '1':528,1131,1858,2111,2321,2998,3325,3531 '10':633,676,4109,4498 '100':1166,1331,1423,1426,1434,1435,4495,4529 '100/5000':1432 '1000':4486 '12.3':1457 '128':3764 '2':581,1143,1889,2162,2345,3055,3403,4477,4483 '200/5000':1441 '202':1151,3114 '2024':646,694,2304,2747,3589 '2025.0':3504 '2026':274 '2026-01.alpha':23,115,268,1930,3051,3483,4016,4278,4683 '21.4':1448 '256':4502 '27':1174 '3':609,1176,1459,1911,2179,2376,3123,3434 '300':1370,3521 '300s':1454 '3072':1342,4102 '33.7':1464 '3x':2703 '4':1940,2211,2400,2749,3456 '40':4232,4521 '400':3888 '400/422':827,2539 '422':3889 '45':3282 '5':799,1347,1949,2224,2429,3450,3505,3561,3576,4108 '50':1338,1359,4098 '5000':1413,1444 '6':1998,3562 '64':3356,3965 '7':2037 '723':94,1634 '73':1171 '768':4139 '9.0':62,66 '9.0.0':91 'abbrevi':1844 'abort':4733 'absent':80 'accept':914,920,2878,3629,4363,4583 'act':4371 'activ':2460 'ad':4279 'add':41,468,2068,2081,2084,2156,3360,3374,3386 'addit':3316,3819 'adjust':3591 'agent':31,708,2283,2286,2359,2545,2866 'agent-mod':2358,2544 'agnost':4750 'ai':649,2665,2673 'alia':925,4596 'allow':2915 'alon':2098 'alongsid':458 'alphanumer':3280 'alreadi':440,574,1288,1833,1851,2318,2335 'also':919,1010,1869,3030,4582 'alway':653,2371,2398,2540,3895,3933,4080,4172,4313 'analyz':1891,4508 'answer':134 'anti':2151 'anti-pattern':2150 'anyth':151 'api':22,72,112,162,264,266,331,359,373,2554,2570,3300,4465 'appear':2801 'appli':166,369,4213 'april':273 'aren':3852,4628 'around':4107 'array':917,3040,4563,4579 'articl':1708,1883,2293,2576,2780,2843 'ascii':3996 'ask':32,486,1065,1598,1995,2427 'assembl':1912 'async':287,4143 'asynchron':3105 'attribut':550 'audio':2126 'authorit':116,502 'auto':3291,4389,4631 'auto-cr':3290,4388 'automat':1646,2002 'avoid':2461,3900,4270 'aw':3323 'b':1328 'back':2433,4338 'backend':1705,3304,3817,4118 'background':4156 'backup/restore':4667 'bar':2582 'bare':801,1116,4585 'bare-llm':1115 'bash':1252 'batch':200,1136,1146,1148,1326,1340,1357,1421,1424,1433,1468,1969,1973,2033,3072,3111,4022,4036,4095,4100,4121,4145,4400,4720,4729 'batch-siz':1325,1356 'batcherror':4042 'beauti':2600 'becom':1170,1378,3441 'benefit':1819 'bewar':2192 'blend':3704 'bm25':624,732,748,854,1030,2489,2590,2907,3154,3648,3665 'bm25encoder':393 'bodi':639,688,1417,2727,2739,3363,3491,3537,3582,4587,4593 'boolean':873,976,2695,2724,2744,3039 'boost':866,2696,3798 'break':3559 'bug':3850 'build':34,84,253,826,1008,1185,1702,2272,2538,3402,3886,3941,4151 'builder':1513 'bulk':197,1069,1133,1563,3059,4675,4709 'bulk-ingest':1068,4708 'bulk-upsert':1132 'bundl':1086,3067 'buy':1824 'buy-in':1823 'byte':3357,3966,3988,4503 'cach':2223 'call':481,497,510,532,584,608,809,1191,1555,1924,1934,2057,2248,2275,2327,2380,2403,2861,3098,3109,3227,3861,4160,4777 'cannot':425 'canon':620,1080,1615,1879,3119,4550 'cap':3986 'capac':3408 'cardin':4001 'care':3053,3992 'carri':564,2406,4032 'case':1654 'categori':647,2758,2762,3377,3501 'caus':4069 'caveat':4425 'chang':4289 'char':3283 'charact':3990 'chunk':4538,4726 'class':542 'classic':160,370,473 'claude/skills/pinecone-fts-index/scripts/ingest.py':1256,1632 'claus':52,717,739,832,2918,2934,3160,3654,3711,4545,4622 'clean':1837,2008 'client':2976,3313,3827,4331 'client-sid':2975 'climat':2308 'closest':1670 'cmek':4663 'cmek-en':4662 'code':103,1119,1205,1604,1888,2994,3849,3950,4332 'coexist':3932 'collect':4379 'combin':427,747,3664 'come':2432 'comma':1745 'comma-separ':1744 'common':1037,2442,2455,3619 'complet':1443 'compos':3566,3773,3784 'composit':1034 'concern':1236 'concret':1656 'confirm':1792,1918,2112,2176 'conflict':3474 'conform':1220 'construct':49,233,293,479,508,582,2377,3080,4768 'consult':174,2525,2990 'contain':2049,2615,2664,2728,4645 'content':149,2052,3373 'context':1608 'contribut':2655 'control':3190 'convers':224,580,1777,2344 'convert':1584 'copi':2563,3213 'corpora':4537 'corpus':1689,1811,1861,2079,2132,2810 'correct':3078 'correspond':1125,2565 'cost':1935 'count':4037 'cover':152,275,335 'coverag':1026 'creat':6,1794,1941,2213,3292,3404,4390,4406,4660 'creation':3049,4223,4276,4285 'cross':871,2106,2250,2722,2803,2922,3164,3659,3729 'cross-field':870,2721,2921,3163,3658 'cross-mod':2105,2249,2802,3728 'csv':1580,1697 'cue':2552,2562,2569 'current':4703 'custom':898 'd':1269 'data':1257,1268,1531,1719,1733,1781,1849,1952,2062,4753 'dataset':1714 'date':1749,2742 'deadlin':1367,1453,1484,1500,1627,2032,3519,3524,4181,4743 'debug':1202,4141 'decid':1857,3002 'decis':1785,1828 'declar':3043,4526 'decompos':2346 'decomposit':2361,2548 'dedic':3415 'deeper':1025 'default':630,673,1323,1324,1330,1369,1394,3285,3409,3864,3894,4028,4200,4225,4231,4238,4246,4418 'defens':3949 'delet':4350,4362,4367,4384,4434,4461 'deliber':3223 'dens':42,279,297,397,657,680,749,753,876,879,904,935,1335,2070,2146,2158,2207,2230,2262,2790,2797,2812,2858,2882,2895,3019,3143,3235,3633,3669,3677,3706,3790,4005,4089,4553,4597,4600 'dense-vector':2206,4088 'deprec':2680,2688 'describ':157,503,1945,2093,4216,4438 'design':277,2999 'detail':3122 'dict':538,797,1014,2418,3618,4193 'differ':357,422,2131,4412 'dim':1343,4103 'dimens':566,1364,2188,2199,4127,4290 'dimension':4138 'direct':498,4778 'discov':529,2323 'distinct':933 'doc':139,155,1016,1140,1167,1172,1339,1401,1427,1436,1445,1503,1548,1553,3093,3489,3516,3906,3913,4099,4254,4328,4337,4745 'docs.pinecone.io':143 'docs.pinecone.io/guides/search/full-text-search':142 'document':48,70,170,328,445,455,998,1155,1277,1376,1414,2306,2783,3057,3102,3460,3472,3487,3893,4317,4343,4410,4448,4467,4475,4487,4511,4679 'document-schema':69,169,327,454 'document-shap':4678 'documents.batch':1617 'documents.search':238,480,496,1190,1622,2025,2056,2274,2379,3860,4174,4739,4776 'documents.upsert':1554,3099,4312,4399 'doesn':1590,2653,4632 'done':1462 'downstream':4086 'drive':2596,2967 'drop':1354,3594 'dump':1583 'duplic':1753 'e.g':159,2513,4134 'eagl':2702 'earlier':82,2341,4256 'earn':3025 'edit':3216 'either':3100,3983 'emb':415,2794 'embed':408,683,686,2085,2182,2218,2817,3226,3241,4126 'embedd':2826 'embedding-provid':3240 'emit':3396 'empti':1196,2434,3843,4169 'empty-result':3842 'en':3368,4264 'enabl':519,558,1049,1298,1499,4196,4491,4664 'encod':901,2140 'end':851,1691,1693,2574,3201,3203 'end-to-end':1690,3200 'endpoint':4429 'enforc':701,2868 'entir':4316 'environ':3309,3320,3823 'eq':312,648,956,969,978,2755,3183 'error':202,1038,1163,1474,1523,1620,1974,3074,4034,4045,4083 'etc':549,1755,2202,2829 'even':1152 'everi':607,808,1145,1473,3217,3464,3550,3738,3859,4120,4728 'everyth':1103,4514 'exact':719,2594,2610,3137,3383,3734 'exact-match':3382 'exceed':1741 'except':736,2870 'exclud':2676 'execut':610,2401 'exist':231,315,781,960,966,975,979,1076,1289,1802,2135,2319,2756,3186,4295,4716 'exit':1477,1515,2007 'expir':1485 'explicit':109,605,655,1597,1763,1821,1994,2375,2543,3898 'expr':2713 'express':2124,4375 'extens':4701 'extern':2127,3225 'f':1293 'fact':477 'fail':1156,1469,4035,4122 'failur':1130,1158,2439,4024,4736 'fals':4239,4247 'feel':2785 'fetch':4326,4348,4353,4382 'fiction':2763 'field':46,432,470,515,540,604,638,652,682,696,731,745,778,793,804,814,845,858,863,872,878,891,910,916,922,939,941,1020,1023,1029,1044,1262,1292,1299,1316,1405,1494,1558,1735,1740,1864,1896,1956,2046,2137,2147,2159,2338,2374,2397,2414,2519,2524,2535,2588,2723,2775,2792,2799,2906,2923,3005,3021,3035,3153,3165,3196,3237,3330,3337,3362,3376,3388,3466,3480,3536,3547,3581,3597,3600,3647,3660,3695,3757,3769,3855,3873,3922,3927,3960,3970,4000,4019,4077,4187,4206,4283,4297,4310,4325,4493,4519,4532,4548,4562,4567,4575,4578,4586,4592,4602,4610,4757 'file':131,1073,1222,1274,2989 'filter':283,311,320,522,562,597,643,662,668,687,760,775,786,936,955,963,968,977,983,985,987,2237,2263,2354,2366,2474,2494,2618,2634,2652,2670,2684,2752,2766,2774,2850,2927,2951,3013,3041,3169,3182,3210,3257,3332,3335,3350,3378,3390,3395,3569,3586,3592,3684,3724,3750,3788,3815,3975,4347,4512 'filter-sid':3749 'filterable-metadata':3209 'find':346,1210,2292 'fine':3650,4021 'first':1395,1587,1648,1724,3294,4377,4392 'fix':3046,4273 'flag':1263,1264 'fli':4409 'float':884,967,1344,2518,3336,3398,4104,4604 'flow':1817 'folder':1699 'follow':353,703,1613,1673,1768,1842,2381 'format':1594 'found':3841 'four':2454,3248 'free':1314 'free-text':1313 'front':187 'fts':17,68,138,172,330,469,491,518,557,847,947,1077,1297,1498,1742,2136,3273,3328,3509,4186,4490,4518,4530,4695,4717 'fts-enabl':517,556,1296,1497,4489 'full':3,14,55,257,429,559,1015,3008,3364,3493,3754,4190,4259,4336 'full-text':3492 'full-text-search':13,256,3007 'fulli':4415 'fuzzi':4669 'gave':1834 'gemini':2200,2827,4135 'generic':3231 'genuin':181,1681,2964 'get':208,2174,2410,4432 'getattr':3610,3613,3952,3955 'give':1381 'given':1760,4320 'glanc':2981 'goe':2949,2957 'gotcha':2443,3620 'great':3497 'gt':645,693,798,971,2520,2753 'gte':314,972,2754,3185,3588 'guarante':2506 'guid':1778 'guidanc':364 'hand':236,1091,1249,1987,3097 'hand-cal':3096 'hand-writ':1090,1986 'hand-written':235 'handl':196,1968,4719 'hard':594,2364,2471,2483,2612,2651,4689 'haven':1758 'header':3310,3317,3820,3830 'helper':87,192,230,248,366,1060,1087,1963,3069,3087,4764,4774 'hidden':4079 'high':1363,1815 'high-stak':1814 'hing':3731 'hint':1490 'hit':122,3545 'honor':1609 'hour':1208 'http':3108,4159 'hybrid':398,656,1033,1683,2088,2228,2264,2838,2926 'hyphen':3281 'id':816,999,1754,2974,3345,3461,3488,3554,3601,3866,3907,3909,3971,4321,4342,4356,4361,4365,4380,4386 'id-on':4355 'identifi':1910,2560 'idx':612,3451 'idx.documents.batch':341 'idx.documents.search':343,616,628,671,3526,3571 'idx.documents.upsert':340,3484 'illinoi':2633 'imag':1886,2125,2809,2823,2836 'immedi':4165 'immut':1806,1928,2166 'import':3259,3263,3267,4676 'improvis':1676 'includ':286,603,651,695,803,1019,1021,2372,2373,2396,2534,3195,3546,3596,3854 'incomplet':78 'indefinit':4061 'independ':463 'index':18,38,260,288,387,400,409,411,436,447,457,474,492,505,1078,1188,1212,1225,1259,1281,1285,1302,1512,1534,1686,1695,1943,1953,2010,2170,2255,2317,3048,3104,3126,3212,3269,3274,3278,3406,3426,3430,3438,3446,3454,3518,3839,4014,4064,4144,4153,4222,4275,4301,4439,4446,4656,4681,4718,4754 'index.query':380 'index.search':384 'index.upsert':376,378 'indexes.create':1925 'indic':543,896,4613 'individu':1154 'ingest':7,47,198,211,285,1056,1070,1118,1168,1173,1573,1603,1797,1939,1959,2216,3056,3083,4540,4710,4760 'ingest.py':1245 'inlin':95,1636,2219 'inline-metadata':1635 'insid':2692 'inspect':203,1144,1165,1621,1859,1975,3075,4051,4727 'instal':64,1642 'instead':2491,3890 'integ':3387 'integr':407,3243 'integrated-embed':406 'intent':2452 'intro':3502 'invert':1187,1511,4152 'inverted-index':1510 'invoc':1251,1649,1983 'invok':1084 'isn':1495,4697 'isol':4416 'iter':2408 'jsonl':1072,1221,1273,1578,1586,1698,1838,3064,4713 'k':632,675,3530,3575 'kb':4496,4522 'keep':706 'key':697,776,839,3301 'keyword':625,852,2575,3499,3584 'knob':4133 'know':228,513,2048,2336 'land':1244,1346,3831,4106 'languag':1897,2280,3367,4207,4263 'larg':1334,4073 'last':118,4650 'layer':3168 'lea':4642 'lead':3962 'learn':642,2296,2579,2617,2622 'least':2659,3745,4647 'legaci':391,4266,4591 'legal':943 'length':3985,4230 'lexic':661,2839,2849,2941,2948,3694,3785 'librari':1887 'lightest':3552 'like':2109,2291,3543,3848 'limit':1743,3354,4471 'line':1280 'link':176 'list':715,821,882,962,2760,2773,2911,3876,4040,4460 'liter':3928 'live':73,506,1159,1630,2445,3809 'll':214,3837,4054,4167 'llm':1117 'load':1410,1412,1564,1720,3060 'local':705 'logic':788,980 'long':1738,1904,4535 'long-pros':4534 'longest':1312 'look':2108,3847 'loop':709,1142,1990 'lost':1175 'loud':4734 'lower':1361 'lowercas':3279,4224 'lt':973 'lte':974 'lucen':862,865,1031,2718 'm':994,3605,3611,3614,3953,3956 'm._id':997,2411,3609 'm._score':1000,2412 'm.to':1013,2417,3617 'machin':641,2295,2578,2616,2621,4641 'make':1712,2086 'manag':4428 'map':2549,2830 'mark':3219 'match':302,304,306,310,585,689,759,761,763,765,948,950,952,988,1001,1663,1873,2065,2236,2238,2240,2395,2421,2448,2496,2498,2514,2619,2635,2671,2686,2851,2853,2953,3173,3174,3176,3178,3254,3384,3683,3686,3688,3690,3723,3732,3736,3742,3902,3917,4653 'matter':4093 'max':3763,3964,4228,4476,4482,4494,4520 'may':156,1009,3942 'mb':1349,4110,4478,4484 'md':130 'meet':1779 'membership':2761 'mention':2298,2628,2847 'merg':1036,2973,3481 'messag':1475,4046,4084 'messi':1718 'metadata':96,284,319,1637,1866,2745,3014,3181,3211,3256,3385,3568,3926,4374,4513 'metric':4292 'migrat':2173 'minim':3207 'mismatch':2512,4075 'miss':3828 'mistak':2458 'mix':727,769,2462,2894,3698 'ml':2667,2674,2782,2845 'modal':2107,2118,2251,2804,3730 'mode':2284,2360,2546,3411,3414 'model':413,1659,2818 'modifi':4329 'mood':886,2104,2778 'multi':730,1028,2905,3152,3646 'multi-field':729,1027,2904,3151,3645 'multimod':2825 'multipl':737,2916,3158,4017 'must':780,1287,2023,2614,2627,2630,2867,3339,4644 'my-fts-index':3271 'n':867,869,1321,2705,2715,2716,3796,3799,4058 'name':339,375,614,779,823,1226,1286,3270,3276,3338,3427,3429,3431,3447,3453,3455,3878,3929,3961,3980,3997,4078 'namespac':629,672,1320,3284,3289,3485,3486,3527,3528,3572,3573,4387,4396,4404,4413,4427,4436,4455 'narrow':669 'nativ':2198 'natur':2279 'natural-languag':2278 'ne':957,970 'need':1105,1652,1870,2203,2965,3017,3031,3818,4423 'new':1688,1810,2114,2169,4300 'nin':959,965 'non':444,1479,1517,1577,3995 'non-ascii':3994 'non-document-schema':443 'non-jsonl':1576 'non-zero':1478,1516 'none':3616,3958 'note':4615 'noun':1909 'numer':2741,3038,3394 'object':4043 'offici':136 'older':1006,3939 'omit':831,2533,3224,3863,3881 'onboard':1722,1808 'ondemand':3412 'one':589,663,720,735,741,857,1278,1671,1876,1982,2040,2254,2389,2477,2660,2871,2891,2920,3133,3138,3327,3331,3334,3621,3643,3746,4004,4010,4324 'open':850,2573 'open-end':849,2572 'oper':789,796,937,944,1569,2527,2954,3351,3685,3725,3752,3976,4344 'operator/field-type':2394,2511 'opinion':2682,2689 'opposit':4248 'opt':4236,4244 'opt-in':4235,4243 'option':650,864,2322,3468,4577 'order':2626,3741 'output':2187,4137 'packag':86,1059,1962,3086,4763,4773 'paragraph':2315 'parquet':1581 'pars':1592 'partial':4307 'partit':4424 'pass':654,810,2541,3136,3413,3651,3869,3896,4194,4394 'patch':1549,3094 'path':173,1081,1271 'pattern':1032,1616,2096,2152,2912,3120,3238,3715,4182 'payload':1017,2194,2209,3193,3553,4071,4091 'payload-s':2193 'payload-too-larg':4070 'pc':3296,3314 'pc.create':386,410 'pc.index':374 'pc.preview.index':338,613,3452 'pc.preview.indexes.create':336,3428 'pc.preview.indexes.describe':533,2328,3445 'pc.preview.indexes.exists':3425 'pep':93,1633 'per':239,592,723,1139,1279,1547,1567,1972,2392,2874,3092,3479,3624,4013,4309,4474,4480,4488,4510,4779 'per-batch':1971 'per-doc':1138,1546,3091 'per-docu':4473,4509 'per-field':3478,4308 'per-record':1566 'per-request':4479 'phase':2983,2997 'phrase':303,764,874,949,2239,2499,2595,2608,2611,2620,2699,2854,3175,3687,3733,3735,3794,3801,4639,4643 'phrase-prefix':2698 'pick':842,1310,1890,2180,3287 'piec':2683 'pin':89,107,2185 'pinecon':2,12,40,54,59,65,90,137,255,358,416,1182,1284,1645,3262,3264,3297,3299,3315,3826 'pinecone-full-text-search':1,53 'pinecone.preview':75,265,3266 'pinecone_text.sparse':392 'pip':63 'pipelin':1239 'pitfal':2196 'place':3027 'plain':3945 'plan':2167,3052,4302 'plus':3251 'point':1829,4702 'polici':2309 'poll':205,290,1177,1199,1308,1366,1392,1450,1483,1623,1944,1980,2001,2024,3077,3506,4065,4173,4738 'poll-deadlin':1365 'possibl':1737,1752 'post':1977,4358,4430 'post-upsert':1976 'postgr':1582 'pre':2222,3787,4251 'pre-cach':2221 'pre-filt':3786 'pre-public-preview':4250 'prefer':145,2641,3934 'prefix':875,2700,3802,4626,4640,4655 'prep':1238 'prepar':1217,1276,1852,3063,4712 'prepare_documents.py':1241 'preprod':3303,3322,3816 'preprod-aw':3321 'present':2510 'preview':21,25,263,270,3912,4117,4253,4472,4694 'previewdensevectorfield':548 'previewintegerfield':547 'previewstringfield':546 'print':1409,1472,1488,3608,4081 'probe':1460,2043 'problem':1528 'process':1784,1796 'processed.jsonl':1258,1411 'prod':3834 'produc':2820 'product':1129,1709,1884 'project':3880,4665 'prompt':2091,2282,2290,2350,2551,2568,2937 'proper':1908 'propos':1789 'prose':1905,4536 'provid':1215,2183,3242,4130 'provis':3421 'proxim':868 'public':24,269,3911,4252,4693 'public-preview':3910,4692 'publish':2302 'pure':623,2259 'purpos':1267 'put':2470,2482 'python':60,622,992,1094,3258 'queri':10,232,241,247,292,295,478,489,640,685,742,848,859,861,912,1052,1204,1309,1393,2029,2044,2095,2229,2386,2531,2547,2583,2589,2605,2607,2643,2693,2710,2712,2733,2735,2806,2833,2880,2924,3124,3141,3166,3500,3538,3542,3583,3585,3631,3661,3811,4164,4178,4558,4568,4569,4572,4574,4580,4636,4767,4781 'question':124 'quick':3198 'race':1213 'rang':2743 'rank':598,658,855,888,902,2064,2310,2367,2475,2501,2597,2598,2842,2889,2946,2968,3131,3641,3666,3793 'rate':2748 'raw':404,1732 're':99,180,1525,1938,2139 're-encod':2138 're-ingest':1937 'reach':348 'read':535,618,1098,1765,2330,2556,3298,3407,3422,3937 'readi':204,1307,1391,1979,2215,3076,3442,4700 'readiness-pol':1306,1390 'real':1527,4068 'receiv':2287 'record':372,379,385,1568 'recreat':2178 'reduc':1332,4124 'refer':117,129,2988 'references/ingestion.md':2205,3116,3233,4184,4542 'references/onboarding-walkthrough.md':1766 'references/querying.md':1041,3197,3245 'references/schema-design.md':1055,1882,2153,3054 'regex':4671 'reject':726,1353,1507,2469,2902,3717,3979,4114 'reliabl':1120 'remov':4280 'renam':3370 'replac':3470,4314 'repres':2116 'request':593,724,1611,1667,2393,2875,2888,3130,3626,3640,4469,4481 'requir':58,595,806,838,1266,2365,2472,2484,2613,2623,2795,2840,2942,3305,3463,3548,3857,4298,4690 'reserv':3343,3348,3959,4076 'resort':119 'resp':615,627,670,3525,3570 'resp.matches':619,996,2407,3558,3607 'respons':991,2405,3192,4217 'rest':1231 'restrict':751,3675 'result':1147,2053,2253,2431,2444,2447,2502,3844,4170 'result.errors':1161,4082,4731 'result.has':1162,1619 'retriev':2257 'return':1150,1195,2035,2063,2500,3110,3113,3514,3914,4030,4147,4161 'retyp':4282 'reus':3767 'robot':2669,2675 'roll':2019 'rrf':1035 'rule':587,698,2383,2864,3984 'run':1254,1640,1950,2969,3065 'runnabl':2266 's3':4674 'safe':199,4724 'safe-s':4723 'schema':71,171,276,329,337,403,446,456,500,531,577,783,1043,1219,1232,1319,1764,1791,1804,1840,1855,1914,1926,2164,2190,2212,2325,3001,3044,3326,3358,3432,3433,3977,4074,4142,4271,4304,4449,4524,4749 'schema-agnost':4748 'schema-conform':1218 'schema.fields':537,2331 'schemabuild':1916,3268,3359 'scope':321 'score':50,301,590,601,634,666,677,711,721,770,773,817,835,1002,1004,1012,2128,2232,2352,2369,2390,2463,2466,2480,2486,2584,2602,2657,2707,2730,2787,2856,2872,2876,2898,2928,2932,2959,3134,3149,3249,3346,3532,3555,3565,3577,3602,3612,3615,3622,3627,3701,3713,3806,3867,3903,3905,3918,3921,3930,3935,3946,3954,3957,3972,4543 'scoring-on':3805 'scratch':1606 'script':88,1095,1100,1111,1228,1255,1408,1471,1487,1541,1560,1589,1629,1638,1641,1967,2000,2017,4706 'scripts/ingest.py':194,1951,3068,4707 'sdk':61,83,825,1007,2537,3885,3940,4257 'search':5,16,37,57,259,431,561,626,754,853,1701,1704,2083,2446,2581,2971,3129,3366,3495,3539,3551,3563,3625,3678,3756,4192,4345,4376,4672 'search-bar':2580 'searchabl':1179,1379,1452,1455,1715,2012,2036,3010,3115,3512,4148,4261,4747 'second':1371 'section':242,1109,1964,2387,2532,3088,4765,4782 'see':1040,1054,1957,3081,3232,3838,4055,4168,4541 'seen':575 'select':1557 'semant':885,2082,2776,2841,2855,2945,2956 'sentinel':1261,1291,1383,1404,1416,1493,1625,1955,2028,2051,3541,4177,4742,4756 'sentinel-field':1260,1290,1403,1954,4755 'separ':453,1398,1746,2640 'serv':2101 'server':420,700,725,2468,2901,3720,4150,4199,4212 'server-appli':4211 'server-sid':419 'serverlessspec':389 'set':1892,2399,4171 'settabl':4203,4220 'setup':1651,2269 'sever':3652 'shape':424,621,833,989,1657,1862,1880,2252,2258,2355,2424,2555,2566,2571,2814,3250,4188,4258,4546,4680 'ship':190,4704 'short':1265 'show':1729 'side':421,1053,2977,3510,3751 'signal':599,750,2073,2115,2120,2368,2478,2947,2963,3674 'silent':1128,1169,4026 'similar':2103,2312,2777 'singl':1552,2247,2860,3128,3459,3657,3761,4624 'single-cal':2246 'single-doc':1551 'single-term':4623 'singular':909,921,4547 'size':1327,1358,2195,2210,4092,4470,4725 'skeleton':3204 'skill':106,126,147,189,251,323,334,362 'skill-pinecone-full-text-search' 'skip':567,1121,2015,2332 'slop':2697,3795 'sombr':2786 'sourc':1102,1579 'source-pinecone-io' 'space':2639,2837 'space-separ':2638 'spars':44,281,299,382,396,889,892,894,900,906,926,929,2072,2884,3033,3146,3236,3636,3672,3792,4011,4555,4605,4608,4611,4616,4621 'sparse-dens':395 'sparse-encod':899 'spec':1841 'specif':3419 'spot':3218 'stage':1771,1773 'stage-by-stag':1770 'stake':1816 'stand':450 'start':221,3341 'stat':4440 'status.ready':1947,3448 'stay':3230 'stem':1898,1901,4208,4233 'stemmer':3772 'step':212,569,1675,1845,1847,2110,2320,4087 'still':1184,3943 'stock':2751 'stop':354,1899,4209,4240 'store':813,2822,3599,3872 'string':296,743,846,860,913,945,954,961,1748,1751,2523,2606,2644,2694,2711,2734,2770,2772,2881,2925,3004,3142,3167,3329,3333,3361,3375,3632,3662,3762,3812,4185,4397,4492,4531,4559,4570,4573,4581,4588,4637 'structur':1504,1865,2863 'stuck':182 'sub':4205 'sub-field':4204 'support':2245,2719,3156,3781,4288,4346,4444,4630 'suppress':1521 'surfac':158,217,1011,1526,1782,3944,4696 'surpris':3901 'symptom':1039,3845 'system':3916,3969 'tabl':2362,2528 'tag':1747,2759,2764 'take':3759,4560,4751 'task':484,1063,1661,1679 'teach':1607 'tell':183,551 'templat':3199,3222 'tensorflow':691,2299,2629,2848 'term':2490,2508,3797,4625,4649,4651 'text':4,15,36,56,258,278,294,309,418,430,560,637,738,758,844,903,1315,1739,1863,1895,2077,2097,2121,2143,2235,2260,2587,2805,2832,2879,2896,2917,3009,3140,3159,3172,3208,3253,3365,3494,3535,3564,3580,3630,3653,3682,3710,3722,3755,4018,4191,4260,4552,4564,4566 'text-match':308,757,2234,3171,3252,3681,3721 'text-on':2076 'thing':1114 'three':1113,1655,2982 'ticket':1710,1885 'tight':710,1502 'time':1372,2005,2191,3260,4096 'time.sleep':3449,3560 'time.time':3520,3523 'titl':2725,2736 'todo':3220,3275,3286,3302,3369,3380,3392,3540,3590 'togeth':1799 'token':1386,1399,2592,2624,2637,2645,2663,2677,2691,3739,3747,3765,3771,4229,4500,4501 'token-or':2591 'top':631,674,2558,3529,3574 'top-down':2557 'topic':887,2779 'topic-agent-skills' 'topic-agents' 'topic-pinecone' 'topic-semantic-search' 'topic-skills-sh' 'total':1431,1440,1463 'transcript':1711 'tree':2440 'tri':4123 'trigger':1694,2080,2285 'true':1948,2650,2757,3379,3391,4226,4262,4369 'truncat':2204,4132,4505 'trust':1246 'two':461,2042,2256,2862,2931,2970,4648 'type':545,591,636,664,679,722,728,771,837,940,942,1045,1736,2339,2391,2464,2586,2604,2709,2732,2789,2873,3135,3397,3534,3579,3623,3644,3699,4565,4571,4599,4607 'typer':1643 'uc':1684,2066,2270,2456 'unclear':1734 'unless':1991 'unprocess':1717 'updat':1550,3095,4311,4323 'upload':4056 'upsert':201,342,1134,1141,1149,1181,1419,1442,1505,1618,1970,1978,2034,3073,3112,3295,3457,3469,3513,4023,4146,4334,4393,4401,4721 'upsert/search':423 'upstream':1235 'usag':4758 'use':19,26,215,908,928,1057,1303,1387,1539,1653,1960,2356,2419,3084,3951,4417,4638,4761 'user':29,185,227,439,1201,1543,1571,1596,1665,1727,1788,1822,1832,1922,1993,2060,2175,2281,2289,2348,2426,2450,2550,2567,3925,4202 'user-sett':4201 'uv':1253,1639 'v9':108 'valid':1233,2038,2225,3978 'valu':684,794,802,881,895,897,930,932,2415,2793,4031,4603,4612,4614,4617,4619 've':573 'vector':45,161,280,282,298,300,371,377,381,383,405,417,526,681,877,880,890,893,905,907,927,1336,1345,1872,2208,2231,2791,2798,2813,2824,2859,2883,2885,3020,3034,3144,3147,3634,3637,3670,3673,3707,3999,4006,4012,4090,4105,4296,4554,4556,4598,4601,4606,4609 'vector-field':3998 'verifi':1798 'version':113,267 'via':92,2416,2952,3157,4128,4463 'violat':3982 'visibl':4214 'visual':2102,2800 'wait':1374,2214,3435 'walk':2437 'walkthrough':1723,1801 'want':3420 'way':3101,3782 'weight':2701 'whatev':2423 'whether':553,3015,3023,3028 'whitespac':1397 'whitespace-separ':1396 'wildcard':4627 'window':1194 'wire':111,3401 'within':2704 'without':401,1164,1197,1209,2267 'won':367 'word':1900,2706,3803,4210,4241 'work':2055,2768,4634 'workflow':250,527,2978 'wrapper':981 'write':100,494,1092,1601,1988,2993,4452,4775 'written':237,2601 'wrong':1933,2436 'x':2729,2737,2740,2765,3308,3319,3822 'x-environ':3307,3318,3821 'year':644,692,3389,3503,3587 'yes':1270,1283,1294 'yet':4063,4443,4466,4699 'yield':3887 'zero':1480,1518","prices":[{"id":"c2e1cf69-a90d-46c7-be10-b420e24ec3a2","listingId":"7c1b6777-fc67-48f6-96ad-31016b2e5134","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"pinecone-io","category":"skills","install_from":"skills.sh"},"createdAt":"2026-05-18T19:07:27.053Z"}],"sources":[{"listingId":"7c1b6777-fc67-48f6-96ad-31016b2e5134","source":"github","sourceId":"pinecone-io/skills/pinecone-full-text-search","sourceUrl":"https://github.com/pinecone-io/skills/tree/main/skills/pinecone-full-text-search","isPrimary":false,"firstSeenAt":"2026-05-18T19:07:27.053Z","lastSeenAt":"2026-05-18T19:07:27.053Z"}],"details":{"listingId":"7c1b6777-fc67-48f6-96ad-31016b2e5134","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"pinecone-io","slug":"pinecone-full-text-search","github":{"repo":"pinecone-io/skills","stars":12,"topics":["agent-skills","agents","pinecone","retrieval-augmented-generation","semantic-search","skills-sh"],"license":"mit","html_url":"https://github.com/pinecone-io/skills","pushed_at":"2026-05-07T04:32:27Z","description":"Pinecone's official Agent Skills library, for use with agentic IDEs such as Cursor, Github Copilot, Antigravity, Gemini CLI and more.","skill_md_sha":"2d4b9428e4d63cb51627351583c1da73ff3ee138","skill_md_path":"skills/pinecone-full-text-search/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/pinecone-io/skills/tree/main/skills/pinecone-full-text-search"},"layout":"multi","source":"github","category":"skills","frontmatter":{"name":"pinecone-full-text-search","description":"Create, ingest into, and query a Pinecone full-text-search (FTS) index using the preview API (2026-01.alpha, public preview). Use when the user or agent asks to build a text search index on Pinecone, add dense or sparse vector fields, ingest documents, construct score_by clauses (text / query_string / dense_vector / sparse_vector), or compose with text-match filters ($match_phrase / $match_all / $match_any). Ships `scripts/ingest.py` for safe bulk ingestion (batch_upsert + error inspection + readiness polling); query construction is documented inline in this skill — write `documents.search(...)` calls directly, validated against `pc.preview.indexes.describe(...)` output."},"skills_sh_url":"https://skills.sh/pinecone-io/skills/pinecone-full-text-search"},"updatedAt":"2026-05-18T19:07:27.053Z"}}