Skillquality 0.46

chatgpt-search

Search ChatGPT conversation exports using SQLite FTS5 (SQLite full-text search). BM25-ranked full-text search (relevance scoring) with TF-IDF keywords (term-weighted key phrases), date/role/model/language filtering, and conversation browsing. Use when agent needs to search past C

Price
free
Protocol
skill
Verified
no

What it does

chatgpt-search

SQLite FTS5 (SQLite full-text search) engine for ChatGPT conversation exports. BM25-ranked full-text search (relevance scoring) with title boosting, code separation, TF-IDF (term-frequency/inverse-document-frequency) keyword extraction, and filtering by date, role, model, and language.

Setup

cd /path/to/skills/chatgpt-search
./scripts/setup.sh /path/to/your/conversations.json
export PYTHONPATH=/path/to/skills/chatgpt-search/src
  • Claude Code: copy this skill folder into .claude/skills/chatgpt-search/
  • Codex CLI: append this SKILL.md content to your project's root AGENTS.md

For the full installation walkthrough (prerequisites, verification, troubleshooting), see references/installation-guide.md.

Staying Updated

This skill ships with an UPDATES.md changelog and UPDATE-GUIDE.md for your AI agent.

After installing, tell your agent: "Check UPDATES.md in the chatgpt-search skill for any new features or changes."

When updating, tell your agent: "Read UPDATE-GUIDE.md and apply the latest changes from UPDATES.md."

Follow UPDATE-GUIDE.md so customized local files are diffed before any overwrite.

Repo: ./ Data: <your-export-path>/conversations.json Default DB: ~/.chatgpt-search/index.db

Quick Start

cd . && ./scripts/setup.sh <your-export-path>/conversations.json
export PYTHONPATH=./src
python -m chatgpt_search.cli "your topic query" --limit 10

Decision Tree

Need to search past ChatGPT conversations?
  |
  +-- Know a topic/keyword? --> Full-text search: "query"
  |     +-- Want only user messages? --> add --role user
  |     +-- Want a specific model's responses? --> add --model gpt-5
  |     +-- Want a date range? --> add --since 2025-01 --until 2025-06
  |     +-- Want a specific language? --> add --lang ru
  |
  +-- Know a conversation ID? --> --conversation <id> (or partial ID)
  |
  +-- Want to explore keywords?
  |     +-- Top corpus keywords --> --keywords
  |     +-- Keywords for a conversation --> --keywords --keywords-conversation <id>
  |
  +-- Want corpus overview? --> --stats
  |
  +-- Need to search non-ChatGPT docs? --> Use your project's document search skill
  +-- Need to search Apple Notes/Obsidian? --> Use a dedicated document search tool
  +-- Need web search? --> Use web-search skill (optional companion, not required)

Setup

cd . && ./scripts/setup.sh <your-export-path>/conversations.json

This installs dependencies (scikit-learn, langdetect) and builds the index from the provided conversations.json location. Rebuild takes ~26 seconds on the full corpus (1,514 conversations, 16,689 messages).

CLI Reference

# Set PYTHONPATH (or install the package)
export PYTHONPATH=./src

# --- Search ---

# Full-text search
python -m chatgpt_search.cli "transformer attention"

# Date filtering
python -m chatgpt_search.cli "kubernetes" --since 2025-01
python -m chatgpt_search.cli "pytorch" --since 2025-06 --until 2025-12

# Role filtering (search only user messages or assistant responses)
python -m chatgpt_search.cli "pricing strategy" --role user

# Model filtering (partial match)
python -m chatgpt_search.cli "code review" --model gpt-5
python -m chatgpt_search.cli "reasoning" --model o3

# Language filtering
python -m chatgpt_search.cli "machine learning" --lang en
python -m chatgpt_search.cli "обучение" --lang ru

# Phrase queries (exact match)
python -m chatgpt_search.cli '"attention is all you need"'

# Prefix queries
python -m chatgpt_search.cli "transfor*"

# Limit results
python -m chatgpt_search.cli "topic" --limit 5
python -m chatgpt_search.cli "topic" -n 50

# --- Browse ---

# Browse a full conversation
python -m chatgpt_search.cli --conversation <conversation-id>
python -m chatgpt_search.cli -c <partial-id>

# --- Keyword Exploration ---

# Top keywords across the corpus (by total TF-IDF score)
python -m chatgpt_search.cli --keywords

# Keywords for a specific conversation
python -m chatgpt_search.cli --keywords --keywords-conversation <conversation-id>

# --- Corpus Info ---

# Corpus statistics (conversations, messages, keywords, models, dates)
python -m chatgpt_search.cli --stats

# --- Index Management ---

# Rebuild index (includes TF-IDF enrichment)
python -m chatgpt_search.cli --rebuild --export /path/to/conversations.json

# Custom database location
python -m chatgpt_search.cli --db /path/to/index.db "query"

Search Syntax

FTS5 query syntax (SQLite full-text query operators) is supported:

SyntaxExampleMeaning
Simple termstransformer attentionImplicit AND
Phrase"attention is all"Exact phrase match
Prefixtransfor*Words starting with "transfor"
ORpytorch OR tensorflowEither term
NOTpython NOT javaExclude term

Architecture

  • Engine: SQLite FTS5 (SQLite full-text search) with BM25 ranking (relevance scoring)
  • Indexing: Message-level rows, conversation metadata joined at query time
  • Boosting: Title at 10x weight, content at 1x, code at 0.5x
  • Tokenizer: Porter stemmer + Unicode61 (handles diacritics)
  • TF-IDF: scikit-learn TfidfVectorizer (term-weighting), unigrams + bigrams, code blocks stripped, top-10 keywords per conversation, min_df=2 for larger language groups and min_df=1 for small groups, max_df=0.8
  • Language Detection: langdetect per message, 15 languages supported
  • Parser: Canonical thread extraction via current_node backward traversal
  • Code separation: Fenced code blocks extracted to separate field
  • PUA cleanup: Unicode Private Use Area (PUA) citation markers stripped
  • Citeturn cleanup: ChatGPT citation markup (citeturn0search1, etc.) stripped

Performance

Tested on 149MB export (1,514 conversations, 16,689 messages):

MetricValue
Full index build (with TF-IDF)~26 seconds
TF-IDF extraction alone~3 seconds
Database size~89 MB
Keywords extracted15,085
Search latency<50ms

Anti-Patterns

Do NOTDo instead
Use for non-ChatGPT document searchUse your project's document search skill
Use for Apple Notes or ObsidianUse a dedicated document search tool
Expect semantic searchThis is lexical BM25 -- use exact terms, expand synonyms manually
Search single common words ("the", "is")Use qualifying terms to narrow results
Forget to rebuild after new exportRun --rebuild after importing new conversations.json
Expect TF-IDF keywords on fresh/tiny corporaSmall groups use min_df=1, but tiny exports can still yield sparse keywords

Error Handling

SymptomCauseFix
"Database not found"Index not builtRun --rebuild --export /path/to/conversations.json
No keyword resultsCorpus too small or low textual signalNormal for small exports; rebuild with more data
"Invalid search query"FTS5 syntax errorCheck query syntax; avoid unmatched quotes
scikit-learn warning during buildscikit-learn not installedRun python3 -m pip install scikit-learn

Bundled Resources Index

PathWhatWhen to load
./UPDATES.mdStructured changelog for AI agentsWhen checking for new features or updates
./UPDATE-GUIDE.mdInstructions for AI agents performing updatesWhen updating this skill
./references/installation-guide.mdDetailed install walkthrough for Claude Code and Codex CLIFirst-time setup or environment repair
./README.mdLocal package and development notesWhen debugging setup or extending the CLI
./scripts/setup.shOne-command dependency setup and index bootstrapDuring first-time setup or rebuild reset
./src/chatgpt_search/Search/index implementation modulesWhen patching ranking, parsing, or filters
./tests/Coverage for parser/index/search behaviorBefore refactors and when validating fixes

Capabilities

skillsource-buildoakskill-chatgpt-searchtopic-agent-skillstopic-ai-agentstopic-ai-toolstopic-automationtopic-browser-automationtopic-claude-codetopic-claude-skillstopic-codex

Install

Installnpx skills add buildoak/fieldwork-skills
Transportskills-sh
Protocolskill

Quality

0.46/ 1.00

deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 15 github stars · SKILL.md body (8,096 chars)

Provenance

Indexed fromgithub
Enriched2026-04-22 19:06:32Z · deterministic:skill-github:v1 · v1
First seen2026-04-18
Last seen2026-04-22

Agent access