Skillquality 0.46

chatgpt-search

Search ChatGPT conversation exports using SQLite FTS5 (SQLite full-text search). BM25-ranked full-text search (relevance scoring) with TF-IDF keywords (term-weighted key phrases), date/role/model/language filtering, and conversation browsing. Use when agent needs to search past C

Price

free

Protocol

skill

Verified

Endpoint

https://skills.sh/buildoak/fieldwork-skills/chatgpt-search

What it does

chatgpt-search

SQLite FTS5 (SQLite full-text search) engine for ChatGPT conversation exports. BM25-ranked full-text search (relevance scoring) with title boosting, code separation, TF-IDF (term-frequency/inverse-document-frequency) keyword extraction, and filtering by date, role, model, and language.

Setup

cd /path/to/skills/chatgpt-search
./scripts/setup.sh /path/to/your/conversations.json
export PYTHONPATH=/path/to/skills/chatgpt-search/src

Claude Code: copy this skill folder into .claude/skills/chatgpt-search/
Codex CLI: append this SKILL.md content to your project's root AGENTS.md

For the full installation walkthrough (prerequisites, verification, troubleshooting), see references/installation-guide.md.

Staying Updated

This skill ships with an UPDATES.md changelog and UPDATE-GUIDE.md for your AI agent.

After installing, tell your agent: "Check UPDATES.md in the chatgpt-search skill for any new features or changes."

When updating, tell your agent: "Read UPDATE-GUIDE.md and apply the latest changes from UPDATES.md."

Follow UPDATE-GUIDE.md so customized local files are diffed before any overwrite.

Repo: ./ Data: <your-export-path>/conversations.json Default DB: ~/.chatgpt-search/index.db

Quick Start

cd . && ./scripts/setup.sh <your-export-path>/conversations.json
export PYTHONPATH=./src
python -m chatgpt_search.cli "your topic query" --limit 10

Decision Tree

Need to search past ChatGPT conversations?
  |
  +-- Know a topic/keyword? --> Full-text search: "query"
  |     +-- Want only user messages? --> add --role user
  |     +-- Want a specific model's responses? --> add --model gpt-5
  |     +-- Want a date range? --> add --since 2025-01 --until 2025-06
  |     +-- Want a specific language? --> add --lang ru
  |
  +-- Know a conversation ID? --> --conversation <id> (or partial ID)
  |
  +-- Want to explore keywords?
  |     +-- Top corpus keywords --> --keywords
  |     +-- Keywords for a conversation --> --keywords --keywords-conversation <id>
  |
  +-- Want corpus overview? --> --stats
  |
  +-- Need to search non-ChatGPT docs? --> Use your project's document search skill
  +-- Need to search Apple Notes/Obsidian? --> Use a dedicated document search tool
  +-- Need web search? --> Use web-search skill (optional companion, not required)

Setup

cd . && ./scripts/setup.sh <your-export-path>/conversations.json

This installs dependencies (scikit-learn, langdetect) and builds the index from the provided conversations.json location. Rebuild takes ~26 seconds on the full corpus (1,514 conversations, 16,689 messages).

CLI Reference

# Set PYTHONPATH (or install the package)
export PYTHONPATH=./src

# --- Search ---

# Full-text search
python -m chatgpt_search.cli "transformer attention"

# Date filtering
python -m chatgpt_search.cli "kubernetes" --since 2025-01
python -m chatgpt_search.cli "pytorch" --since 2025-06 --until 2025-12

# Role filtering (search only user messages or assistant responses)
python -m chatgpt_search.cli "pricing strategy" --role user

# Model filtering (partial match)
python -m chatgpt_search.cli "code review" --model gpt-5
python -m chatgpt_search.cli "reasoning" --model o3

# Language filtering
python -m chatgpt_search.cli "machine learning" --lang en
python -m chatgpt_search.cli "обучение" --lang ru

# Phrase queries (exact match)
python -m chatgpt_search.cli '"attention is all you need"'

# Prefix queries
python -m chatgpt_search.cli "transfor*"

# Limit results
python -m chatgpt_search.cli "topic" --limit 5
python -m chatgpt_search.cli "topic" -n 50

# --- Browse ---

# Browse a full conversation
python -m chatgpt_search.cli --conversation <conversation-id>
python -m chatgpt_search.cli -c <partial-id>

# --- Keyword Exploration ---

# Top keywords across the corpus (by total TF-IDF score)
python -m chatgpt_search.cli --keywords

# Keywords for a specific conversation
python -m chatgpt_search.cli --keywords --keywords-conversation <conversation-id>

# --- Corpus Info ---

# Corpus statistics (conversations, messages, keywords, models, dates)
python -m chatgpt_search.cli --stats

# --- Index Management ---

# Rebuild index (includes TF-IDF enrichment)
python -m chatgpt_search.cli --rebuild --export /path/to/conversations.json

# Custom database location
python -m chatgpt_search.cli --db /path/to/index.db "query"

Search Syntax

FTS5 query syntax (SQLite full-text query operators) is supported:

Syntax	Example	Meaning
Simple terms	`transformer attention`	Implicit AND
Phrase	`"attention is all"`	Exact phrase match
Prefix	`transfor*`	Words starting with "transfor"
OR	`pytorch OR tensorflow`	Either term
NOT	`python NOT java`	Exclude term

Architecture

Engine: SQLite FTS5 (SQLite full-text search) with BM25 ranking (relevance scoring)
Indexing: Message-level rows, conversation metadata joined at query time
Boosting: Title at 10x weight, content at 1x, code at 0.5x
Tokenizer: Porter stemmer + Unicode61 (handles diacritics)
TF-IDF: scikit-learn TfidfVectorizer (term-weighting), unigrams + bigrams, code blocks stripped, top-10 keywords per conversation, min_df=2 for larger language groups and min_df=1 for small groups, max_df=0.8
Language Detection: langdetect per message, 15 languages supported
Parser: Canonical thread extraction via current_node backward traversal
Code separation: Fenced code blocks extracted to separate field
PUA cleanup: Unicode Private Use Area (PUA) citation markers stripped
Citeturn cleanup: ChatGPT citation markup (citeturn0search1, etc.) stripped

Performance

Tested on 149MB export (1,514 conversations, 16,689 messages):

Metric	Value
Full index build (with TF-IDF)	~26 seconds
TF-IDF extraction alone	~3 seconds
Database size	~89 MB
Keywords extracted	15,085
Search latency	<50ms

Anti-Patterns

Do NOT	Do instead
Use for non-ChatGPT document search	Use your project's document search skill
Use for Apple Notes or Obsidian	Use a dedicated document search tool
Expect semantic search	This is lexical BM25 -- use exact terms, expand synonyms manually
Search single common words ("the", "is")	Use qualifying terms to narrow results
Forget to rebuild after new export	Run --rebuild after importing new conversations.json
Expect TF-IDF keywords on fresh/tiny corpora	Small groups use min_df=1, but tiny exports can still yield sparse keywords

Error Handling

Symptom	Cause	Fix
"Database not found"	Index not built	Run `--rebuild --export /path/to/conversations.json`
No keyword results	Corpus too small or low textual signal	Normal for small exports; rebuild with more data
"Invalid search query"	FTS5 syntax error	Check query syntax; avoid unmatched quotes
scikit-learn warning during build	scikit-learn not installed	Run `python3 -m pip install scikit-learn`

Bundled Resources Index

Path	What	When to load
`./UPDATES.md`	Structured changelog for AI agents	When checking for new features or updates
`./UPDATE-GUIDE.md`	Instructions for AI agents performing updates	When updating this skill
`./references/installation-guide.md`	Detailed install walkthrough for Claude Code and Codex CLI	First-time setup or environment repair
`./README.md`	Local package and development notes	When debugging setup or extending the CLI
`./scripts/setup.sh`	One-command dependency setup and index bootstrap	During first-time setup or rebuild reset
`./src/chatgpt_search/`	Search/index implementation modules	When patching ranking, parsing, or filters
`./tests/`	Coverage for parser/index/search behavior	Before refactors and when validating fixes

Capabilities

skillsource-buildoakskill-chatgpt-searchtopic-agent-skillstopic-ai-agentstopic-ai-toolstopic-automationtopic-browser-automationtopic-claude-codetopic-claude-skillstopic-codex

Install

Installnpx skills add buildoak/fieldwork-skills

Sourcehttps://github.com/buildoak/fieldwork-skills/tree/main/skills/chatgpt-search

skills.shhttps://skills.sh/buildoak/fieldwork-skills/chatgpt-search

Transportskills-sh

Protocolskill

Quality

0.46/ 1.00

deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 15 github stars · SKILL.md body (8,096 chars)

Provenance

Indexed fromgithub

Enriched2026-04-22 19:06:32Z · deterministic:skill-github:v1 · v1

First seen2026-04-18

Last seen2026-04-22

Agent access

JSONhttps://clawmart.sh/api/listings/FemrvX