{"id":"4616e80e-3c26-4538-b146-ca36eb7379dc","shortId":"FemrvX","kind":"skill","title":"chatgpt-search","tagline":"Search ChatGPT conversation exports using SQLite FTS5 (SQLite full-text search).\nBM25-ranked full-text search (relevance scoring) with TF-IDF keywords\n(term-weighted key phrases),\ndate/role/model/language filtering, and conversation browsing.\nUse when agent needs to search past C","description":"# chatgpt-search\n\nSQLite FTS5 (SQLite full-text search) engine for ChatGPT conversation exports.\nBM25-ranked full-text search (relevance scoring)\nwith title boosting, code separation, TF-IDF (term-frequency/inverse-document-frequency)\nkeyword extraction,\nand filtering by date, role, model, and language.\n\n## Setup\n\n```bash\ncd /path/to/skills/chatgpt-search\n./scripts/setup.sh /path/to/your/conversations.json\nexport PYTHONPATH=/path/to/skills/chatgpt-search/src\n```\n\n- **Claude Code:** copy this skill folder into `.claude/skills/chatgpt-search/`\n- **Codex CLI:** append this SKILL.md content to your project's root `AGENTS.md`\n\nFor the full installation walkthrough (prerequisites, verification, troubleshooting), see [references/installation-guide.md](references/installation-guide.md).\n\n## Staying Updated\n\nThis skill ships with an `UPDATES.md` changelog and `UPDATE-GUIDE.md` for your AI agent.\n\nAfter installing, tell your agent: \"Check `UPDATES.md` in the chatgpt-search skill for any new features or changes.\"\n\nWhen updating, tell your agent: \"Read `UPDATE-GUIDE.md` and apply the latest changes from `UPDATES.md`.\"\n\nFollow `UPDATE-GUIDE.md` so customized local files are diffed before any overwrite.\n\n**Repo:** `./`\n**Data:** `<your-export-path>/conversations.json`\n**Default DB:** `~/.chatgpt-search/index.db`\n\n## Quick Start\n\n```bash\ncd . && ./scripts/setup.sh <your-export-path>/conversations.json\nexport PYTHONPATH=./src\npython -m chatgpt_search.cli \"your topic query\" --limit 10\n```\n\n## Decision Tree\n\n```text\nNeed to search past ChatGPT conversations?\n  |\n  +-- Know a topic/keyword? --> Full-text search: \"query\"\n  |     +-- Want only user messages? --> add --role user\n  |     +-- Want a specific model's responses? --> add --model gpt-5\n  |     +-- Want a date range? --> add --since 2025-01 --until 2025-06\n  |     +-- Want a specific language? --> add --lang ru\n  |\n  +-- Know a conversation ID? --> --conversation <id> (or partial ID)\n  |\n  +-- Want to explore keywords?\n  |     +-- Top corpus keywords --> --keywords\n  |     +-- Keywords for a conversation --> --keywords --keywords-conversation <id>\n  |\n  +-- Want corpus overview? --> --stats\n  |\n  +-- Need to search non-ChatGPT docs? --> Use your project's document search skill\n  +-- Need to search Apple Notes/Obsidian? --> Use a dedicated document search tool\n  +-- Need web search? --> Use web-search skill (optional companion, not required)\n```\n\n## Setup\n\n```bash\ncd . && ./scripts/setup.sh <your-export-path>/conversations.json\n```\n\nThis installs dependencies (scikit-learn, langdetect) and builds the index\nfrom the provided conversations.json location. Rebuild takes ~26 seconds\non the full corpus (1,514 conversations, 16,689 messages).\n\n## CLI Reference\n\n```bash\n# Set PYTHONPATH (or install the package)\nexport PYTHONPATH=./src\n\n# --- Search ---\n\n# Full-text search\npython -m chatgpt_search.cli \"transformer attention\"\n\n# Date filtering\npython -m chatgpt_search.cli \"kubernetes\" --since 2025-01\npython -m chatgpt_search.cli \"pytorch\" --since 2025-06 --until 2025-12\n\n# Role filtering (search only user messages or assistant responses)\npython -m chatgpt_search.cli \"pricing strategy\" --role user\n\n# Model filtering (partial match)\npython -m chatgpt_search.cli \"code review\" --model gpt-5\npython -m chatgpt_search.cli \"reasoning\" --model o3\n\n# Language filtering\npython -m chatgpt_search.cli \"machine learning\" --lang en\npython -m chatgpt_search.cli \"обучение\" --lang ru\n\n# Phrase queries (exact match)\npython -m chatgpt_search.cli '\"attention is all you need\"'\n\n# Prefix queries\npython -m chatgpt_search.cli \"transfor*\"\n\n# Limit results\npython -m chatgpt_search.cli \"topic\" --limit 5\npython -m chatgpt_search.cli \"topic\" -n 50\n\n# --- Browse ---\n\n# Browse a full conversation\npython -m chatgpt_search.cli --conversation <conversation-id>\npython -m chatgpt_search.cli -c <partial-id>\n\n# --- Keyword Exploration ---\n\n# Top keywords across the corpus (by total TF-IDF score)\npython -m chatgpt_search.cli --keywords\n\n# Keywords for a specific conversation\npython -m chatgpt_search.cli --keywords --keywords-conversation <conversation-id>\n\n# --- Corpus Info ---\n\n# Corpus statistics (conversations, messages, keywords, models, dates)\npython -m chatgpt_search.cli --stats\n\n# --- Index Management ---\n\n# Rebuild index (includes TF-IDF enrichment)\npython -m chatgpt_search.cli --rebuild --export /path/to/conversations.json\n\n# Custom database location\npython -m chatgpt_search.cli --db /path/to/index.db \"query\"\n```\n\n## Search Syntax\n\nFTS5 query syntax (SQLite full-text query operators) is supported:\n\n| Syntax | Example | Meaning |\n|--------|---------|---------|\n| Simple terms | `transformer attention` | Implicit AND |\n| Phrase | `\"attention is all\"` | Exact phrase match |\n| Prefix | `transfor*` | Words starting with \"transfor\" |\n| OR | `pytorch OR tensorflow` | Either term |\n| NOT | `python NOT java` | Exclude term |\n\n## Architecture\n\n- **Engine:** SQLite FTS5 (SQLite full-text search) with BM25 ranking (relevance scoring)\n- **Indexing:** Message-level rows, conversation metadata joined at query time\n- **Boosting:** Title at 10x weight, content at 1x, code at 0.5x\n- **Tokenizer:** Porter stemmer + Unicode61 (handles diacritics)\n- **TF-IDF:** scikit-learn TfidfVectorizer (term-weighting), unigrams + bigrams, code blocks stripped,\n  top-10 keywords per conversation, min_df=2 for larger language groups and min_df=1\n  for small groups, max_df=0.8\n- **Language Detection:** langdetect per message, 15 languages supported\n- **Parser:** Canonical thread extraction via `current_node` backward traversal\n- **Code separation:** Fenced code blocks extracted to separate field\n- **PUA cleanup:** Unicode Private Use Area (PUA) citation markers stripped\n- **Citeturn cleanup:** ChatGPT citation markup (citeturn0search1, etc.) stripped\n\n## Performance\n\nTested on 149MB export (1,514 conversations, 16,689 messages):\n\n| Metric | Value |\n|--------|-------|\n| Full index build (with TF-IDF) | ~26 seconds |\n| TF-IDF extraction alone | ~3 seconds |\n| Database size | ~89 MB |\n| Keywords extracted | 15,085 |\n| Search latency | <50ms |\n\n## Anti-Patterns\n\n| Do NOT | Do instead |\n|--------|------------|\n| Use for non-ChatGPT document search | Use your project's document search skill |\n| Use for Apple Notes or Obsidian | Use a dedicated document search tool |\n| Expect semantic search | This is lexical BM25 -- use exact terms, expand synonyms manually |\n| Search single common words (\"the\", \"is\") | Use qualifying terms to narrow results |\n| Forget to rebuild after new export | Run --rebuild after importing new conversations.json |\n| Expect TF-IDF keywords on fresh/tiny corpora | Small groups use min_df=1, but tiny exports can still yield sparse keywords |\n\n## Error Handling\n\n| Symptom | Cause | Fix |\n|---------|-------|-----|\n| \"Database not found\" | Index not built | Run `--rebuild --export /path/to/conversations.json` |\n| No keyword results | Corpus too small or low textual signal | Normal for small exports; rebuild with more data |\n| \"Invalid search query\" | FTS5 syntax error | Check query syntax; avoid unmatched quotes |\n| scikit-learn warning during build | scikit-learn not installed | Run `python3 -m pip install scikit-learn` |\n\n## Bundled Resources Index\n\n| Path | What | When to load |\n|------|------|--------------|\n| `./UPDATES.md` | Structured changelog for AI agents | When checking for new features or updates |\n| `./UPDATE-GUIDE.md` | Instructions for AI agents performing updates | When updating this skill |\n| `./references/installation-guide.md` | Detailed install walkthrough for Claude Code and Codex CLI | First-time setup or environment repair |\n| `./README.md` | Local package and development notes | When debugging setup or extending the CLI |\n| `./scripts/setup.sh` | One-command dependency setup and index bootstrap | During first-time setup or rebuild reset |\n| `./src/chatgpt_search/` | Search/index implementation modules | When patching ranking, parsing, or filters |\n| `./tests/` | Coverage for parser/index/search behavior | Before refactors and when validating fixes |","tags":["chatgpt","search","fieldwork","skills","buildoak","agent-skills","ai-agents","ai-tools","automation","browser-automation","claude-code","claude-skills"],"capabilities":["skill","source-buildoak","skill-chatgpt-search","topic-agent-skills","topic-ai-agents","topic-ai-tools","topic-automation","topic-browser-automation","topic-claude-code","topic-claude-skills","topic-codex"],"categories":["fieldwork-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/buildoak/fieldwork-skills/chatgpt-search","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add buildoak/fieldwork-skills","source_repo":"https://github.com/buildoak/fieldwork-skills","install_from":"skills.sh"}},"qualityScore":"0.457","qualityRationale":"deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 15 github stars · SKILL.md body (8,096 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-04-22T19:06:32.606Z","embedding":null,"createdAt":"2026-04-18T23:07:12.289Z","updatedAt":"2026-04-22T19:06:32.606Z","lastSeenAt":"2026-04-22T19:06:32.606Z","tsv":"'-01':257,398 '-06':260,405 '-10':675 '-12':408 '-5':249,436 '/.chatgpt-search/index.db':198 '/conversations.json':195,204,337 '/inverse-document-frequency':83 '/path/to/conversations.json':559,886 '/path/to/index.db':567 '/path/to/skills/chatgpt-search':97 '/path/to/skills/chatgpt-search/src':102 '/path/to/your/conversations.json':99 '/readme.md':985 '/references/installation-guide.md':968 '/scripts/setup.sh':98,203,336,998 '/src':207,379 '/src/chatgpt_search':1015 '/tests':1025 '/update-guide.md':957 '/updates.md':944 '0.5':651 '0.8':695 '085':776 '1':362,689,745,863 '10':215 '10x':644 '149mb':743 '15':701,775 '16':365,748 '1x':648 '2':681 '2025':256,259,397,404,407 '26':356,760 '3':767 '5':483 '50':489 '50ms':779 '514':363,746 '689':366,749 '89':771 'across':507 'add':237,246,254,265 'agent':42,148,153,172,949,961 'agents.md':122 'ai':147,948,960 'alon':766 'anti':781 'anti-pattern':780 'append':113 'appl':313,803 'appli':176 'architectur':616 'area':727 'assist':416 'attent':389,465,588,592 'avoid':914 'backward':711 'bash':95,201,334,370 'behavior':1029 'bigram':670 'block':672,717 'bm25':17,64,626,819 'bm25-ranked':16,63 'boost':74,641 'bootstrap':1006 'brows':39,490,491 'build':346,755,922 'built':882 'bundl':936 'c':47,502 'canon':705 'caus':875 'cd':96,202,335 'chang':167,179 'changelog':142,946 'chatgpt':2,5,49,60,159,223,301,734,791 'chatgpt-search':1,48,158 'chatgpt_search.cli':210,387,394,401,420,431,439,447,454,464,474,480,486,497,501,518,527,543,556,565 'check':154,911,951 'citat':729,735 'citeturn':732 'citeturn0search1':737 'claud':103,973 'claude/skills/chatgpt-search':110 'cleanup':723,733 'cli':112,368,977,997 'code':75,104,432,649,671,713,716,974 'codex':111,976 'command':1001 'common':828 'companion':330 'content':116,646 'convers':6,38,61,224,270,272,287,291,364,494,498,524,531,536,635,678,747 'conversations.json':352,849 'copi':105 'corpora':857 'corpus':281,293,361,509,532,534,890 'coverag':1026 'current':709 'custom':185,560 'data':194,904 'databas':561,769,877 'date':89,252,390,540 'date/role/model/language':35 'db':197,566 'debug':992 'decis':216 'dedic':317,809 'default':196 'depend':340,1002 'detail':969 'detect':697 'develop':989 'df':680,688,694,862 'diacrit':658 'dif':189 'doc':302 'document':307,318,792,798,810 'either':608 'en':451 'engin':58,617 'enrich':553 'environ':983 'error':872,910 'etc':738 'exact':460,595,821 'exampl':583 'exclud':614 'expand':823 'expect':813,850 'explor':278,504 'export':7,62,100,205,377,558,744,843,866,885,900 'extend':995 'extract':85,707,718,765,774 'featur':165,954 'fenc':715 'field':721 'file':187 'filter':36,87,391,410,426,444,1024 'first':979,1009 'first-tim':978,1008 'fix':876,1035 'folder':108 'follow':182 'forget':838 'found':879 'frequenc':82 'fresh/tiny':856 'fts5':10,52,571,619,908 'full':13,20,55,67,125,229,360,382,493,576,622,753 'full-text':12,19,54,66,228,381,575,621 'gpt':248,435 'group':685,692,859 'handl':657,873 'id':271,275 'idf':28,79,514,552,661,759,764,853 'implement':1017 'implicit':589 'import':847 'includ':549 'index':348,545,548,630,754,880,938,1005 'info':533 'instal':126,150,339,374,927,932,970 'instead':786 'instruct':958 'invalid':905 'java':613 'join':637 'key':33 'keyword':29,84,279,282,283,284,288,290,503,506,519,520,528,530,538,676,773,854,871,888 'keywords-convers':289,529 'know':225,268 'kubernet':395 'lang':266,450,456 'langdetect':344,698 'languag':93,264,443,684,696,702 'larger':683 'latenc':778 'latest':178 'learn':343,449,664,919,925,935 'level':633 'lexic':818 'limit':214,476,482 'load':943 'local':186,986 'locat':353,562 'low':894 'm':209,386,393,400,419,430,438,446,453,463,473,479,485,496,500,517,526,542,555,564,930 'machin':448 'manag':546 'manual':825 'marker':730 'markup':736 'match':428,461,597 'max':693 'mb':772 'mean':584 'messag':236,367,414,537,632,700,750 'message-level':631 'metadata':636 'metric':751 'min':679,687,861 'model':91,243,247,425,434,441,539 'modul':1018 'n':488 'narrow':836 'need':43,219,296,310,321,469 'new':164,842,848,953 'node':710 'non':300,790 'non-chatgpt':299,789 'normal':897 'note':804,990 'notes/obsidian':314 'o3':442 'obsidian':806 'one':1000 'one-command':999 'oper':579 'option':329 'overview':294 'overwrit':192 'packag':376,987 'pars':1022 'parser':704 'parser/index/search':1028 'partial':274,427 'past':46,222 'patch':1020 'path':939 'pattern':782 'per':677,699 'perform':740,962 'phrase':34,458,591,596 'pip':931 'porter':654 'prefix':470,598 'prerequisit':128 'price':421 'privat':725 'project':119,305,796 'provid':351 'pua':722,728 'python':208,385,392,399,418,429,437,445,452,462,472,478,484,495,499,516,525,541,554,563,611 'python3':929 'pythonpath':101,206,372,378 'pytorch':402,605 'qualifi':833 'queri':213,232,459,471,568,572,578,639,907,912 'quick':199 'quot':916 'rang':253 'rank':18,65,627,1021 'read':173 'reason':440 'rebuild':354,547,557,840,845,884,901,1013 'refactor':1031 'refer':369 'references/installation-guide.md':132,133 'relev':23,70,628 'repair':984 'repo':193 'requir':332 'reset':1014 'resourc':937 'respons':245,417 'result':477,837,889 'review':433 'role':90,238,409,423 'root':121 'row':634 'ru':267,457 'run':844,883,928 'scikit':342,663,918,924,934 'scikit-learn':341,662,917,923,933 'score':24,71,515,629 'search':3,4,15,22,45,50,57,69,160,221,231,298,308,312,319,323,327,380,384,411,569,624,777,793,799,811,815,826,906 'search/index':1016 'second':357,761,768 'see':131 'semant':814 'separ':76,714,720 'set':371 'setup':94,333,981,993,1003,1011 'ship':138 'signal':896 'simpl':585 'sinc':255,396,403 'singl':827 'size':770 'skill':107,137,161,309,328,800,967 'skill-chatgpt-search' 'skill.md':115 'small':691,858,892,899 'source-buildoak' 'spars':870 'specif':242,263,523 'sqlite':9,11,51,53,574,618,620 'start':200,601 'stat':295,544 'statist':535 'stay':134 'stemmer':655 'still':868 'strategi':422 'strip':673,731,739 'structur':945 'support':581,703 'symptom':874 'synonym':824 'syntax':570,573,582,909,913 'take':355 'tell':151,170 'tensorflow':607 'term':31,81,586,609,615,667,822,834 'term-frequ':80 'term-weight':30,666 'test':741 'text':14,21,56,68,218,230,383,577,623 'textual':895 'tf':27,78,513,551,660,758,763,852 'tf-idf':26,77,512,550,659,757,762,851 'tfidfvector':665 'thread':706 'time':640,980,1010 'tini':865 'titl':73,642 'token':653 'tool':320,812 'top':280,505,674 'topic':212,481,487 'topic-agent-skills' 'topic-ai-agents' 'topic-ai-tools' 'topic-automation' 'topic-browser-automation' 'topic-claude-code' 'topic-claude-skills' 'topic-codex' 'topic/keyword':227 'total':511 'transfor':475,599,603 'transform':388,587 'travers':712 'tree':217 'troubleshoot':130 'unicod':724 'unicode61':656 'unigram':669 'unmatch':915 'updat':135,169,956,963,965 'update-guide.md':144,174,183 'updates.md':141,155,181 'use':8,40,303,315,324,726,787,794,801,807,820,832,860 'user':235,239,413,424 'valid':1034 'valu':752 'verif':129 'via':708 'walkthrough':127,971 'want':233,240,250,261,276,292 'warn':920 'web':322,326 'web-search':325 'weight':32,645,668 'word':600,829 'x':652 'yield':869 'обучение':455","prices":[{"id":"b7393733-1a06-4a4b-82c0-de140b8447e1","listingId":"4616e80e-3c26-4538-b146-ca36eb7379dc","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"buildoak","category":"fieldwork-skills","install_from":"skills.sh"},"createdAt":"2026-04-18T23:07:12.289Z"}],"sources":[{"listingId":"4616e80e-3c26-4538-b146-ca36eb7379dc","source":"github","sourceId":"buildoak/fieldwork-skills/chatgpt-search","sourceUrl":"https://github.com/buildoak/fieldwork-skills/tree/main/skills/chatgpt-search","isPrimary":false,"firstSeenAt":"2026-04-18T23:07:12.289Z","lastSeenAt":"2026-04-22T19:06:32.606Z"}],"details":{"listingId":"4616e80e-3c26-4538-b146-ca36eb7379dc","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"buildoak","slug":"chatgpt-search","github":{"repo":"buildoak/fieldwork-skills","stars":15,"topics":["agent-skills","ai-agents","ai-tools","automation","browser-automation","claude-code","claude-skills","codex"],"license":"apache-2.0","html_url":"https://github.com/buildoak/fieldwork-skills","pushed_at":"2026-03-18T08:36:25Z","description":"Battle-tested skills for AI agents that do real work","skill_md_sha":"74164db5b9acc823b4adf2bcf4104f7695f3b846","skill_md_path":"skills/chatgpt-search/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/buildoak/fieldwork-skills/tree/main/skills/chatgpt-search"},"layout":"multi","source":"github","category":"fieldwork-skills","frontmatter":{"name":"chatgpt-search","description":"Search ChatGPT conversation exports using SQLite FTS5 (SQLite full-text search).\nBM25-ranked full-text search (relevance scoring) with TF-IDF keywords\n(term-weighted key phrases),\ndate/role/model/language filtering, and conversation browsing.\nUse when agent needs to search past ChatGPT conversations by topic,\nfind specific discussions, browse conversation history,\nor find conversations by extracted keywords.\nDo NOT use for non-ChatGPT knowledge bases — use a dedicated document search tool.\nDo NOT use for Apple Notes or Obsidian — use a dedicated document search tool."},"skills_sh_url":"https://skills.sh/buildoak/fieldwork-skills/chatgpt-search"},"updatedAt":"2026-04-22T19:06:32.606Z"}}