{"id":"bcb49f55-cc5d-4e1e-be1b-cc03660495f4","shortId":"t7zc9R","kind":"skill","title":"baoyu-youtube-transcript","tagline":"Downloads YouTube video transcripts/subtitles and cover images by URL or video ID. Supports multiple languages, translation, chapters, and speaker identification. Caches raw data for fast re-formatting. Use when user asks to \"get YouTube transcript\", \"download subtitles\", \"get ca","description":"# YouTube Transcript\n\nDownloads transcripts (subtitles/captions) from YouTube videos. Works with both manually created and auto-generated transcripts. No API key or browser required — uses YouTube's InnerTube API directly and automatically falls back to `yt-dlp` when YouTube blocks the direct API path.\n\nFetches video metadata and cover image on first run, caches raw data for fast re-formatting.\n\n## Script Directory\n\nScripts in `scripts/` subdirectory. `{baseDir}` = this SKILL.md's directory path. Resolve `${BUN_X}` runtime: if `bun` installed → `bun`; if `npx` available → `npx -y bun`; else suggest installing bun. Replace `{baseDir}` and `${BUN_X}` with actual values.\n\n| Script | Purpose |\n|--------|---------|\n| `scripts/main.ts` | Transcript download CLI |\n\n## Usage\n\n```bash\n# Default: markdown with timestamps (English)\n${BUN_X} {baseDir}/scripts/main.ts <youtube-url-or-id>\n\n# Specify languages (priority order)\n${BUN_X} {baseDir}/scripts/main.ts <url> --languages zh,en,ja\n\n# Without timestamps\n${BUN_X} {baseDir}/scripts/main.ts <url> --no-timestamps\n\n# With chapter segmentation\n${BUN_X} {baseDir}/scripts/main.ts <url> --chapters\n\n# With speaker identification (requires AI post-processing)\n${BUN_X} {baseDir}/scripts/main.ts <url> --speakers\n\n# SRT subtitle file\n${BUN_X} {baseDir}/scripts/main.ts <url> --format srt\n\n# Translate transcript\n${BUN_X} {baseDir}/scripts/main.ts <url> --translate zh-Hans\n\n# List available transcripts\n${BUN_X} {baseDir}/scripts/main.ts <url> --list\n\n# Force re-fetch (ignore cache)\n${BUN_X} {baseDir}/scripts/main.ts <url> --refresh\n```\n\n## Options\n\n| Option | Description | Default |\n|--------|-------------|---------|\n| `<url-or-id>` | YouTube URL or video ID (multiple allowed) | Required |\n| `--languages <codes>` | Language codes, comma-separated, in priority order | `en` |\n| `--format <fmt>` | Output format: `text`, `srt` | `text` |\n| `--translate <code>` | Translate to specified language code | |\n| `--list` | List available transcripts instead of fetching | |\n| `--timestamps` | Include `[HH:MM:SS → HH:MM:SS]` timestamps per paragraph | on |\n| `--no-timestamps` | Disable timestamps | |\n| `--chapters` | Chapter segmentation from video description | |\n| `--speakers` | Raw transcript with metadata for speaker identification | |\n| `--exclude-generated` | Skip auto-generated transcripts | |\n| `--exclude-manually-created` | Skip manually created transcripts | |\n| `--refresh` | Force re-fetch, ignore cached data | |\n| `-o, --output <path>` | Save to specific file path | auto-generated |\n| `--output-dir <dir>` | Base output directory | `youtube-transcript` |\n\n## Optional Environment Variables\n\n| Variable | Description |\n|----------|-------------|\n| `YOUTUBE_TRANSCRIPT_COOKIES_FROM_BROWSER` | Passed to `yt-dlp --cookies-from-browser` during fallback, e.g. `chrome`, `safari`, `firefox`, or `chrome:Profile 1` |\n\n## Input Formats\n\nAccepts any of these as video input:\n- Full URL: `https://www.youtube.com/watch?v=dQw4w9WgXcQ`\n- Short URL: `https://youtu.be/dQw4w9WgXcQ`\n- Embed URL: `https://www.youtube.com/embed/dQw4w9WgXcQ`\n- Shorts URL: `https://www.youtube.com/shorts/dQw4w9WgXcQ`\n- Video ID: `dQw4w9WgXcQ`\n\n## Output Formats\n\n| Format | Extension | Description |\n|--------|-----------|-------------|\n| `text` | `.md` | Markdown with frontmatter (incl. `description`), title heading, summary, optional TOC/cover/timestamps/chapters/speakers |\n| `srt` | `.srt` | SubRip subtitle format for video players |\n\n## Output Directory\n\n```\nyoutube-transcript/\n├── .index.json                          # Video ID → directory path mapping (for cache lookup)\n└── {channel-slug}/{title-full-slug}/\n    ├── meta.json                        # Video metadata (title, channel, description, duration, chapters, etc.)\n    ├── transcript-raw.json              # Raw transcript snippets from YouTube API (cached)\n    ├── transcript-sentences.json        # Sentence-segmented transcript (split by punctuation, merged across snippets)\n    ├── imgs/\n    │   └── cover.jpg                    # Video thumbnail\n    ├── transcript.md                    # Markdown transcript (generated from sentences)\n    └── transcript.srt                   # SRT subtitle (generated from raw snippets, if --format srt)\n```\n\n- `{channel-slug}`: Channel name in kebab-case\n- `{title-full-slug}`: Full video title in kebab-case\n\nThe `--list` mode outputs to stdout only (no file saved).\n\n## Caching\n\nOn first fetch, the script saves:\n- `meta.json` — video metadata, chapters, cover image path, language info\n- `transcript-raw.json` — raw transcript snippets from YouTube API (`{ text, start, duration }[]`)\n- `transcript-sentences.json` — sentence-segmented transcript (`{ text, start: \"HH:mm:ss\", end: \"HH:mm:ss\" }[]`), split by sentence-ending punctuation (`.?!…。？！` etc.), timestamps proportionally allocated by character length, CJK-aware text merging\n- `imgs/cover.jpg` — video thumbnail\n\nSubsequent runs for the same video use cached data (no network calls). Use `--refresh` to force re-fetch. If a different language is requested, the cache is automatically refreshed.\n\nWhen YouTube returns anti-bot / blocked responses on the direct InnerTube path, the script retries with alternate client identities and then falls back to `yt-dlp` if available. If fallback is needed but `yt-dlp` is unavailable, the agent should decide how to make `yt-dlp` available and continue rather than pushing the installation decision to the user.\n\nSRT output (`--format srt`) is generated from `transcript-raw.json`. Text/markdown output uses `transcript-sentences.json` for natural sentence boundaries.\n\n## Workflow\n\nWhen user provides a YouTube URL and wants the transcript:\n\n1. Run with `--list` first if the user hasn't specified a language, to show available options\n2. **Always single-quote the URL** when running the script — zsh treats `?` as a glob wildcard, so an unquoted YouTube URL causes \"no matches found\": use `'https://www.youtube.com/watch?v=ID'`\n3. Default: run with `--chapters --speakers` for the richest output (chapters + speaker identification)\n3. The script auto-saves cached data + output file and prints the file path\n4. For `--speakers` mode: after the script saves the raw file, follow the speaker identification workflow below to post-process with speaker labels\n\nWhen user only wants a cover image or metadata, running the script with any option will also cache `meta.json` and `imgs/cover.jpg`.\n\nWhen re-formatting the same video (e.g., first text then SRT), the cached data is reused — no re-fetch needed.\n\n## Chapter & Speaker Workflow\n\n### Chapters (`--chapters`)\n\nThe script parses chapter timestamps from the video description (e.g., `0:00 Introduction`), segments the transcript by chapter boundaries, groups snippets into readable paragraphs, and saves as `.md` with a Table of Contents. No further processing needed.\n\nIf no chapter timestamps exist in the description, the transcript is output as grouped paragraphs without chapter headings.\n\n### Speaker Identification (`--speakers`)\n\nSpeaker identification requires AI processing. The script outputs a raw `.md` file containing:\n- YAML frontmatter with video metadata (title, channel, date, cover, description, language)\n- Video description (for speaker name extraction)\n- Chapter list from description (if available)\n- Raw transcript in SRT format (pre-computed start/end timestamps, token-efficient)\n\nAfter the script saves the raw file, spawn a sub-agent (use a cheaper model like Sonnet for cost efficiency) to process speaker identification:\n\n1. Read the saved `.md` file\n2. Read the prompt template at `{baseDir}/prompts/speaker-transcript.md`\n3. Process the raw transcript following the prompt:\n   - Identify speakers using video metadata (title → guest, channel → host, description → names)\n   - Detect speaker turns from conversation flow, question-answer patterns, and contextual cues\n   - Segment into chapters (use description chapters if available, else create from topic shifts)\n   - Format with `**Speaker Name:**` labels, paragraph grouping (2-4 sentences), and `[HH:MM:SS → HH:MM:SS]` timestamps\n4. Overwrite the `.md` file with the processed transcript (keep the YAML frontmatter)\n\nWhen `--speakers` is used, `--chapters` is implied — the processed output always includes chapter segmentation.\n\n## Error Cases\n\n| Error | Meaning |\n|-------|---------|\n| Transcripts disabled | Video has no captions at all |\n| No transcript found | Requested language not available |\n| Video unavailable | Video deleted, private, or region-locked |\n| IP blocked | Too many requests, try again later |\n| Age restricted | Video requires login for age verification |\n| bot detected | The script retries alternate clients and then `yt-dlp`; if fallback tooling is missing, the agent should resolve that itself, otherwise if it still fails try `YOUTUBE_TRANSCRIPT_COOKIES_FROM_BROWSER=safari` (or your browser) |","tags":["baoyu","youtube","transcript","skills","jimliu","agent-skills","claude-skills","codex-skills","openclaw-skills"],"capabilities":["skill","source-jimliu","skill-baoyu-youtube-transcript","topic-agent-skills","topic-claude-skills","topic-codex-skills","topic-openclaw-skills"],"categories":["baoyu-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/JimLiu/baoyu-skills/baoyu-youtube-transcript","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add JimLiu/baoyu-skills","source_repo":"https://github.com/JimLiu/baoyu-skills","install_from":"skills.sh"}},"qualityScore":"0.700","qualityRationale":"deterministic score 0.70 from registry signals: · indexed on github topic:agent-skills · 16958 github stars · SKILL.md body (8,912 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-03T00:52:28.120Z","embedding":null,"createdAt":"2026-04-18T21:53:39.660Z","updatedAt":"2026-05-03T00:52:28.120Z","lastSeenAt":"2026-05-03T00:52:28.120Z","tsv":"'-4':1068 '/dqw4w9wgxcq':404 '/embed/dqw4w9wgxcq':409 '/prompts/speaker-transcript.md':1014 '/scripts/main.ts':161,169,179,189,202,210,218,229,240 '/shorts/dqw4w9wgxcq':414 '/watch?v=dqw4w9wgxcq':399 '/watch?v=id''':768 '0':879 '00':880 '1':385,722,1001 '2':739,1007,1067 '3':769,782,1015 '4':797,1078 'accept':388 'across':490 'actual':143 'age':1141,1147 'agent':674,987,1167 'ai':195,930 'alloc':591 'allow':252 'also':837 'altern':650,1154 'alway':740,1101 'answer':1042 'anti':637 'anti-bot':636 'api':64,73,88,479,564 'ask':36 'auto':60,319,346,786 'auto-gener':59,318,345 'auto-sav':785 'automat':76,631 'avail':129,224,278,662,683,737,962,1054,1123 'awar':597 'back':78,656 'baoyu':2 'baoyu-youtube-transcript':1 'base':351 'basedir':113,138,160,168,178,188,201,209,217,228,239,1013 'bash':152 'block':85,639,1134 'bot':638,1149 'boundari':710,887 'browser':67,366,375,1182,1186 'bun':120,124,126,132,136,140,158,166,176,186,199,207,215,226,237 'ca':44 'cach':25,99,236,336,455,480,542,610,629,788,838,855 'call':614 'caption':1114 'case':520,531,1106 'caus':761 'channel':458,468,513,515,946,1030 'channel-slug':457,512 'chapter':21,184,190,300,301,471,552,773,779,864,867,868,872,886,908,922,957,1049,1052,1095,1103 'charact':593 'cheaper':990 'chrome':379,383 'cjk':596 'cjk-awar':595 'cli':150 'client':651,1155 'code':256,275 'comma':258 'comma-separ':257 'comput':970 'contain':939 'content':901 'contextu':1045 'continu':685 'convers':1038 'cooki':364,373,1180 'cookies-from-brows':372 'cost':995 'cover':10,94,553,826,948 'cover.jpg':493 'creat':57,325,328,1056 'cue':1046 'data':27,101,337,611,789,856 'date':947 'decid':676 'decis':691 'default':153,245,770 'delet':1127 'descript':244,305,361,422,429,469,877,913,949,952,960,1032,1051 'detect':1034,1150 'differ':624 'dir':350 'direct':74,87,643 'directori':108,117,353,444,451 'disabl':298,1110 'dlp':82,371,660,670,682,1160 'download':5,41,47,149 'dqw4w9wgxcq':417 'durat':470,567 'e.g':378,849,878 'effici':975,996 'els':133,1055 'emb':405 'en':172,263 'end':578,586 'english':157 'environ':358 'error':1105,1107 'etc':472,588 'exclud':315,323 'exclude-gener':314 'exclude-manually-cr':322 'exist':910 'extens':421 'extract':956 'fail':1176 'fall':77,655 'fallback':377,664,1162 'fast':29,103 'fetch':90,234,282,334,545,621,862 'file':206,343,540,791,795,807,938,982,1006,1082 'firefox':381 'first':97,544,726,850 'flow':1039 'follow':808,1020 'forc':231,331,618 'format':32,106,211,264,266,387,419,420,439,510,697,845,967,1060 'found':764,1119 'frontmatt':427,941,1090 'full':395,462,523,525 'generat':61,316,320,347,499,505,700 'get':38,43 'glob':754 'group':888,919,1066 'guest':1029 'han':222 'hasn':730 'head':431,923 'hh':285,288,575,579,1071,1074 'host':1031 'id':16,250,416,450 'ident':652 'identif':24,193,313,781,811,925,928,1000 'identifi':1023 'ignor':235,335 'imag':11,95,554,827 'img':492 'imgs/cover.jpg':600,841 'impli':1097 'incl':428 'includ':284,1102 'index.json':448 'info':557 'innertub':72,644 'input':386,394 'instal':125,135,690 'instead':280 'introduct':881 'ip':1133 'ja':173 'kebab':519,530 'kebab-cas':518,529 'keep':1087 'key':65 'label':820,1064 'languag':19,163,170,254,255,274,556,625,734,950,1121 'later':1140 'length':594 'like':992 'list':223,230,276,277,533,725,958 'lock':1132 'login':1145 'lookup':456 'make':679 'mani':1136 'manual':56,324,327 'map':453 'markdown':154,425,497 'match':763 'md':424,896,937,1005,1081 'mean':1108 'merg':489,599 'meta.json':464,549,839 'metadata':92,310,466,551,829,944,1027 'miss':1165 'mm':286,289,576,580,1072,1075 'mode':534,800 'model':991 'multipl':18,251 'name':516,955,1033,1063 'natur':708 'need':666,863,905 'network':613 'no-timestamp':180,295 'npx':128,130 'o':338 'option':242,243,357,433,738,835 'order':165,262 'otherwis':1172 'output':265,339,349,352,418,443,535,696,704,778,790,917,934,1100 'output-dir':348 'overwrit':1079 'paragraph':293,892,920,1065 'pars':871 'pass':367 'path':89,118,344,452,555,645,796 'pattern':1043 'per':292 'player':442 'post':197,816 'post-process':196,815 'pre':969 'pre-comput':968 'print':793 'prioriti':164,261 'privat':1128 'process':198,817,904,931,998,1016,1085,1099 'profil':384 'prompt':1010,1022 'proport':590 'provid':714 'punctuat':488,587 'purpos':146 'push':688 'question':1041 'question-answ':1040 'quot':743 'rather':686 'raw':26,100,307,474,507,559,806,936,963,981,1018 're':31,105,233,333,620,844,861 're-fetch':232,332,619,860 're-format':30,104,843 'read':1002,1008 'readabl':891 'refresh':241,330,616,632 'region':1131 'region-lock':1130 'replac':137 'request':627,1120,1137 'requir':68,194,253,929,1144 'resolv':119,1169 'respons':640 'restrict':1142 'retri':648,1153 'return':635 'reus':858 'richest':777 'run':98,604,723,747,771,830 'runtim':122 'safari':380,1183 'save':340,541,548,787,804,894,979,1004 'script':107,109,111,145,547,647,749,784,803,832,870,933,978,1152 'scripts/main.ts':147 'segment':185,302,484,571,882,1047,1104 'sentenc':483,501,570,585,709,1069 'sentence-end':584 'sentence-seg':482,569 'separ':259 'shift':1059 'short':400,410 'show':736 'singl':742 'single-quot':741 'skill' 'skill-baoyu-youtube-transcript' 'skill.md':115 'skip':317,326 'slug':459,463,514,524 'snippet':476,491,508,561,889 'sonnet':993 'source-jimliu' 'spawn':983 'speaker':23,192,203,306,312,774,780,799,810,819,865,924,926,927,954,999,1024,1035,1062,1092 'specif':342 'specifi':162,273,732 'split':486,582 'srt':204,212,268,435,436,503,511,695,698,853,966 'ss':287,290,577,581,1073,1076 'start':566,574 'start/end':971 'stdout':537 'still':1175 'sub':986 'sub-ag':985 'subdirectori':112 'subrip':437 'subsequ':603 'subtitl':42,205,438,504 'subtitles/captions':49 'suggest':134 'summari':432 'support':17 'tabl':899 'templat':1011 'text':267,269,423,565,573,598,851 'text/markdown':703 'thumbnail':495,602 'timestamp':156,175,182,283,291,297,299,589,873,909,972,1077 'titl':430,461,467,522,527,945,1028 'title-full-slug':460,521 'toc/cover/timestamps/chapters/speakers':434 'token':974 'token-effici':973 'tool':1163 'topic':1058 'topic-agent-skills' 'topic-claude-skills' 'topic-codex-skills' 'topic-openclaw-skills' 'transcript':4,40,46,48,62,148,214,225,279,308,321,329,356,363,447,475,485,498,560,572,721,884,915,964,1019,1086,1109,1118,1179 'transcript-raw.json':473,558,702 'transcript-sentences.json':481,568,706 'transcript.md':496 'transcript.srt':502 'transcripts/subtitles':8 'translat':20,213,219,270,271 'treat':751 'tri':1138,1177 'turn':1036 'unavail':672,1125 'unquot':758 'url':13,247,396,401,406,411,717,745,760 'usag':151 'use':33,69,609,615,705,765,988,1025,1050,1094 'user':35,694,713,729,822 'valu':144 'variabl':359,360 'verif':1148 'video':7,15,52,91,249,304,393,415,441,449,465,494,526,550,601,608,848,876,943,951,1026,1111,1124,1126,1143 'want':719,824 'wildcard':755 'without':174,921 'work':53 'workflow':711,812,866 'www.youtube.com':398,408,413,767 'www.youtube.com/embed/dqw4w9wgxcq':407 'www.youtube.com/shorts/dqw4w9wgxcq':412 'www.youtube.com/watch?v=dqw4w9wgxcq':397 'www.youtube.com/watch?v=id''':766 'x':121,141,159,167,177,187,200,208,216,227,238 'y':131 'yaml':940,1089 'youtu.be':403 'youtu.be/dqw4w9wgxcq':402 'youtub':3,6,39,45,51,70,84,246,355,362,446,478,563,634,716,759,1178 'youtube-transcript':354,445 'yt':81,370,659,669,681,1159 'yt-dlp':80,369,658,668,680,1158 'zh':171,221 'zh-han':220 'zsh':750","prices":[{"id":"eb1c153c-5673-4f2b-a1da-0482a091eaa4","listingId":"bcb49f55-cc5d-4e1e-be1b-cc03660495f4","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"JimLiu","category":"baoyu-skills","install_from":"skills.sh"},"createdAt":"2026-04-18T21:53:39.660Z"}],"sources":[{"listingId":"bcb49f55-cc5d-4e1e-be1b-cc03660495f4","source":"github","sourceId":"JimLiu/baoyu-skills/baoyu-youtube-transcript","sourceUrl":"https://github.com/JimLiu/baoyu-skills/tree/main/skills/baoyu-youtube-transcript","isPrimary":false,"firstSeenAt":"2026-04-18T21:53:39.660Z","lastSeenAt":"2026-05-03T00:52:28.120Z"}],"details":{"listingId":"bcb49f55-cc5d-4e1e-be1b-cc03660495f4","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"JimLiu","slug":"baoyu-youtube-transcript","github":{"repo":"JimLiu/baoyu-skills","stars":16958,"topics":["agent-skills","claude-skills","codex-skills","openclaw-skills"],"license":null,"html_url":"https://github.com/JimLiu/baoyu-skills","pushed_at":"2026-04-25T20:03:31Z","description":null,"skill_md_sha":"35a05640d56622839e2dd748afffd3dc3531849f","skill_md_path":"skills/baoyu-youtube-transcript/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/JimLiu/baoyu-skills/tree/main/skills/baoyu-youtube-transcript"},"layout":"multi","source":"github","category":"baoyu-skills","frontmatter":{"name":"baoyu-youtube-transcript","description":"Downloads YouTube video transcripts/subtitles and cover images by URL or video ID. Supports multiple languages, translation, chapters, and speaker identification. Caches raw data for fast re-formatting. Use when user asks to \"get YouTube transcript\", \"download subtitles\", \"get captions\", \"YouTube字幕\", \"YouTube封面\", \"视频封面\", \"video thumbnail\", \"video cover image\", or provides a YouTube URL and wants the transcript/subtitle text or cover image extracted."},"skills_sh_url":"https://skills.sh/JimLiu/baoyu-skills/baoyu-youtube-transcript"},"updatedAt":"2026-05-03T00:52:28.120Z"}}