{"id":"5ef88150-50fb-4368-9704-9e347f52aec7","shortId":"Eqssea","kind":"skill","title":"hugging-face-dataset-viewer","tagline":"Query Hugging Face datasets through the Dataset Viewer API for splits, rows, search, filters, and parquet links.","description":"# Hugging Face Dataset Viewer\n\n## When to Use\nUse this skill when you need read-only exploration of a Hugging Face dataset through the Dataset Viewer API.\n\nUse this skill to execute read-only Dataset Viewer API calls for dataset exploration and extraction.\n\n## Core workflow\n\n1. Optionally validate dataset availability with `/is-valid`.\n2. Resolve `config` + `split` with `/splits`.\n3. Preview with `/first-rows`.\n4. Paginate content with `/rows` using `offset` and `length` (max 100).\n5. Use `/search` for text matching and `/filter` for row predicates.\n6. Retrieve parquet links via `/parquet` and totals/metadata via `/size` and `/statistics`.\n\n## Defaults\n\n- Base URL: `https://datasets-server.huggingface.co`\n- Default API method: `GET`\n- Query params should be URL-encoded.\n- `offset` is 0-based.\n- `length` max is usually `100` for row-like endpoints.\n- Gated/private datasets require `Authorization: Bearer <HF_TOKEN>`.\n\n## Dataset Viewer\n\n- `Validate dataset`: `/is-valid?dataset=<namespace/repo>`\n- `List subsets and splits`: `/splits?dataset=<namespace/repo>`\n- `Preview first rows`: `/first-rows?dataset=<namespace/repo>&config=<config>&split=<split>`\n- `Paginate rows`: `/rows?dataset=<namespace/repo>&config=<config>&split=<split>&offset=<int>&length=<int>`\n- `Search text`: `/search?dataset=<namespace/repo>&config=<config>&split=<split>&query=<text>&offset=<int>&length=<int>`\n- `Filter with predicates`: `/filter?dataset=<namespace/repo>&config=<config>&split=<split>&where=<predicate>&orderby=<sort>&offset=<int>&length=<int>`\n- `List parquet shards`: `/parquet?dataset=<namespace/repo>`\n- `Get size totals`: `/size?dataset=<namespace/repo>`\n- `Get column statistics`: `/statistics?dataset=<namespace/repo>&config=<config>&split=<split>`\n- `Get Croissant metadata (if available)`: `/croissant?dataset=<namespace/repo>`\n\nPagination pattern:\n\n```bash\ncurl \"https://datasets-server.huggingface.co/rows?dataset=stanfordnlp/imdb&config=plain_text&split=train&offset=0&length=100\"\ncurl \"https://datasets-server.huggingface.co/rows?dataset=stanfordnlp/imdb&config=plain_text&split=train&offset=100&length=100\"\n```\n\nWhen pagination is partial, use response fields such as `num_rows_total`, `num_rows_per_page`, and `partial` to drive continuation logic.\n\nSearch/filter notes:\n\n- `/search` matches string columns (full-text style behavior is internal to the API).\n- `/filter` requires predicate syntax in `where` and optional sort in `orderby`.\n- Keep filtering and searches read-only and side-effect free.\n\n## Querying Datasets\n\nUse `npx parquetlens` with Hub parquet alias paths for SQL querying.\n\nParquet alias shape:\n\n```text\nhf://datasets/<namespace>/<repo>@~parquet/<config>/<split>/<shard>.parquet\n```\n\nDerive `<config>`, `<split>`, and `<shard>` from Dataset Viewer `/parquet`:\n\n```bash\ncurl -s \"https://datasets-server.huggingface.co/parquet?dataset=cfahlgren1/hub-stats\" \\\n  | jq -r '.parquet_files[] | \"hf://datasets/\\(.dataset)@~parquet/\\(.config)/\\(.split)/\\(.filename)\"'\n```\n\nRun SQL query:\n\n```bash\nnpx -y -p parquetlens -p @parquetlens/sql parquetlens \\\n  \"hf://datasets/<namespace>/<repo>@~parquet/<config>/<split>/<shard>.parquet\" \\\n  --sql \"SELECT * FROM data LIMIT 20\"\n```\n\n### SQL export\n\n- CSV: `--sql \"COPY (SELECT * FROM data LIMIT 1000) TO 'export.csv' (FORMAT CSV, HEADER, DELIMITER ',')\"`\n- JSON: `--sql \"COPY (SELECT * FROM data LIMIT 1000) TO 'export.json' (FORMAT JSON)\"`\n- Parquet: `--sql \"COPY (SELECT * FROM data LIMIT 1000) TO 'export.parquet' (FORMAT PARQUET)\"`\n\n## Creating and Uploading Datasets\n\nUse one of these flows depending on dependency constraints.\n\nZero local dependencies (Hub UI):\n\n- Create dataset repo in browser: `https://huggingface.co/new-dataset`\n- Upload parquet files in the repo \"Files and versions\" page.\n- Verify shards appear in Dataset Viewer:\n\n```bash\ncurl -s \"https://datasets-server.huggingface.co/parquet?dataset=<namespace>/<repo>\"\n```\n\nLow dependency CLI flow (`npx @huggingface/hub` / `hfjs`):\n\n- Set auth token:\n\n```bash\nexport HF_TOKEN=<your_hf_token>\n```\n\n- Upload parquet folder to a dataset repo (auto-creates repo if missing):\n\n```bash\nnpx -y @huggingface/hub upload datasets/<namespace>/<repo> ./local/parquet-folder data\n```\n\n- Upload as private repo on creation:\n\n```bash\nnpx -y @huggingface/hub upload datasets/<namespace>/<repo> ./local/parquet-folder data --private\n```\n\nAfter upload, call `/parquet` to discover `<config>/<split>/<shard>` values for querying with `@~parquet`.\n\n## Limitations\n- Use this skill only when the task clearly matches the scope described above.\n- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.\n- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.","tags":["hugging","face","dataset","viewer","antigravity","awesome","skills","sickn33","agent-skills","agentic-skills","ai-agent-skills","ai-agents"],"capabilities":["skill","source-sickn33","skill-hugging-face-dataset-viewer","topic-agent-skills","topic-agentic-skills","topic-ai-agent-skills","topic-ai-agents","topic-ai-coding","topic-ai-workflows","topic-antigravity","topic-antigravity-skills","topic-claude-code","topic-claude-code-skills","topic-codex-cli","topic-codex-skills"],"categories":["antigravity-awesome-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/sickn33/antigravity-awesome-skills/hugging-face-dataset-viewer","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add sickn33/antigravity-awesome-skills","source_repo":"https://github.com/sickn33/antigravity-awesome-skills","install_from":"skills.sh"}},"qualityScore":"0.700","qualityRationale":"deterministic score 0.70 from registry signals: · indexed on github topic:agent-skills · 37911 github stars · SKILL.md body (4,786 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T18:51:11.156Z","embedding":null,"createdAt":"2026-04-18T21:38:43.671Z","updatedAt":"2026-05-18T18:51:11.156Z","lastSeenAt":"2026-05-18T18:51:11.156Z","tsv":"'/croissant':232 '/filter':104,198,284 '/first-rows':85,171 '/is-valid':75,158 '/local/parquet-folder':490,504 '/new-dataset':434 '/parquet':113,210,332,510 '/parquet?dataset=':456 '/parquet?dataset=cfahlgren1/hub-stats':338 '/rows':90,178 '/rows?dataset=stanfordnlp/imdb&config=plain_text&split=train&offset=0&length=100':241 '/rows?dataset=stanfordnlp/imdb&config=plain_text&split=train&offset=100&length=100':245 '/search':99,187,270 '/size':117,216 '/splits':81,165 '/statistics':119,222 '0':137 '1':69 '100':96,143 '1000':378,392,404 '2':76 '20':368 '3':82 '4':86 '5':97 '6':108 'alia':315,321 'api':14,49,60,125,283 'appear':447 'ask':551 'auth':465 'author':152 'auto':479 'auto-cr':478 'avail':73,231 'base':121,138 'bash':237,333,352,451,467,484,498 'bearer':153 'behavior':278 'boundari':559 'browser':431 'call':61,509 'clarif':553 'clear':526 'cli':459 'column':220,273 'config':78,174,181,190,201,225,346 'constraint':421 'content':88 'continu':266 'copi':373,387,399 'core':67 'creat':409,427,480 'creation':497 'criteria':562 'croissant':228 'csv':371,382 'curl':238,242,334,452 'data':366,376,390,402,491,505 'dataset':4,9,12,25,44,47,58,63,72,150,154,157,159,166,172,179,188,199,211,217,223,233,308,324,330,343,344,360,412,428,449,476,489,503 'datasets-server.huggingface.co':123,240,244,337,455 'datasets-server.huggingface.co/parquet?dataset=':454 'datasets-server.huggingface.co/parquet?dataset=cfahlgren1/hub-stats':336 'datasets-server.huggingface.co/rows?dataset=stanfordnlp/imdb&config=plain_text&split=train&offset=0&length=100':239 'datasets-server.huggingface.co/rows?dataset=stanfordnlp/imdb&config=plain_text&split=train&offset=100&length=100':243 'default':120,124 'delimit':384 'depend':418,420,424,458 'deriv':327 'describ':530 'discov':512 'drive':265 'effect':305 'encod':134 'endpoint':148 'environ':542 'environment-specif':541 'execut':54 'expert':547 'explor':39,64 'export':370,468 'export.csv':380 'export.json':394 'export.parquet':406 'extract':66 'face':3,8,24,43 'field':252 'file':342,437,441 'filenam':348 'filter':19,195,296 'first':169 'flow':417,460 'folder':473 'format':381,395,407 'free':306 'full':275 'full-text':274 'gated/private':149 'get':127,213,219,227 'header':383 'hf':469 'hfjs':463 'hub':313,425 'hug':2,7,23,42 'hugging-face-dataset-view':1 'huggingface.co':433 'huggingface.co/new-dataset':432 'huggingface/hub':462,487,501 'input':556 'intern':280 'jq':339 'json':385,396 'keep':295 'length':94,139,184,194,206 'like':147 'limit':367,377,391,403,518 'link':22,111 'list':161,207 'local':423 'logic':267 'low':457 'match':102,271,527 'max':95,140 'metadata':229 'method':126 'miss':483,564 'namespace/repo':160,167,173,180,189,200,212,218,224,234 'need':35 'note':269 'npx':310,353,461,485,499 'num':255,258 'offset':92,135,183,193,205 'one':414 'option':70,291 'orderbi':204,294 'output':536 'p':355,357 'page':261,444 'pagin':87,176,235,247 'param':129 'parquet':21,110,208,314,320,325,326,341,345,361,362,397,408,436,472,517 'parquetlen':311,356,359 'parquetlens/sql':358 'partial':249,263 'path':316 'pattern':236 'per':260 'permiss':557 'predic':107,197,286 'preview':83,168 'privat':494,506 'queri':6,128,192,307,319,351,515 'r':340 'read':37,56,300 'read-on':36,55,299 'repo':429,440,477,481,495 'requir':151,285,555 'resolv':77 'respons':251 'retriev':109 'review':548 'row':17,106,146,170,177,256,259 'row-lik':145 'run':349 'safeti':558 'scope':529 'search':18,185,298 'search/filter':268 'select':364,374,388,400 'set':464 'shape':322 'shard':209,446 'side':304 'side-effect':303 'size':214 'skill':32,52,521 'skill-hugging-face-dataset-viewer' 'sort':292 'source-sickn33' 'specif':543 'split':16,79,164,175,182,191,202,226,347 'sql':318,350,363,369,372,386,398 'statist':221 'stop':549 'string':272 'style':277 'subset':162 'substitut':539 'success':561 'syntax':287 'task':525 'test':545 'text':101,186,276,323 'token':466,470 'topic-agent-skills' 'topic-agentic-skills' 'topic-ai-agent-skills' 'topic-ai-agents' 'topic-ai-coding' 'topic-ai-workflows' 'topic-antigravity' 'topic-antigravity-skills' 'topic-claude-code' 'topic-claude-code-skills' 'topic-codex-cli' 'topic-codex-skills' 'total':215,257 'totals/metadata':115 'treat':534 'ui':426 'upload':411,435,471,488,492,502,508 'url':122,133 'url-encod':132 'use':29,30,50,91,98,250,309,413,519 'usual':142 'valid':71,156,544 'valu':513 'verifi':445 'version':443 'via':112,116 'viewer':5,13,26,48,59,155,331,450 'workflow':68 'y':354,486,500 'zero':422","prices":[{"id":"7f94817d-7865-40b4-8775-2644f03a9df7","listingId":"5ef88150-50fb-4368-9704-9e347f52aec7","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"sickn33","category":"antigravity-awesome-skills","install_from":"skills.sh"},"createdAt":"2026-04-18T21:38:43.671Z"}],"sources":[{"listingId":"5ef88150-50fb-4368-9704-9e347f52aec7","source":"github","sourceId":"sickn33/antigravity-awesome-skills/hugging-face-dataset-viewer","sourceUrl":"https://github.com/sickn33/antigravity-awesome-skills/tree/main/skills/hugging-face-dataset-viewer","isPrimary":false,"firstSeenAt":"2026-04-18T21:38:43.671Z","lastSeenAt":"2026-05-18T18:51:11.156Z"}],"details":{"listingId":"5ef88150-50fb-4368-9704-9e347f52aec7","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"sickn33","slug":"hugging-face-dataset-viewer","github":{"repo":"sickn33/antigravity-awesome-skills","stars":37911,"topics":["agent-skills","agentic-skills","ai-agent-skills","ai-agents","ai-coding","ai-workflows","antigravity","antigravity-skills","claude-code","claude-code-skills","codex-cli","codex-skills","cursor","cursor-skills","developer-tools","gemini-cli","gemini-skills","kiro","mcp","skill-library"],"license":"mit","html_url":"https://github.com/sickn33/antigravity-awesome-skills","pushed_at":"2026-05-18T08:24:49Z","description":"Installable GitHub library of 1,400+ agentic skills for Claude Code, Cursor, Codex CLI, Gemini CLI, Antigravity, and more. Includes installer CLI, bundles, workflows, and official/community skill collections.","skill_md_sha":"46c4ec740704555ce07399cd523d8347eb004f01","skill_md_path":"skills/hugging-face-dataset-viewer/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/sickn33/antigravity-awesome-skills/tree/main/skills/hugging-face-dataset-viewer"},"layout":"multi","source":"github","category":"antigravity-awesome-skills","frontmatter":{"name":"hugging-face-dataset-viewer","description":"Query Hugging Face datasets through the Dataset Viewer API for splits, rows, search, filters, and parquet links."},"skills_sh_url":"https://skills.sh/sickn33/antigravity-awesome-skills/hugging-face-dataset-viewer"},"updatedAt":"2026-05-18T18:51:11.156Z"}}