{"id":"10cbabb1-46a2-4bb6-b218-a84ae788f48b","shortId":"N9yqqD","kind":"skill","title":"pilot-data-labeling-pipeline-setup","tagline":"Deploy a data labeling pipeline with 4 agents for ingestion, auto-labeling, quality review, and dataset export.  Use this skill when: 1. User wants to set up a data labeling or annotation pipeline 2. User is configuring an agent as part of a labeling workflow 3. User asks about M","description":"# Data Labeling Pipeline Setup\n\nDeploy 4 agents that ingest raw data, apply ML labels, review quality, and export training-ready datasets.\n\n## Roles\n\n| Role | Hostname | Skills | Purpose |\n|------|----------|--------|---------|\n| ingester | `<prefix>-ingester` | pilot-s3-bridge, pilot-stream-data, pilot-task-parallel | Accepts raw data batches, splits into work items |\n| labeler | `<prefix>-labeler` | pilot-task-router, pilot-dataset, pilot-metrics | Applies ML-based labels to work items |\n| reviewer | `<prefix>-reviewer` | pilot-review, pilot-event-filter, pilot-alert | Samples labeled items, checks accuracy, flags disagreements |\n| exporter | `<prefix>-exporter` | pilot-dataset, pilot-share, pilot-webhook-bridge | Packages approved labels into training-ready datasets |\n\n## Setup Procedure\n\n**Step 1:** Ask the user which role this agent should play and what prefix to use.\n\n**Step 2:** Install the skills for the chosen role:\n```bash\n# ingester:\nclawhub install pilot-s3-bridge pilot-stream-data pilot-task-parallel\n# labeler:\nclawhub install pilot-task-router pilot-dataset pilot-metrics\n# reviewer:\nclawhub install pilot-review pilot-event-filter pilot-alert\n# exporter:\nclawhub install pilot-dataset pilot-share pilot-webhook-bridge\n```\n\n**Step 3:** Set the hostname:\n```bash\npilotctl --json set-hostname <prefix>-<role>\n```\n\n**Step 4:** Write the setup manifest:\n```bash\nmkdir -p ~/.pilot/setups\ncat > ~/.pilot/setups/data-labeling-pipeline.json << 'MANIFEST'\n{\n  \"setup\": \"data-labeling-pipeline\",\n  \"setup_name\": \"Data Labeling Pipeline\",\n  \"role\": \"<ROLE_ID>\",\n  \"role_name\": \"<ROLE_NAME>\",\n  \"hostname\": \"<prefix>-<role>\",\n  \"description\": \"<ROLE_DESCRIPTION>\",\n  \"skills\": { \"<skill>\": \"<contextual description>\" },\n  \"peers\": [ { \"role\": \"...\", \"hostname\": \"...\", \"description\": \"...\" } ],\n  \"data_flows\": [ { \"direction\": \"send|receive\", \"peer\": \"...\", \"port\": 1002, \"topic\": \"...\", \"description\": \"...\" } ],\n  \"handshakes_needed\": [ \"<peer-hostname>\" ]\n}\nMANIFEST\n```\n\n**Step 5:** Tell the user to initiate handshakes with direct communication peers.\n\n## Manifest Templates Per Role\n\n### ingester\n```json\n{\"setup\":\"data-labeling-pipeline\",\"setup_name\":\"Data Labeling Pipeline\",\"role\":\"ingester\",\"role_name\":\"Data Ingester\",\"hostname\":\"<prefix>-ingester\",\"description\":\"Accepts raw data batches from S3 or webhooks. Splits into work items and distributes.\",\"skills\":{\"pilot-s3-bridge\":\"Pull raw data batches from S3 buckets on schedule or webhook trigger.\",\"pilot-stream-data\":\"Stream work items to labeler as they are split from batches.\",\"pilot-task-parallel\":\"Parallelize batch splitting across available workers.\"},\"peers\":[{\"role\":\"labeler\",\"hostname\":\"<prefix>-labeler\",\"description\":\"Receives work items for labeling\"}],\"data_flows\":[{\"direction\":\"send\",\"peer\":\"<prefix>-labeler\",\"port\":1002,\"topic\":\"work-item\",\"description\":\"Work items with raw data references\"}],\"handshakes_needed\":[\"<prefix>-labeler\"]}\n```\n\n### labeler\n```json\n{\"setup\":\"data-labeling-pipeline\",\"setup_name\":\"Data Labeling Pipeline\",\"role\":\"labeler\",\"role_name\":\"Auto Labeler\",\"hostname\":\"<prefix>-labeler\",\"description\":\"Applies ML-based labels, classifications, bounding boxes, or entity tags to work items.\",\"skills\":{\"pilot-task-router\":\"Route work items to appropriate ML models by data type.\",\"pilot-dataset\":\"Store and retrieve labeled data records.\",\"pilot-metrics\":\"Track labeling throughput, model confidence distributions.\"},\"peers\":[{\"role\":\"ingester\",\"hostname\":\"<prefix>-ingester\",\"description\":\"Sends work items for labeling\"},{\"role\":\"reviewer\",\"hostname\":\"<prefix>-reviewer\",\"description\":\"Receives labeled items for quality review\"}],\"data_flows\":[{\"direction\":\"receive\",\"peer\":\"<prefix>-ingester\",\"port\":1002,\"topic\":\"work-item\",\"description\":\"Work items with raw data references\"},{\"direction\":\"send\",\"peer\":\"<prefix>-reviewer\",\"port\":1002,\"topic\":\"labeled-item\",\"description\":\"Labeled items for quality review\"},{\"direction\":\"receive\",\"peer\":\"<prefix>-reviewer\",\"port\":1002,\"topic\":\"review-feedback\",\"description\":\"Feedback on rejected labels for re-labeling\"}],\"handshakes_needed\":[\"<prefix>-ingester\",\"<prefix>-reviewer\"]}\n```\n\n### reviewer\n```json\n{\"setup\":\"data-labeling-pipeline\",\"setup_name\":\"Data Labeling Pipeline\",\"role\":\"reviewer\",\"role_name\":\"Quality Reviewer\",\"hostname\":\"<prefix>-reviewer\",\"description\":\"Samples labeled items, checks accuracy, flags disagreements, computes inter-annotator agreement.\",\"skills\":{\"pilot-review\":\"Score labeled items against quality criteria and flag disagreements.\",\"pilot-event-filter\":\"Filter low-confidence labels for priority review.\",\"pilot-alert\":\"Alert on quality drops or inter-annotator agreement below threshold.\"},\"peers\":[{\"role\":\"labeler\",\"hostname\":\"<prefix>-labeler\",\"description\":\"Sends labeled items for review\"},{\"role\":\"exporter\",\"hostname\":\"<prefix>-exporter\",\"description\":\"Receives approved labels for export\"}],\"data_flows\":[{\"direction\":\"receive\",\"peer\":\"<prefix>-labeler\",\"port\":1002,\"topic\":\"labeled-item\",\"description\":\"Labeled items for quality review\"},{\"direction\":\"send\",\"peer\":\"<prefix>-labeler\",\"port\":1002,\"topic\":\"review-feedback\",\"description\":\"Feedback for re-labeling rejected items\"},{\"direction\":\"send\",\"peer\":\"<prefix>-exporter\",\"port\":1002,\"topic\":\"approved-label\",\"description\":\"Approved labels ready for packaging\"}],\"handshakes_needed\":[\"<prefix>-labeler\",\"<prefix>-exporter\"]}\n```\n\n### exporter\n```json\n{\"setup\":\"data-labeling-pipeline\",\"setup_name\":\"Data Labeling Pipeline\",\"role\":\"exporter\",\"role_name\":\"Dataset Exporter\",\"hostname\":\"<prefix>-exporter\",\"description\":\"Packages reviewed labels into training-ready datasets (COCO, VOC, JSONL). Publishes to storage.\",\"skills\":{\"pilot-dataset\":\"Assemble labeled items into structured dataset formats.\",\"pilot-share\":\"Upload packaged datasets to S3 or shared storage.\",\"pilot-webhook-bridge\":\"Notify downstream consumers when datasets are published.\"},\"peers\":[{\"role\":\"reviewer\",\"hostname\":\"<prefix>-reviewer\",\"description\":\"Sends approved labels for packaging\"}],\"data_flows\":[{\"direction\":\"receive\",\"peer\":\"<prefix>-reviewer\",\"port\":1002,\"topic\":\"approved-label\",\"description\":\"Approved labels ready for packaging\"},{\"direction\":\"send\",\"peer\":\"external\",\"port\":443,\"topic\":\"dataset-published\",\"description\":\"Notification that a new dataset is available\"}],\"handshakes_needed\":[\"<prefix>-reviewer\"]}\n```\n\n## Data Flows\n\n- `ingester -> labeler` : work-item events (port 1002)\n- `labeler -> reviewer` : labeled-item events (port 1002)\n- `reviewer -> labeler` : review-feedback events (port 1002)\n- `reviewer -> exporter` : approved-label events (port 1002)\n- `exporter -> external` : dataset-published notifications (port 443)\n\n## Handshakes\n\n```bash\n# ingester <-> labeler:\npilotctl --json handshake <prefix>-labeler \"setup: data-labeling-pipeline\"\npilotctl --json handshake <prefix>-ingester \"setup: data-labeling-pipeline\"\n\n# labeler <-> reviewer:\npilotctl --json handshake <prefix>-reviewer \"setup: data-labeling-pipeline\"\npilotctl --json handshake <prefix>-labeler \"setup: data-labeling-pipeline\"\n\n# reviewer <-> exporter:\npilotctl --json handshake <prefix>-exporter \"setup: data-labeling-pipeline\"\npilotctl --json handshake <prefix>-reviewer \"setup: data-labeling-pipeline\"\n```\n\n## Workflow Example\n\n```bash\n# On labeler — subscribe to work items:\npilotctl --json subscribe <prefix>-ingester work-item\n\n# On ingester — publish a work item:\npilotctl --json publish <prefix>-labeler work-item '{\"batch_id\":\"batch-042\",\"item_id\":\"img-0017\",\"type\":\"image\",\"s3_uri\":\"s3://raw-data/batch-042/img-0017.jpg\"}'\n\n# On reviewer — subscribe to labeled items:\npilotctl --json subscribe <prefix>-labeler labeled-item\n\n# On exporter — subscribe to approved labels:\npilotctl --json subscribe <prefix>-reviewer approved-label\n```\n\n## Dependencies\n\nRequires `pilot-protocol` skill, `pilotctl` binary, `clawhub` binary, and a running daemon.","tags":["pilot","data","labeling","pipeline","setup","skills","teoslayer","agent-skills","ai-agents","clawhub","networking","openclaw"],"capabilities":["skill","source-teoslayer","skill-pilot-data-labeling-pipeline-setup","topic-agent-skills","topic-ai-agents","topic-clawhub","topic-networking","topic-openclaw","topic-overlay-network","topic-p2p","topic-pilot-protocol"],"categories":["pilot-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/TeoSlayer/pilot-skills/pilot-data-labeling-pipeline-setup","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add TeoSlayer/pilot-skills","source_repo":"https://github.com/TeoSlayer/pilot-skills","install_from":"skills.sh"}},"qualityScore":"0.453","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 6 github stars · SKILL.md body (7,892 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:14:53.668Z","embedding":null,"createdAt":"2026-05-18T13:22:37.547Z","updatedAt":"2026-05-18T19:14:53.668Z","lastSeenAt":"2026-05-18T19:14:53.668Z","tsv":"'-0017':986 '-042':982 '/.pilot/setups':268 '/.pilot/setups/data-labeling-pipeline.json':270 '/raw-data/batch-042/img-0017.jpg':992 '1':29,169 '1002':299,416,528,545,561,679,695,713,814,855,863,871,879 '2':41,185 '3':53,249 '4':13,63,260 '443':830,887 '5':306 'accept':99,342 'accuraci':143,604 'across':395 'agent':14,46,64,176 'agreement':611,648 'alert':138,234,639,640 'annot':39,610,647 'appli':69,119,452 'appropri':475 'approv':159,668,716,719,803,817,820,875,1010,1017 'approved-label':715,816,874,1016 'ask':55,170 'assembl':767 'auto':18,447 'auto-label':17 'avail':396,842 'base':122,455 'bash':193,253,265,889,952 'batch':102,345,364,387,393,979,981 'binari':1026,1028 'bound':458 'box':459 'bridg':90,157,200,247,360,788 'bucket':367 'cat':269 'check':142,603 'chosen':191 'classif':457 'clawhub':195,210,223,236,1027 'coco':757 'communic':315 'comput':607 'confid':497,632 'configur':44 'consum':791 'criteria':621 'daemon':1032 'data':3,9,36,58,68,94,101,204,274,279,292,325,330,337,344,363,376,409,426,435,440,479,488,521,538,583,588,672,732,737,807,846,898,907,918,927,938,947 'data-labeling-pipelin':273,324,434,582,731,897,906,917,926,937,946 'dataset':23,79,115,150,165,218,240,483,744,756,766,772,779,793,833,840,883 'dataset-publish':832,882 'depend':1019 'deploy':7,62 'descript':286,291,301,341,403,421,451,504,514,533,550,566,599,656,666,684,700,718,748,801,819,835 'direct':294,314,411,523,540,556,674,690,708,809,825 'disagr':145,606,624 'distribut':355,498 'downstream':790 'drop':643 'entiti':461 'event':134,230,627,853,861,869,877 'exampl':951 'export':24,75,146,147,235,663,665,671,711,727,728,741,745,747,873,880,931,935,1007 'extern':828,881 'feedback':565,567,699,701,868 'filter':135,231,628,629 'flag':144,605,623 'flow':293,410,522,673,808,847 'format':773 'handshak':302,312,428,575,724,843,888,894,903,914,923,934,943 'hostnam':82,252,258,285,290,339,401,449,502,512,597,654,664,746,799 'id':980,984 'imag':988 'img':985 'ingest':16,66,85,86,194,321,334,338,340,501,503,526,577,848,890,904,962,967 'initi':311 'instal':186,196,211,224,237 'inter':609,646 'inter-annot':608,645 'item':106,126,141,353,379,406,420,423,465,473,507,517,532,535,549,552,602,618,659,683,686,707,769,852,860,958,965,971,978,983,998,1005 'json':255,322,432,580,729,893,902,913,922,933,942,960,973,1000,1013 'jsonl':759 'label':4,10,19,37,51,59,71,107,108,123,140,160,209,275,280,326,331,381,400,402,408,414,430,431,436,441,444,448,450,456,487,494,509,516,548,551,570,574,584,589,601,617,633,653,655,658,669,677,682,685,693,705,717,720,726,733,738,751,768,804,818,821,849,856,859,865,876,891,895,899,908,910,919,924,928,939,948,954,975,997,1002,1004,1011,1018 'labeled-item':547,681,858,1003 'low':631 'low-confid':630 'm':57 'manifest':264,271,304,317 'metric':118,221,492 'mkdir':266 'ml':70,121,454,476 'ml-base':120,453 'model':477,496 'name':278,284,329,336,439,446,587,594,736,743 'need':303,429,576,725,844 'new':839 'notif':836,885 'notifi':789 'p':267 'packag':158,723,749,778,806,824 'parallel':98,208,391,392 'part':48 'peer':288,297,316,398,413,499,525,542,558,651,676,692,710,796,811,827 'per':319 'pilot':2,88,92,96,110,114,117,130,133,137,149,152,155,198,202,206,213,217,220,226,229,233,239,242,245,358,374,389,468,482,491,614,626,638,765,775,786,1022 'pilot-alert':136,232,637 'pilot-data-labeling-pipeline-setup':1 'pilot-dataset':113,148,216,238,481,764 'pilot-event-filt':132,228,625 'pilot-metr':116,219,490 'pilot-protocol':1021 'pilot-review':129,225,613 'pilot-s3-bridge':87,197,357 'pilot-shar':151,241,774 'pilot-stream-data':91,201,373 'pilot-task-parallel':95,205,388 'pilot-task-rout':109,212,467 'pilot-webhook-bridg':154,244,785 'pilotctl':254,892,901,912,921,932,941,959,972,999,1012,1025 'pipelin':5,11,40,60,276,281,327,332,437,442,585,590,734,739,900,909,920,929,940,949 'play':178 'port':298,415,527,544,560,678,694,712,813,829,854,862,870,878,886 'prefix':181 'prioriti':635 'procedur':167 'protocol':1023 'publish':760,795,834,884,968,974 'pull':361 'purpos':84 'qualiti':20,73,519,554,595,620,642,688 'raw':67,100,343,362,425,537 're':573,704 're-label':572,703 'readi':78,164,721,755,822 'receiv':296,404,515,524,557,667,675,810 'record':489 'refer':427,539 'reject':569,706 'requir':1020 'retriev':486 'review':21,72,127,128,131,222,227,511,513,520,543,555,559,564,578,579,592,596,598,615,636,661,689,698,750,798,800,812,845,857,864,867,872,911,915,930,944,994,1015 'review-feedback':563,697,866 'role':80,81,174,192,282,283,289,320,333,335,399,443,445,500,510,591,593,652,662,740,742,797 'rout':471 'router':112,215,470 'run':1031 's3':89,199,347,359,366,781,989,991 'sampl':139,600 'schedul':369 'score':616 'send':295,412,505,541,657,691,709,802,826 'set':33,250,257 'set-hostnam':256 'setup':6,61,166,263,272,277,323,328,433,438,581,586,730,735,896,905,916,925,936,945 'share':153,243,776,783 'skill':27,83,188,287,356,466,612,763,1024 'skill-pilot-data-labeling-pipeline-setup' 'source-teoslayer' 'split':103,350,385,394 'step':168,184,248,259,305 'storag':762,784 'store':484 'stream':93,203,375,377 'structur':771 'subscrib':955,961,995,1001,1008,1014 'tag':462 'task':97,111,207,214,390,469 'tell':307 'templat':318 'threshold':650 'throughput':495 'topic':300,417,529,546,562,680,696,714,815,831 'topic-agent-skills' 'topic-ai-agents' 'topic-clawhub' 'topic-networking' 'topic-openclaw' 'topic-overlay-network' 'topic-p2p' 'topic-pilot-protocol' 'track':493 'train':77,163,754 'training-readi':76,162,753 'trigger':372 'type':480,987 'upload':777 'uri':990 'use':25,183 'user':30,42,54,172,309 'voc':758 'want':31 'webhook':156,246,349,371,787 'work':105,125,352,378,405,419,422,464,472,506,531,534,851,957,964,970,977 'work-item':418,530,850,963,976 'worker':397 'workflow':52,950 'write':261","prices":[{"id":"c8ea7048-312e-4f63-a12e-7cce256ec787","listingId":"10cbabb1-46a2-4bb6-b218-a84ae788f48b","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"TeoSlayer","category":"pilot-skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:22:37.547Z"}],"sources":[{"listingId":"10cbabb1-46a2-4bb6-b218-a84ae788f48b","source":"github","sourceId":"TeoSlayer/pilot-skills/pilot-data-labeling-pipeline-setup","sourceUrl":"https://github.com/TeoSlayer/pilot-skills/tree/main/skills/pilot-data-labeling-pipeline-setup","isPrimary":false,"firstSeenAt":"2026-05-18T13:22:37.547Z","lastSeenAt":"2026-05-18T19:14:53.668Z"}],"details":{"listingId":"10cbabb1-46a2-4bb6-b218-a84ae788f48b","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"TeoSlayer","slug":"pilot-data-labeling-pipeline-setup","github":{"repo":"TeoSlayer/pilot-skills","stars":6,"topics":["agent-skills","ai-agents","clawhub","networking","openclaw","overlay-network","p2p","pilot-protocol"],"license":"agpl-3.0","html_url":"https://github.com/TeoSlayer/pilot-skills","pushed_at":"2026-05-13T06:08:49Z","description":"80+ agent skills for Pilot Protocol — communication, file transfer, trust, task routing, swarm coordination, and more","skill_md_sha":"305bd39b6fcb50c66dc687f084773917b886a2e4","skill_md_path":"skills/pilot-data-labeling-pipeline-setup/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/TeoSlayer/pilot-skills/tree/main/skills/pilot-data-labeling-pipeline-setup"},"layout":"multi","source":"github","category":"pilot-skills","frontmatter":{"name":"pilot-data-labeling-pipeline-setup","license":"AGPL-3.0","description":"Deploy a data labeling pipeline with 4 agents for ingestion, auto-labeling, quality review, and dataset export.  Use this skill when: 1. User wants to set up a data labeling or annotation pipeline 2. User is configuring an agent as part of a labeling workflow 3. User asks about ML data preparation, annotation, or training dataset generation  Do NOT use this skill when: - User wants to share a single dataset (use pilot-dataset instead) - User wants to stream raw data without labeling (use pilot-stream-data instead)"},"skills_sh_url":"https://skills.sh/TeoSlayer/pilot-skills/pilot-data-labeling-pipeline-setup"},"updatedAt":"2026-05-18T19:14:53.668Z"}}