{"id":"51ee9521-9e8f-48e8-8b78-d20ccb105055","shortId":"kfqhn3","kind":"skill","title":"pilot-document-processing-setup","tagline":"Deploy a document processing pipeline with 3 agents that automate ingestion, data extraction, and search indexing.  Use this skill when: 1. User wants to set up document processing or data extraction pipeline 2. User is configuring an agent as part of a document ingestion or inde","description":"# Document Processing Setup\n\nDeploy 3 agents that automate document ingestion, data extraction, and search indexing.\n\n## Roles\n\n| Role | Hostname | Skills | Purpose |\n|------|----------|--------|---------|\n| ingester | `<prefix>-ingester` | pilot-stream-data, pilot-share, pilot-archive | Accepts documents, converts to processable format |\n| extractor | `<prefix>-extractor` | pilot-task-router, pilot-dataset, pilot-receipt | Extracts structured data — tables, entities, amounts |\n| indexer | `<prefix>-indexer` | pilot-webhook-bridge, pilot-announce, pilot-metrics | Indexes data for search, publishes to downstream systems |\n\n## Setup Procedure\n\n**Step 1:** Ask the user which role this agent should play and what prefix to use.\n\n**Step 2:** Install the skills for the chosen role:\n```bash\n# ingester:\nclawhub install pilot-stream-data pilot-share pilot-archive\n# extractor:\nclawhub install pilot-task-router pilot-dataset pilot-receipt\n# indexer:\nclawhub install pilot-webhook-bridge pilot-announce pilot-metrics\n```\n\n**Step 3:** Set the hostname:\n```bash\npilotctl --json set-hostname <prefix>-<role>\n```\n\n**Step 4:** Write the setup manifest:\n```bash\nmkdir -p ~/.pilot/setups\ncat > ~/.pilot/setups/document-processing.json << 'MANIFEST'\n<USE ROLE TEMPLATE BELOW>\nMANIFEST\n```\n\n**Step 5:** Tell the user to initiate handshakes with direct communication peers.\n\n## Manifest Templates Per Role\n\n### ingester\n```json\n{\"setup\":\"document-processing\",\"setup_name\":\"Document Processing\",\"role\":\"ingester\",\"role_name\":\"Document Ingester\",\"hostname\":\"<prefix>-ingester\",\"description\":\"Accepts documents (PDF, DOCX, images) via upload or webhook, converts to processable format.\",\"skills\":{\"pilot-stream-data\":\"Stream raw document bytes to extractor for processing.\",\"pilot-share\":\"Share converted document files with extractor.\",\"pilot-archive\":\"Archive original documents for audit and reprocessing.\"},\"peers\":[{\"role\":\"extractor\",\"hostname\":\"<prefix>-extractor\",\"description\":\"Receives raw documents for data extraction\"}],\"data_flows\":[{\"direction\":\"send\",\"peer\":\"<prefix>-extractor\",\"port\":1002,\"topic\":\"raw-document\",\"description\":\"Raw documents in processable format\"}],\"handshakes_needed\":[\"<prefix>-extractor\"]}\n```\n\n### extractor\n```json\n{\"setup\":\"document-processing\",\"setup_name\":\"Document Processing\",\"role\":\"extractor\",\"role_name\":\"Data Extractor\",\"hostname\":\"<prefix>-extractor\",\"description\":\"Pulls structured data from documents — tables, key-value pairs, entities, dates, amounts.\",\"skills\":{\"pilot-task-router\":\"Route documents to specialized extractors by type (invoice, contract, form).\",\"pilot-dataset\":\"Store extraction results and training data for accuracy improvement.\",\"pilot-receipt\":\"Confirm document receipt and report extraction status.\"},\"peers\":[{\"role\":\"ingester\",\"hostname\":\"<prefix>-ingester\",\"description\":\"Sends raw documents\"},{\"role\":\"indexer\",\"hostname\":\"<prefix>-indexer\",\"description\":\"Receives extracted structured data\"}],\"data_flows\":[{\"direction\":\"receive\",\"peer\":\"<prefix>-ingester\",\"port\":1002,\"topic\":\"raw-document\",\"description\":\"Raw documents in processable format\"},{\"direction\":\"send\",\"peer\":\"<prefix>-indexer\",\"port\":1002,\"topic\":\"extracted-data\",\"description\":\"Extracted structured data as JSON\"}],\"handshakes_needed\":[\"<prefix>-ingester\",\"<prefix>-indexer\"]}\n```\n\n### indexer\n```json\n{\"setup\":\"document-processing\",\"setup_name\":\"Document Processing\",\"role\":\"indexer\",\"role_name\":\"Search Indexer\",\"hostname\":\"<prefix>-indexer\",\"description\":\"Indexes extracted data for search, builds document summaries, publishes to downstream systems.\",\"skills\":{\"pilot-webhook-bridge\":\"Push index events and summaries to downstream APIs and search engines.\",\"pilot-announce\":\"Broadcast new document availability to interested subscribers.\",\"pilot-metrics\":\"Track indexing throughput, search latency, and document counts.\"},\"peers\":[{\"role\":\"extractor\",\"hostname\":\"<prefix>-extractor\",\"description\":\"Sends extracted structured data\"}],\"data_flows\":[{\"direction\":\"receive\",\"peer\":\"<prefix>-extractor\",\"port\":1002,\"topic\":\"extracted-data\",\"description\":\"Extracted structured data as JSON\"},{\"direction\":\"send\",\"peer\":\"external\",\"port\":443,\"topic\":\"index-notification\",\"description\":\"Index notifications to downstream systems\"}],\"handshakes_needed\":[\"<prefix>-extractor\"]}\n```\n\n## Data Flows\n\n- `ingester -> extractor` : raw-document events (port 1002)\n- `extractor -> indexer` : extracted-data events (port 1002)\n- `indexer -> downstream` : index notifications via webhook (port 443)\n\n## Handshakes\n\n```bash\n# ingester <-> extractor:\npilotctl --json handshake <prefix>-extractor \"setup: document-processing\"\npilotctl --json handshake <prefix>-ingester \"setup: document-processing\"\n# extractor <-> indexer:\npilotctl --json handshake <prefix>-indexer \"setup: document-processing\"\npilotctl --json handshake <prefix>-extractor \"setup: document-processing\"\n```\n\n## Workflow Example\n\n```bash\n# On extractor — subscribe to raw documents:\npilotctl --json subscribe <prefix>-ingester raw-document\n# On indexer — subscribe to extracted data:\npilotctl --json subscribe <prefix>-extractor extracted-data\n# On ingester — publish a document:\npilotctl --json publish <prefix>-extractor raw-document '{\"filename\":\"invoice-2024-003.pdf\",\"type\":\"pdf\",\"pages\":2}'\n# On extractor — publish extracted data:\npilotctl --json publish <prefix>-indexer extracted-data '{\"filename\":\"invoice-2024-003.pdf\",\"vendor\":\"Acme Corp\",\"amount\":12500.00}'\n```\n\n## Dependencies\n\nRequires `pilot-protocol` skill, `pilotctl` binary, `clawhub` binary, and a running daemon.","tags":["pilot","document","processing","setup","skills","teoslayer","agent-skills","ai-agents","clawhub","networking","openclaw","overlay-network"],"capabilities":["skill","source-teoslayer","skill-pilot-document-processing-setup","topic-agent-skills","topic-ai-agents","topic-clawhub","topic-networking","topic-openclaw","topic-overlay-network","topic-p2p","topic-pilot-protocol"],"categories":["pilot-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/TeoSlayer/pilot-skills/pilot-document-processing-setup","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add TeoSlayer/pilot-skills","source_repo":"https://github.com/TeoSlayer/pilot-skills","install_from":"skills.sh"}},"qualityScore":"0.453","qualityRationale":"deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 6 github stars · SKILL.md body (5,566 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:14:54.593Z","embedding":null,"createdAt":"2026-05-18T13:22:38.883Z","updatedAt":"2026-05-18T19:14:54.593Z","lastSeenAt":"2026-05-18T19:14:54.593Z","tsv":"'/.pilot/setups':215 '/.pilot/setups/document-processing.json':217 '1':26,131 '1002':319,427,443,543,582,590 '12500.00':702 '2':38,147,683 '3':12,56,196 '4':207 '443':559,598 '5':221 'accept':84,255 'accuraci':390 'acm':699 'agent':13,43,57,138 'amount':107,364,701 'announc':116,191,507 'api':501 'archiv':83,168,292,293 'ask':132 'audit':297 'autom':15,59 'avail':511 'bash':155,200,212,600,639 'binari':710,712 'bridg':113,188,493 'broadcast':508 'build':482 'byte':276 'cat':216 'chosen':153 'clawhub':157,170,183,711 'communic':230 'configur':41 'confirm':395 'contract':378 'convert':86,264,285 'corp':700 'count':525 'daemon':716 'data':17,35,62,77,104,121,162,272,310,312,347,354,388,419,420,447,451,479,535,536,547,551,573,587,658,665,688,695 'dataset':98,178,382 'date':363 'depend':703 'deploy':6,55 'descript':254,305,324,351,407,415,432,448,476,531,548,564 'direct':229,314,422,438,538,554 'document':3,8,32,48,52,60,85,240,244,250,256,275,286,295,308,323,326,337,341,356,371,396,410,431,434,462,466,483,510,524,579,609,617,627,635,645,652,670,677 'document-process':239,336,461,608,616,626,634 'docx':258 'downstream':126,487,500,568,592 'engin':504 'entiti':106,362 'event':496,580,588 'exampl':638 'extern':557 'extract':18,36,63,102,311,384,400,417,446,449,478,533,546,549,586,657,664,687,694 'extracted-data':445,545,585,663,693 'extractor':90,91,169,278,289,302,304,317,332,333,344,348,350,374,528,530,541,572,576,583,602,606,619,632,641,662,674,685 'file':287 'filenam':678,696 'flow':313,421,537,574 'form':379 'format':89,267,329,437 'handshak':227,330,454,570,599,605,613,623,631 'hostnam':69,199,205,252,303,349,405,413,474,529 'imag':259 'improv':391 'ind':51 'index':21,66,108,109,120,182,412,414,441,457,458,469,473,475,477,495,519,562,565,584,591,593,620,624,654,692 'index-notif':561 'ingest':16,49,61,72,73,156,236,247,251,253,404,406,425,456,575,601,614,649,667 'initi':226 'instal':148,158,171,184 'interest':513 'invoic':377 'invoice-2024-003.pdf':679,697 'json':202,237,334,453,459,553,604,612,622,630,647,660,672,690 'key':359 'key-valu':358 'latenc':522 'manifest':211,218,219,232 'metric':119,194,517 'mkdir':213 'name':243,249,340,346,465,471 'need':331,455,571 'new':509 'notif':563,566,594 'origin':294 'p':214 'page':682 'pair':361 'part':45 'pdf':257,681 'peer':231,300,316,402,424,440,526,540,556 'per':234 'pilot':2,75,79,82,93,97,100,111,115,118,160,164,167,173,177,180,186,190,193,270,282,291,367,381,393,491,506,516,706 'pilot-announc':114,189,505 'pilot-arch':81,166,290 'pilot-dataset':96,176,380 'pilot-document-processing-setup':1 'pilot-metr':117,192,515 'pilot-protocol':705 'pilot-receipt':99,179,392 'pilot-shar':78,163,281 'pilot-stream-data':74,159,269 'pilot-task-rout':92,172,366 'pilot-webhook-bridg':110,185,490 'pilotctl':201,603,611,621,629,646,659,671,689,709 'pipelin':10,37 'play':140 'port':318,426,442,542,558,581,589,597 'prefix':143 'procedur':129 'process':4,9,33,53,88,241,245,266,280,328,338,342,436,463,467,610,618,628,636 'protocol':707 'publish':124,485,668,673,686,691 'pull':352 'purpos':71 'push':494 'raw':274,307,322,325,409,430,433,578,644,651,676 'raw-docu':321,429,577,650,675 'receipt':101,181,394,397 'receiv':306,416,423,539 'report':399 'reprocess':299 'requir':704 'result':385 'role':67,68,136,154,235,246,248,301,343,345,403,411,468,470,527 'rout':370 'router':95,175,369 'run':715 'search':20,65,123,472,481,503,521 'send':315,408,439,532,555 'set':30,197,204 'set-hostnam':203 'setup':5,54,128,210,238,242,335,339,460,464,607,615,625,633 'share':80,165,283,284 'skill':24,70,150,268,365,489,708 'skill-pilot-document-processing-setup' 'source-teoslayer' 'special':373 'status':401 'step':130,146,195,206,220 'store':383 'stream':76,161,271,273 'structur':103,353,418,450,534,550 'subscrib':514,642,648,655,661 'summari':484,498 'system':127,488,569 'tabl':105,357 'task':94,174,368 'tell':222 'templat':233 'throughput':520 'topic':320,428,444,544,560 'topic-agent-skills' 'topic-ai-agents' 'topic-clawhub' 'topic-networking' 'topic-openclaw' 'topic-overlay-network' 'topic-p2p' 'topic-pilot-protocol' 'track':518 'train':387 'type':376,680 'upload':261 'use':22,145 'user':27,39,134,224 'valu':360 'vendor':698 'via':260,595 'want':28 'webhook':112,187,263,492,596 'workflow':637 'write':208","prices":[{"id":"ff9610f8-103f-4f77-99bf-e126d62aabc7","listingId":"51ee9521-9e8f-48e8-8b78-d20ccb105055","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"TeoSlayer","category":"pilot-skills","install_from":"skills.sh"},"createdAt":"2026-05-18T13:22:38.883Z"}],"sources":[{"listingId":"51ee9521-9e8f-48e8-8b78-d20ccb105055","source":"github","sourceId":"TeoSlayer/pilot-skills/pilot-document-processing-setup","sourceUrl":"https://github.com/TeoSlayer/pilot-skills/tree/main/skills/pilot-document-processing-setup","isPrimary":false,"firstSeenAt":"2026-05-18T13:22:38.883Z","lastSeenAt":"2026-05-18T19:14:54.593Z"}],"details":{"listingId":"51ee9521-9e8f-48e8-8b78-d20ccb105055","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"TeoSlayer","slug":"pilot-document-processing-setup","github":{"repo":"TeoSlayer/pilot-skills","stars":6,"topics":["agent-skills","ai-agents","clawhub","networking","openclaw","overlay-network","p2p","pilot-protocol"],"license":"agpl-3.0","html_url":"https://github.com/TeoSlayer/pilot-skills","pushed_at":"2026-05-13T06:08:49Z","description":"80+ agent skills for Pilot Protocol — communication, file transfer, trust, task routing, swarm coordination, and more","skill_md_sha":"f7a7fa853027ec61e1f9bfe998a955a66d5b5053","skill_md_path":"skills/pilot-document-processing-setup/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/TeoSlayer/pilot-skills/tree/main/skills/pilot-document-processing-setup"},"layout":"multi","source":"github","category":"pilot-skills","frontmatter":{"name":"pilot-document-processing-setup","license":"AGPL-3.0","description":"Deploy a document processing pipeline with 3 agents that automate ingestion, data extraction, and search indexing.  Use this skill when: 1. User wants to set up document processing or data extraction pipeline 2. User is configuring an agent as part of a document ingestion or indexing workflow 3. User asks about OCR, PDF processing, or structured data extraction across agents  Do NOT use this skill when: - User wants to share a single file (use pilot-share instead) - User wants to stream raw data once (use pilot-stream-data instead)"},"skills_sh_url":"https://skills.sh/TeoSlayer/pilot-skills/pilot-document-processing-setup"},"updatedAt":"2026-05-18T19:14:54.593Z"}}