{"id":"a6ee5830-e09c-457f-b782-273f1a1b05cd","shortId":"NEJcpK","kind":"skill","title":"pdf","tagline":"Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables from PDFs, combining or merging multiple PDFs into one, splitting PDFs apart, rotating pages, adding watermarks, creating new PDFs, filling PDF forms, encrypting/","description":"# PDF Processing Guide\n\n## Overview\n\nThis guide covers essential PDF processing operations using Python libraries and command-line tools. For advanced features, JavaScript libraries, and detailed examples, see REFERENCE.md. If you need to fill out a PDF form, read FORMS.md and follow its instructions.\n\n## Quick Start\n\n```python\nfrom pypdf import PdfReader, PdfWriter\n\n# Read a PDF\nreader = PdfReader(\"document.pdf\")\nprint(f\"Pages: {len(reader.pages)}\")\n\n# Extract text\ntext = \"\"\nfor page in reader.pages:\n    text += page.extract_text()\n```\n\n## Python Libraries\n\n### pypdf - Basic Operations\n\n#### Merge PDFs\n```python\nfrom pypdf import PdfWriter, PdfReader\n\nwriter = PdfWriter()\nfor pdf_file in [\"doc1.pdf\", \"doc2.pdf\", \"doc3.pdf\"]:\n    reader = PdfReader(pdf_file)\n    for page in reader.pages:\n        writer.add_page(page)\n\nwith open(\"merged.pdf\", \"wb\") as output:\n    writer.write(output)\n```\n\n#### Split PDF\n```python\nreader = PdfReader(\"input.pdf\")\nfor i, page in enumerate(reader.pages):\n    writer = PdfWriter()\n    writer.add_page(page)\n    with open(f\"page_{i+1}.pdf\", \"wb\") as output:\n        writer.write(output)\n```\n\n#### Extract Metadata\n```python\nreader = PdfReader(\"document.pdf\")\nmeta = reader.metadata\nprint(f\"Title: {meta.title}\")\nprint(f\"Author: {meta.author}\")\nprint(f\"Subject: {meta.subject}\")\nprint(f\"Creator: {meta.creator}\")\n```\n\n#### Rotate Pages\n```python\nreader = PdfReader(\"input.pdf\")\nwriter = PdfWriter()\n\npage = reader.pages[0]\npage.rotate(90)  # Rotate 90 degrees clockwise\nwriter.add_page(page)\n\nwith open(\"rotated.pdf\", \"wb\") as output:\n    writer.write(output)\n```\n\n### pdfplumber - Text and Table Extraction\n\n#### Extract Text with Layout\n```python\nimport pdfplumber\n\nwith pdfplumber.open(\"document.pdf\") as pdf:\n    for page in pdf.pages:\n        text = page.extract_text()\n        print(text)\n```\n\n#### Extract Tables\n```python\nwith pdfplumber.open(\"document.pdf\") as pdf:\n    for i, page in enumerate(pdf.pages):\n        tables = page.extract_tables()\n        for j, table in enumerate(tables):\n            print(f\"Table {j+1} on page {i+1}:\")\n            for row in table:\n                print(row)\n```\n\n#### Advanced Table Extraction\n```python\nimport pandas as pd\n\nwith pdfplumber.open(\"document.pdf\") as pdf:\n    all_tables = []\n    for page in pdf.pages:\n        tables = page.extract_tables()\n        for table in tables:\n            if table:  # Check if table is not empty\n                df = pd.DataFrame(table[1:], columns=table[0])\n                all_tables.append(df)\n\n# Combine all tables\nif all_tables:\n    combined_df = pd.concat(all_tables, ignore_index=True)\n    combined_df.to_excel(\"extracted_tables.xlsx\", index=False)\n```\n\n### reportlab - Create PDFs\n\n#### Basic PDF Creation\n```python\nfrom reportlab.lib.pagesizes import letter\nfrom reportlab.pdfgen import canvas\n\nc = canvas.Canvas(\"hello.pdf\", pagesize=letter)\nwidth, height = letter\n\n# Add text\nc.drawString(100, height - 100, \"Hello World!\")\nc.drawString(100, height - 120, \"This is a PDF created with reportlab\")\n\n# Add a line\nc.line(100, height - 140, 400, height - 140)\n\n# Save\nc.save()\n```\n\n#### Create PDF with Multiple Pages\n```python\nfrom reportlab.lib.pagesizes import letter\nfrom reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak\nfrom reportlab.lib.styles import getSampleStyleSheet\n\ndoc = SimpleDocTemplate(\"report.pdf\", pagesize=letter)\nstyles = getSampleStyleSheet()\nstory = []\n\n# Add content\ntitle = Paragraph(\"Report Title\", styles['Title'])\nstory.append(title)\nstory.append(Spacer(1, 12))\n\nbody = Paragraph(\"This is the body of the report. \" * 20, styles['Normal'])\nstory.append(body)\nstory.append(PageBreak())\n\n# Page 2\nstory.append(Paragraph(\"Page 2\", styles['Heading1']))\nstory.append(Paragraph(\"Content for page 2\", styles['Normal']))\n\n# Build PDF\ndoc.build(story)\n```\n\n#### Subscripts and Superscripts\n\n**IMPORTANT**: Never use Unicode subscript/superscript characters (₀₁₂₃₄₅₆₇₈₉, ⁰¹²³⁴⁵⁶⁷⁸⁹) in ReportLab PDFs. The built-in fonts do not include these glyphs, causing them to render as solid black boxes.\n\nInstead, use ReportLab's XML markup tags in Paragraph objects:\n```python\nfrom reportlab.platypus import Paragraph\nfrom reportlab.lib.styles import getSampleStyleSheet\n\nstyles = getSampleStyleSheet()\n\n# Subscripts: use <sub> tag\nchemical = Paragraph(\"H<sub>2</sub>O\", styles['Normal'])\n\n# Superscripts: use <super> tag\nsquared = Paragraph(\"x<super>2</super> + y<super>2</super>\", styles['Normal'])\n```\n\nFor canvas-drawn text (not Paragraph objects), manually adjust font the size and position rather than using Unicode subscripts/superscripts.\n\n## Command-Line Tools\n\n### pdftotext (poppler-utils)\n```bash\n# Extract text\npdftotext input.pdf output.txt\n\n# Extract text preserving layout\npdftotext -layout input.pdf output.txt\n\n# Extract specific pages\npdftotext -f 1 -l 5 input.pdf output.txt  # Pages 1-5\n```\n\n### qpdf\n```bash\n# Merge PDFs\nqpdf --empty --pages file1.pdf file2.pdf -- merged.pdf\n\n# Split pages\nqpdf input.pdf --pages . 1-5 -- pages1-5.pdf\nqpdf input.pdf --pages . 6-10 -- pages6-10.pdf\n\n# Rotate pages\nqpdf input.pdf output.pdf --rotate=+90:1  # Rotate page 1 by 90 degrees\n\n# Remove password\nqpdf --password=mypassword --decrypt encrypted.pdf decrypted.pdf\n```\n\n### pdftk (if available)\n```bash\n# Merge\npdftk file1.pdf file2.pdf cat output merged.pdf\n\n# Split\npdftk input.pdf burst\n\n# Rotate\npdftk input.pdf rotate 1east output rotated.pdf\n```\n\n## Common Tasks\n\n### Extract Text from Scanned PDFs\n```python\n# Requires: pip install pytesseract pdf2image\nimport pytesseract\nfrom pdf2image import convert_from_path\n\n# Convert PDF to images\nimages = convert_from_path('scanned.pdf')\n\n# OCR each page\ntext = \"\"\nfor i, image in enumerate(images):\n    text += f\"Page {i+1}:\\n\"\n    text += pytesseract.image_to_string(image)\n    text += \"\\n\\n\"\n\nprint(text)\n```\n\n### Add Watermark\n```python\nfrom pypdf import PdfReader, PdfWriter\n\n# Create watermark (or load existing)\nwatermark = PdfReader(\"watermark.pdf\").pages[0]\n\n# Apply to all pages\nreader = PdfReader(\"document.pdf\")\nwriter = PdfWriter()\n\nfor page in reader.pages:\n    page.merge_page(watermark)\n    writer.add_page(page)\n\nwith open(\"watermarked.pdf\", \"wb\") as output:\n    writer.write(output)\n```\n\n### Extract Images\n```bash\n# Using pdfimages (poppler-utils)\npdfimages -j input.pdf output_prefix\n\n# This extracts all images as output_prefix-000.jpg, output_prefix-001.jpg, etc.\n```\n\n### Password Protection\n```python\nfrom pypdf import PdfReader, PdfWriter\n\nreader = PdfReader(\"input.pdf\")\nwriter = PdfWriter()\n\nfor page in reader.pages:\n    writer.add_page(page)\n\n# Add password\nwriter.encrypt(\"userpassword\", \"ownerpassword\")\n\nwith open(\"encrypted.pdf\", \"wb\") as output:\n    writer.write(output)\n```\n\n## Quick Reference\n\n| Task | Best Tool | Command/Code |\n|------|-----------|--------------|\n| Merge PDFs | pypdf | `writer.add_page(page)` |\n| Split PDFs | pypdf | One page per file |\n| Extract text | pdfplumber | `page.extract_text()` |\n| Extract tables | pdfplumber | `page.extract_tables()` |\n| Create PDFs | reportlab | Canvas or Platypus |\n| Command line merge | qpdf | `qpdf --empty --pages ...` |\n| OCR scanned PDFs | pytesseract | Convert to image first |\n| Fill PDF forms | pdf-lib or pypdf (see FORMS.md) | See FORMS.md |\n\n## Next Steps\n\n- For advanced pypdfium2 usage, see REFERENCE.md\n- For JavaScript libraries (pdf-lib), see REFERENCE.md\n- If you need to fill out a PDF form, follow the instructions in FORMS.md\n- For troubleshooting guides, see REFERENCE.md","tags":["pdf","atlasclaw","providers","cloudchef","agent-skills","agentic-workflow","ai-integration","openclaw"],"capabilities":["skill","source-cloudchef","skill-pdf","topic-agent-skills","topic-agentic-workflow","topic-ai-integration","topic-openclaw"],"categories":["atlasclaw-providers"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/CloudChef/atlasclaw-providers/pdf","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add CloudChef/atlasclaw-providers","source_repo":"https://github.com/CloudChef/atlasclaw-providers","install_from":"skills.sh"}},"qualityScore":"0.455","qualityRationale":"deterministic score 0.46 from registry signals: · indexed on github topic:agent-skills · 10 github stars · SKILL.md body (7,511 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T19:08:23.291Z","embedding":null,"createdAt":"2026-05-09T01:05:33.333Z","updatedAt":"2026-05-18T19:08:23.291Z","lastSeenAt":"2026-05-18T19:08:23.291Z","tsv":"'+1':180,292,296,737 '+90':655 '-10':647 '-5':624,641 '0':221,343,766 '1':340,460,617,623,640,656,659 '100':391,393,397,411 '12':461 '120':399 '140':413,416 '1east':690 '2':479,483,491,555,565,567 '20':471 '400':414 '5':619 '6':646 '90':223,225,661 'ad':35 'add':388,407,448,749,835 'adjust':579 'advanc':64,303,913 'all_tables.append':344 'anyth':11 'apart':32 'appli':767 'author':201 'avail':673 'bash':598,626,674,796 'basic':120,368 'best':851 'black':526 'bodi':462,467,475 'box':527 'build':494 'built':512 'built-in':511 'burst':685 'c':380 'c.drawstring':390,396 'c.line':410 'c.save':418 'canva':379,572,880 'canvas-drawn':571 'canvas.canvas':381 'cat':679 'caus':520 'charact':506 'check':331 'chemic':552 'clockwis':227 'column':341 'combin':23,346,352 'combined_df.to':360 'command':60,591,883 'command-lin':59,590 'command/code':853 'common':693 'content':449,488 'convert':711,714,719,894 'cover':50 'creat':37,366,404,419,757,877 'creation':370 'creator':209 'decrypt':668 'decrypted.pdf':670 'degre':226,662 'detail':69 'df':337,345,353 'doc':440 'doc.build':496 'doc1.pdf':136 'doc2.pdf':137 'doc3.pdf':138 'document.pdf':101,192,253,270,313,773 'drawn':573 'empti':336,630,888 'encrypt':43 'encrypted.pdf':669,842 'enumer':168,277,286,731 'essenti':51 'etc':814 'exampl':70 'excel':361 'exist':761 'extract':19,107,187,243,244,265,305,599,604,612,695,794,808,867,872 'extracted_tables.xlsx':362 'f':103,177,196,200,204,208,289,616,734 'fals':364 'featur':65 'file':14,134,142,866 'file1.pdf':632,677 'file2.pdf':633,678 'fill':40,77,898,930 'first':897 'follow':85,935 'font':514,580 'form':42,81,900,934 'forms.md':83,907,909,939 'getsamplestylesheet':439,446,546,548 'glyph':519 'guid':46,49,942 'h':554 'heading1':485 'height':386,392,398,412,415 'hello':394 'hello.pdf':382 'ignor':357 'imag':717,718,729,732,743,795,810,896 'import':93,127,249,307,374,378,427,431,438,501,541,545,706,710,754,820 'includ':16,517 'index':358,363 'input.pdf':163,216,602,610,620,638,644,652,684,688,804,825 'instal':703 'instead':528 'instruct':87,937 'j':283,291,803 'javascript':66,919 'l':618 'layout':247,607,609 'len':105 'letter':375,384,387,428,444 'lib':903,923 'librari':57,67,118,920 'line':61,409,592,884 'load':760 'manual':578 'markup':533 'merg':25,122,627,675,854,885 'merged.pdf':152,634,681 'meta':193 'meta.author':202 'meta.creator':210 'meta.subject':206 'meta.title':198 'metadata':188 'multipl':26,422 'mypassword':667 'n':738,745,746 'need':75,928 'never':502 'new':38 'next':910 'normal':473,493,558,569 'o':556 'object':537,577 'ocr':723,890 'one':29,863 'open':151,176,232,787,841 'oper':54,121 'output':155,157,184,186,236,238,680,691,791,793,805,845,847 'output.pdf':653 'output.txt':603,611,621 'output_prefix-000.jpg':812 'output_prefix-001.jpg':813 'overview':47 'ownerpassword':839 'page':34,104,111,144,148,149,166,173,174,178,212,219,229,230,257,275,294,319,423,478,482,490,614,622,631,636,639,645,650,658,725,735,765,770,777,781,784,785,829,833,834,858,859,864,889 'page.extract':115,261,280,323,870,875 'page.merge':780 'page.rotate':222 'pagebreak':435,477 'pages':383,443 'pages1-5.pdf':642 'pages6-10.pdf':648 'panda':308 'paragraph':433,451,463,481,487,536,542,553,563,576 'password':664,666,815,836 'path':713,721 'pd':310 'pd.concat':354 'pd.dataframe':338 'pdf':1,13,41,44,52,80,98,133,141,159,181,255,272,315,369,403,420,495,715,899,902,922,933 'pdf-lib':901,921 'pdf.pages':259,278,321 'pdf2image':705,709 'pdfimag':798,802 'pdfplumber':239,250,869,874 'pdfplumber.open':252,269,312 'pdfreader':94,100,129,140,162,191,215,755,763,772,821,824 'pdfs':22,27,31,39,123,367,509,628,699,855,861,878,892 'pdftk':671,676,683,687 'pdftotext':594,601,608,615 'pdfwriter':95,128,131,171,218,756,775,822,827 'per':865 'pip':702 'platypus':882 'poppler':596,800 'poppler-util':595,799 'posit':584 'prefix':806 'preserv':606 'print':102,195,199,203,207,263,288,301,747 'process':45,53 'protect':816 'pypdf':92,119,126,753,819,856,862,905 'pypdfium2':914 'pytesseract':704,707,893 'pytesseract.image':740 'python':56,90,117,124,160,189,213,248,267,306,371,424,538,700,751,817 'qpdf':625,629,637,643,651,665,886,887 'quick':88,848 'rather':585 'read':17,82,96 'reader':99,139,161,190,214,771,823 'reader.metadata':194 'reader.pages':106,113,146,169,220,779,831 'refer':849 'reference.md':72,917,925,944 'remov':663 'render':523 'report':452,470 'report.pdf':442 'reportlab':365,406,508,530,879 'reportlab.lib.pagesizes':373,426 'reportlab.lib.styles':437,544 'reportlab.pdfgen':377 'reportlab.platypus':430,540 'requir':701 'rotat':33,211,224,649,654,657,686,689 'rotated.pdf':233,692 'row':298,302 'save':417 'scan':698,891 'scanned.pdf':722 'see':71,906,908,916,924,943 'simpledoctempl':432,441 'size':582 'skill':4 'skill-pdf' 'solid':525 'source-cloudchef' 'spacer':434,459 'specif':613 'split':30,158,635,682,860 'squar':562 'start':89 'step':911 'stori':447,497 'story.append':456,458,474,476,480,486 'string':742 'style':445,454,472,484,492,547,557,568 'subject':205 'subscript':498,549 'subscript/superscript':505 'subscripts/superscripts':589 'superscript':500,559 'tabl':242,266,279,281,284,287,290,300,304,317,322,324,326,328,330,333,339,342,348,351,356,873,876 'tag':534,551,561 'task':694,850 'text':108,109,114,116,240,245,260,262,264,389,574,600,605,696,726,733,739,744,748,868,871 'text/tables':20 'titl':197,450,453,455,457 'tool':62,593,852 'topic-agent-skills' 'topic-agentic-workflow' 'topic-ai-integration' 'topic-openclaw' 'troubleshoot':941 'true':359 'unicod':504,588 'usag':915 'use':2,55,503,529,550,560,587,797 'user':7 'userpassword':838 'util':597,801 'want':8 'watermark':36,750,758,762,782 'watermark.pdf':764 'watermarked.pdf':788 'wb':153,182,234,789,843 'whenev':5 'width':385 'world':395 'writer':130,170,217,774,826 'writer.add':147,172,228,783,832,857 'writer.encrypt':837 'writer.write':156,185,237,792,846 'x':564 'xml':532 'y':566","prices":[{"id":"968c537c-8a6b-4a71-9100-46b4ee4dd7a4","listingId":"a6ee5830-e09c-457f-b782-273f1a1b05cd","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"CloudChef","category":"atlasclaw-providers","install_from":"skills.sh"},"createdAt":"2026-05-09T01:05:33.333Z"}],"sources":[{"listingId":"a6ee5830-e09c-457f-b782-273f1a1b05cd","source":"github","sourceId":"CloudChef/atlasclaw-providers/pdf","sourceUrl":"https://github.com/CloudChef/atlasclaw-providers/tree/main/skills/pdf","isPrimary":false,"firstSeenAt":"2026-05-09T01:05:33.333Z","lastSeenAt":"2026-05-18T19:08:23.291Z"}],"details":{"listingId":"a6ee5830-e09c-457f-b782-273f1a1b05cd","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"CloudChef","slug":"pdf","github":{"repo":"CloudChef/atlasclaw-providers","stars":10,"topics":["agent-skills","agentic-workflow","ai-integration","openclaw"],"license":"apache-2.0","html_url":"https://github.com/CloudChef/atlasclaw-providers","pushed_at":"2026-05-18T03:15:37Z","description":"atlasclaw-providers are the integration with enterprise systems through skills and webhook.","skill_md_sha":"d3e046a5ae107a6cb23cfb16c219837094ab35d3","skill_md_path":"skills/pdf/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/CloudChef/atlasclaw-providers/tree/main/skills/pdf"},"layout":"multi","source":"github","category":"atlasclaw-providers","frontmatter":{"name":"pdf","license":"Proprietary. LICENSE.txt has complete terms","description":"Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables from PDFs, combining or merging multiple PDFs into one, splitting PDFs apart, rotating pages, adding watermarks, creating new PDFs, filling PDF forms, encrypting/decrypting PDFs, extracting images, and OCR on scanned PDFs to make them searchable. If the user mentions a .pdf file or asks to produce one, use this skill."},"skills_sh_url":"https://skills.sh/CloudChef/atlasclaw-providers/pdf"},"updatedAt":"2026-05-18T19:08:23.291Z"}}