{"id":"bde676c7-8337-43d8-9224-60d3b270e828","shortId":"hrsZgg","kind":"skill","title":"processing-pdfs","tagline":"Processes PDF files. Extracts text and tables, fills forms, merges and splits documents, batch-processes files, converts to images, and generates PDFs programmatically. Use when working with .pdf files. Do NOT use for Word documents, spreadsheets, or presentations.","description":"# PDF Processing Guide\n\n## Overview\n\nThis guide covers essential PDF processing operations using Python libraries and command-line tools. For advanced features, JavaScript libraries, and detailed examples, see REFERENCE.md. If you need to fill out a PDF form, read FORMS.md and follow its instructions.\n\n## Quick Start\n\n```python\nfrom pypdf import PdfReader, PdfWriter\n\n# Read a PDF\nreader = PdfReader(\"document.pdf\")\nprint(f\"Pages: {len(reader.pages)}\")\n\n# Extract text\ntext = \"\"\nfor page in reader.pages:\n    text += page.extract_text()\n```\n\n## Python Libraries\n\n### pypdf - Basic Operations\n\n#### Merge PDFs\n```python\nfrom pypdf import PdfWriter, PdfReader\n\nwriter = PdfWriter()\nfor pdf_file in [\"doc1.pdf\", \"doc2.pdf\", \"doc3.pdf\"]:\n    reader = PdfReader(pdf_file)\n    for page in reader.pages:\n        writer.add_page(page)\n\nwith open(\"merged.pdf\", \"wb\") as output:\n    writer.write(output)\n```\n\n#### Split PDF\n```python\nreader = PdfReader(\"input.pdf\")\nfor i, page in enumerate(reader.pages):\n    writer = PdfWriter()\n    writer.add_page(page)\n    with open(f\"page_{i+1}.pdf\", \"wb\") as output:\n        writer.write(output)\n```\n\n#### Extract Metadata\n```python\nreader = PdfReader(\"document.pdf\")\nmeta = reader.metadata\nprint(f\"Title: {meta.title}\")\nprint(f\"Author: {meta.author}\")\nprint(f\"Subject: {meta.subject}\")\nprint(f\"Creator: {meta.creator}\")\n```\n\n#### Rotate Pages\n```python\nreader = PdfReader(\"input.pdf\")\nwriter = PdfWriter()\n\npage = reader.pages[0]\npage.rotate(90)  # Rotate 90 degrees clockwise\nwriter.add_page(page)\n\nwith open(\"rotated.pdf\", \"wb\") as output:\n    writer.write(output)\n```\n\n### pdfplumber - Text and Table Extraction\n\n#### Extract Text with Layout\n```python\nimport pdfplumber\n\nwith pdfplumber.open(\"document.pdf\") as pdf:\n    for page in pdf.pages:\n        text = page.extract_text()\n        print(text)\n```\n\n#### Extract Tables\n```python\nwith pdfplumber.open(\"document.pdf\") as pdf:\n    for i, page in enumerate(pdf.pages):\n        tables = page.extract_tables()\n        for j, table in enumerate(tables):\n            print(f\"Table {j+1} on page {i+1}:\")\n            for row in table:\n                print(row)\n```\n\n#### Advanced Table Extraction\n```python\nimport pandas as pd\n\nwith pdfplumber.open(\"document.pdf\") as pdf:\n    all_tables = []\n    for page in pdf.pages:\n        tables = page.extract_tables()\n        for table in tables:\n            if table:  # Check if table is not empty\n                df = pd.DataFrame(table[1:], columns=table[0])\n                all_tables.append(df)\n\n# Combine all tables\nif all_tables:\n    combined_df = pd.concat(all_tables, ignore_index=True)\n    combined_df.to_excel(\"extracted_tables.xlsx\", index=False)\n```\n\n### reportlab - Create PDFs\n\n#### Basic PDF Creation\n```python\nfrom reportlab.lib.pagesizes import letter\nfrom reportlab.pdfgen import canvas\n\nc = canvas.Canvas(\"hello.pdf\", pagesize=letter)\nwidth, height = letter\n\n# Add text\nc.drawString(100, height - 100, \"Hello World!\")\nc.drawString(100, height - 120, \"This is a PDF created with reportlab\")\n\n# Add a line\nc.line(100, height - 140, 400, height - 140)\n\n# Save\nc.save()\n```\n\n#### Create PDF with Multiple Pages\n```python\nfrom reportlab.lib.pagesizes import letter\nfrom reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak\nfrom reportlab.lib.styles import getSampleStyleSheet\n\ndoc = SimpleDocTemplate(\"report.pdf\", pagesize=letter)\nstyles = getSampleStyleSheet()\nstory = []\n\n# Add content\ntitle = Paragraph(\"Report Title\", styles['Title'])\nstory.append(title)\nstory.append(Spacer(1, 12))\n\nbody = Paragraph(\"This is the body of the report. \" * 20, styles['Normal'])\nstory.append(body)\nstory.append(PageBreak())\n\n# Page 2\nstory.append(Paragraph(\"Page 2\", styles['Heading1']))\nstory.append(Paragraph(\"Content for page 2\", styles['Normal']))\n\n# Build PDF\ndoc.build(story)\n```\n\n## Command-Line Tools\n\n### pdftotext (poppler-utils)\n```bash\n# Extract text\npdftotext input.pdf output.txt\n\n# Extract text preserving layout\npdftotext -layout input.pdf output.txt\n\n# Extract specific pages\npdftotext -f 1 -l 5 input.pdf output.txt  # Pages 1-5\n```\n\n### qpdf\n```bash\n# Merge PDFs\nqpdf --empty --pages file1.pdf file2.pdf -- merged.pdf\n\n# Split pages\nqpdf input.pdf --pages . 1-5 -- pages1-5.pdf\nqpdf input.pdf --pages . 6-10 -- pages6-10.pdf\n\n# Rotate pages\nqpdf input.pdf output.pdf --rotate=+90:1  # Rotate page 1 by 90 degrees\n\n# Remove password\nqpdf --password=mypassword --decrypt encrypted.pdf decrypted.pdf\n```\n\n### pdftk (if available)\n```bash\n# Merge\npdftk file1.pdf file2.pdf cat output merged.pdf\n\n# Split\npdftk input.pdf burst\n\n# Rotate\npdftk input.pdf rotate 1east output rotated.pdf\n```\n\n## Common Tasks\n\n### Extract Text from Scanned PDFs\n```python\n# Requires: pip install pytesseract pdf2image\nimport pytesseract\nfrom pdf2image import convert_from_path\n\n# Convert PDF to images\nimages = convert_from_path('scanned.pdf')\n\n# OCR each page\ntext = \"\"\nfor i, image in enumerate(images):\n    text += f\"Page {i+1}:\\n\"\n    text += pytesseract.image_to_string(image)\n    text += \"\\n\\n\"\n\nprint(text)\n```\n\n### Add Watermark\n```python\nfrom pypdf import PdfReader, PdfWriter\n\n# Create watermark (or load existing)\nwatermark = PdfReader(\"watermark.pdf\").pages[0]\n\n# Apply to all pages\nreader = PdfReader(\"document.pdf\")\nwriter = PdfWriter()\n\nfor page in reader.pages:\n    page.merge_page(watermark)\n    writer.add_page(page)\n\nwith open(\"watermarked.pdf\", \"wb\") as output:\n    writer.write(output)\n```\n\n### Extract Images\n```bash\n# Using pdfimages (poppler-utils)\npdfimages -j input.pdf output_prefix\n\n# This extracts all images as output_prefix-000.jpg, output_prefix-001.jpg, etc.\n```\n\n### Password Protection\n```python\nfrom pypdf import PdfReader, PdfWriter\n\nreader = PdfReader(\"input.pdf\")\nwriter = PdfWriter()\n\nfor page in reader.pages:\n    writer.add_page(page)\n\n# Add password\nwriter.encrypt(\"userpassword\", \"ownerpassword\")\n\nwith open(\"encrypted.pdf\", \"wb\") as output:\n    writer.write(output)\n```\n\n## Quick Reference\n\n| Task | Best Tool | Command/Code |\n|------|-----------|--------------|\n| Merge PDFs | pypdf | `writer.add_page(page)` |\n| Split PDFs | pypdf | One page per file |\n| Extract text | pdfplumber | `page.extract_text()` |\n| Extract tables | pdfplumber | `page.extract_tables()` |\n| Create PDFs | reportlab | Canvas or Platypus |\n| Command line merge | qpdf | `qpdf --empty --pages ...` |\n| OCR scanned PDFs | pytesseract | Convert to image first |\n| Fill PDF forms | pdf-lib or pypdf (see FORMS.md) | See FORMS.md |\n\n## Next Steps\n\n- For advanced pypdfium2 usage, see REFERENCE.md\n- For JavaScript libraries (pdf-lib), see REFERENCE.md\n- If you need to fill out a PDF form, follow the instructions in FORMS.md\n- For troubleshooting guides, see REFERENCE.md","tags":["processing","pdfs","code","abyss","telagod","agent-skills","ai-agent","ai-assistant","ai-personality","blue-team","character-card","claude-code"],"capabilities":["skill","source-telagod","skill-processing-pdfs","topic-agent-skills","topic-ai-agent","topic-ai-assistant","topic-ai-personality","topic-blue-team","topic-character-card","topic-claude-code","topic-cli","topic-codex","topic-codex-cli","topic-configuration","topic-developer-tools"],"categories":["code-abyss"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/telagod/code-abyss/processing-pdfs","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add telagod/code-abyss","source_repo":"https://github.com/telagod/code-abyss","install_from":"skills.sh"}},"qualityScore":"0.555","qualityRationale":"deterministic score 0.56 from registry signals: · indexed on github topic:agent-skills · 211 github stars · SKILL.md body (6,729 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T18:55:06.658Z","embedding":null,"createdAt":"2026-05-16T12:54:50.705Z","updatedAt":"2026-05-18T18:55:06.658Z","lastSeenAt":"2026-05-18T18:55:06.658Z","tsv":"'+1':179,291,295,644 '+90':562 '-10':554 '-5':531,548 '0':220,342,673 '1':339,459,524,530,547,563,566 '100':390,392,396,410 '12':460 '120':398 '140':412,415 '1east':597 '2':478,482,490 '20':470 '400':413 '5':526 '6':553 '90':222,224,568 'add':387,406,447,656,742 'advanc':63,302,820 'all_tables.append':343 'appli':674 'author':200 'avail':580 'bash':505,533,581,703 'basic':119,367 'batch':18 'batch-process':17 'best':758 'bodi':461,466,474 'build':493 'burst':592 'c':379 'c.drawstring':389,395 'c.line':409 'c.save':417 'canva':378,787 'canvas.canvas':380 'cat':586 'check':330 'clockwis':226 'column':340 'combin':345,351 'combined_df.to':359 'command':59,498,790 'command-lin':58,497 'command/code':760 'common':600 'content':448,487 'convert':21,618,621,626,801 'cover':49 'creat':365,403,418,664,784 'creation':369 'creator':208 'decrypt':575 'decrypted.pdf':577 'degre':225,569 'detail':68 'df':336,344,352 'doc':439 'doc.build':495 'doc1.pdf':135 'doc2.pdf':136 'doc3.pdf':137 'document':16,39 'document.pdf':100,191,252,269,312,680 'empti':335,537,795 'encrypted.pdf':576,749 'enumer':167,276,285,638 'essenti':50 'etc':721 'exampl':69 'excel':360 'exist':668 'extract':7,106,186,242,243,264,304,506,511,519,602,701,715,774,779 'extracted_tables.xlsx':361 'f':102,176,195,199,203,207,288,523,641 'fals':363 'featur':64 'file':6,20,33,133,141,773 'file1.pdf':539,584 'file2.pdf':540,585 'fill':11,76,805,837 'first':804 'follow':84,842 'form':12,80,807,841 'forms.md':82,814,816,846 'generat':25 'getsamplestylesheet':438,445 'guid':45,48,849 'heading1':484 'height':385,391,397,411,414 'hello':393 'hello.pdf':381 'ignor':356 'imag':23,624,625,636,639,650,702,717,803 'import':92,126,248,306,373,377,426,430,437,613,617,661,727 'index':357,362 'input.pdf':162,215,509,517,527,545,551,559,591,595,711,732 'instal':610 'instruct':86,844 'j':282,290,710 'javascript':65,826 'l':525 'layout':246,514,516 'len':104 'letter':374,383,386,427,443 'lib':810,830 'librari':56,66,117,827 'line':60,408,499,791 'load':667 'merg':13,121,534,582,761,792 'merged.pdf':151,541,588 'meta':192 'meta.author':201 'meta.creator':209 'meta.subject':205 'meta.title':197 'metadata':187 'multipl':421 'mypassword':574 'n':645,652,653 'need':74,835 'next':817 'normal':472,492 'ocr':630,797 'one':770 'open':150,175,231,694,748 'oper':53,120 'output':154,156,183,185,235,237,587,598,698,700,712,752,754 'output.pdf':560 'output.txt':510,518,528 'output_prefix-000.jpg':719 'output_prefix-001.jpg':720 'overview':46 'ownerpassword':746 'page':103,110,143,147,148,165,172,173,177,211,218,228,229,256,274,293,318,422,477,481,489,521,529,538,543,546,552,557,565,632,642,672,677,684,688,691,692,736,740,741,765,766,771,796 'page.extract':114,260,279,322,777,782 'page.merge':687 'page.rotate':221 'pagebreak':434,476 'pages':382,442 'pages1-5.pdf':549 'pages6-10.pdf':555 'panda':307 'paragraph':432,450,462,480,486 'password':571,573,722,743 'path':620,628 'pd':309 'pd.concat':353 'pd.dataframe':337 'pdf':5,32,43,51,79,97,132,140,158,180,254,271,314,368,402,419,494,622,806,809,829,840 'pdf-lib':808,828 'pdf.pages':258,277,320 'pdf2image':612,616 'pdfimag':705,709 'pdfplumber':238,249,776,781 'pdfplumber.open':251,268,311 'pdfreader':93,99,128,139,161,190,214,662,670,679,728,731 'pdfs':3,26,122,366,535,606,762,768,785,799 'pdftk':578,583,590,594 'pdftotext':501,508,515,522 'pdfwriter':94,127,130,170,217,663,682,729,734 'per':772 'pip':609 'platypus':789 'poppler':503,707 'poppler-util':502,706 'prefix':713 'present':42 'preserv':513 'print':101,194,198,202,206,262,287,300,654 'process':2,4,19,44,52 'processing-pdf':1 'programmat':27 'protect':723 'pypdf':91,118,125,660,726,763,769,812 'pypdfium2':821 'pytesseract':611,614,800 'pytesseract.image':647 'python':55,89,116,123,159,188,212,247,266,305,370,423,607,658,724 'qpdf':532,536,544,550,558,572,793,794 'quick':87,755 'read':81,95 'reader':98,138,160,189,213,678,730 'reader.metadata':193 'reader.pages':105,112,145,168,219,686,738 'refer':756 'reference.md':71,824,832,851 'remov':570 'report':451,469 'report.pdf':441 'reportlab':364,405,786 'reportlab.lib.pagesizes':372,425 'reportlab.lib.styles':436 'reportlab.pdfgen':376 'reportlab.platypus':429 'requir':608 'rotat':210,223,556,561,564,593,596 'rotated.pdf':232,599 'row':297,301 'save':416 'scan':605,798 'scanned.pdf':629 'see':70,813,815,823,831,850 'simpledoctempl':431,440 'skill' 'skill-processing-pdfs' 'source-telagod' 'spacer':433,458 'specif':520 'split':15,157,542,589,767 'spreadsheet':40 'start':88 'step':818 'stori':446,496 'story.append':455,457,473,475,479,485 'string':649 'style':444,453,471,483,491 'subject':204 'tabl':10,241,265,278,280,283,286,289,299,303,316,321,323,325,327,329,332,338,341,347,350,355,780,783 'task':601,757 'text':8,107,108,113,115,239,244,259,261,263,388,507,512,603,633,640,646,651,655,775,778 'titl':196,449,452,454,456 'tool':61,500,759 'topic-agent-skills' 'topic-ai-agent' 'topic-ai-assistant' 'topic-ai-personality' 'topic-blue-team' 'topic-character-card' 'topic-claude-code' 'topic-cli' 'topic-codex' 'topic-codex-cli' 'topic-configuration' 'topic-developer-tools' 'troubleshoot':848 'true':358 'usag':822 'use':28,36,54,704 'userpassword':745 'util':504,708 'watermark':657,665,669,689 'watermark.pdf':671 'watermarked.pdf':695 'wb':152,181,233,696,750 'width':384 'word':38 'work':30 'world':394 'writer':129,169,216,681,733 'writer.add':146,171,227,690,739,764 'writer.encrypt':744 'writer.write':155,184,236,699,753","prices":[{"id":"bc0b2cce-a557-4094-9aa6-a074a20bda9b","listingId":"bde676c7-8337-43d8-9224-60d3b270e828","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"telagod","category":"code-abyss","install_from":"skills.sh"},"createdAt":"2026-05-16T12:54:50.705Z"}],"sources":[{"listingId":"bde676c7-8337-43d8-9224-60d3b270e828","source":"github","sourceId":"telagod/code-abyss/processing-pdfs","sourceUrl":"https://github.com/telagod/code-abyss/tree/main/skills/processing-pdfs","isPrimary":false,"firstSeenAt":"2026-05-16T12:54:50.705Z","lastSeenAt":"2026-05-18T18:55:06.658Z"}],"details":{"listingId":"bde676c7-8337-43d8-9224-60d3b270e828","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"telagod","slug":"processing-pdfs","github":{"repo":"telagod/code-abyss","stars":211,"topics":["agent-skills","ai-agent","ai-assistant","ai-personality","blue-team","character-card","claude-code","cli","codex","codex-cli","configuration","developer-tools","devops","gemini-cli","persona","prompt-engineering","red-team","security","skills"],"license":"mit","html_url":"https://github.com/telagod/code-abyss","pushed_at":"2026-05-16T10:42:04Z","description":"Give your AI coding agent a personality. Composable persona + style + skills for Claude Code, Codex, Gemini CLI & OpenClaw. Ships Tech Persona Card v1.0 spec.","skill_md_sha":"9779f1cbd26af04c2ce68265262d083e84f0f189","skill_md_path":"skills/processing-pdfs/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/telagod/code-abyss/tree/main/skills/processing-pdfs"},"layout":"multi","source":"github","category":"code-abyss","frontmatter":{"name":"processing-pdfs","description":"Processes PDF files. Extracts text and tables, fills forms, merges and splits documents, batch-processes files, converts to images, and generates PDFs programmatically. Use when working with .pdf files. Do NOT use for Word documents, spreadsheets, or presentations."},"skills_sh_url":"https://skills.sh/telagod/code-abyss/processing-pdfs"},"updatedAt":"2026-05-18T18:55:06.658Z"}}