{"id":"7fe6f7b4-8b16-4242-ae13-996a0105aecc","shortId":"6L4TUX","kind":"skill","title":"Pdf","tagline":"Skills skill by Anthropics","description":"# PDF Processing Guide\n\n## Overview\n\nThis guide covers essential PDF processing operations using Python libraries and command-line tools. For advanced features, JavaScript libraries, and detailed examples, see REFERENCE.md. If you need to fill out a PDF form, read FORMS.md and follow its instructions.\n\n## Quick Start\n\n```python\nfrom pypdf import PdfReader, PdfWriter\n\n# Read a PDF\nreader = PdfReader(\"document.pdf\")\nprint(f\"Pages: {len(reader.pages)}\")\n\n# Extract text\ntext = \"\"\nfor page in reader.pages:\n    text += page.extract_text()\n```\n\n## Python Libraries\n\n### pypdf - Basic Operations\n\n#### Merge PDFs\n```python\nfrom pypdf import PdfWriter, PdfReader\n\nwriter = PdfWriter()\nfor pdf_file in [\"doc1.pdf\", \"doc2.pdf\", \"doc3.pdf\"]:\n    reader = PdfReader(pdf_file)\n    for page in reader.pages:\n        writer.add_page(page)\n\nwith open(\"merged.pdf\", \"wb\") as output:\n    writer.write(output)\n```\n\n#### Split PDF\n```python\nreader = PdfReader(\"input.pdf\")\nfor i, page in enumerate(reader.pages):\n    writer = PdfWriter()\n    writer.add_page(page)\n    with open(f\"page_{i+1}.pdf\", \"wb\") as output:\n        writer.write(output)\n```\n\n#### Extract Metadata\n```python\nreader = PdfReader(\"document.pdf\")\nmeta = reader.metadata\nprint(f\"Title: {meta.title}\")\nprint(f\"Author: {meta.author}\")\nprint(f\"Subject: {meta.subject}\")\nprint(f\"Creator: {meta.creator}\")\n```\n\n#### Rotate Pages\n```python\nreader = PdfReader(\"input.pdf\")\nwriter = PdfWriter()\n\npage = reader.pages[0]\npage.rotate(90)  # Rotate 90 degrees clockwise\nwriter.add_page(page)\n\nwith open(\"rotated.pdf\", \"wb\") as output:\n    writer.write(output)\n```\n\n### pdfplumber - Text and Table Extraction\n\n#### Extract Text with Layout\n```python\nimport pdfplumber\n\nwith pdfplumber.open(\"document.pdf\") as pdf:\n    for page in pdf.pages:\n        text = page.extract_text()\n        print(text)\n```\n\n#### Extract Tables\n```python\nwith pdfplumber.open(\"document.pdf\") as pdf:\n    for i, page in enumerate(pdf.pages):\n        tables = page.extract_tables()\n        for j, table in enumerate(tables):\n            print(f\"Table {j+1} on page {i+1}:\")\n            for row in table:\n                print(row)\n```\n\n#### Advanced Table Extraction\n```python\nimport pandas as pd\n\nwith pdfplumber.open(\"document.pdf\") as pdf:\n    all_tables = []\n    for page in pdf.pages:\n        tables = page.extract_tables()\n        for table in tables:\n            if table:  # Check if table is not empty\n                df = pd.DataFrame(table[1:], columns=table[0])\n                all_tables.append(df)\n\n# Combine all tables\nif all_tables:\n    combined_df = pd.concat(all_tables, ignore_index=True)\n    combined_df.to_excel(\"extracted_tables.xlsx\", index=False)\n```\n\n### reportlab - Create PDFs\n\n#### Basic PDF Creation\n```python\nfrom reportlab.lib.pagesizes import letter\nfrom reportlab.pdfgen import canvas\n\nc = canvas.Canvas(\"hello.pdf\", pagesize=letter)\nwidth, height = letter\n\n# Add text\nc.drawString(100, height - 100, \"Hello World!\")\nc.drawString(100, height - 120, \"This is a PDF created with reportlab\")\n\n# Add a line\nc.line(100, height - 140, 400, height - 140)\n\n# Save\nc.save()\n```\n\n#### Create PDF with Multiple Pages\n```python\nfrom reportlab.lib.pagesizes import letter\nfrom reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, PageBreak\nfrom reportlab.lib.styles import getSampleStyleSheet\n\ndoc = SimpleDocTemplate(\"report.pdf\", pagesize=letter)\nstyles = getSampleStyleSheet()\nstory = []\n\n# Add content\ntitle = Paragraph(\"Report Title\", styles['Title'])\nstory.append(title)\nstory.append(Spacer(1, 12))\n\nbody = Paragraph(\"This is the body of the report. \" * 20, styles['Normal'])\nstory.append(body)\nstory.append(PageBreak())\n\n# Page 2\nstory.append(Paragraph(\"Page 2\", styles['Heading1']))\nstory.append(Paragraph(\"Content for page 2\", styles['Normal']))\n\n# Build PDF\ndoc.build(story)\n```\n\n#### Subscripts and Superscripts\n\n**IMPORTANT**: Never use Unicode subscript/superscript characters (₀₁₂₃₄₅₆₇₈₉, ⁰¹²³⁴⁵⁶⁷⁸⁹) in ReportLab PDFs. The built-in fonts do not include these glyphs, causing them to render as solid black boxes.\n\nInstead, use ReportLab's XML markup tags in Paragraph objects:\n```python\nfrom reportlab.platypus import Paragraph\nfrom reportlab.lib.styles import getSampleStyleSheet\n\nstyles = getSampleStyleSheet()\n\n# Subscripts: use <sub> tag\nchemical = Paragraph(\"H<sub>2</sub>O\", styles['Normal'])\n\n# Superscripts: use <super> tag\nsquared = Paragraph(\"x<super>2</super> + y<super>2</super>\", styles['Normal'])\n```\n\nFor canvas-drawn text (not Paragraph objects), manually adjust font the size and position rather than using Unicode subscripts/superscripts.\n\n## Command-Line Tools\n\n### pdftotext (poppler-utils)\n```bash\n# Extract text\npdftotext input.pdf output.txt\n\n# Extract text preserving layout\npdftotext -layout input.pdf output.txt\n\n# Extract specific pages\npdftotext -f 1 -l 5 input.pdf output.txt  # Pages 1-5\n```\n\n### qpdf\n```bash\n# Merge PDFs\nqpdf --empty --pages file1.pdf file2.pdf -- merged.pdf\n\n# Split pages\nqpdf input.pdf --pages . 1-5 -- pages1-5.pdf\nqpdf input.pdf --pages . 6-10 -- pages6-10.pdf\n\n# Rotate pages\nqpdf input.pdf output.pdf --rotate=+90:1  # Rotate page 1 by 90 degrees\n\n# Remove password\nqpdf --password=mypassword --decrypt encrypted.pdf decrypted.pdf\n```\n\n### pdftk (if available)\n```bash\n# Merge\npdftk file1.pdf file2.pdf cat output merged.pdf\n\n# Split\npdftk input.pdf burst\n\n# Rotate\npdftk input.pdf rotate 1east output rotated.pdf\n```\n\n## Common Tasks\n\n### Extract Text from Scanned PDFs\n```python\n# Requires: pip install pytesseract pdf2image\nimport pytesseract\nfrom pdf2image import convert_from_path\n\n# Convert PDF to images\nimages = convert_from_path('scanned.pdf')\n\n# OCR each page\ntext = \"\"\nfor i, image in enumerate(images):\n    text += f\"Page {i+1}:\\n\"\n    text += pytesseract.image_to_string(image)\n    text += \"\\n\\n\"\n\nprint(text)\n```\n\n### Add Watermark\n```python\nfrom pypdf import PdfReader, PdfWriter\n\n# Create watermark (or load existing)\nwatermark = PdfReader(\"watermark.pdf\").pages[0]\n\n# Apply to all pages\nreader = PdfReader(\"document.pdf\")\nwriter = PdfWriter()\n\nfor page in reader.pages:\n    page.merge_page(watermark)\n    writer.add_page(page)\n\nwith open(\"watermarked.pdf\", \"wb\") as output:\n    writer.write(output)\n```\n\n### Extract Images\n```bash\n# Using pdfimages (poppler-utils)\npdfimages -j input.pdf output_prefix\n\n# This extracts all images as output_prefix-000.jpg, output_prefix-001.jpg, etc.\n```\n\n### Password Protection\n```python\nfrom pypdf import PdfReader, PdfWriter\n\nreader = PdfReader(\"input.pdf\")\nwriter = PdfWriter()\n\nfor page in reader.pages:\n    writer.add_page(page)\n\n# Add password\nwriter.encrypt(\"userpassword\", \"ownerpassword\")\n\nwith open(\"encrypted.pdf\", \"wb\") as output:\n    writer.write(output)\n```\n\n## Quick Reference\n\n| Task | Best Tool | Command/Code |\n|------|-----------|--------------|\n| Merge PDFs | pypdf | `writer.add_page(page)` |\n| Split PDFs | pypdf | One page per file |\n| Extract text | pdfplumber | `page.extract_text()` |\n| Extract tables | pdfplumber | `page.extract_tables()` |\n| Create PDFs | reportlab | Canvas or Platypus |\n| Command line merge | qpdf | `qpdf --empty --pages ...` |\n| OCR scanned PDFs | pytesseract | Convert to image first |\n| Fill PDF forms | pdf-lib or pypdf (see FORMS.md) | See FORMS.md |\n\n## Next Steps\n\n- For advanced pypdfium2 usage, see REFERENCE.md\n- For JavaScript libraries (pdf-lib), see REFERENCE.md\n- If you need to fill out a PDF form, follow the instructions in FORMS.md\n- For troubleshooting guides, see REFERENCE.md","tags":["pdf","skills","anthropics"],"capabilities":["skill","source-anthropics","category-skills"],"categories":["skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/anthropics/skills/pdf","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"install_from":"skills.sh"}},"qualityScore":"0.500","qualityRationale":"deterministic score 0.50 from registry signals: · indexed on skills.sh · published under anthropics/skills","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill:v1","enrichmentVersion":1,"enrichedAt":"2026-04-24T02:40:11.717Z","embedding":null,"createdAt":"2026-04-18T20:23:31.766Z","updatedAt":"2026-04-24T02:40:11.717Z","lastSeenAt":"2026-04-24T02:40:11.717Z","tsv":"'+1':142,254,258,699 '+90':617 '-10':609 '-5':586,603 '0':183,305,728 '1':302,422,579,585,602,618,621 '100':353,355,359,373 '12':423 '120':361 '140':375,378 '1east':652 '2':441,445,453,517,527,529 '20':433 '400':376 '5':581 '6':608 '90':185,187,623 'add':350,369,410,711,797 'adjust':541 'advanc':26,265,875 'all_tables.append':306 'anthrop':5 'appli':729 'author':163 'avail':635 'bash':560,588,636,758 'basic':82,330 'best':813 'black':488 'bodi':424,429,437 'box':489 'build':456 'built':474 'built-in':473 'burst':647 'c':342 'c.drawstring':352,358 'c.line':372 'c.save':380 'canva':341,534,842 'canvas-drawn':533 'canvas.canvas':343 'cat':641 'category-skills' 'caus':482 'charact':468 'check':293 'chemic':514 'clockwis':189 'column':303 'combin':308,314 'combined_df.to':322 'command':22,553,845 'command-lin':21,552 'command/code':815 'common':655 'content':411,450 'convert':673,676,681,856 'cover':12 'creat':328,366,381,719,839 'creation':332 'creator':171 'decrypt':630 'decrypted.pdf':632 'degre':188,624 'detail':31 'df':299,307,315 'doc':402 'doc.build':458 'doc1.pdf':98 'doc2.pdf':99 'doc3.pdf':100 'document.pdf':63,154,215,232,275,735 'drawn':535 'empti':298,592,850 'encrypted.pdf':631,804 'enumer':130,239,248,693 'essenti':13 'etc':776 'exampl':32 'excel':323 'exist':723 'extract':69,149,205,206,227,267,561,566,574,657,756,770,829,834 'extracted_tables.xlsx':324 'f':65,139,158,162,166,170,251,578,696 'fals':326 'featur':27 'file':96,104,828 'file1.pdf':594,639 'file2.pdf':595,640 'fill':39,860,892 'first':859 'follow':47,897 'font':476,542 'form':43,862,896 'forms.md':45,869,871,901 'getsamplestylesheet':401,408,508,510 'glyph':481 'guid':8,11,904 'h':516 'heading1':447 'height':348,354,360,374,377 'hello':356 'hello.pdf':344 'ignor':319 'imag':679,680,691,694,705,757,772,858 'import':55,89,211,269,336,340,389,393,400,463,503,507,668,672,716,782 'includ':479 'index':320,325 'input.pdf':125,178,564,572,582,600,606,614,646,650,766,787 'instal':665 'instead':490 'instruct':49,899 'j':245,253,765 'javascript':28,881 'l':580 'layout':209,569,571 'len':67 'letter':337,346,349,390,406 'lib':865,885 'librari':19,29,80,882 'line':23,371,554,846 'load':722 'manual':540 'markup':495 'merg':84,589,637,816,847 'merged.pdf':114,596,643 'meta':155 'meta.author':164 'meta.creator':172 'meta.subject':168 'meta.title':160 'metadata':150 'multipl':384 'mypassword':629 'n':700,707,708 'need':37,890 'never':464 'next':872 'normal':435,455,520,531 'o':518 'object':499,539 'ocr':685,852 'one':825 'open':113,138,194,749,803 'oper':16,83 'output':117,119,146,148,198,200,642,653,753,755,767,807,809 'output.pdf':615 'output.txt':565,573,583 'output_prefix-000.jpg':774 'output_prefix-001.jpg':775 'overview':9 'ownerpassword':801 'page':66,73,106,110,111,128,135,136,140,174,181,191,192,219,237,256,281,385,440,444,452,576,584,593,598,601,607,612,620,687,697,727,732,739,743,746,747,791,795,796,820,821,826,851 'page.extract':77,223,242,285,832,837 'page.merge':742 'page.rotate':184 'pagebreak':397,439 'pages':345,405 'pages1-5.pdf':604 'pages6-10.pdf':610 'panda':270 'paragraph':395,413,425,443,449,498,504,515,525,538 'password':626,628,777,798 'path':675,683 'pd':272 'pd.concat':316 'pd.dataframe':300 'pdf':1,6,14,42,60,95,103,121,143,217,234,277,331,365,382,457,677,861,864,884,895 'pdf-lib':863,883 'pdf.pages':221,240,283 'pdf2image':667,671 'pdfimag':760,764 'pdfplumber':201,212,831,836 'pdfplumber.open':214,231,274 'pdfreader':56,62,91,102,124,153,177,717,725,734,783,786 'pdfs':85,329,471,590,661,817,823,840,854 'pdftk':633,638,645,649 'pdftotext':556,563,570,577 'pdfwriter':57,90,93,133,180,718,737,784,789 'per':827 'pip':664 'platypus':844 'poppler':558,762 'poppler-util':557,761 'posit':546 'prefix':768 'preserv':568 'print':64,157,161,165,169,225,250,263,709 'process':7,15 'protect':778 'pypdf':54,81,88,715,781,818,824,867 'pypdfium2':876 'pytesseract':666,669,855 'pytesseract.image':702 'python':18,52,79,86,122,151,175,210,229,268,333,386,500,662,713,779 'qpdf':587,591,599,605,613,627,848,849 'quick':50,810 'rather':547 'read':44,58 'reader':61,101,123,152,176,733,785 'reader.metadata':156 'reader.pages':68,75,108,131,182,741,793 'refer':811 'reference.md':34,879,887,906 'remov':625 'render':485 'report':414,432 'report.pdf':404 'reportlab':327,368,470,492,841 'reportlab.lib.pagesizes':335,388 'reportlab.lib.styles':399,506 'reportlab.pdfgen':339 'reportlab.platypus':392,502 'requir':663 'rotat':173,186,611,616,619,648,651 'rotated.pdf':195,654 'row':260,264 'save':379 'scan':660,853 'scanned.pdf':684 'see':33,868,870,878,886,905 'simpledoctempl':394,403 'size':544 'skill':2,3 'solid':487 'source-anthropics' 'spacer':396,421 'specif':575 'split':120,597,644,822 'squar':524 'start':51 'step':873 'stori':409,459 'story.append':418,420,436,438,442,448 'string':704 'style':407,416,434,446,454,509,519,530 'subject':167 'subscript':460,511 'subscript/superscript':467 'subscripts/superscripts':551 'superscript':462,521 'tabl':204,228,241,243,246,249,252,262,266,279,284,286,288,290,292,295,301,304,310,313,318,835,838 'tag':496,513,523 'task':656,812 'text':70,71,76,78,202,207,222,224,226,351,536,562,567,658,688,695,701,706,710,830,833 'titl':159,412,415,417,419 'tool':24,555,814 'troubleshoot':903 'true':321 'unicod':466,550 'usag':877 'use':17,465,491,512,522,549,759 'userpassword':800 'util':559,763 'watermark':712,720,724,744 'watermark.pdf':726 'watermarked.pdf':750 'wb':115,144,196,751,805 'width':347 'world':357 'writer':92,132,179,736,788 'writer.add':109,134,190,745,794,819 'writer.encrypt':799 'writer.write':118,147,199,754,808 'x':526 'xml':494 'y':528","prices":[{"id":"e75627e3-6c8c-47c0-b577-3c743a7dac6b","listingId":"7fe6f7b4-8b16-4242-ae13-996a0105aecc","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"anthropics","category":"skills","install_from":"skills.sh"},"createdAt":"2026-04-18T20:23:31.766Z"}],"sources":[{"listingId":"7fe6f7b4-8b16-4242-ae13-996a0105aecc","source":"github","sourceId":"anthropics/skills/pdf","sourceUrl":"https://github.com/anthropics/skills/tree/main/skills/pdf","isPrimary":false,"firstSeenAt":"2026-04-18T21:24:27.583Z","lastSeenAt":"2026-04-24T00:50:10.493Z"},{"listingId":"7fe6f7b4-8b16-4242-ae13-996a0105aecc","source":"skills_sh","sourceId":"anthropics/skills/pdf","sourceUrl":"https://skills.sh/anthropics/skills/pdf","isPrimary":true,"firstSeenAt":"2026-04-18T20:23:31.766Z","lastSeenAt":"2026-04-24T02:40:11.717Z"}],"details":{"listingId":"7fe6f7b4-8b16-4242-ae13-996a0105aecc","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"anthropics","slug":"pdf","source":"skills_sh","category":"skills","skills_sh_url":"https://skills.sh/anthropics/skills/pdf"},"updatedAt":"2026-04-24T02:40:11.717Z"}}