Skillquality 0.70

sn-da-image-caption

图片理解与数据提取 skill。当图片文件(.png/.jpg/.jpeg/.gif/.webp/.bmp)是主要输入且用户需要理解、提取数据或分析图片内容时使用。提供预配置的 caption 脚本(scripts/caption.py),通过 vision 模型将图片转为文本描述,无需额外配置 API Key。覆盖:(1) 通过 scripts/caption.py 对图表/表格/截图/流程图进行 caption,(2) 将 caption 文本解析为结构化 DataFrame,(3) 基于提取数据重新生成可视化图表,(4) 导出为 Excel/CSV

Price
free
Protocol
skill
Verified
no

What it does

Image Caption Analysis — 图片描述与数据提取

Overview

Analyze, extract data from, or understand image files (.png, .jpg, .jpeg, .gif, .webp, .bmp). The core workflow:

  1. Run scripts/caption.py to get a text description of the image
  2. Parse the description into structured data (DataFrame, etc.)
  3. Analyze, visualize, or export

scripts/caption.py — Image Caption

The script converts images to text descriptions via a vision model. Configure via SN_API_KEY (minimum required), or use SN_VISION_API_KEY / SN_VISION_BASE_URL / SN_VISION_MODEL for fine-grained control. See the project environment variable spec for the full fallback chain.

Usage

# Basic — get text description
python3 scripts/caption.py /mnt/data/image.png

# Custom prompt — guide what to extract
python3 scripts/caption.py /mnt/data/chart.png --prompt "提取所有数值,Markdown 表格格式"

# JSON output — includes detected type, usage stats, cache info
python3 scripts/caption.py /mnt/data/image.png --json

# Batch — process all images in a directory
python3 scripts/caption.py /mnt/data/images/ --batch --output /mnt/data/captions.json

# Override model (optional)
python3 scripts/caption.py /mnt/data/image.png --model gemini-3.1-flash-lite-preview

Options

OptionDescription
--prompt, -pCustom prompt (overrides auto-detection)
--model, -mVision model (default: sensenova-6.7-flash-lite)
--jsonOutput structured JSON instead of plain text
--batchProcess all images in a directory
--output, -oOutput file for batch results
--no-cacheSkip MD5 cache

What it does automatically

  • Type detection: Detects image type from filename (chart/table/UI/diagram/general) and picks the best prompt
  • Compression: Images >5MB or >2048px are compressed before sending
  • Caching: Same image + same prompt → instant cached result, no API cost
  • Error handling: Retries on failure, returns error message on permanent failure

JSON output format

{
  "file": "/mnt/data/image.png",
  "type": "chart",
  "description": "这是一张柱状图...",
  "usage": {"prompt_tokens": 1100, "completion_tokens": 400},
  "cached": false
}

Calling from Python

import subprocess, json

CAPTION = "/path/to/skills/sn-da-image-caption/scripts/caption.py"

# Single image
result = subprocess.run(
    ["python3", CAPTION, "/mnt/data/chart.png", "--json",
     "--prompt", "提取图表数据,Markdown 表格输出"],
    capture_output=True, text=True, timeout=60
)
data = json.loads(result.stdout)
description = data["description"]

# Batch
result = subprocess.run(
    ["python3", CAPTION, "/mnt/data/images/", "--batch",
     "--output", "/mnt/data/captions.json"],
    capture_output=True, text=True, timeout=300
)
with open("/mnt/data/captions.json") as f:
    all_captions = json.load(f)

Prompt Strategy

Different image types need different prompts. The script auto-detects, but specifying --prompt gives better results.

Image TypeWhenRecommended --prompt
Data chart柱状图/折线图/饼图"提取图表标题、坐标轴、每个数据点数值、图例。Markdown 表格输出。"
Table screenshot表格截图"提取表格所有内容,Markdown 表格格式,保持行列结构,数值不四舍五入。"
UI screenshot界面截图"以前端开发者视角描述:布局、组件、文字、颜色。"
Diagram流程图/架构图"描述所有节点、连接关系(A→B)、分支条件。"
General照片、其他不传 --prompt,用默认

Parsing Caption Results

Caption 通常返回 Markdown 表格,解析为 DataFrame:

import pandas as pd

def parse_markdown_table(text):
    lines = text.strip().split('\n')
    table_lines = []
    in_table = False
    for line in lines:
        stripped = line.strip()
        if '|' in stripped:
            in_table = True
            table_lines.append(stripped)
        elif in_table:
            break

    data_lines = []
    for l in table_lines:
        cells = [c.strip() for c in l.split('|') if c.strip()]
        if cells and not all(set(c) <= set('-: ') for c in cells):
            data_lines.append(cells)

    if len(data_lines) < 2:
        return None

    header = data_lines[0]
    rows = [r for r in data_lines[1:] if len(r) == len(header)]
    df = pd.DataFrame(rows, columns=header)

    # Auto numeric conversion
    for col in df.columns:
        try:
            cleaned = df[col].str.replace(',', '').str.strip()
            if cleaned.str.endswith('%').any():
                df[col] = pd.to_numeric(cleaned.str.rstrip('%'), errors='coerce')
            else:
                converted = pd.to_numeric(cleaned, errors='coerce')
                if converted.notna().sum() > len(df) * 0.5:
                    df[col] = converted
        except Exception:
            pass
    return df

Visualization

Chinese Font Setup (MANDATORY)

import matplotlib.pyplot as plt
import matplotlib
import os

font_path = '/usr/share/fonts/truetype/wqy/wqy-zenhei.ttc'
if os.path.exists(font_path):
    matplotlib.rcParams['font.family'] = 'WenQuanYi Zen Hei'
matplotlib.rcParams['axes.unicode_minus'] = False

Color Palette

COLORS = ['#4C72B0', '#55A868', '#C44E52', '#8172B2', '#CCB974', '#64B5CD']

Save & Display

plt.savefig('/mnt/data/chart.png', dpi=150, bbox_inches='tight')
plt.show()
print("![图表](sandbox:/mnt/data/chart.png)")

Export to Excel

from openpyxl.styles import Font, PatternFill, Alignment

output_path = "/mnt/data/result.xlsx"
with pd.ExcelWriter(output_path, engine='openpyxl') as writer:
    df.to_excel(writer, index=False, sheet_name='提取数据')
    ws = writer.sheets['提取数据']
    fill = PatternFill(start_color='4472C4', end_color='4472C4', fill_type='solid')
    for cell in ws[1]:
        cell.font = Font(bold=True, color='FFFFFF')
        cell.fill = fill
        cell.alignment = Alignment(horizontal='center')
    for i, col in enumerate(df.columns, 1):
        w = max(df[col].astype(str).str.len().max(), len(str(col))) + 2
        ws.column_dimensions[chr(64 + i)].width = min(w * 1.2, 40)

print(f"[下载](sandbox:{output_path})")

Multi-Image Processing

import glob

image_files = sorted(glob.glob("/mnt/data/*.png"))
all_dfs = []

for img in image_files:
    r = subprocess.run(
        ["python3", CAPTION, img, "--json", "--prompt", "提取数据,Markdown 表格"],
        capture_output=True, text=True, timeout=60
    )
    desc = json.loads(r.stdout)["description"]
    df = parse_markdown_table(desc)
    if df is not None:
        all_dfs.append(df)

combined = pd.concat(all_dfs, ignore_index=True) if all_dfs else None

Or batch mode:

python3 scripts/caption.py /mnt/data/images/ --batch --output /mnt/data/captions.json

Common Pitfalls

  • Always caption first — don't guess image content from filenames
  • Use --prompt for precision — auto-detect is OK, explicit prompt is better
  • Verify extracted data — check sums, percentages, row counts after parsing
  • Large tables truncate — caption in two passes: "提取前半部分" + "提取后半部分"
  • Chinese font — must set before any matplotlib call, or output is garbled
  • Timeout — single image ~10-30s, batch set timeout accordingly

Capabilities

skillsource-opensensenovaskill-sn-da-image-captiontopic-agenttopic-agent-skillstopic-ai-agentstopic-ai-assistanttopic-data-analysistopic-document-processingtopic-office-automationtopic-presentation-slides

Install

Quality

0.70/ 1.00

deterministic score 0.70 from registry signals: · indexed on github topic:agent-skills · 1627 github stars · SKILL.md body (7,125 chars)

Provenance

Indexed fromgithub
Enriched2026-05-18 18:53:04Z · deterministic:skill-github:v1 · v1
First seen2026-05-15
Last seen2026-05-18

Agent access