Skillquality 0.70

sn-da-image-caption

图片理解与数据提取 skill。当图片文件（.png/.jpg/.jpeg/.gif/.webp/.bmp）是主要输入且用户需要理解、提取数据或分析图片内容时使用。提供预配置的 caption 脚本（scripts/caption.py），通过 vision 模型将图片转为文本描述，无需额外配置 API Key。覆盖：(1) 通过 scripts/caption.py 对图表/表格/截图/流程图进行 caption，(2) 将 caption 文本解析为结构化 DataFrame，(3) 基于提取数据重新生成可视化图表，(4) 导出为 Excel/CSV

Price

free

Protocol

skill

Verified

Endpoint

https://skills.sh/OpenSenseNova/SenseNova-Skills/sn-da-image-caption

What it does

Image Caption Analysis — 图片描述与数据提取

Overview

Analyze, extract data from, or understand image files (.png, .jpg, .jpeg, .gif, .webp, .bmp). The core workflow:

Run scripts/caption.py to get a text description of the image
Parse the description into structured data (DataFrame, etc.)
Analyze, visualize, or export

scripts/caption.py — Image Caption

The script converts images to text descriptions via a vision model. Configure via SN_API_KEY (minimum required), or use SN_VISION_API_KEY / SN_VISION_BASE_URL / SN_VISION_MODEL for fine-grained control. See the project environment variable spec for the full fallback chain.

Usage

# Basic — get text description
python3 scripts/caption.py /mnt/data/image.png

# Custom prompt — guide what to extract
python3 scripts/caption.py /mnt/data/chart.png --prompt "提取所有数值，Markdown 表格格式"

# JSON output — includes detected type, usage stats, cache info
python3 scripts/caption.py /mnt/data/image.png --json

# Batch — process all images in a directory
python3 scripts/caption.py /mnt/data/images/ --batch --output /mnt/data/captions.json

# Override model (optional)
python3 scripts/caption.py /mnt/data/image.png --model gemini-3.1-flash-lite-preview

Options

Option	Description
`--prompt, -p`	Custom prompt (overrides auto-detection)
`--model, -m`	Vision model (default: sensenova-6.7-flash-lite)
`--json`	Output structured JSON instead of plain text
`--batch`	Process all images in a directory
`--output, -o`	Output file for batch results
`--no-cache`	Skip MD5 cache

What it does automatically

Type detection: Detects image type from filename (chart/table/UI/diagram/general) and picks the best prompt
Compression: Images >5MB or >2048px are compressed before sending
Caching: Same image + same prompt → instant cached result, no API cost
Error handling: Retries on failure, returns error message on permanent failure

JSON output format

{
  "file": "/mnt/data/image.png",
  "type": "chart",
  "description": "这是一张柱状图...",
  "usage": {"prompt_tokens": 1100, "completion_tokens": 400},
  "cached": false
}

Calling from Python

import subprocess, json

CAPTION = "/path/to/skills/sn-da-image-caption/scripts/caption.py"

# Single image
result = subprocess.run(
    ["python3", CAPTION, "/mnt/data/chart.png", "--json",
     "--prompt", "提取图表数据，Markdown 表格输出"],
    capture_output=True, text=True, timeout=60
)
data = json.loads(result.stdout)
description = data["description"]

# Batch
result = subprocess.run(
    ["python3", CAPTION, "/mnt/data/images/", "--batch",
     "--output", "/mnt/data/captions.json"],
    capture_output=True, text=True, timeout=300
)
with open("/mnt/data/captions.json") as f:
    all_captions = json.load(f)

Prompt Strategy

Different image types need different prompts. The script auto-detects, but specifying --prompt gives better results.

Image Type	When	Recommended --prompt
Data chart	柱状图/折线图/饼图	`"提取图表标题、坐标轴、每个数据点数值、图例。Markdown 表格输出。"`
Table screenshot	表格截图	`"提取表格所有内容，Markdown 表格格式，保持行列结构，数值不四舍五入。"`
UI screenshot	界面截图	`"以前端开发者视角描述：布局、组件、文字、颜色。"`
Diagram	流程图/架构图	`"描述所有节点、连接关系（A→B）、分支条件。"`
General	照片、其他	不传 --prompt，用默认

Parsing Caption Results

Caption 通常返回 Markdown 表格，解析为 DataFrame：

import pandas as pd

def parse_markdown_table(text):
    lines = text.strip().split('\n')
    table_lines = []
    in_table = False
    for line in lines:
        stripped = line.strip()
        if '|' in stripped:
            in_table = True
            table_lines.append(stripped)
        elif in_table:
            break

    data_lines = []
    for l in table_lines:
        cells = [c.strip() for c in l.split('|') if c.strip()]
        if cells and not all(set(c) <= set('-: ') for c in cells):
            data_lines.append(cells)

    if len(data_lines) < 2:
        return None

    header = data_lines[0]
    rows = [r for r in data_lines[1:] if len(r) == len(header)]
    df = pd.DataFrame(rows, columns=header)

    # Auto numeric conversion
    for col in df.columns:
        try:
            cleaned = df[col].str.replace(',', '').str.strip()
            if cleaned.str.endswith('%').any():
                df[col] = pd.to_numeric(cleaned.str.rstrip('%'), errors='coerce')
            else:
                converted = pd.to_numeric(cleaned, errors='coerce')
                if converted.notna().sum() > len(df) * 0.5:
                    df[col] = converted
        except Exception:
            pass
    return df

Visualization

Chinese Font Setup (MANDATORY)

import matplotlib.pyplot as plt
import matplotlib
import os

font_path = '/usr/share/fonts/truetype/wqy/wqy-zenhei.ttc'
if os.path.exists(font_path):
    matplotlib.rcParams['font.family'] = 'WenQuanYi Zen Hei'
matplotlib.rcParams['axes.unicode_minus'] = False

Color Palette

COLORS = ['#4C72B0', '#55A868', '#C44E52', '#8172B2', '#CCB974', '#64B5CD']

Save & Display

plt.savefig('/mnt/data/chart.png', dpi=150, bbox_inches='tight')
plt.show()
print("![图表](sandbox:/mnt/data/chart.png)")

Export to Excel

from openpyxl.styles import Font, PatternFill, Alignment

output_path = "/mnt/data/result.xlsx"
with pd.ExcelWriter(output_path, engine='openpyxl') as writer:
    df.to_excel(writer, index=False, sheet_name='提取数据')
    ws = writer.sheets['提取数据']
    fill = PatternFill(start_color='4472C4', end_color='4472C4', fill_type='solid')
    for cell in ws[1]:
        cell.font = Font(bold=True, color='FFFFFF')
        cell.fill = fill
        cell.alignment = Alignment(horizontal='center')
    for i, col in enumerate(df.columns, 1):
        w = max(df[col].astype(str).str.len().max(), len(str(col))) + 2
        ws.column_dimensions[chr(64 + i)].width = min(w * 1.2, 40)

print(f"[下载](sandbox:{output_path})")

Multi-Image Processing

import glob

image_files = sorted(glob.glob("/mnt/data/*.png"))
all_dfs = []

for img in image_files:
    r = subprocess.run(
        ["python3", CAPTION, img, "--json", "--prompt", "提取数据，Markdown 表格"],
        capture_output=True, text=True, timeout=60
    )
    desc = json.loads(r.stdout)["description"]
    df = parse_markdown_table(desc)
    if df is not None:
        all_dfs.append(df)

combined = pd.concat(all_dfs, ignore_index=True) if all_dfs else None

Or batch mode:

python3 scripts/caption.py /mnt/data/images/ --batch --output /mnt/data/captions.json

Common Pitfalls

Always caption first — don't guess image content from filenames
Use --prompt for precision — auto-detect is OK, explicit prompt is better
Verify extracted data — check sums, percentages, row counts after parsing
Large tables truncate — caption in two passes: "提取前半部分" + "提取后半部分"
Chinese font — must set before any matplotlib call, or output is garbled
Timeout — single image ~10-30s, batch set timeout accordingly

Capabilities

skillsource-opensensenovaskill-sn-da-image-captiontopic-agenttopic-agent-skillstopic-ai-agentstopic-ai-assistanttopic-data-analysistopic-document-processingtopic-office-automationtopic-presentation-slides

Install

Installnpx skills add OpenSenseNova/SenseNova-Skills

Sourcehttps://github.com/OpenSenseNova/SenseNova-Skills/tree/main/skills/sn-da-image-caption

skills.shhttps://skills.sh/OpenSenseNova/SenseNova-Skills/sn-da-image-caption

Transportskills-sh

Protocolskill

Quality

0.70/ 1.00

deterministic score 0.70 from registry signals: · indexed on github topic:agent-skills · 1627 github stars · SKILL.md body (7,125 chars)

Provenance

Indexed fromgithub

Enriched2026-05-18 18:53:04Z · deterministic:skill-github:v1 · v1

First seen2026-05-15

Last seen2026-05-18

Agent access

JSONhttps://clawmart.sh/api/listings/eC9LGA