{"id":"ba555121-2cef-4ac7-a8d3-015f3cf76bdd","shortId":"RcWPBA","kind":"skill","title":"sn-da-excel-workflow","tagline":"Excel 数据分析多步编排器。覆盖：(1) 读取多 Sheet Excel 文件并统计行数，(2) 大文件检测（≥10k 行自动 Parquet 优化），(3) 数据清洗（缺失值、文本标准化、无效字符），(4) 条件筛选与分类提取，(5) 跨 Sheet 统计聚合，(6) 导出 Excel/CSV 并提供下载链接。覆盖从数据读取到报告生成全流程，按步骤编排 capability 子 skill。**遇到以下任一情况就主动使用本 skill，不要自行写几行 pandas 就回答**：①用户出现触发词：Excel 分析 / 表格分析 / 数据分析 / 数据","description":"# Excel Data Analysis Workflow\n\nEnd-to-end workflow for structured Excel analysis. Each step maps to a\ncapability sub-skill that can be loaded for detailed patterns.\n\n## Workflow\n\n### Step 1 — Count rows across all sheets (lightweight, no full load)\n\nCount rows per sheet **without loading data into memory**. Use openpyxl\n`read_only` mode — this works for any file size.\n\n```python\nimport openpyxl, gc\n\nwb = openpyxl.load_workbook(file_path, read_only=True, data_only=True)\ntotal_rows = 0\nsheet_info = {}\nfor name in wb.sheetnames:\n    ws = wb[name]\n    row_count = sum(1 for _ in ws.iter_rows(min_row=2, values_only=True))\n    total_rows += row_count\n    sheet_info[name] = row_count\n    print(f\"Sheet '{name}': {row_count} rows\")\nwb.close()\nprint(f\"总行数={total_rows}\")\n```\n\n⚠️ **Do NOT use `pd.read_excel()` to count rows** — it loads all data into\nmemory, which will OOM on large files.\n\n→ capability: `excel-reading/multi-sheet-reading`\n\n### Step 2 — Large file gate (CRITICAL — choose strategy by row count)\n\n| total_rows | Strategy | What to do |\n|-----------|----------|------------|\n| < 10k | Direct read | `df = pd.read_excel(file_path, sheet_name=target_sheet)` |\n| 10k – 100k | Parquet cache | `pd.read_excel()` once → `df.to_parquet()` → all later reads from Parquet |\n| **>= 100k** | **STOP. Load `sn-da-large-file-analysis` skill** | Read its SKILL.md, then follow its streaming read + Parquet pattern. **Do NOT use `pd.read_excel()` at all** — it will OOM or timeout on 100k+ rows. |\n\n**For >= 100k rows:**\n```\nread_file(path=\"<skills_base>/sn-da-large-file-analysis/SKILL.md\")\n```\nThen use `stream_excel_to_parquet()` from that skill — it reads via\nopenpyxl `iter_rows` in 50k-row chunks with constant memory.\n\n**For 10k – 100k rows (only):**\n```python\nimport pandas as pd\nparquet_path = \"/tmp/_auto_parquet.parquet\"\ndf = pd.read_excel(file_path, sheet_name=target_sheet)\ndf.to_parquet(parquet_path, engine=\"pyarrow\")\ndel df; gc.collect()\ndf = pd.read_parquet(parquet_path)\n```\n\n→ capability: `excel-reading/large-excel-reading`\n\n### Step 3 — Inspect schema & data types\n\nPreview target sheet structure. **For large files (>= 10k rows), only read\na small sample — never full load just to inspect.**\n\n```python\n# For any file size — read only first N rows for inspection\ndf_head = pd.read_excel(file_path, sheet_name=target_sheet, nrows=20)\nprint(f\"Columns: {df_head.columns.tolist()}\")\nprint(f\"Dtypes:\\n{df_head.dtypes}\")\nprint(df_head.head(10))\n```\n\n→ capability: `excel-reading/range-reading`\n\n### Step 4 — Data cleaning\n\nHandle missing values, normalize text, clean invalid characters.\n\n```python\n# Missing values\nnull_count = df[col].isna().sum()\n\n# Text cleaning: keep only Chinese characters\nimport re\ndef clean_text(val):\n    if pd.isna(val): return val\n    return \"\".join(re.findall(r\"[\\u4e00-\\u9fff]\", str(val))) or \"\"\n\ndf[col] = df[col].apply(clean_text)\n```\n\n⚠️ **Large file rule**: When `total_rows >= 100k`, do NOT use `df.apply(lambda...)`.\nUse vectorized operations or `np.where()` instead. See `sn-da-large-file-analysis` skill\nfor the vectorized cheat sheet.\n\n→ capabilities:\n  - `excel-data-cleaning/missing-value-handling`\n  - `excel-data-cleaning/invalid-data-cleaning`\n  - `excel-data-cleaning/text-normalization`\n\n### Step 5 — Filter & extract\n\nApply condition or category filters, aggregate results.\n\n```python\n# Condition filter\nmask = df[col].astype(str).str.strip() == target_value\nfiltered = df[mask]\n\n# Category extraction (for headerless layouts)\ndf_raw = pd.read_excel(file_path, sheet_name=sheet, header=None)\n# Walk rows to find category markers, collect items until next marker\n```\n\n→ capabilities:\n  - `excel-data-filtering/condition-filtering`\n  - `excel-data-filtering/category-filtering`\n  - `excel-data-filtering/threshold-filtering`\n\n### Step 6 — Export results\n\nSave filtered/cleaned data as Excel or CSV. Provide download link.\n\n```python\noutput_path = \"/mnt/data/result.xlsx\"\nresult_df.to_excel(output_path, index=False)\nprint(f\"[Download](sandbox:{output_path})\")\n```\n\n→ capabilities:\n  - `excel-result-export/single-sheet-export`\n  - `excel-result-export/formatted-export`\n\n## Key rules\n\n- **Always count rows first** — gate large-file logic on the 10k threshold.\n- **>= 100k rows → MUST load `sn-da-large-file-analysis` skill** — do not attempt to handle with `pd.read_excel()`.\n- **Column names may contain spaces** (e.g. `'是否通 过'`) — use exact string indexing.\n- **Headerless sheets** — use `header=None` and positional indexing.\n- **Prohibited on large files (>= 100k rows)**:\n  - `pd.read_excel()` for full load (use streaming read → Parquet)\n  - `df.apply(lambda...)` or `df.iterrows()` (use vectorized ops or `itertuples()`)\n  - `fc-list`, `find ... fonts`, `subprocess` to search fonts, or `pip install` (use fixed font paths below)\n  - Printing all unique values or full DataFrames (use `.head()`, `.value_counts().head()`)\n\n## CJK Font Setup (mandatory for charts)\n\nWhen generating charts with matplotlib, **copy this block as-is**. Do NOT search for fonts.\n\n```python\nimport os\nimport matplotlib\nimport matplotlib.pyplot as plt\nimport matplotlib.font_manager as fm\n\n_FONT_PATHS = [\n    '/mnt/afs_agents/SimHei.ttf',\n    '/mnt/afs_agents/mnt/data/SimHei.ttf',\n    os.path.expanduser('~/.fonts/SimHei.ttf'),\n    '/usr/share/fonts/truetype/wqy/wqy-zenhei.ttc',\n    '/usr/share/fonts/SimHei.ttf',\n]\nfor _p in _FONT_PATHS:\n    if os.path.exists(_p):\n        fm.fontManager.addfont(_p)\n        matplotlib.rcParams['font.family'] = fm.FontProperties(fname=_p).get_name()\n        break\nmatplotlib.rcParams['axes.unicode_minus'] = False\n```\n\n## How to load sub-skills\n\nEach workflow step references one or more capability sub-skills. When you\nneed the detailed code pattern for a step, load the sub-skill on demand:\n\n```\nread_file(path=\"<base_path>/<category>/<sub-skill-name>/SKILL.md\")\n```\n\n**Rules:**\n- Only load the sub-skill(s) needed for your current step.\n- Do NOT load all sub-skills at once — it wastes context.\n- The top-level workflow (this file) is your guide; sub-skills provide\n  detailed implementation patterns.\n\n## Available capability sub-skills\n\nBase path: `<skills_root>/sn-da-excel-workflow/capability/{category}/{sub-skill}/SKILL.md`\n\n### excel-reading — 读取与解析\n\n| Sub-skill | 功能 |\n|---|---|\n| single-sheet-reading | 读取单个工作表，支持合并单元格处理、交叉分析及多维度可视化 |\n| multi-sheet-reading | 读取多工作表，动态评估数据量启用Parquet优化，支持正则清洗、分类汇总与线性拟合 |\n| range-reading | 特定区域数据提取，根据数据量动态选择处理策略 |\n| large-excel-reading | 大型Excel文件处理，支持Parquet转换提速，生成带条件高亮的格式化报告 |\n| multi-file-reading | 多文件读取与统计，支持大文件Parquet转换与可视化报告 |\n| specific-sheet-reading | 跨Sheet特定字段统计、数据清洗与交叉分析，生成带下载链接的汇总报告 |\n| structured-header-reading | 动态识别目标列进行统计，正则清洗文本字段提取中文字符 |\n\n### excel-data-cleaning — 数据清洗\n\n| Sub-skill | 功能 |\n|---|---|\n| missing-value-handling | 多Sheet智能清洗、跨表核对与可视化分析 |\n| duplicate-removal | 多Sheet去重统计，生成摘要与明细报表 |\n| invalid-data-cleaning | 正则清洗指定文本列（如保留中文字符），大文件自动Parquet加速 |\n| text-normalization | 文本标准化清洗（去除异常前缀、提取纯中文字符等） |\n| numeric-format-normalization | 数值格式标准化，支持关键指标合计核对与结果文件导出 |\n| outlier-detection | IQR异常值检测，结合偏度/峰度分析数据分布，适用于非正态数据预处理 |\n\n### excel-data-filtering — 数据筛选\n\n| Sub-skill | 功能 |\n|---|---|\n| condition-filtering | 根据数据规模动态选择处理策略进行条件筛选 |\n| category-filtering | 自定义分类统计、交叉分析，支持文本长度/术语密度/正则匹配等综合评分与分级 |\n| range-filtering | 根据多维数值条件筛选并导出，支持大规模数据自动性能优化 |\n| threshold-filtering | 数值列清洗、条件过滤，使用openpyxl对符合条件的单元格进行样式标记 |\n\n### excel-data-analysis — 数据分析\n\n| Sub-skill | 功能 |\n|---|---|\n| comparison-analysis | 两类分类数据对比分析，统计数量差异与比例关系并生成可视化 |\n| group-by-analysis | 多Sheet数据清洗及分组聚合分析，生成带样式标记的统计表与图表 |\n| kpi-metric-analysis | 提取关键指标进行单位一致性验证与排序分析 |\n| pivot-table-analysis | 交叉表与热力图进行多维度占比分析，适用于奖项分布/绩效评估/市场占有率 |\n| time-series-analysis | 时间序列趋势分析、百分比清洗、绩效分级建模与预测，生成高分辨率可视化报告 |\n| trend-analysis | 多维度分级评估与趋势预测，差异化增长率计算，适用于绩效评估/目标设定 |\n\n### excel-data-statistics — 统计计算\n\n| Sub-skill | 功能 |\n|---|---|\n| basic-statistics | 基础统计，支持按条件筛选计算均值，指定行区间提取数据去重求和 |\n| category-statistics | 各类别数量与占比统计，生成柱状图/饼图等组合可视化报告 |\n| grouped-statistics | 多Sheet数据合并与前向填充，分组统计 |\n| percentage-calculation | 逐行扫描或列匹配提取关键指标并计算占比/均值，输出结构化报告及图表 |\n\n### excel-data-visualization — 数据可视化\n\n| Sub-skill | 功能 |\n|---|---|\n| bar-chart-visualization | 处理合并单元格，交叉分组统计，生成支持中英文字体的美化柱状图 |\n| histogram-visualization | 数值型分布分析与异常值检测，支持正则提取误差项，生成箱线图与直方图 |\n| line-chart-visualization | 特征清洗与聚类分析，生成趋势对比/分布特征/参数敏感性多维度图表 |\n| pie-chart-visualization | 分类汇总统计，自动识别关键字段生成包含占比/数值的美化饼图 |\n| scatter-plot-visualization | 多维度统计分析与散点图可视化 |\n| stacked-chart-visualization | 百分比字符串数据处理，补全缺失维度，生成堆叠柱状图展示构成变化趋势 |\n\n### excel-cell-coloring — 单元格着色\n\n| Sub-skill | 功能 |\n|---|---|\n| category-coloring | 提取目标指标计算最大值，对特定行进行高亮标注 |\n| duplicate-value-coloring | 对比多表中的特定系数并对异常值进行颜色标记 |\n| outlier-coloring | 识别超限数值与错误单元格并进行高亮标注 |\n| threshold-cell-coloring | 计算时间序列平均值，使用openpyxl输出带条件格式（如低于均值标绿）的报告 |\n| top-value-coloring | 根据数据规模动态选择策略，多表合并、统计筛选，关键指标自动化样式高亮 |\n\n### excel-conditional-formatting — 条件格式\n\n| Sub-skill | 功能 |\n|---|---|\n| data-bar-formatting | 从带单位字符串列提取数值并清洗，生成直方图/饼图/条形图/累积分布图 |\n\n### excel-result-export — 结果导出\n\n| Sub-skill | 功能 |\n|---|---|\n| single-sheet-export | 多Sheet数据探查与条件过滤导出，重命名字段后生成带下载链接的Excel |\n| formatted-export | 条件筛选记录并以整行标红格式导出Excel |\n| chart-embedded-export | 分类分布清洗与统计，生成多维度交叉分析与高分辨率嵌入式图表报告 |\n| report-generation-export | 从Excel提取多类型数据，生成包含可视化图表与下载链接的综合分析报告 |\n\n### excel-table-styling — 表格样式\n\n| Sub-skill | 功能 |\n|---|---|\n| table-theme-styling | 大文件Parquet加速读取，条件筛选/分类汇总与结果导出 |","tags":["excel","workflow","sensenova","skills","opensensenova","agent","agent-skills","ai-agents","ai-assistant","data-analysis","document-processing","office-automation"],"capabilities":["skill","source-opensensenova","skill-sn-da-excel-workflow","topic-agent","topic-agent-skills","topic-ai-agents","topic-ai-assistant","topic-data-analysis","topic-document-processing","topic-office-automation","topic-presentation-slides"],"categories":["SenseNova-Skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/OpenSenseNova/SenseNova-Skills/sn-da-excel-workflow","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add OpenSenseNova/SenseNova-Skills","source_repo":"https://github.com/OpenSenseNova/SenseNova-Skills","install_from":"skills.sh"}},"qualityScore":"0.700","qualityRationale":"deterministic score 0.70 from registry signals: · indexed on github topic:agent-skills · 1627 github stars · SKILL.md body (9,366 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-18T18:53:04.358Z","embedding":null,"createdAt":"2026-05-15T06:53:09.557Z","updatedAt":"2026-05-18T18:53:04.358Z","lastSeenAt":"2026-05-18T18:53:04.358Z","tsv":"'/.fonts/simhei.ttf':774 '/category-filtering':579 '/condition-filtering':574 '/formatted-export':625 '/invalid-data-cleaning':511 '/large-excel-reading':348 '/missing-value-handling':506 '/mnt/afs_agents/mnt/data/simhei.ttf':772 '/mnt/afs_agents/simhei.ttf':771 '/mnt/data/result.xlsx':602 '/multi-sheet-reading':199 '/range-reading':415 '/single-sheet-export':620 '/skill.md':836,891 '/sn-da-excel-workflow/capability':886 '/sn-da-large-file-analysis/skill.md':284 '/text-normalization':516 '/threshold-filtering':584 '/tmp/_auto_parquet.parquet':320 '/usr/share/fonts/simhei.ttf':776 '/usr/share/fonts/truetype/wqy/wqy-zenhei.ttc':775 '0':129 '1':9,82,142 '10':410 '100k':230,243,276,279,310,476,641,684 '10k':16,217,229,309,362,639 '2':14,149,201 '20':398 '3':20,350 '4':25,417 '5':27,518 '50k':302 '50k-row':301 '6':31,586 'across':85 'aggreg':526 'alway':628 'analysi':53,63,251,494,650,1027,1035,1041,1047,1052,1060,1067 'appli':467,521 'as-i':747 'astyp':534 'attempt':654 'avail':879 'axes.unicode':796 'bar':1114,1203 'bar-chart-visu':1113 'base':884 'basic':1082 'basic-statist':1081 'block':746 'break':794 'cach':232 'calcul':1100 'capabl':37,69,195,344,411,501,569,615,812,880 'categori':524,542,562,887,1006,1088,1163 'category-color':1162 'category-filt':1005 'category-statist':1087 'cell':1155,1178 'charact':427,442 'chart':738,741,1115,1128,1136,1148,1230 'chart-embedded-export':1229 'cheat':499 'chines':441 'choos':206 'chunk':304 'cjk':733 'clean':419,425,438,446,468,505,510,515,949,969 'code':821 'col':434,464,466,533 'collect':564 'color':1156,1164,1170,1174,1179,1187 'column':401,660 'comparison':1034 'comparison-analysi':1033 'condit':522,529,1002,1194 'condition-filt':1001 'constant':306 'contain':663 'context':861 'copi':744 'count':83,92,140,156,161,167,181,210,432,629,731 'critic':205 'csv':595 'current':848 'da':3,248,491,647 'data':52,98,124,186,353,418,504,509,514,572,577,582,591,948,968,994,1026,1074,1106,1202 'data-bar-format':1201 'datafram':727 'def':445 'del':336 'demand':832 'detail':78,820,876 'detect':987 'df':220,321,337,339,387,433,463,465,532,540,547 'df.apply':480,695 'df.iterrows':698 'df.to':236,330 'df_head.columns.tolist':402 'df_head.dtypes':407 'df_head.head':409 'direct':218 'download':597,611 'dtype':405 'duplic':962,1168 'duplicate-remov':961 'duplicate-value-color':1167 'e.g':665 'embed':1231 'end':56,58 'end-to-end':55 'engin':334 'exact':669 'excel':4,6,12,46,51,62,179,197,222,234,267,288,323,346,390,413,503,508,513,550,571,576,581,593,604,617,622,659,687,893,922,947,993,1025,1073,1105,1154,1193,1211,1242 'excel-cell-color':1153 'excel-conditional-format':1192 'excel-data-analysi':1024 'excel-data-clean':502,507,512,946 'excel-data-filt':570,575,580,992 'excel-data-statist':1072 'excel-data-visu':1104 'excel-read':196,345,412,892 'excel-result-export':616,621,1210 'excel-table-styl':1241 'excel/csv':33 'export':587,619,624,1213,1222,1227,1232,1238 'extract':520,543 'f':163,171,400,404,610 'fals':608,798 'fc':705 'fc-list':704 'file':110,119,194,203,223,250,282,324,361,378,391,471,493,551,635,649,683,834,868,929 'filter':519,525,530,539,573,578,583,995,1003,1007,1015,1020 'filtered/cleaned':590 'find':561,707 'first':382,631 'fix':717 'fm':768 'fm.fontmanager.addfont':785 'fm.fontproperties':789 'fname':790 'follow':257 'font':708,712,718,734,754,769,780 'font.family':788 'format':981,1195,1204,1226 'formatted-export':1225 'full':90,370,689,726 'gate':204,632 'gc':115 'gc.collect':338 'generat':740,1237 'get':792 'group':1039,1094 'group-by-analysi':1038 'grouped-statist':1093 'guid':871 'handl':420,656,958 'head':388,729,732 'header':556,675,942 'headerless':545,672 'histogram':1121 'histogram-visu':1120 'implement':877 'import':113,314,443,756,758,760,764 'index':607,671,679 'info':131,158 'inspect':351,374,386 'instal':715 'instead':487 'invalid':426,967 'invalid-data-clean':966 'iqr异常值检测':988 'isna':435 'item':565 'iter':298 'itertupl':703 'join':455 'keep':439 'key':626 'kpi':1045 'kpi-metric-analysi':1044 'lambda':481,696 'larg':193,202,249,360,470,492,634,648,682,921 'large-excel-read':920 'large-fil':633 'later':239 'layout':546 'level':865 'lightweight':88 'line':1127 'line-chart-visu':1126 'link':598 'list':706 'load':76,91,97,184,245,371,644,690,801,826,839,852 'logic':636 'manag':766 'mandatori':736 'map':66 'marker':563,568 'mask':531,541 'matplotlib':743,759 'matplotlib.font':765 'matplotlib.pyplot':761 'matplotlib.rcparams':787,795 'may':662 'memori':100,188,307 'metric':1046 'min':147 'minus':797 'miss':421,429,956 'missing-value-handl':955 'mode':105 'multi':908,928 'multi-file-read':927 'multi-sheet-read':907 'must':643 'n':383,406 'name':133,138,159,165,226,327,394,554,661,793 'need':818,845 'never':369 'next':567 'none':557,676 'normal':423,975,982 'np.where':486 'nrow':397 'null':431 'numer':980 'numeric-format-norm':979 'one':809 'oom':191,272 'op':701 'openpyxl':102,114,297 'openpyxl.load':117 'oper':484 'os':757 'os.path.exists':783 'os.path.expanduser':773 'outlier':986,1173 'outlier-color':1172 'outlier-detect':985 'output':600,605,613 'p':778,784,786,791 'panda':43,315 'parquet':18,231,237,242,261,290,318,331,332,341,342,694 'path':120,224,283,319,325,333,343,392,552,601,606,614,719,770,781,835,885 'pattern':79,262,822,878 'pd':317 'pd.isna':450 'pd.read':178,221,233,266,322,340,389,549,658,686 'per':94 'percentag':1099 'percentage-calcul':1098 'pie':1135 'pie-chart-visu':1134 'pip':714 'pivot':1050 'pivot-table-analysi':1049 'plot':1143 'plt':763 'posit':678 'preview':355 'print':162,170,399,403,408,609,721 'prohibit':680 'provid':596,875 'pyarrow':335 'python':112,313,375,428,528,599,755 'r':457 'rang':916,1014 'range-filt':1013 'range-read':915 'raw':548 're':444 're.findall':456 'read':103,121,198,219,240,253,260,281,295,347,365,380,414,693,833,894,903,910,917,923,930,936,943 'refer':808 'remov':963 'report':1236 'report-generation-export':1235 'result':527,588,618,623,1212 'result_df.to':603 'return':452,454 'row':84,93,128,139,146,148,154,155,160,166,168,174,182,209,212,277,280,299,303,311,363,384,475,559,630,642,685 'rule':472,627,837 'sampl':368 'sandbox':612 'save':589 'scatter':1142 'scatter-plot-visu':1141 'schema':352 'search':711,752 'see':488 'seri':1059 'setup':735 'sheet':11,29,87,95,130,157,164,225,228,326,329,357,393,396,500,553,555,673,902,909,935,1221 'singl':901,1220 'single-sheet-export':1219 'single-sheet-read':900 'size':111,379 'skill':39,41,72,252,293,495,651,804,815,830,843,856,874,883,890,898,953,999,1031,1079,1111,1160,1199,1217,1248 'skill-sn-da-excel-workflow' 'skill.md':255 'small':367 'sn':2,247,490,646 'sn-da-excel-workflow':1 'sn-da-large-file-analysi':246,489,645 'source-opensensenova' 'space':664 'specif':934 'specific-sheet-read':933 'stack':1147 'stacked-chart-visu':1146 'statist':1075,1083,1089,1095 'step':65,81,200,349,416,517,585,807,825,849 'stop':244 'str':460,535 'str.strip':536 'strategi':207,213 'stream':259,287,692 'string':670 'structur':61,358,941 'structured-header-read':940 'style':1244,1253 'sub':71,803,814,829,842,855,873,882,889,897,952,998,1030,1078,1110,1159,1198,1216,1247 'sub-skil':70,802,813,828,841,854,872,881,888,896,951,997,1029,1077,1109,1158,1197,1215,1246 'subprocess':709 'sum':141,436 'tabl':1051,1243,1251 'table-theme-styl':1250 'target':227,328,356,395,537 'text':424,437,447,469,974 'text-norm':973 'theme':1252 'threshold':640,1019,1177 'threshold-cell-color':1176 'threshold-filt':1018 'time':1058 'time-series-analysi':1057 'timeout':274 'top':864,1185 'top-level':863 'top-value-color':1184 'topic-agent' 'topic-agent-skills' 'topic-ai-agents' 'topic-ai-assistant' 'topic-data-analysis' 'topic-document-processing' 'topic-office-automation' 'topic-presentation-slides' 'total':127,153,173,211,474 'trend':1066 'trend-analysi':1065 'true':123,126,152 'type':354 'u4e00':458 'u9fff':459 'uniqu':723 'use':101,177,265,286,479,482,668,674,691,699,716,728 'val':448,451,453,461 'valu':150,422,430,538,724,730,957,1169,1186 'vector':483,498,700 'via':296 'visual':1107,1116,1122,1129,1137,1144,1149 'walk':558 'wast':860 'wb':116,137 'wb.close':169 'wb.sheetnames':135 'without':96 'work':107 'workbook':118 'workflow':5,54,59,80,806,866 'ws':136 'ws.iter':145 '不要自行写几行':42 '两类分类数据对比分析':1036 '交叉分析':1009 '交叉分析及多维度可视化':906 '交叉分组统计':1118 '交叉表与热力图进行多维度占比分析':1053 '从excel提取多类型数据':1239 '从带单位字符串列提取数值并清洗':1205 '优化':19 '使用openpyxl对符合条件的单元格进行样式标记':1023 '使用openpyxl输出带条件格式':1181 '关键指标自动化样式高亮':1191 '分布特征':1132 '分析':47 '分类分布清洗与统计':1233 '分类汇总与线性拟合':914 '分类汇总与结果导出':1256 '分类汇总统计':1138 '分组统计':1097 '功能':899,954,1000,1032,1080,1112,1161,1200,1218,1249 '动态评估数据量启用parquet优化':912 '动态识别目标列进行统计':944 '单元格着色':1157 '去除异常前缀':977 '参数敏感性多维度图表':1133 '各类别数量与占比统计':1090 '均值':1102 '基础统计':1084 '处理合并单元格':1117 '多sheet去重统计':964 '多sheet数据合并与前向填充':1096 '多sheet数据探查与条件过滤导出':1223 '多sheet数据清洗及分组聚合分析':1042 '多sheet智能清洗':959 '多文件读取与统计':931 '多维度分级评估与趋势预测':1068 '多维度统计分析与散点图可视化':1145 '多表合并':1189 '大型excel文件处理':924 '大文件parquet加速读取':1254 '大文件检测':15 '大文件自动parquet加速':972 '如低于均值标绿':1182 '如保留中文字符':971 '子':38 '对比多表中的特定系数并对异常值进行颜色标记':1171 '对特定行进行高亮标注':1166 '导出':32 '就回答':44 '峰度分析数据分布':990 '差异化增长率计算':1069 '市场占有率':1056 '并提供下载链接':34 '总行数':172 '指定行区间提取数据去重求和':1086 '按步骤编排':36 '提取关键指标进行单位一致性验证与排序分析':1048 '提取目标指标计算最大值':1165 '提取纯中文字符等':978 '支持parquet转换提速':925 '支持关键指标合计核对与结果文件导出':984 '支持合并单元格处理':905 '支持大文件parquet转换与可视化报告':932 '支持大规模数据自动性能优化':1017 '支持按条件筛选计算均值':1085 '支持文本长度':1010 '支持正则提取误差项':1124 '支持正则清洗':913 '数值列清洗':1021 '数值型分布分析与异常值检测':1123 '数值格式标准化':983 '数值的美化饼图':1140 '数据':50 '数据分析':49,1028 '数据分析多步编排器':7 '数据可视化':1108 '数据清洗':21,950 '数据清洗与交叉分析':938 '数据筛选':996 '文件并统计行数':13 '文本标准化':23 '文本标准化清洗':976 '无效字符':24 '时间序列趋势分析':1061 '是否通':666 '术语密度':1011 '条件格式':1196 '条件筛选':1255 '条件筛选与分类提取':26 '条件筛选记录并以整行标红格式导出excel':1228 '条件过滤':1022 '条形图':1208 '根据多维数值条件筛选并导出':1016 '根据数据规模动态选择处理策略进行条件筛选':1004 '根据数据规模动态选择策略':1188 '根据数据量动态选择处理策略':919 '正则匹配等综合评分与分级':1012 '正则清洗指定文本列':970 '正则清洗文本字段提取中文字符':945 '特定区域数据提取':918 '特征清洗与聚类分析':1130 '生成包含可视化图表与下载链接的综合分析报告':1240 '生成堆叠柱状图展示构成变化趋势':1152 '生成多维度交叉分析与高分辨率嵌入式图表报告':1234 '生成带下载链接的汇总报告':939 '生成带条件高亮的格式化报告':926 '生成带样式标记的统计表与图表':1043 '生成摘要与明细报表':965 '生成支持中英文字体的美化柱状图':1119 '生成柱状图':1091 '生成直方图':1206 '生成箱线图与直方图':1125 '生成趋势对比':1131 '生成高分辨率可视化报告':1064 '用户出现触发词':45 '百分比字符串数据处理':1150 '百分比清洗':1062 '的报告':1183 '目标设定':1071 '累积分布图':1209 '结合偏度':989 '结果导出':1214 '统计数量差异与比例关系并生成可视化':1037 '统计筛选':1190 '统计聚合':30 '统计计算':1076 '绩效分级建模与预测':1063 '绩效评估':1055 '缺失值':22 '自动识别关键字段生成包含占比':1139 '自定义分类统计':1008 '行自动':17 '补全缺失维度':1151 '表格分析':48 '表格样式':1245 '覆盖':8 '覆盖从数据读取到报告生成全流程':35 '计算时间序列平均值':1180 '识别超限数值与错误单元格并进行高亮标注':1175 '读取与解析':895 '读取单个工作表':904 '读取多':10 '读取多工作表':911 '跨':28 '跨sheet特定字段统计':937 '跨表核对与可视化分析':960 '输出结构化报告及图表':1103 '过':667 '适用于奖项分布':1054 '适用于绩效评估':1070 '适用于非正态数据预处理':991 '逐行扫描或列匹配提取关键指标并计算占比':1101 '遇到以下任一情况就主动使用本':40 '重命名字段后生成带下载链接的excel':1224 '饼图':1207 '饼图等组合可视化报告':1092","prices":[{"id":"cc5b90fe-e151-475f-9a0c-e5d832da638e","listingId":"ba555121-2cef-4ac7-a8d3-015f3cf76bdd","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"OpenSenseNova","category":"SenseNova-Skills","install_from":"skills.sh"},"createdAt":"2026-05-15T06:53:09.557Z"}],"sources":[{"listingId":"ba555121-2cef-4ac7-a8d3-015f3cf76bdd","source":"github","sourceId":"OpenSenseNova/SenseNova-Skills/sn-da-excel-workflow","sourceUrl":"https://github.com/OpenSenseNova/SenseNova-Skills/tree/main/skills/sn-da-excel-workflow","isPrimary":false,"firstSeenAt":"2026-05-15T06:53:09.557Z","lastSeenAt":"2026-05-18T18:53:04.358Z"}],"details":{"listingId":"ba555121-2cef-4ac7-a8d3-015f3cf76bdd","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"OpenSenseNova","slug":"sn-da-excel-workflow","github":{"repo":"OpenSenseNova/SenseNova-Skills","stars":1627,"topics":["agent","agent-skills","ai-agents","ai-assistant","data-analysis","document-processing","office-automation","presentation-slides"],"license":"mit","html_url":"https://github.com/OpenSenseNova/SenseNova-Skills","pushed_at":"2026-05-15T04:43:37Z","description":"Modular SenseNova skills for building AI-powered office assistants and productivity workflows","skill_md_sha":"4c541325e6ad029e284e69edd07ae613078dbdd3","skill_md_path":"skills/sn-da-excel-workflow/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/OpenSenseNova/SenseNova-Skills/tree/main/skills/sn-da-excel-workflow"},"layout":"multi","source":"github","category":"SenseNova-Skills","frontmatter":{"name":"sn-da-excel-workflow","description":"Excel 数据分析多步编排器。覆盖：(1) 读取多 Sheet Excel 文件并统计行数，(2) 大文件检测（≥10k 行自动 Parquet 优化），(3) 数据清洗（缺失值、文本标准化、无效字符），(4) 条件筛选与分类提取，(5) 跨 Sheet 统计聚合，(6) 导出 Excel/CSV 并提供下载链接。覆盖从数据读取到报告生成全流程，按步骤编排 capability 子 skill。**遇到以下任一情况就主动使用本 skill，不要自行写几行 pandas 就回答**：①用户出现触发词：Excel 分析 / 表格分析 / 数据分析 / 数据清洗 / 数据统计 / 数据筛选 / 数据可视化 / 数据导出 / 汇总统计 / 透视表 / 分组统计 / 交叉分析 / 趋势分析 / 对比分析 / 异常值检测 / 去重 / 缺失值处理 / Excel 报告 / 生成报表 / analyze Excel / data analysis / data cleaning / pivot table；②用户上传或指定了 .xlsx / .xls / .csv 文件并要求分析、清洗、统计或可视化；③任务涉及多 Sheet 读取、条件筛选、分类汇总、图表生成中的任意一项；④用户要求导出带格式的 Excel 报告或下载链接。仅不用于：不涉及表格数据的纯文本处理、图片分析（使用 sn-da-image-caption）、单个公式计算的简单问答。"},"skills_sh_url":"https://skills.sh/OpenSenseNova/SenseNova-Skills/sn-da-excel-workflow"},"updatedAt":"2026-05-18T18:53:04.358Z"}}