{"id":"a662f761-3700-465a-8805-943a99a8e0ab","shortId":"DQsqKn","kind":"skill","title":"csv-data-summarizer","tagline":"Analyzes CSV files and automatically generates comprehensive summaries with statistical insights, data quality checks, and visualizations using Python and pandas. No questions asked — just upload a CSV and get a full analysis immediately.","description":"# CSV Data Summarizer\n\nThis skill analyzes any CSV file and delivers a complete statistical summary with visualizations in one shot. It adapts intelligently to the type of data it finds — sales, customer, financial, operational, survey, or generic tabular data.\n\n## When to Use This Skill\n\n- User uploads or references a CSV file\n- Asking to summarize, analyze, or visualize tabular data\n- Requesting insights from a dataset\n- Wanting to understand data structure and quality\n\n## Behavior Rule\n\n**Do not ask the user what they want. Immediately run the full analysis.**\n\nWhen a CSV is provided, skip questions like \"What would you like me to do?\" and go straight to the analysis.\n\n## Required Tools / Libraries\n\n```bash\npip install pandas matplotlib seaborn\n```\n\n## How It Works\n\nThe skill inspects the data first, then automatically determines which analyses are relevant:\n\n| Data type | Focus areas |\n|-----------|-------------|\n| Sales / e-commerce | Time-series trends, revenue, product performance |\n| Customer data | Distributions, segmentation, geographic patterns |\n| Financial | Trend analysis, statistics, correlations |\n| Operational | Time-series, performance metrics, distributions |\n| Survey | Frequency analysis, cross-tabulations |\n| Generic | Adapts based on column types found |\n\nVisualizations are only created when they make sense:\n- Time-series plots → only if date/timestamp columns exist\n- Correlation heatmaps → only if multiple numeric columns exist\n- Category distributions → only if categorical columns exist\n- Histograms → for numeric distributions when relevant\n\n## Core Function\n\n```python\nimport pandas as pd\nimport matplotlib.pyplot as plt\nimport seaborn as sns\n\ndef summarize_csv(file_path):\n    df = pd.read_csv(file_path)\n    summary = []\n    charts_created = []\n\n    # --- Overview ---\n    summary.append(\"=\" * 60)\n    summary.append(\"DATA OVERVIEW\")\n    summary.append(\"=\" * 60)\n    summary.append(f\"Rows: {df.shape[0]:,} | Columns: {df.shape[1]}\")\n    summary.append(f\"\\nColumns: {', '.join(df.columns.tolist())}\")\n\n    summary.append(\"\\nDATA TYPES:\")\n    for col, dtype in df.dtypes.items():\n        summary.append(f\"  • {col}: {dtype}\")\n\n    # --- Data quality ---\n    missing = df.isnull().sum().sum()\n    missing_pct = (missing / (df.shape[0] * df.shape[1])) * 100\n    summary.append(\"\\nDATA QUALITY:\")\n    if missing:\n        summary.append(f\"Missing values: {missing:,} ({missing_pct:.2f}% of total data)\")\n        for col in df.columns:\n            col_missing = df[col].isnull().sum()\n            if col_missing > 0:\n                summary.append(f\"  • {col}: {col_missing:,} ({(col_missing / len(df)) * 100:.1f}%)\")\n    else:\n        summary.append(\"No missing values — dataset is complete.\")\n\n    # --- Numeric analysis ---\n    numeric_cols = df.select_dtypes(include='number').columns.tolist()\n    if numeric_cols:\n        summary.append(\"\\nNUMERICAL ANALYSIS:\")\n        summary.append(str(df[numeric_cols].describe()))\n\n        if len(numeric_cols) > 1:\n            corr_matrix = df[numeric_cols].corr()\n            summary.append(\"\\nCORRELATIONS:\")\n            summary.append(str(corr_matrix))\n\n            plt.figure(figsize=(10, 8))\n            sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', center=0, square=True, linewidths=1)\n            plt.title('Correlation Heatmap')\n            plt.tight_layout()\n            plt.savefig('correlation_heatmap.png', dpi=150)\n            plt.close()\n            charts_created.append('correlation_heatmap.png')\n\n    # --- Categorical analysis ---\n    categorical_cols = [c for c in df.select_dtypes(include='object').columns if 'id' not in c.lower()]\n    if categorical_cols:\n        summary.append(\"\\nCATEGORICAL ANALYSIS:\")\n        for col in categorical_cols[:5]:\n            value_counts = df[col].value_counts()\n            summary.append(f\"\\n{col}:\")\n            for val, count in value_counts.head(10).items():\n                summary.append(f\"  • {val}: {count:,} ({(count / len(df)) * 100:.1f}%)\")\n\n    # --- Time series analysis ---\n    date_cols = [c for c in df.columns if 'date' in c.lower() or 'time' in c.lower()]\n    if date_cols:\n        date_col = date_cols[0]\n        df[date_col] = pd.to_datetime(df[date_col], errors='coerce')\n        date_range = df[date_col].max() - df[date_col].min()\n        summary.append(f\"\\nTIME SERIES ANALYSIS:\")\n        summary.append(f\"Date range: {df[date_col].min()} to {df[date_col].max()}\")\n        summary.append(f\"Span: {date_range.days} days\")\n\n        if numeric_cols:\n            fig, axes = plt.subplots(min(3, len(numeric_cols)), 1, figsize=(12, 4 * min(3, len(numeric_cols))))\n            if len(numeric_cols) == 1:\n                axes = [axes]\n            for idx, num_col in enumerate(numeric_cols[:3]):\n                ax = axes[idx]\n                df.groupby(date_col)[num_col].mean().plot(ax=ax, linewidth=2)\n                ax.set_title(f'{num_col} Over Time')\n                ax.set_xlabel('Date')\n                ax.set_ylabel(num_col)\n                ax.grid(True, alpha=0.3)\n            plt.tight_layout()\n            plt.savefig('time_series_analysis.png', dpi=150)\n            plt.close()\n            charts_created.append('time_series_analysis.png')\n\n    # --- Distribution plots ---\n    if numeric_cols:\n        fig, axes = plt.subplots(2, 2, figsize=(12, 10))\n        axes = axes.flatten()\n        for idx, col in enumerate(numeric_cols[:4]):\n            axes[idx].hist(df[col].dropna(), bins=30, edgecolor='black', alpha=0.7)\n            axes[idx].set_title(f'Distribution of {col}')\n            axes[idx].set_xlabel(col)\n            axes[idx].set_ylabel('Frequency')\n            axes[idx].grid(True, alpha=0.3)\n        for idx in range(len(numeric_cols[:4]), 4):\n            axes[idx].set_visible(False)\n        plt.tight_layout()\n        plt.savefig('distributions.png', dpi=150)\n        plt.close()\n        charts_created.append('distributions.png')\n\n    # --- Categorical distribution plots ---\n    if categorical_cols:\n        fig, axes = plt.subplots(2, 2, figsize=(14, 10))\n        axes = axes.flatten()\n        for idx, col in enumerate(categorical_cols[:4]):\n            value_counts = df[col].value_counts().head(10)\n            axes[idx].barh(range(len(value_counts)), value_counts.values)\n            axes[idx].set_yticks(range(len(value_counts)))\n            axes[idx].set_yticklabels(value_counts.index)\n            axes[idx].set_title(f'Top Values in {col}')\n            axes[idx].set_xlabel('Count')\n            axes[idx].grid(True, alpha=0.3, axis='x')\n        for idx in range(len(categorical_cols[:4]), 4):\n            axes[idx].set_visible(False)\n        plt.tight_layout()\n        plt.savefig('categorical_distributions.png', dpi=150)\n        plt.close()\n        charts_created.append('categorical_distributions.png')\n\n    if charts_created:\n        summary.append(\"\\nVISUALIZATIONS CREATED:\")\n        for chart in charts_created:\n            summary.append(f\"  ✓ {chart}\")\n\n    summary.append(\"\\n\" + \"=\" * 60)\n    summary.append(\"ANALYSIS COMPLETE\")\n    summary.append(\"=\" * 60)\n\n    return \"\\n\".join(summary)\n```\n\n## Usage\n\n```\nHere's sales_data.csv. Can you summarize this file?\n```\n\n```\nAnalyze this customer data CSV and show me trends.\n```\n\n```\nWhat insights can you find in orders.csv?\n```\n\n## Example Output\n\n```\n============================================================\nDATA OVERVIEW\n============================================================\nRows: 5,000 | Columns: 8\nColumns: order_id, date, product, category, quantity, price, region, customer_id\n\nDATA TYPES:\n  • order_id: int64\n  • date: object\n  • price: float64\n  ...\n\nDATA QUALITY:\nMissing values: 100 (0.25% of total data)\n  • price: 100 (2.0%)\n\nNUMERICAL ANALYSIS:\n         quantity        price\ncount    5000.000    4900.000\nmean        3.200      58.200\nstd         1.800      12.400\n...\n\nTIME SERIES ANALYSIS:\nDate range: 2023-01-01 to 2023-12-31\nSpan: 364 days\n\nVISUALIZATIONS CREATED:\n  ✓ time_series_analysis.png\n  ✓ distributions.png\n  ✓ categorical_distributions.png\n  ✓ correlation_heatmap.png\n============================================================\nANALYSIS COMPLETE\n============================================================\n```\n\n## Notes\n\n- Date columns are auto-detected if the column name contains `date` or `time`\n- Columns with `id` in the name are excluded from categorical analysis\n- All charts are saved as PNG files in the working directory\n- Missing data is handled gracefully throughout\n\n## Related Skills\n\n- `json-and-csv-data-transformation` — Clean and reshape CSV data before analysis\n- `database-query-and-export` — Export query results to CSV for analysis\n- `d3js-data-visualization` — Build interactive browser-based charts from the same data","tags":["csv","data","summarizer","open","skills","besoeasy","agent-skills","ai-agents","claude-code","clawdbot","clawdbot-skill","llm-tools"],"capabilities":["skill","source-besoeasy","skill-csv-data-summarizer","topic-agent-skills","topic-ai-agents","topic-claude-code","topic-clawdbot","topic-clawdbot-skill","topic-llm-tools","topic-mcp-server","topic-openai","topic-openclaw","topic-vibe-coding","topic-vibecoding"],"categories":["open-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/besoeasy/open-skills/csv-data-summarizer","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add besoeasy/open-skills","source_repo":"https://github.com/besoeasy/open-skills","install_from":"skills.sh"}},"qualityScore":"0.505","qualityRationale":"deterministic score 0.51 from registry signals: · indexed on github topic:agent-skills · 111 github stars · SKILL.md body (8,751 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-05-02T12:55:03.162Z","embedding":null,"createdAt":"2026-04-18T22:10:38.811Z","updatedAt":"2026-05-02T12:55:03.162Z","lastSeenAt":"2026-05-02T12:55:03.162Z","tsv":"'-01':938,939 '-12':942 '-31':943 '0':294,325,358,428,526 '0.25':912 '0.3':637,705,801 '0.7':681 '000':884 '1':297,327,403,432,581,594 '1.800':930 '10':418,490,659,742,760 '100':328,368,499,911,917 '12':583,658 '12.400':931 '14':741 '150':441,643,725,823 '1f':369,500 '2':619,655,656,738,739 '2.0':918 '2023':937,941 '2f':341 '3':577,586,605 '3.200':927 '30':677 '364':945 '4':584,669,713,714,752,811,812 '4900.000':925 '5':474,883 '5000.000':924 '58.200':928 '60':284,289,843,848 '8':419,886 'adapt':59,210 'alpha':636,680,704,800 'analys':167 'analysi':36,123,144,193,205,379,392,446,468,503,551,845,920,934,953,980,1012,1024 'analyz':5,43,92,862 'annot':423 'area':173 'ask':27,89,113 'auto':960 'auto-detect':959 'automat':9,164 'ax':606,616,617 'ax.grid':634 'ax.set':620,627,630 'axe':574,595,596,607,653,660,670,682,690,695,700,715,736,743,761,769,777,782,791,796,813 'axes.flatten':661,744 'axi':802 'barh':763 'base':211,1033 'bash':148 'behavior':109 'bin':676 'black':679 'browser':1032 'browser-bas':1031 'build':1029 'c':449,451,506,508 'c.lower':462,514,518 'categor':245,445,447,464,472,729,733,750,809,979 'categori':241,892 'categorical_distributions.png':821,826,951 'center':427 'chart':280,828,834,836,840,982,1034 'charts_created.append':443,645,727,825 'check':18 'clean':1006 'cmap':425 'coerc':536 'col':307,313,346,349,352,356,361,362,364,381,389,397,402,408,448,465,470,473,478,484,505,521,523,525,529,534,541,545,558,563,572,580,589,593,600,604,611,613,624,633,651,664,668,674,689,694,712,734,747,751,756,790,810 'column':213,231,239,246,295,457,885,887,957,964,970 'columns.tolist':386 'commerc':177 'complet':50,377,846,954 'comprehens':11 'contain':966 'coolwarm':426 'core':254 'corr':404,409,414,421 'correl':195,233,434 'correlation_heatmap.png':439,444,952 'count':476,480,487,495,496,754,758,767,776,795,923 'creat':219,281,829,832,837,948 'cross':207 'cross-tabul':206 'csv':2,6,31,38,45,87,126,271,276,866,1003,1009,1022 'csv-data-summar':1 'custom':69,185,864,896 'd3js':1026 'd3js-data-visualization':1025 'data':3,16,39,65,76,96,105,161,170,186,286,315,344,865,880,898,907,915,993,1004,1010,1027,1038 'databas':1014 'database-query-and-export':1013 'dataset':101,375 'date':504,512,520,522,524,528,533,537,540,544,554,557,562,610,629,890,903,935,956,967 'date/timestamp':230 'date_range.days':568 'datetim':531 'day':569,946 'def':269 'deliv':48 'describ':398 'detect':961 'determin':165 'df':274,351,367,395,406,477,498,527,532,539,543,556,561,673,755 'df.columns':348,510 'df.columns.tolist':302 'df.dtypes.items':310 'df.groupby':609 'df.isnull':318 'df.select':382,453 'df.shape':293,296,324,326 'directori':991 'distribut':187,202,242,251,647,687,730 'distributions.png':723,728,950 'dpi':440,642,724,822 'dropna':675 'dtype':308,314,383,454 'e':176 'e-commerc':175 'edgecolor':678 'els':370 'enumer':602,666,749 'error':535 'exampl':878 'exclud':977 'exist':232,240,247 'export':1017,1018 'f':291,299,312,335,360,482,493,548,553,566,622,686,786,839 'fals':719,817 'fig':573,652,735 'figsiz':417,582,657,740 'file':7,46,88,272,277,861,987 'financi':70,191 'find':67,875 'first':162 'float64':906 'focus':172 'found':215 'frequenc':204,699 'full':35,122 'function':255 'generat':10 'generic':74,209 'geograph':189 'get':33 'go':140 'grace':996 'grid':702,798 'handl':995 'head':759 'heatmap':234,435 'hist':672 'histogram':248 'id':459,889,897,901,972 'idx':598,608,663,671,683,691,696,701,707,716,746,762,770,778,783,792,797,805,814 'immedi':37,119 'import':257,261,265 'includ':384,455 'insight':15,98,872 'inspect':159 'instal':150 'int64':902 'intellig':60 'interact':1030 'isnul':353 'item':491 'join':301,851 'json':1001 'json-and-csv-data-transform':1000 'layout':437,639,721,819 'len':366,400,497,578,587,591,710,765,774,808 'librari':147 'like':131,135 'linewidth':431,618 'make':222 'matplotlib':152 'matplotlib.pyplot':262 'matrix':405,415,422 'max':542,564 'mean':614,926 'metric':201 'min':546,559,576,585 'miss':317,321,323,333,336,338,339,350,357,363,365,373,909,992 'multipl':237 'n':483,842,850 'name':965,975 'ncategor':467 'ncolumn':300 'ncorrel':411 'ndata':304,330 'nnumer':391 'note':955 'ntime':549 'num':599,612,623,632 'number':385 'numer':238,250,378,380,388,396,401,407,571,579,588,592,603,650,667,711,919 'nvisual':831 'object':456,904 'one':56 'oper':71,196 'order':888,900 'orders.csv':877 'output':879 'overview':282,287,881 'panda':24,151,258 'path':273,278 'pattern':190 'pct':322,340 'pd':260 'pd.read':275 'pd.to':530 'perform':184,200 'pip':149 'plot':227,615,648,731 'plt':264 'plt.close':442,644,726,824 'plt.figure':416 'plt.savefig':438,640,722,820 'plt.subplots':575,654,737 'plt.tight':436,638,720,818 'plt.title':433 'png':986 'price':894,905,916,922 'product':183,891 'provid':128 'python':22,256 'qualiti':17,108,316,331,908 'quantiti':893,921 'queri':1015,1019 'question':26,130 'rang':538,555,709,764,773,807,936 'refer':85 'region':895 'relat':998 'relev':169,253 'request':97 'requir':145 'reshap':1008 'result':1020 'return':849 'revenu':182 'row':292,882 'rule':110 'run':120 'sale':68,174 'sales_data.csv':856 'save':984 'seaborn':153,266 'segment':188 'sens':223 'seri':180,199,226,502,550,933 'set':684,692,697,717,771,779,784,793,815 'shot':57 'show':868 'skill':42,81,158,999 'skill-csv-data-summarizer' 'skip':129 'sns':268 'sns.heatmap':420 'source-besoeasy' 'span':567,944 'squar':429 'statist':14,51,194 'std':929 'str':394,413 'straight':141 'structur':106 'sum':319,320,354 'summar':4,40,91,270,859 'summari':12,52,279,852 'summary.append':283,285,288,290,298,303,311,329,334,359,371,390,393,410,412,466,481,492,547,552,565,830,838,841,844,847 'survey':72,203 'tabul':208 'tabular':75,95 'throughout':997 'time':179,198,225,501,516,626,932,969 'time-seri':178,197,224 'time_series_analysis.png':641,646,949 'titl':621,685,785 'tool':146 'top':787 'topic-agent-skills' 'topic-ai-agents' 'topic-claude-code' 'topic-clawdbot' 'topic-clawdbot-skill' 'topic-llm-tools' 'topic-mcp-server' 'topic-openai' 'topic-openclaw' 'topic-vibe-coding' 'topic-vibecoding' 'total':343,914 'transform':1005 'trend':181,192,870 'true':424,430,635,703,799 'type':63,171,214,305,899 'understand':104 'upload':29,83 'usag':853 'use':21,79 'user':82,115 'val':486,494 'valu':337,374,475,479,753,757,766,775,788,910 'value_counts.head':489 'value_counts.index':781 'value_counts.values':768 'visibl':718,816 'visual':20,54,94,216,947,1028 'want':102,118 'work':156,990 'would':133 'x':803 'xlabel':628,693,794 'ylabel':631,698 'ytick':772 'yticklabel':780","prices":[{"id":"5b60efff-9088-4796-a0ec-53a2f019e89c","listingId":"a662f761-3700-465a-8805-943a99a8e0ab","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"besoeasy","category":"open-skills","install_from":"skills.sh"},"createdAt":"2026-04-18T22:10:38.811Z"}],"sources":[{"listingId":"a662f761-3700-465a-8805-943a99a8e0ab","source":"github","sourceId":"besoeasy/open-skills/csv-data-summarizer","sourceUrl":"https://github.com/besoeasy/open-skills/tree/main/skills/csv-data-summarizer","isPrimary":false,"firstSeenAt":"2026-04-18T22:10:38.811Z","lastSeenAt":"2026-05-02T12:55:03.162Z"}],"details":{"listingId":"a662f761-3700-465a-8805-943a99a8e0ab","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"besoeasy","slug":"csv-data-summarizer","github":{"repo":"besoeasy/open-skills","stars":111,"topics":["agent-skills","ai","ai-agents","claude-code","clawdbot","clawdbot-skill","llm-tools","mcp-server","openai","openclaw","vibe-coding","vibecoding"],"license":null,"html_url":"https://github.com/besoeasy/open-skills","pushed_at":"2026-03-31T13:05:30Z","description":"Battle-tested skill library for AI agents. Save 98% of API costs with ready-to-use code for crypto, PDFs, search, web scraping & more. No trial-and-error, no expensive APIs.","skill_md_sha":"c034b425a128a2da818fbd00b40eb59451be68f3","skill_md_path":"skills/csv-data-summarizer/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/besoeasy/open-skills/tree/main/skills/csv-data-summarizer"},"layout":"multi","source":"github","category":"open-skills","frontmatter":{"name":"csv-data-summarizer","description":"Analyzes CSV files and automatically generates comprehensive summaries with statistical insights, data quality checks, and visualizations using Python and pandas. No questions asked — just upload a CSV and get a full analysis immediately."},"skills_sh_url":"https://skills.sh/besoeasy/open-skills/csv-data-summarizer"},"updatedAt":"2026-05-02T12:55:03.162Z"}}