{"id":"deadee02-81c2-463a-b1e3-cc5e535cd3e4","shortId":"kgf4Hs","kind":"skill","title":"computer-vision-expert","tagline":"SOTA Computer Vision Expert (2026). Specialized in YOLO26, Segment Anything 3 (SAM 3), Vision Language Models, and real-time spatial analysis.","description":"# Computer Vision Expert (SOTA 2026)\n\n**Role**: Advanced Vision Systems Architect & Spatial Intelligence Expert\n\n## Purpose\nTo provide expert guidance on designing, implementing, and optimizing state-of-the-art computer vision pipelines. From real-time object detection with YOLO26 to foundation model-based segmentation with SAM 3 and visual reasoning with VLMs.\n\n## When to Use\n- Designing high-performance real-time detection systems (YOLO26).\n- Implementing zero-shot or text-guided segmentation tasks (SAM 3).\n- Building spatial awareness, depth estimation, or 3D reconstruction systems.\n- Optimizing vision models for edge device deployment (ONNX, TensorRT, NPU).\n- Needing to bridge classical geometry (calibration) with modern deep learning.\n\n## Capabilities\n\n### 1. Unified Real-Time Detection (YOLO26)\n- **NMS-Free Architecture**: Mastery of end-to-end inference without Non-Maximum Suppression (reducing latency and complexity).\n- **Edge Deployment**: Optimization for low-power hardware using Distribution Focal Loss (DFL) removal and MuSGD optimizer.\n- **Improved Small-Object Recognition**: Expertise in using ProgLoss and STAL assignment for high precision in IoT and industrial settings.\n\n### 2. Promptable Segmentation (SAM 3)\n- **Text-to-Mask**: Ability to segment objects using natural language descriptions (e.g., \"the blue container on the right\").\n- **SAM 3D**: Reconstructing objects, scenes, and human bodies in 3D from single/multi-view images.\n- **Unified Logic**: One model for detection, segmentation, and tracking with 2x accuracy over SAM 2.\n\n### 3. Vision Language Models (VLMs)\n- **Visual Grounding**: Leveraging Florence-2, PaliGemma 2, or Qwen2-VL for semantic scene understanding.\n- **Visual Question Answering (VQA)**: Extracting structured data from visual inputs through conversational reasoning.\n\n### 4. Geometry & Reconstruction\n- **Depth Anything V2**: State-of-the-art monocular depth estimation for spatial awareness.\n- **Sub-pixel Calibration**: Chessboard/Charuco pipelines for high-precision stereo/multi-camera rigs.\n- **Visual SLAM**: Real-time localization and mapping for autonomous systems.\n\n## Patterns\n\n### 1. Text-Guided Vision Pipelines\n- Use SAM 3's text-to-mask capability to isolate specific parts during inspection without needing custom detectors for every variation.\n- Combine YOLO26 for fast \"candidate proposal\" and SAM 3 for \"precise mask refinement\".\n\n### 2. Deployment-First Design\n- Leverage YOLO26's simplified ONNX/TensorRT exports (NMS-free).\n- Use MuSGD for significantly faster training convergence on custom datasets.\n\n### 3. Progressive 3D Scene Reconstruction\n- Integrate monocular depth maps with geometric homographies to build accurate 2.5D/3D representations of scenes.\n\n## Anti-Patterns\n\n- **Manual NMS Post-processing**: Stick to NMS-free architectures (YOLO26/v10+) for lower overhead.\n- **Click-Only Segmentation**: Forgetting that SAM 3 eliminates the need for manual point prompts in many scenarios via text grounding.\n- **Legacy DFL Exports**: Using outdated export pipelines that don't take advantage of YOLO26's simplified module structure.\n\n## Sharp Edges (2026)\n\n| Issue | Severity | Solution |\n|-------|----------|----------|\n| SAM 3 VRAM Usage | Medium | Use quantized/distilled versions for local GPU inference. |\n| Text Ambiguity | Low | Use descriptive prompts (\"the 5mm bolt\" instead of just \"bolt\"). |\n| Motion Blur | Medium | Optimize shutter speed or use SAM 3's temporal tracking consistency. |\n| Hardware Compatibility | Low | YOLO26 simplified architecture is highly compatible with NPU/TPUs. |\n\n## Related Skills\n`ai-engineer`, `robotics-expert`, `research-engineer`, `embedded-systems`\n\n## Limitations\n- Use this skill only when the task clearly matches the scope described above.\n- Do not treat the output as a substitute for environment-specific validation, testing, or expert review.\n- Stop and ask for clarification if required inputs, permissions, safety boundaries, or success criteria are missing.","tags":["computer","vision","expert","antigravity","awesome","skills","sickn33","agent-skills","agentic-skills","ai-agent-skills","ai-agents","ai-coding"],"capabilities":["skill","source-sickn33","skill-computer-vision-expert","topic-agent-skills","topic-agentic-skills","topic-ai-agent-skills","topic-ai-agents","topic-ai-coding","topic-ai-workflows","topic-antigravity","topic-antigravity-skills","topic-claude-code","topic-claude-code-skills","topic-codex-cli","topic-codex-skills"],"categories":["antigravity-awesome-skills"],"synonyms":[],"warnings":[],"endpointUrl":"https://skills.sh/sickn33/antigravity-awesome-skills/computer-vision-expert","protocol":"skill","transport":"skills-sh","auth":{"type":"none","details":{"cli":"npx skills add sickn33/antigravity-awesome-skills","source_repo":"https://github.com/sickn33/antigravity-awesome-skills","install_from":"skills.sh"}},"qualityScore":"0.700","qualityRationale":"deterministic score 0.70 from registry signals: · indexed on github topic:agent-skills · 34882 github stars · SKILL.md body (4,008 chars)","verified":false,"liveness":"unknown","lastLivenessCheck":null,"agentReviews":{"count":0,"score_avg":null,"cost_usd_avg":null,"success_rate":null,"latency_p50_ms":null,"narrative_summary":null,"summary_updated_at":null},"enrichmentModel":"deterministic:skill-github:v1","enrichmentVersion":1,"enrichedAt":"2026-04-24T12:50:49.577Z","embedding":null,"createdAt":"2026-04-18T21:34:53.366Z","updatedAt":"2026-04-24T12:50:49.577Z","lastSeenAt":"2026-04-24T12:50:49.577Z","tsv":"'-2':260 '1':135,325 '2':199,250,262,366 '2.5':405 '2026':9,31,469 '2x':246 '3':15,17,74,104,203,251,333,361,390,435,474,507 '3d':111,224,232,392 '4':284 '5mm':492 'abil':208 'accur':404 'accuraci':247 'advanc':33 'advantag':460 'ai':526 'ai-engin':525 'ambigu':486 'analysi':26 'answer':273 'anti':411 'anti-pattern':410 'anyth':14,288 'architect':36 'architectur':145,423,517 'art':54,294 'ask':570 'assign':190 'autonom':322 'awar':107,300 'base':70 'blue':218 'blur':499 'bodi':230 'bolt':493,497 'boundari':578 'bridg':126 'build':105,403 'calibr':129,304 'candid':357 'capabl':134,339 'chessboard/charuco':305 'clarif':572 'classic':127 'clear':545 'click':429 'click-on':428 'combin':353 'compat':513,520 'complex':161 'comput':2,6,27,55 'computer-vision-expert':1 'consist':511 'contain':219 'converg':386 'convers':282 'criteria':581 'custom':348,388 'd/3d':406 'data':277 'dataset':389 'deep':132 'deploy':120,163,368 'deployment-first':367 'depth':108,287,296,397 'describ':549 'descript':215,489 'design':46,83,370 'detect':63,90,140,241 'detector':349 'devic':119 'dfl':174,450 'distribut':171 'e.g':216 'edg':118,162,468 'elimin':436 'embed':535 'embedded-system':534 'end':149,151 'end-to-end':148 'engin':527,533 'environ':561 'environment-specif':560 'estim':109,297 'everi':351 'expert':4,8,29,39,43,530,566 'expertis':184 'export':376,451,454 'extract':275 'fast':356 'faster':384 'first':369 'florenc':259 'focal':172 'forget':432 'foundat':67 'free':144,379,422 'geometr':400 'geometri':128,285 'gpu':483 'ground':257,448 'guid':100,328 'guidanc':44 'hardwar':169,512 'high':85,192,309,519 'high-perform':84 'high-precis':308 'homographi':401 'human':229 'imag':235 'implement':47,93 'improv':179 'industri':197 'infer':152,484 'input':280,575 'inspect':345 'instead':494 'integr':395 'intellig':38 'iot':195 'isol':341 'issu':470 'languag':19,214,253 'latenc':159 'learn':133 'legaci':449 'leverag':258,371 'limit':537 'local':318,482 'logic':237 'loss':173 'low':167,487,514 'low-pow':166 'lower':426 'mani':444 'manual':413,440 'map':320,398 'mask':207,338,364 'masteri':146 'match':546 'maximum':156 'medium':477,500 'miss':583 'model':20,69,116,239,254 'model-bas':68 'modern':131 'modul':465 'monocular':295,396 'motion':498 'musgd':177,381 'natur':213 'need':124,347,438 'nms':143,378,414,421 'nms-free':142,377,420 'non':155 'non-maximum':154 'npu':123 'npu/tpus':522 'object':62,182,211,226 'one':238 'onnx':121 'onnx/tensorrt':375 'optim':49,114,164,178,501 'outdat':453 'output':555 'overhead':427 'paligemma':261 'part':343 'pattern':324,412 'perform':86 'permiss':576 'pipelin':57,306,330,455 'pixel':303 'point':441 'post':416 'post-process':415 'power':168 'precis':193,310,363 'process':417 'progloss':187 'progress':391 'prompt':442,490 'promptabl':200 'propos':358 'provid':42 'purpos':40 'quantized/distilled':479 'question':272 'qwen2':265 'qwen2-vl':264 'real':23,60,88,138,316 'real-tim':22,59,87,137,315 'reason':77,283 'recognit':183 'reconstruct':112,225,286,394 'reduc':158 'refin':365 'relat':523 'remov':175 'represent':407 'requir':574 'research':532 'research-engin':531 'review':567 'rig':312 'right':222 'robot':529 'robotics-expert':528 'role':32 'safeti':577 'sam':16,73,103,202,223,249,332,360,434,473,506 'scenario':445 'scene':227,269,393,409 'scope':548 'segment':13,71,101,201,210,242,431 'semant':268 'set':198 'sever':471 'sharp':467 'shot':96 'shutter':502 'signific':383 'simplifi':374,464,516 'single/multi-view':234 'skill':524,540 'skill-computer-vision-expert' 'slam':314 'small':181 'small-object':180 'solut':472 'sota':5,30 'source-sickn33' 'spatial':25,37,106,299 'special':10 'specif':342,562 'speed':503 'stal':189 'state':51,291 'state-of-the-art':50,290 'stereo/multi-camera':311 'stick':418 'stop':568 'structur':276,466 'sub':302 'sub-pixel':301 'substitut':558 'success':580 'suppress':157 'system':35,91,113,323,536 'take':459 'task':102,544 'tempor':509 'tensorrt':122 'test':564 'text':99,205,327,336,447,485 'text-guid':98,326 'text-to-mask':204,335 'time':24,61,89,139,317 'topic-agent-skills' 'topic-agentic-skills' 'topic-ai-agent-skills' 'topic-ai-agents' 'topic-ai-coding' 'topic-ai-workflows' 'topic-antigravity' 'topic-antigravity-skills' 'topic-claude-code' 'topic-claude-code-skills' 'topic-codex-cli' 'topic-codex-skills' 'track':244,510 'train':385 'treat':553 'understand':270 'unifi':136,236 'usag':476 'use':82,170,186,212,331,380,452,478,488,505,538 'v2':289 'valid':563 'variat':352 'version':480 'via':446 'vision':3,7,18,28,34,56,115,252,329 'visual':76,256,271,279,313 'vl':266 'vlms':79,255 'vqa':274 'vram':475 'without':153,346 'yolo26':12,65,92,141,354,372,462,515 'yolo26/v10':424 'zero':95 'zero-shot':94","prices":[{"id":"ce425169-3261-4f95-9545-4ede9e0a5343","listingId":"deadee02-81c2-463a-b1e3-cc5e535cd3e4","amountUsd":"0","unit":"free","nativeCurrency":null,"nativeAmount":null,"chain":null,"payTo":null,"paymentMethod":"skill-free","isPrimary":true,"details":{"org":"sickn33","category":"antigravity-awesome-skills","install_from":"skills.sh"},"createdAt":"2026-04-18T21:34:53.366Z"}],"sources":[{"listingId":"deadee02-81c2-463a-b1e3-cc5e535cd3e4","source":"github","sourceId":"sickn33/antigravity-awesome-skills/computer-vision-expert","sourceUrl":"https://github.com/sickn33/antigravity-awesome-skills/tree/main/skills/computer-vision-expert","isPrimary":false,"firstSeenAt":"2026-04-18T21:34:53.366Z","lastSeenAt":"2026-04-24T12:50:49.577Z"}],"details":{"listingId":"deadee02-81c2-463a-b1e3-cc5e535cd3e4","quickStartSnippet":null,"exampleRequest":null,"exampleResponse":null,"schema":null,"openapiUrl":null,"agentsTxtUrl":null,"citations":[],"useCases":[],"bestFor":[],"notFor":[],"kindDetails":{"org":"sickn33","slug":"computer-vision-expert","github":{"repo":"sickn33/antigravity-awesome-skills","stars":34882,"topics":["agent-skills","agentic-skills","ai-agent-skills","ai-agents","ai-coding","ai-workflows","antigravity","antigravity-skills","claude-code","claude-code-skills","codex-cli","codex-skills","cursor","cursor-skills","developer-tools","gemini-cli","gemini-skills","kiro","mcp","skill-library"],"license":"mit","html_url":"https://github.com/sickn33/antigravity-awesome-skills","pushed_at":"2026-04-24T06:41:17Z","description":"Installable GitHub library of 1,400+ agentic skills for Claude Code, Cursor, Codex CLI, Gemini CLI, Antigravity, and more. Includes installer CLI, bundles, workflows, and official/community skill collections.","skill_md_sha":"18f33d4941eb3e9f656922e97b03de1d5ec08bb3","skill_md_path":"skills/computer-vision-expert/SKILL.md","default_branch":"main","skill_tree_url":"https://github.com/sickn33/antigravity-awesome-skills/tree/main/skills/computer-vision-expert"},"layout":"multi","source":"github","category":"antigravity-awesome-skills","frontmatter":{"name":"computer-vision-expert","description":"SOTA Computer Vision Expert (2026). Specialized in YOLO26, Segment Anything 3 (SAM 3), Vision Language Models, and real-time spatial analysis."},"skills_sh_url":"https://skills.sh/sickn33/antigravity-awesome-skills/computer-vision-expert"},"updatedAt":"2026-04-24T12:50:49.577Z"}}