Skillquality 0.45

ai-product

Expert in shipping production-grade AI-powered features — LLM integration patterns, RAG architecture, prompt engineering that scales, AI UX that users trust, safety and guardrails, streaming, and cost optimization.

Price

free

Protocol

skill

Verified

Endpoint

https://skills.sh/rkz91/coco/ai-product

What it does

AI Product Development

Use when: building AI features into a product, integrating LLMs, designing RAG pipelines, implementing AI safety/guardrails, optimizing AI costs, building AI UX patterns, prompt engineering for production, handling hallucinations, streaming LLM responses, or evaluating AI output quality.

When This Skill Is Activated

Read this file fully before proceeding
Understand what AI feature the user is building
Apply the relevant patterns below (integration, RAG, UX, safety, cost)
Always address: output validation, error handling, cost awareness, user trust

Core Principle

Demos are easy. Production is hard. Every pattern below exists because something broke in production.

LLM Integration Patterns

Structured Output with Validation

Never parse free-text LLM output with regex. Use structured output modes and validate with a schema.

import { z } from "zod";
import OpenAI from "openai";

// 1. Define your schema
const ProductReviewSchema = z.object({
  sentiment: z.enum(["positive", "negative", "neutral"]),
  score: z.number().min(0).max(10),
  summary: z.string().max(200),
  keyTopics: z.array(z.string()).max(5),
});

type ProductReview = z.infer<typeof ProductReviewSchema>;

// 2. Call with structured output
const openai = new OpenAI();

async function analyzeReview(reviewText: string): Promise<ProductReview> {
  const response = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    response_format: { type: "json_object" },
    messages: [
      {
        role: "system",
        content: `Analyze the product review. Return JSON matching this schema:
          { sentiment: "positive"|"negative"|"neutral", score: 0-10, summary: string, keyTopics: string[] }`,
      },
      { role: "user", content: reviewText },
    ],
  });

  const raw = JSON.parse(response.choices[0].message.content!);

  // 3. Always validate — the model can return anything
  const result = ProductReviewSchema.parse(raw);
  return result;
}

Streaming with Progress

Stream LLM responses to reduce perceived latency. Show users something is happening immediately.

async function streamResponse(prompt: string, onChunk: (text: string) => void) {
  const stream = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [{ role: "user", content: prompt }],
    stream: true,
  });

  let fullText = "";
  for await (const chunk of stream) {
    const delta = chunk.choices[0]?.delta?.content || "";
    fullText += delta;
    onChunk(delta); // Update UI incrementally
  }

  return fullText;
}

// React example: streaming into state
function useStreamingAI() {
  const [text, setText] = useState("");
  const [isStreaming, setIsStreaming] = useState(false);

  const generate = async (prompt: string) => {
    setIsStreaming(true);
    setText("");
    await streamResponse(prompt, (chunk) => {
      setText((prev) => prev + chunk);
    });
    setIsStreaming(false);
  };

  return { text, isStreaming, generate };
}

Prompt Versioning and Testing

Treat prompts as code. Version them. Test with regression suites.

// prompts/v3-review-analyzer.ts
export const REVIEW_ANALYZER_PROMPT = {
  version: "3.0",
  system: `You are a product review analyst. Extract sentiment, score, summary, and topics.
    Always return valid JSON. Never hallucinate topics not mentioned in the review.`,
  temperature: 0.1, // Low temp for consistent structured output
  maxTokens: 500,
};

// tests/prompts/review-analyzer.test.ts
describe("Review Analyzer Prompt v3", () => {
  const testCases = [
    {
      input: "This product is amazing! Great battery life and beautiful screen.",
      expected: { sentiment: "positive", minScore: 7 },
    },
    {
      input: "Terrible. Broke after 2 days. Worst purchase ever.",
      expected: { sentiment: "negative", maxScore: 3 },
    },
    {
      input: "It's okay. Does what it says but nothing special.",
      expected: { sentiment: "neutral", minScore: 4, maxScore: 6 },
    },
  ];

  test.each(testCases)("correctly analyzes: $input", async ({ input, expected }) => {
    const result = await analyzeReview(input);
    expect(result.sentiment).toBe(expected.sentiment);
    if (expected.minScore) expect(result.score).toBeGreaterThanOrEqual(expected.minScore);
    if (expected.maxScore) expect(result.score).toBeLessThanOrEqual(expected.maxScore);
  });
});

RAG Architecture

When to Use RAG

Approach	When	Example
Prompt only	Model already knows the answer	General knowledge questions
RAG	Answer depends on your data	"What's our refund policy?"
Fine-tuning	Model needs new behavior/style	Domain-specific tone or format
RAG + Fine-tuning	Both custom data and custom behavior	Enterprise support bot

RAG Pipeline

User Query
    ↓
Query Processing (rewrite, expand, decompose)
    ↓
Embedding (text → vector)
    ↓
Vector Search (find relevant chunks)
    ↓
Re-ranking (order by relevance)
    ↓
Context Assembly (fit within token budget)
    ↓
LLM Generation (with retrieved context)
    ↓
Citation Extraction + Validation
    ↓
Response with Sources

Implementation

import { OpenAI } from "openai";

const openai = new OpenAI();

// 1. Chunk documents at ingest time
function chunkDocument(text: string, maxChunkSize = 500, overlap = 50): string[] {
  const sentences = text.split(/(?<=[.!?])\s+/);
  const chunks: string[] = [];
  let current = "";

  for (const sentence of sentences) {
    if ((current + sentence).length > maxChunkSize && current) {
      chunks.push(current.trim());
      // Keep overlap for context continuity
      const words = current.split(" ");
      current = words.slice(-overlap).join(" ") + " " + sentence;
    } else {
      current += " " + sentence;
    }
  }
  if (current.trim()) chunks.push(current.trim());
  return chunks;
}

// 2. Embed and store
async function embedChunks(chunks: string[]) {
  const response = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: chunks,
  });
  return response.data.map((d, i) => ({
    text: chunks[i],
    embedding: d.embedding,
  }));
}

// 3. Retrieve relevant context
async function retrieve(query: string, topK = 5) {
  const queryEmbedding = await openai.embeddings.create({
    model: "text-embedding-3-small",
    input: query,
  });

  // Search your vector DB (Pinecone, Weaviate, pgvector, etc.)
  const results = await vectorDB.search({
    vector: queryEmbedding.data[0].embedding,
    topK,
  });

  return results.map((r) => r.metadata.text);
}

// 4. Generate with context
async function ragGenerate(query: string) {
  const context = await retrieve(query);

  const response = await openai.chat.completions.create({
    model: "gpt-4o-mini",
    messages: [
      {
        role: "system",
        content: `Answer based ONLY on the provided context. If the context doesn't contain the answer, say "I don't have enough information to answer that."
        
Context:
${context.map((c, i) => `[${i + 1}] ${c}`).join("\n\n")}`,
      },
      { role: "user", content: query },
    ],
  });

  return response.choices[0].message.content;
}

RAG Quality Checklist

Chunk size tuned for your content type (code: larger, FAQ: smaller)
Overlap between chunks prevents losing context at boundaries
Query rewriting improves retrieval for vague user questions
Re-ranking (Cohere, cross-encoder) boosts relevance over pure vector similarity
Token budget management — don't exceed context window
Citation tracking — map output claims back to source chunks
Evaluation set — measure retrieval recall and answer quality regularly

AI UX Patterns — Building Trust

Show Your Work

Users trust AI more when they can see why it produced an answer.

Pattern	Implementation	Trust Impact
Citations	"Based on [Source A] and [Source B]..."	High
Confidence indicators	Color-coded confidence bars	Medium
Editable output	Let users modify AI-generated content	High
Explain reasoning	Step-by-step breakdown	High
Show alternatives	"Here are 3 options..."	Medium
Undo/regenerate	One-click to try again	High

Loading States for AI

// Bad: Spinner for 10 seconds with no context
<Spinner />

// Good: Progressive disclosure
function AILoadingState({ stage }: { stage: string }) {
  return (
    <div className="ai-loading">
      <div className="pulse-dot" />
      <span className="text-sm text-muted">{stage}</span>
    </div>
  );
}

// Usage with stages
const stages = [
  "Understanding your question...",
  "Searching relevant documents...",
  "Generating response...",
];

Error States

// Always have AI-specific error handling
function AIErrorBoundary({ error, onRetry }: { error: Error; onRetry: () => void }) {
  const messages: Record<string, string> = {
    RATE_LIMITED: "Too many requests. Please wait a moment.",
    CONTEXT_TOO_LONG: "Your input is too long. Try shortening it.",
    SAFETY_FILTERED: "This request couldn't be processed. Try rephrasing.",
    API_ERROR: "AI service is temporarily unavailable.",
  };

  return (
    <div className="ai-error">
      <p>{messages[error.message] || "Something went wrong."}</p>
      <button onClick={onRetry}>Try Again</button>
    </div>
  );
}

Safety and Guardrails

Input Sanitization

function sanitizeInput(userInput: string): string {
  // 1. Length limit
  if (userInput.length > 10_000) {
    throw new Error("CONTEXT_TOO_LONG");
  }

  // 2. Strip known injection patterns
  const cleaned = userInput
    .replace(/ignore (all |previous |above )?instructions/gi, "[filtered]")
    .replace(/you are now/gi, "[filtered]")
    .replace(/system:\s/gi, "[filtered]");

  return cleaned;
}

Output Validation

async function safeGenerate(prompt: string) {
  const response = await generate(prompt);

  // 1. Check for refusal (model declined to answer)
  if (response.includes("I cannot") || response.includes("I'm not able to")) {
    return { type: "refused", content: response };
  }

  // 2. Check for hallucinated URLs/emails
  const urls = response.match(/https?:\/\/[^\s]+/g) || [];
  const validatedUrls = await Promise.all(urls.map(validateUrl));
  if (validatedUrls.some((v) => !v)) {
    return { type: "contains_hallucinated_links", content: stripInvalidUrls(response) };
  }

  // 3. Content policy check (use a classifier or moderation endpoint)
  const moderation = await openai.moderations.create({ input: response });
  if (moderation.results[0].flagged) {
    return { type: "flagged", content: "Response filtered for safety." };
  }

  return { type: "ok", content: response };
}

Guardrail Layers

Layer	What It Catches	Implementation
Input validation	Injection, abuse, length	Regex + length checks
System prompt	Role boundaries	Strong system instructions
Output validation	Hallucinated links/data, PII leakage	Post-processing checks
Moderation API	Harmful content	OpenAI moderation endpoint
Human review queue	Edge cases	Flag low-confidence outputs

Cost Optimization

Token Budget Management

function estimateTokens(text: string): number {
  // Rough estimate: 1 token ≈ 4 characters for English
  return Math.ceil(text.length / 4);
}

function fitWithinBudget(
  systemPrompt: string,
  context: string[],
  userQuery: string,
  maxTokens: number = 120_000 // model context window
): string[] {
  const reservedForOutput = 4_000;
  const systemTokens = estimateTokens(systemPrompt);
  const queryTokens = estimateTokens(userQuery);
  let budget = maxTokens - reservedForOutput - systemTokens - queryTokens;

  const fitted: string[] = [];
  for (const chunk of context) {
    const chunkTokens = estimateTokens(chunk);
    if (chunkTokens <= budget) {
      fitted.push(chunk);
      budget -= chunkTokens;
    } else {
      break;
    }
  }
  return fitted;
}

Cost Tracking Per Feature

async function trackAICost(feature: string, userId: string, apiCall: () => Promise<any>) {
  const start = Date.now();
  const result = await apiCall();
  const duration = Date.now() - start;

  await analytics.track("ai_usage", {
    feature,         // "review-analyzer", "chat", "summarizer"
    userId,
    model: result.model,
    inputTokens: result.usage.prompt_tokens,
    outputTokens: result.usage.completion_tokens,
    cost: calculateCost(result.model, result.usage),
    latency: duration,
    cached: result.usage.prompt_tokens_details?.cached_tokens > 0,
  });

  return result;
}

Cost Reduction Playbook

Strategy	Savings	Trade-off
Route simple queries to cheaper models	10–50x	Slight quality drop on edge cases
Cache identical queries (Redis/KV)	40–80%	Stale results for dynamic content
Prompt caching (Anthropic/OpenAI)	Up to 90% on input tokens	Only for repeated system prompts
Batch non-urgent requests	50% (OpenAI Batch API)	24-hour turnaround
Truncate context to relevant chunks only	Variable	Requires good retrieval
Use embeddings for classification instead of LLM	95%+	Only works for classify/match tasks

Resilience Patterns

Retry with Fallback

async function resilientGenerate(prompt: string) {
  const providers = [
    { name: "openai", fn: () => callOpenAI(prompt) },
    { name: "anthropic", fn: () => callAnthropic(prompt) },
    { name: "cached", fn: () => getCachedFallback(prompt) },
  ];

  for (const provider of providers) {
    try {
      return await withRetry(provider.fn, { maxRetries: 2, backoff: "exponential" });
    } catch (error) {
      console.error(`${provider.name} failed:`, error);
      continue;
    }
  }

  throw new Error("API_ERROR");
}

async function withRetry<T>(
  fn: () => Promise<T>,
  opts: { maxRetries: number; backoff: "exponential" }
): Promise<T> {
  let lastError: Error;
  for (let i = 0; i <= opts.maxRetries; i++) {
    try {
      return await fn();
    } catch (error) {
      lastError = error as Error;
      if (i < opts.maxRetries) {
        await sleep(Math.pow(2, i) * 1000); // 1s, 2s, 4s
      }
    }
  }
  throw lastError!;
}

Async Processing for Heavy AI Tasks

// Don't block request handlers with LLM calls

// API: enqueue the job
app.post("/api/analyze", async (req, res) => {
  const jobId = await queue.add("analyze", {
    userId: req.user.id,
    input: req.body.input,
  });
  res.json({ jobId, status: "processing" });
});

// Worker: process async
queue.process("analyze", async (job) => {
  const result = await analyzeWithAI(job.data.input);
  await db.results.create({ jobId: job.id, ...result });
  await notify(job.data.userId, { jobId: job.id, status: "complete" });
});

// Client: poll or use websocket
app.get("/api/analyze/:jobId", async (req, res) => {
  const result = await db.results.findUnique({ where: { jobId: req.params.jobId } });
  res.json(result || { status: "processing" });
});

Anti-Patterns

Demo-ware

Why it fails: Demos deceive. "Works on my laptop" with hand-picked inputs. Production reveals every edge case — hallucinations, latency spikes, cost overruns, adversarial users.

Instead: Test with adversarial inputs from day one. Build evaluation sets. Monitor output quality in production. Ship with guardrails, not just a happy path.

Context Window Stuffing

Why it fails: Expensive (paying for irrelevant tokens), slow (larger prompts = higher latency), hits context limits, dilutes relevant signal with noise.

Instead: Retrieve only the most relevant context. Use re-ranking. Calculate token budget before sending. Summarize long documents before injecting.

Unstructured Output Parsing

Why it fails: Regex on free-text breaks randomly. Format changes between calls. Injection risks when users control part of the prompt.

Instead: Use JSON mode / function calling. Validate with Zod or similar schema library. Always have a fallback for malformed output.

Sharp Edges — Quick Reference

Issue	Severity	Solution
Trusting LLM output without validation	Critical	Zod schema validation on every response
User input directly in prompts	Critical	Sanitize input, separate system/user roles
Context window stuffing	High	Token budget + retrieve only relevant chunks
Blocking UI on LLM response	High	Stream responses, show progressive loading states
No cost monitoring	High	Track per-request cost, set alerts, usage caps
App breaks when LLM API fails	High	Retry + provider fallback + cached fallback
Hallucinated facts in output	Critical	Ground with RAG, validate claims, show citations
Synchronous LLM in request handler	High	Queue heavy tasks, process async, notify on completion

Related Skills

Works well with: ai-wrapper-product, openai-api, openai-agents, api-design-principles, frontend-design

Capabilities

skillsource-rkz91skill-ai-producttopic-agent-skillstopic-agents-mdtopic-ai-agentstopic-claude-codetopic-codextopic-cursortopic-developer-toolstopic-llm-toolstopic-mcptopic-pm-toolstopic-product-managementtopic-productivity

Install

Installnpx skills add rkz91/coco

Sourcehttps://github.com/rkz91/coco/tree/main/skills/ai-product

skills.shhttps://skills.sh/rkz91/coco/ai-product

Transportskills-sh

Protocolskill

Quality

0.45/ 1.00

deterministic score 0.45 from registry signals: · indexed on github topic:agent-skills · 7 github stars · SKILL.md body (17,656 chars)

Provenance

Indexed fromgithub

Enriched2026-05-18 19:14:05Z · deterministic:skill-github:v1 · v1

First seen2026-05-18

Last seen2026-05-18

Agent access

JSONhttps://clawmart.sh/api/listings/pxzJ94