LLM Prompt缓存

低风险

作者 @sickn33已验证来源

4.3387 次安装v1.0.0更新于 2026年5月25日

使用方式

在 Claude Code 中运行以下命令

第一步：添加 Marketplace

/plugin marketplace add sickn33/antigravity-awesome-skills

第二步：安装插件

/plugin install prompt-caching@antigravity-awesome-skills

关于

LLM 提示词缓存策略，包括 Anthropic 提示缓存和其他优化方法。

name: prompt-caching description: LLM 提示词缓存策略，包括 Anthropic 提示词缓存、响应缓存和 CAG（缓存增强生成） risk: none source: vibeship-spawner-skills (Apache 2.0) date_added: 2026-02-27

提示词缓存

LLM 提示词缓存策略，包括 Anthropic 提示词缓存、响应缓存和 CAG（缓存增强生成）

能力

prompt-cache
response-cache
kv-cache
cag-patterns
cache-invalidation

前置条件

知识：缓存基础、LLM API 使用、哈希函数
推荐技能：context-window-management

范围

不涵盖：CDN 缓存、数据库查询缓存、静态资源缓存
边界：专注于 LLM 特定缓存，涵盖提示词和响应缓存

生态系统

主要工具

Anthropic Prompt Caching - Claude API 中的原生提示词缓存
Redis - 用于响应的内存缓存
OpenAI Caching - OpenAI API 中的自动缓存

模式

Anthropic 提示词缓存

对重复前缀使用 Claude 的原生提示词缓存

何时使用：使用 Claude API 且有稳定的系统提示词或上下文时

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

// Cache the stable parts of your prompt
async function queryWithCaching(userQuery: string) {
    const response = await client.messages.create({
        model: "claude-sonnet-4-20250514",
        max_tokens: 1024,
        system: [
            {
                type: "text",
                text: LONG_SYSTEM_PROMPT,  // Your detailed instructions
                cache_control: { type: "ephemeral" }  // Cache this!
            },
            {
                type: "text",
                text: KNOWLEDGE_BASE,  // Large static context
                cache_control: { type: "ephemeral" }
            }
        ],
        messages: [
            { role: "user", content: userQuery }  // Dynamic part
        ]
    });

    // Check cache usage
    console.log(`Cache read: ${response.usage.cache_read_input_tokens}`);
    console.log(`Cache write: ${response.usage.cache_creation_input_tokens}`);

    return response;
}

// Cost savings: 90% reduction on cached tokens
// Latency savings: Up to 2x faster

响应缓存

对相同或相似查询缓存完整的 LLM 响应

何时使用：相同查询被重复提问时

import { createHash } from 'crypto';
import Redis from 'ioredis';

const redis = new Redis(process.env.REDIS_URL);

class ResponseCache {
    private ttl = 3600;  // 1 hour default

    // Exact match caching
    async getCached(prompt: string): Promise<string | null> {
        const key = this.hashPrompt(prompt);
        return await redis.get(`response:${key}`);
    }

    async setCached(prompt: string, response: string): Promise<void> {
        const key = this.hashPrompt(prompt);
        await redis.set(`response:${key}`, response, 'EX', this.ttl);
    }

    private hashPrompt(prompt: string): string {
        return createHash('sha256').update(prompt).digest('hex');
    }

    // Semantic similarity caching
    async getSemanticallySimilar(
        prompt: string,
        threshold: number = 0.95
    ): Promise<string | null> {
        const embedding = await embed(prompt);
        const similar = await this.vectorCache.search(embedding, 1);

        if (similar.length && similar[0].similarity > threshold) {
            return await redis.get(`response:${similar[0].id}`);
        }
        return null;
    }

    // Temperature-aware caching
    async getCachedWithParams(
        prompt: string,
        params: { temperature: number; model: string }
    ): Promise<string | null> {
        // Only cache low-temperature responses
        if (params.temperature > 0.5) return null;

        const key = this.hashPrompt(
            `${prompt}|${params.model}|${params.temperature}`
        );
        return await redis.get(`response:${key}`);
    }
}

缓存增强生成（CAG）

将文档预缓存在提示词中，而非使用 RAG 检索

何时使用：文档语料库稳定且适合上下文窗口时

// CAG: Pre-compute document context, cache in prompt
// Better than RAG when:
// - Documents are stable
// - Total fits in context window
// - Latency is critical

class CAGSystem {
    private cachedContext: string | null = null;
    private lastUpdate: number = 0;

    async buildCachedContext(documents: Document[]): Promise<void> {
        // Pre-process and format documents
        const formatted = documents.map(d =>
            `## ${d.title}\n${d.content}`
        ).join('\n\n');

        // Store with timestamp
        this.cachedContext = formatted;
        this.lastUpdate = Date.now();
    }

    async query(userQuery: string): Promise<string> {
        // Use cached context directly in prompt
        const response = await client.messages.create({
            model: "claude-sonnet-4-20250514",
            max_tokens: 1024,
            system: [
                {
                    type: "text",
                    text: this.cachedContext,
                    cache_control: { type: "ephemeral" }
                }
            ],
            messages: [
                { role: "user", content: userQuery }
            ]
        });
        return response.content[0].text;
    }
}

兼容工具

Claude CodeCursor

LLM Prompt缓存

关于

name: prompt-caching description: LLM 提示词缓存策略，包括 Anthropic 提示词缓存、响应缓存和 CAG（缓存增强生成） risk: none source: vibeship-spawner-skills (Apache 2.0) date_added: 2026-02-27

提示词缓存

能力

前置条件

范围

生态系统

主要工具

模式

Anthropic 提示词缓存

响应缓存

缓存增强生成（CAG）

兼容工具

标签

相关推荐

RAG系统工程师

批量重构编排

Docx 文档处理

Azure AI Agents Java SDK

Azure Search 文档搜索

Azure AI Agent框架