
关于
LLM 提示词缓存策略,包括 Anthropic 提示缓存和其他优化方法。
name: prompt-caching description: LLM 提示词缓存策略,包括 Anthropic 提示词缓存、响应缓存和 CAG(缓存增强生成) risk: none source: vibeship-spawner-skills (Apache 2.0) date_added: 2026-02-27
提示词缓存
LLM 提示词缓存策略,包括 Anthropic 提示词缓存、响应缓存和 CAG(缓存增强生成)
能力
- prompt-cache
- response-cache
- kv-cache
- cag-patterns
- cache-invalidation
前置条件
- 知识:缓存基础、LLM API 使用、哈希函数
- 推荐技能:context-window-management
范围
- 不涵盖:CDN 缓存、数据库查询缓存、静态资源缓存
- 边界:专注于 LLM 特定缓存,涵盖提示词和响应缓存
生态系统
主要工具
- Anthropic Prompt Caching - Claude API 中的原生提示词缓存
- Redis - 用于响应的内存缓存
- OpenAI Caching - OpenAI API 中的自动缓存
模式
Anthropic 提示词缓存
对重复前缀使用 Claude 的原生提示词缓存
何时使用:使用 Claude API 且有稳定的系统提示词或上下文时
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Cache the stable parts of your prompt
async function queryWithCaching(userQuery: string) {
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
system: [
{
type: "text",
text: LONG_SYSTEM_PROMPT, // Your detailed instructions
cache_control: { type: "ephemeral" } // Cache this!
},
{
type: "text",
text: KNOWLEDGE_BASE, // Large static context
cache_control: { type: "ephemeral" }
}
],
messages: [
{ role: "user", content: userQuery } // Dynamic part
]
});
// Check cache usage
console.log(`Cache read: ${response.usage.cache_read_input_tokens}`);
console.log(`Cache write: ${response.usage.cache_creation_input_tokens}`);
return response;
}
// Cost savings: 90% reduction on cached tokens
// Latency savings: Up to 2x faster
响应缓存
对相同或相似查询缓存完整的 LLM 响应
何时使用:相同查询被重复提问时
import { createHash } from 'crypto';
import Redis from 'ioredis';
const redis = new Redis(process.env.REDIS_URL);
class ResponseCache {
private ttl = 3600; // 1 hour default
// Exact match caching
async getCached(prompt: string): Promise<string | null> {
const key = this.hashPrompt(prompt);
return await redis.get(`response:${key}`);
}
async setCached(prompt: string, response: string): Promise<void> {
const key = this.hashPrompt(prompt);
await redis.set(`response:${key}`, response, 'EX', this.ttl);
}
private hashPrompt(prompt: string): string {
return createHash('sha256').update(prompt).digest('hex');
}
// Semantic similarity caching
async getSemanticallySimilar(
prompt: string,
threshold: number = 0.95
): Promise<string | null> {
const embedding = await embed(prompt);
const similar = await this.vectorCache.search(embedding, 1);
if (similar.length && similar[0].similarity > threshold) {
return await redis.get(`response:${similar[0].id}`);
}
return null;
}
// Temperature-aware caching
async getCachedWithParams(
prompt: string,
params: { temperature: number; model: string }
): Promise<string | null> {
// Only cache low-temperature responses
if (params.temperature > 0.5) return null;
const key = this.hashPrompt(
`${prompt}|${params.model}|${params.temperature}`
);
return await redis.get(`response:${key}`);
}
}
缓存增强生成(CAG)
将文档预缓存在提示词中,而非使用 RAG 检索
何时使用:文档语料库稳定且适合上下文窗口时
// CAG: Pre-compute document context, cache in prompt
// Better than RAG when:
// - Documents are stable
// - Total fits in context window
// - Latency is critical
class CAGSystem {
private cachedContext: string | null = null;
private lastUpdate: number = 0;
async buildCachedContext(documents: Document[]): Promise<void> {
// Pre-process and format documents
const formatted = documents.map(d =>
`## ${d.title}\n${d.content}`
).join('\n\n');
// Store with timestamp
this.cachedContext = formatted;
this.lastUpdate = Date.now();
}
async query(userQuery: string): Promise<string> {
// Use cached context directly in prompt
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 1024,
system: [
{
type: "text",
text: this.cachedContext,
cache_control: { type: "ephemeral" }
}
],
messages: [
{ role: "user", content: userQuery }
]
});
return response.content[0].text;
}
}
兼容工具
Claude CodeCursor
标签
AI与机器学习