OpenRouter Launch Response Caching: Same requests with zero charges, latency reduced from seconds to milliseconds

robot
Abstract generation in progress

CoinWorld News, OpenRouter has launched a response caching feature. Developers can enable it by adding x-openrouter-cache: true to the request header. The first call will be billed normally by the provider, and subsequent identical requests will directly return cached results without incurring token costs. After a cache hit, response times range from 80 to 300 milliseconds, with an average query time of 4 milliseconds. When not cached, Gemini 2.5 Flash averages about 1.3 seconds, Kimi K2.6 about 4.6 seconds, and GPT-5.5 approximately 9.1 seconds. This feature differs from the provider’s prompt caching; response caching completely bypasses the provider and returns the full response directly from OpenRouter’s edge cache. Text, images, audio, documents, and tool calls can all be cached, covering four endpoints. Cache isolation is based on API keys, with a default TTL of 5 minutes, configurable from 1 second to 24 hours.

View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin