Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gemini context caching (openai format) support #5381

Merged
merged 5 commits into from
Aug 27, 2024

Conversation

krrishdholakia
Copy link
Contributor

@krrishdholakia krrishdholakia commented Aug 27, 2024

Title

gemini context caching (openai format)

from litellm import completion 

completion(
model="gemini/gemini-1.5-pro",
messages=[
# System Message
    {
        "role": "system",
        "content": [
            {
                "type": "text",
                "text": "Here is the full text of a complex legal agreement" * 4000,
                "cache_control": {"type": "ephemeral"}, # 👈 KEY CHANGE
            }
        ],
    },
    # marked for caching with the cache_control parameter, so that this checkpoint can read from the previous cache.
    {
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "What are the key terms and conditions in this agreement?",
                "cache_control": {"type": "ephemeral"},
            }
        ],
    }]
)

Relevant issues

Closes #4284
Closes #5213

Type

🆕 New Feature

Changes

  • adds check if messages contains {..cache_control: {"type": "ephemeral"}}
  • checks if message(s) are in context cache -> if not, it adds them to cache
  • uses cached value in subsequent requests
  • Also includes refactoring work for vertex ai / google ai studio, to make it easier to understand how transformations are applied (cc: @yujonglee)

[REQUIRED] Testing - Attach a screenshot of any new tests passing locall

If UI changes, send a screenshot/GIF of working UI fixes

  • new test added to test_amazing_vertex_completion.py

…lls to vertex ai in a normal chat completion call (anthropic caching format)

Closes #5213
Copy link

vercel bot commented Aug 27, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
litellm ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 27, 2024 5:23am

@krrishdholakia krrishdholakia merged commit 81e62ae into main Aug 27, 2024
1 of 3 checks passed
@NF-Karlo
Copy link

NF-Karlo commented Feb 20, 2025

Hi, this does not seem to work on the latest version (1.61.11). Running the exact toy example found in this PR , the following error pops up:

NotFoundError: litellm.NotFoundError: VertexAIException - { "error": { "code": 404, "message": "models/gemini-1.5-pro is not found for API version v1beta, or is not supported for createCachedContent. Call ListModels to see the list of available models and their supported methods.", "status": "NOT_FOUND" } }

Edit: https://ai.google.dev/gemini-api/docs/caching?lang=python they added the following:

Note: Context caching is only available for stable models with fixed versions (for example, gemini-1.5-pro-001). You must include the version postfix (for example, the -001 in gemini-1.5-pro-001).

The documentation in litellm should also be updated to account for this change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature]: LiteLLM SDK - Add support for Google AI Studio context caching Gemini API: Context Caching
2 participants