Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

experimental: LLMGraphTransformer add missing conditional adding restrictions to prompts for LLM that do not support function calling #22793

Merged
merged 5 commits into from
Jul 1, 2024

Conversation

jordyantunes
Copy link
Contributor

@jordyantunes jordyantunes commented Jun 12, 2024

  • Description: Modified the prompt created by the function create_unstructured_prompt (which is called for LLMs that do not support function calling) by adding conditional checks that verify if restrictions on entity types and rel_types should be added to the prompt. If the user provides a sufficiently large text, the current prompt may fail to produce results in some LLMs. I have first seen this issue when I implemented a custom LLM class that did not support Function Calling and used Gemini 1.5 Pro, but I was able to replicate this issue using OpenAI models.

By loading a sufficiently large text

from langchain_community.llms import Ollama
from langchain_openai import ChatOpenAI, OpenAI
from langchain_core.prompts import PromptTemplate
import re
from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_core.documents import Document

with open("texto-longo.txt", "r") as file:
    full_text = file.read()
    partial_text = full_text[:4000]

documents = [Document(page_content=partial_text)] # cropped to fit GPT 3.5 context window

And using the chat class (that has function calling)

chat_openai = ChatOpenAI(model="gpt-3.5-turbo", model_kwargs={"seed": 42})
chat_gpt35_transformer = LLMGraphTransformer(llm=chat_openai)
graph_from_chat_gpt35 = chat_gpt35_transformer.convert_to_graph_documents(documents)

It works:

>>> print(graph_from_chat_gpt35[0].nodes)
[Node(id="Jesu, Joy of Man's Desiring", type='Music'), Node(id='Godel', type='Person'), Node(id='Johann Sebastian Bach', type='Person'), Node(id='clever way of encoding the complicated expressions as numbers', type='Concept')]

But if you try to use the non-chat LLM class (that does not support function calling)

openai = OpenAI(
    model="gpt-3.5-turbo-instruct",
    max_tokens=1000,
)
gpt35_transformer = LLMGraphTransformer(llm=openai)
graph_from_gpt35 = gpt35_transformer.convert_to_graph_documents(documents)

It uses the prompt that has issues and sometimes does not produce any result

>>> print(graph_from_gpt35[0].nodes)
[]

After implementing the changes, I was able to use both classes more consistently:

>>> chat_gpt35_transformer = LLMGraphTransformer(llm=chat_openai)
>>> graph_from_chat_gpt35 = chat_gpt35_transformer.convert_to_graph_documents(documents)
>>> print(graph_from_chat_gpt35[0].nodes)
[Node(id="Jesu, Joy Of Man'S Desiring", type='Music'), Node(id='Johann Sebastian Bach', type='Person'), Node(id='Godel', type='Person')]
>>> gpt35_transformer = LLMGraphTransformer(llm=openai)
>>> graph_from_gpt35 = gpt35_transformer.convert_to_graph_documents(documents)
>>> print(graph_from_gpt35[0].nodes)
[Node(id='I', type='Pronoun'), Node(id="JESU, JOY OF MAN'S DESIRING", type='Song'), Node(id='larger memory', type='Memory'), Node(id='this nice tree structure', type='Structure'), Node(id='how you can do it all with the numbers', type='Process'), Node(id='JOHANN SEBASTIAN BACH', type='Composer'), Node(id='type of structure', type='Characteristic'), Node(id='that', type='Pronoun'), Node(id='we', type='Pronoun'), Node(id='worry', type='Verb')]

The results are a little inconsistent because the GPT 3.5 model may produce incomplete json due to the token limit, but that could be solved (or mitigated) by checking for a complete json when parsing it.

Copy link

vercel bot commented Jun 12, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Jul 1, 2024 0:29am

@jordyantunes jordyantunes marked this pull request as ready for review June 12, 2024 02:06
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. 🔌: openai Primarily related to OpenAI integrations 🤖:improvement Medium size change to existing code to handle new use-cases labels Jun 12, 2024
@baskaryan
Copy link
Collaborator

cc @tomasonjo

@tomasonjo
Copy link
Contributor

I don't really understand what was changed. New prompt seems identical to the previous one?

@jordyantunes
Copy link
Contributor Author

What changed is that I made the following sentences only included if the variables node_labels and rel_types aren't None.

# ENTITY TYPES:
Use the following relation types, don't use other relation that is not defined below:
{node_labels}

 # RELATION TYPES:
Below are a number of examples of text and their extracted entities and relationships:
{rel_types}

The current version includes these restrictions even if the user does not provide node_labels or rel_types, resulting in a prompt that specifically says not to include any relations that are not provided, but not providing any examples. This causes some LLMs to return nothing.

@tomasonjo
Copy link
Contributor

Ok, looks great. Please fix the linting errors and we can merge it in.

@tomasonjo
Copy link
Contributor

@jordyantunes ping

@jordyantunes
Copy link
Contributor Author

I'm sorry for the delay. I'll fix the linting errors today.

@tomasonjo
Copy link
Contributor

Thanks! Ping @ccurme

@ccurme ccurme enabled auto-merge (squash) July 1, 2024 17:23
@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Jul 1, 2024
@ccurme ccurme merged commit a50eabb into langchain-ai:master Jul 1, 2024
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:improvement Medium size change to existing code to handle new use-cases lgtm PR looks good. Use to confirm that a PR is ready for merging. 🔌: openai Primarily related to OpenAI integrations size:M This PR changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants