-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: children nodes not carrying metadata from source nodes #15254
Conversation
@@ -89,7 +90,13 @@ def _postprocess_parsed_nodes( | |||
|
|||
# update metadata | |||
if self.include_metadata: | |||
node.metadata.update(parent_doc.metadata) | |||
# Update parent_doc.metadata with node.metadata, giving preference to node's values | |||
node.metadata = {**parent_doc.metadata, **node.metadata} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we resolve conflicts if there are multiple child nodes all updating the same parent? It seems like it would be good to put it in a separate key and merge when you refer to the nodes. That way you get more control over whether you access only the parent attributes or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I think my comment applies to the chunk below. The comment in the code is incorrect- this line is merging parent metadata into the current node's metadata.
when running 2 node parsers the metadata is being lost
reproducible example: