Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: modify document ID during sync #170

Open
go-sean-go opened this issue Jun 4, 2023 · 9 comments
Open

Feature request: modify document ID during sync #170

go-sean-go opened this issue Jun 4, 2023 · 9 comments

Comments

@go-sean-go
Copy link

(see Question at the end)

I have sync'd an index between Firestore and Algolia. In my case it's a subcollection of feed items, such that when User1 creates a post (let's call it Post1), if User2 and User3 follow User1, they both get a copy of Post1 in a subcollection.

Here is the data structure, then, after creation/replication:

// posts/{postId}
// users/{userId}/feed/{postId}

posts/Post1 // authorId === User1
users/User1/feed/Post1 // User1's copy of their own post
users/User2/feed/Post1 // User2's copy of the post
users/User3/feed/Post1 // User3's copy of the post

You may already see the problem here: I have 3 documents with the same document ID (Post1).

In Firestore land, this is totally fine; typical use cases generally query subcollections with this sort of syntax: query.collection(users/${userId}/feed).get() - in other words, you've already automatically filtered the returned feed items to prevent duplicate Ids.

In Algolia, it appears that with this configuration, the documents are simply re-written many times, and there is only one copy of Post1 in the end.


Question: perhaps this can be solved with a Transform Function Name function? If so, how would I change the document Id...? I can experiment with this next week, but the documentation here is very light, so I don't want to sink too much time into it if the maintainers can simply speak to it here.

@maiconkf
Copy link

maiconkf commented Jun 6, 2023

Maybe it is the same problem, right? #171
In the v1.1.1 it was working properly

@go-sean-go
Copy link
Author

go-sean-go commented Jun 14, 2023

Could be the same root cause/fix, but the goal is different: this issue/request deals with overlapping/colliding document IDs, and #171 is dealing with overlapping/colliding collection names.

So I would say it is a different thing; this is also not a bug, but a feature request.

@andrewkimjoseph
Copy link

This feature is quite important, no feedback yet?

@smomin
Copy link
Collaborator

smomin commented Feb 13, 2024

hello @go-sean-go I have created a RC of the extension that allows you to change the Object in the configuration. Please try it out and let me know if it solves your problem. https://console.firebase.google.com/project/_/extensions/install?ref=algolia/firestore-algolia-search@1.2.1-rc.0

@go-sean-go
Copy link
Author

@smomin Can you confirm how to use the field?

Are the valid values either (a) (path) or (b) a document field/property name (e.g. authorId or something)?

In the case of option (b), must the value of the property name remain stable? Or, if it changes over time (some time after create), would the object ID of the document on the Algolia side change as well?

@smomin
Copy link
Collaborator

smomin commented Feb 15, 2024

@go-sean-go

The valid values are below:

(path) use the document path
authorId it will get the document attribute

@go-sean-go
Copy link
Author

thanks @smomin - regarding my other question using e.g. the authorId:

must the value of the property name remain stable? Or, if it changes over time (some time after create), would the object ID of the document on the Algolia side change as well?

Basically, if the authorId value changes, what happens? Does it change the object ID at Algolia? Or does it only consider the value on initial sync and leave it alone afterward? Or, would it be naive of the Algolia state and simply create a new document? (Would that leave the old one orphaned?)

I'm trying to consider the practical use case for the property value option.


Separately, a question on the (path) option: what are the limits here on the Algolia side? I ask because Firestore's limits on a path are rather extreme (per their docs): paths may have up to 100 segments, each with IDs up to 1,500 bytes, and document names can be up to 6kb, etc. Meaning these paths could be thousands of characters. I imagine Algolia will choke on that? But maybe not - maybe a 1,000 character Algolia ID is fine...? But probably not preferred.


Considering these scenarios, if I might suggest a simpler feature to solve the original issue + avoid these scenarios (which would likely be unintended from the user side): perhaps we should simply offer a simple checkbox option that hashes the full path (VERY LONG STRING) to a standard UUID-length string. This would provide uniqueness, idempotency, and I believe handle even extreme edges of the Firestore quotas - if I'm reading it right & thinking about it right.

Anyway, just my two cents. Let me know about the above.

@smomin
Copy link
Collaborator

smomin commented Mar 4, 2024

hey @go-sean-go sorry for missing this but are you concerns still valid? Let me know you feedback on the RC release.

@go-sean-go
Copy link
Author

I haven't re-tested since my original comment - so I'm not sure. If no changes have been made to the feature/code, then yeah, my questions would still be outstanding.

Overall, per my comments above, I don't think the current solution is very durable or clear. The hash mechanism I suggested above would be something to explore (not my area of expertise), I believe - as long as it has a sufficiently large capacity.

To repeat my concerns:

  • I don't actually understand the inner-workings of the property-name approach, but I have some inferred questions above.
  • I believe the (path) approach is flawed because Firestore has very extreme limits on # of subcollections and so on (meaning: someone using Firestore in a valid/supported way will probably exceed Algolia's object ID limits, I'm guessing?).
  • I think the right solution to this problem is probably to offer some idempotent mechanism to generate a new, unique document ID - or do nothing bespoke on this extension, but allow the user modify it by whatever arbitrary code (cloud function?) they like (my understanding is that the existing transform functions don't let you modify the obj ID, but that could be incorrect?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants