-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error trying to run fullIndex on > 250 documents #196
Comments
I think this may also be something to do with incorrect logic around collectionGroup queries too firestore-algolia-search/functions/src/index.ts Lines 180 to 188 in a62ccda
I feel like this logic is inverse? My collection path is something like "companies/{id}/users" and it thinks this is a collection group, when in fact this is a collection path. |
@fringley That's a good observation. I can understand what the logic was, but it actually creates a bigger problem. When running a full index of existing docs, it needs to happen in batches. In a scenario where the docs you want to index are in a sub-collection of a document (in your case you are indexing a 'users' collection that exists under multiple company docs) it makes sense to query a collectionGroup of 'users' and then each batch starts where the last finished. The big problem here is your database might have other collections called 'users' elsewhere. So a full index would process a lot of documents you never intended to have indexed. I'm not sure of the best solution here. You could continue to use a collectionGroup query, but check that the doc path matches the config.collectionPath convention. However, this could cause a lot of unnecessary reads if you have many collections with the same name in other parts of your db. Any other solution I can think of would get pretty messy if people are using deeply nested sub-collections. |
@StevenH86 I always test the latest extension on my test database which contains over 18K records and I have not had any issues. @fringley At this time, the collection path pattern is set up to treat your path as a CollectionGroup. Can you provide details how this logic is presenting an issue? |
Hi @smomin and @StevenH86 thank you for your comments - I now understand why internally this uses a collectionGroup query as that is the only way that firebase can trigger on a document update where there are nested collections. I actually spotted this issue a while ago and is the reason I have moved the collections that I am indexing into a unique named sub-collection to avoid the issue of accidentally indexing other sub-collections that don't match the same path components. I might suggest a slight update to the documentation to make it clear that there is no value in entering a wildcarded sub-collection path as this gives the wrong impression that the indexing will happen ONLY on these sub-collections, whereas it will actually use a collectionGroup using the last part of the path... which as mentioned, can have unintended side effects. My apologies for bringing up this query on an issue related to the initial indexing... an issue that I am also having after the first 250 records. My solution was to rollback the extension to a previous (1.1.5) version to allow the initial indexing to work. |
@smomin would I be wrong in assuming your collection with 18k records is a root level collection? A root level collection uses the basic collection query, whereas a nested collection will use a collectionGroup query. When you query collection groups, the FieldPath.documentId() is the full path to the document, not just the docId. An easy way you can see this for yourself is by going to the firebase console > firestore > query build and changing the query scope from collection to collection group. The first column in the query builder is the document Id. You will see collection is just the docId, but collection group is the full path to the document. Cheers |
@smomin any update on my above post? Cheers |
Hey @smomin I'm having the same issue where the full index operation only loads 250 docs. I see the following logs in my the respective cloud function logs:
This is a pretty fatal issue, so I'd appreciate help here. I ran this today so I'm on the latest version of the extension I assume |
hey @davidoort this looks like an issues with provisioning of the Algolia Firestore extension which is managed by the Firebase framework. Let me ping the Firebase team if they have a response to it. Let me see what they say. |
hello @davidoort @StevenH86 If you are getting the error that @davidoort reported, the Firebase team has said that user installing the extension does not have the proper IAM message. This is reported in the error, which is
The full index operation uses a queue that needs this permission to create a task queue. |
@smomin that's quite strange cause I'm the owner of the GCP/Firebase project. Just to make sure, in IAM I should be looking at the email of the account from which the extension is installed? |
@davidoort - Can you please create a separate thread for your issue. It's not the same issue that I started this thread about. @smomin - The original thread is about a different issue. Can you please look at my previous comment to you (quoted below). Thanks again.
|
@StevenH86 I was reading through this issue but I am bit lost. I guess, are you having problems indexing more than 250 documents in a subcollection? |
The issue is when you are indexing a subcollection. Eg. When you're indexing a collection group (which is what happens if you are referencing a subcollection) it will use the collectionGroup query. Your code is querying the collectionGroup (on line 192) by FieldPath.documentId() after the first 250 records. In a collection group the FieldPath.documentId() value is the FULL PATH to the document, not just the ID of the document. Therefore, you need to set docId: newCursor.ref.path (on line 229) if it's a collectionGroup query. If you change line 229 to:
It will solve the issue. |
hey @ljrodriguez1 It looks like an API access issue. Have you provided enough permissions for the API key? Have you tried the Admin key to see if that work? Let me know. |
hey @StevenH86 here is the RC with your recommend change. Please try it out. |
@smomin - Also getting the error when wanting to run a full index to a collection with over 250 documents: The principal (user or service account) lacks IAM permission "cloudtasks.tasks.create" for the resource I'm using version 1.2.2 of the extension... |
@nickycdk you will need to open a ticket with Firebase team. The IAM permission is beyond what I can control in the extension. I will check with the Firebase team again but I don't think I have option to assign IAM permission during installation. |
I'm also hitting a 250 record max. I tried uninstalling and reinstalling the extension. Using version 1.2.2. In my logs I see lots of errors like:
I think this started happening after I configured the extension to use Path is a collection group: |
Can you change back to using the document id to see if it resolves the issue? |
I used the default document ID prior to yesterday, and it did work (meaning no errors in my logs, and the index was fully populated with ~1200 records). The problem is my IDs across all sub-collections aren't unique, so I needed to use |
Would it make sense to try the code changes suggested in the first or 2nd comments on this issue? #196 (comment) I'd be happy to test out an RC. |
Hey @smomin, this seems to work! Logs from the extension finished with:
|
The value returned by documentId() is ALWAYS the fully qualified path - mostly because that is actually how Firestore keeps and indexes it. (I have done extensive research on this; Google acknowledges it, and has no intention of changing it. stackoverflow: https://stackoverflow.com/a/58104104/2434784 ) As such, the "Id" only needs to be unique in sub-collections; the rest of the path with keep it unique. But DON'T make the mistake of using only the final leaf (Id) of the documentId. I do make a record in each document of the simple Id for convenience, but don't count on it for global uniqueness. It's worth noting that the Id returned by .doc() is a full UUID, and is generally fully unique. I personally put absolutely no other formatting requirements on my Id's, for that reason. That's what other fields are for, |
Yeah, this isn't at all true - I use this pattern all the time. Firestore triggers use a parameterized approach: This properly triggers on any CollectionBottom document from any leaf of the CollectionTop, {CollectionMiddleId}, CollectionBottom tree. I have an entire application built around this. This is NOT the same as a CollectionGroup - triggers can have parameterized elements/wildcard, but NOT a full Group functionality - so they won't match the collection name from arbitrary levels in the database the way a CollectionGroup can. Your trigger must always point to a document, even if you're using a wildcard. For example, users/{userId}/{messageCollectionId} is not valid because {messageCollectionId} is a collection. However, users/{userId}/{messageCollectionId}/{messageId} is valid because {messageId} will always point to a document. You CAN even make all the entries parameters: On the other hand, the bulk copy is complicated - Collections and CollectionGroups do not allow for wildcards, and Collection Groups only match the "last collection segment", with no "/" - it's kind of a typical Firestore limitation. |
Before I start the release process for this change, are there any other feedback on the requested change? |
Yeah, actually, there is an issue: You use config.collectionPath in two different contexts:
The ISSUE is that the TRIGGER must point to a DOCUMENT or DOCUMENTS, and the parameter for collection & collectionGroup must point to a COLLECTION. Luckily, there is a trivial fix: revise the trigger to:
the added string creates a parameterized wildcard for the documents - you can EITHER use context.params to retrieve it's actual value, or ignore it and get it from the document itself. Everything else should work the same. You are correct that .collection and .collectionGroup do not allow wildcards/params. But Firestore triggers absolutely do. Tracy Hall |
Hi,
There is a mistake in the latest version that causes the fullIndexOperation to break when it attempts to process the second batch of records after the initial 250 records.
This is due to the following line:
let queryCursor = query.where(FieldPath.documentId(), '==', docId);
FieldPath.documentId() requires the full path to the document.
Changing the line above to the following will resolve the issue:
const queryCursor = query.where(FieldPath.documentId(), '==', config.collectionPath + '/' + docId);
Cheers
EDIT:
I actually realised, that updating line 229 from:
docId: newCursor.id,
To:
docId: newCursor.ref.path,
Will solve the issue. As stated above FieldPath.documentId() is the full path to the document.
The text was updated successfully, but these errors were encountered: