Error trying to run fullIndex on > 250 documents #196

StevenH86 · 2024-02-01T03:02:54Z

Hi,

There is a mistake in the latest version that causes the fullIndexOperation to break when it attempts to process the second batch of records after the initial 250 records.

This is due to the following line:
let queryCursor = query.where(FieldPath.documentId(), '==', docId);

FieldPath.documentId() requires the full path to the document.

Changing the line above to the following will resolve the issue:
~~const queryCursor = query.where(FieldPath.documentId(), '==', config.collectionPath + '/' + docId);~~

Cheers

EDIT:

I actually realised, that updating line 229 from:
docId: newCursor.id,

To:

docId: newCursor.ref.path,

Will solve the issue. As stated above FieldPath.documentId() is the full path to the document.

The text was updated successfully, but these errors were encountered:

fringley · 2024-02-03T20:40:41Z

I think this may also be something to do with incorrect logic around collectionGroup queries too

firestore-algolia-search/functions/src/index.ts

Lines 180 to 188 in a62ccda

    
           logs.info('Is Collection Group?', config.collectionPath.indexOf('/') !== -1); 
        
           if (config.collectionPath.indexOf('/') === -1) { 
        
             query = firestoreDB 
        
               .collection(config.collectionPath); 
        
           } else { 
        
             query = firestoreDB 
        
               .collectionGroup(config.collectionPath.split('/').pop()); 
        
           } 
        
           query = query.limit(DOCS_PER_INDEXING);

I feel like this logic is inverse? My collection path is something like "companies/{id}/users" and it thinks this is a collection group, when in fact this is a collection path.

StevenH86 · 2024-02-04T12:15:46Z

@fringley That's a good observation. I can understand what the logic was, but it actually creates a bigger problem.

When running a full index of existing docs, it needs to happen in batches. In a scenario where the docs you want to index are in a sub-collection of a document (in your case you are indexing a 'users' collection that exists under multiple company docs) it makes sense to query a collectionGroup of 'users' and then each batch starts where the last finished.

The big problem here is your database might have other collections called 'users' elsewhere. So a full index would process a lot of documents you never intended to have indexed.

I'm not sure of the best solution here. You could continue to use a collectionGroup query, but check that the doc path matches the config.collectionPath convention. However, this could cause a lot of unnecessary reads if you have many collections with the same name in other parts of your db. Any other solution I can think of would get pretty messy if people are using deeply nested sub-collections.

smomin · 2024-02-09T03:56:26Z

@StevenH86 I always test the latest extension on my test database which contains over 18K records and I have not had any issues.

@fringley At this time, the collection path pattern is set up to treat your path as a CollectionGroup. Can you provide details how this logic is presenting an issue?

fringley · 2024-02-13T10:37:35Z

Hi @smomin and @StevenH86 thank you for your comments - I now understand why internally this uses a collectionGroup query as that is the only way that firebase can trigger on a document update where there are nested collections. I actually spotted this issue a while ago and is the reason I have moved the collections that I am indexing into a unique named sub-collection to avoid the issue of accidentally indexing other sub-collections that don't match the same path components.

I might suggest a slight update to the documentation to make it clear that there is no value in entering a wildcarded sub-collection path as this gives the wrong impression that the indexing will happen ONLY on these sub-collections, whereas it will actually use a collectionGroup using the last part of the path... which as mentioned, can have unintended side effects.

My apologies for bringing up this query on an issue related to the initial indexing... an issue that I am also having after the first 250 records. My solution was to rollback the extension to a previous (1.1.5) version to allow the initial indexing to work.

StevenH86 · 2024-02-17T09:58:08Z

@smomin would I be wrong in assuming your collection with 18k records is a root level collection?

A root level collection uses the basic collection query, whereas a nested collection will use a collectionGroup query. When you query collection groups, the FieldPath.documentId() is the full path to the document, not just the docId.

An easy way you can see this for yourself is by going to the firebase console > firestore > query build and changing the query scope from collection to collection group. The first column in the query builder is the document Id. You will see collection is just the docId, but collection group is the full path to the document.

Cheers

StevenH86 · 2024-03-02T04:57:26Z

@smomin any update on my above post? Cheers

davidoort · 2024-03-02T15:48:11Z

Hey @smomin I'm having the same issue where the full index operation only loads 250 docs.

I see the following logs in my the respective cloud function logs:

ext-firestore-algolia-search-gift-queries-executeFullIndexOperatvjlew8sinxy Unhandled error FirebaseFunctionsError: The principal (user or service account) lacks IAM permission "cloudtasks.tasks.create" for the resource "projects/giftit-207d2/locations/us-central1/queues/ext-firestore-algolia-search-gift-queries-executeFullIndexOperation" (or the resource may not exist). at FunctionsApiClient.toFirebaseError (/workspace/node_modules/firebase-admin/lib/functions/functions-api-client-internal.js:305:16) at FunctionsApiClient.enqueue (/workspace/node_modules/firebase-admin/lib/functions/functions-api-client-internal.js:146:32) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async /workspace/lib/index.js:191:9 at async /workspace/node_modules/firebase-functions/lib/common/providers/tasks.js:74:17 { errorInfo: { code: 'functions/permission-denied', message: 'The principal (user or service account) lacks IAM permission "cloudtasks.tasks.create" for the resource "projects/giftit-207d2/locations/us-central1/queues/ext-firestore-algolia-search-gift-queries-executeFullIndexOperation" (or the resource may not exist).' }, codePrefix: 'functions' }

This is a pretty fatal issue, so I'd appreciate help here. I ran this today so I'm on the latest version of the extension I assume

smomin · 2024-03-04T18:40:03Z

hey @davidoort this looks like an issues with provisioning of the Algolia Firestore extension which is managed by the Firebase framework. Let me ping the Firebase team if they have a response to it. Let me see what they say.

smomin · 2024-03-06T16:54:49Z

hello @davidoort @StevenH86

If you are getting the error that @davidoort reported, the Firebase team has said that user installing the extension does not have the proper IAM message. This is reported in the error, which is

The principal (user or service account) lacks IAM permission "cloudtasks.tasks.create" for the resource "projects/giftit-207d2/locations/us-central1/queues/ext-firestore-algolia-search-gift-queries-executeFullIndexOperation"

The full index operation uses a queue that needs this permission to create a task queue.

davidoort · 2024-03-07T12:55:09Z

@smomin that's quite strange cause I'm the owner of the GCP/Firebase project. Just to make sure, in IAM I should be looking at the email of the account from which the extension is installed?

StevenH86 · 2024-03-08T00:41:53Z

@davidoort - Can you please create a separate thread for your issue. It's not the same issue that I started this thread about.

@smomin - The original thread is about a different issue. Can you please look at my previous comment to you (quoted below). Thanks again.

@smomin would I be wrong in assuming your collection with 18k records is a root level collection?

A root level collection uses the basic collection query, whereas a nested collection will use a collectionGroup query. When you query collection groups, the FieldPath.documentId() is the full path to the document, not just the docId.

An easy way you can see this for yourself is by going to the firebase console > firestore > query build and changing the query scope from collection to collection group. The first column in the query builder is the document Id. You will see collection is just the docId, but collection group is the full path to the document.

smomin · 2024-03-11T21:17:35Z

@StevenH86 I was reading through this issue but I am bit lost. I guess, are you having problems indexing more than 250 documents in a subcollection?

ljrodriguez1 · 2024-03-14T19:46:07Z

Hi I am too having this problem. i am indexing a collection name destinations, and this is the error that the fullIndexOperation give, it is not clear in wich may be the underline problem here.

It used to work fine before.

StevenH86 · 2024-03-15T11:48:02Z

@smomin

The issue is when you are indexing a subcollection. Eg. users/{uid}/user-events/{eventId}

When you're indexing a collection group (which is what happens if you are referencing a subcollection) it will use the collectionGroup query. Your code is querying the collectionGroup (on line 192) by FieldPath.documentId() after the first 250 records. In a collection group the FieldPath.documentId() value is the FULL PATH to the document, not just the ID of the document. Therefore, you need to set docId: newCursor.ref.path (on line 229) if it's a collectionGroup query.

If you change line 229 to:

docId: config.collectionPath.indexOf('/') !== -1 ? newCursor.ref.path : newCursor.id

It will solve the issue.

smomin · 2024-03-18T21:47:58Z

hey @ljrodriguez1

It looks like an API access issue. Have you provided enough permissions for the API key? Have you tried the Admin key to see if that work? Let me know.

smomin · 2024-03-19T02:35:38Z

hey @StevenH86

here is the RC with your recommend change. Please try it out.

https://console.firebase.google.com/project/_/extensions/install?ref=algolia/firestore-algolia-search@1.2.2-rc.0

nickycdk · 2024-04-05T09:05:29Z

@smomin - Also getting the error when wanting to run a full index to a collection with over 250 documents: The principal (user or service account) lacks IAM permission "cloudtasks.tasks.create" for the resource

I'm using version 1.2.2 of the extension...

smomin · 2024-04-05T18:48:14Z

@nickycdk you will need to open a ticket with Firebase team. The IAM permission is beyond what I can control in the extension. I will check with the Firebase team again but I don't think I have option to assign IAM permission during installation.

devth · 2024-04-09T00:05:37Z

I'm also hitting a 250 record max. I tried uninstalling and reinstalling the extension. Using version 1.2.2.

In my logs I see lots of errors like:

Unhandled error Error: When querying a collection group and ordering by FieldPath.documentId(), the corresponding value must result in a valid document path, but 'CCkSvn24CXdzecgCG12nR2SXsTV2' is not because it contains an odd number of segments.

I think this started happening after I configured the extension to use (path) as the Alternative Object ID.

Path is a collection group: networkTypes/{networkType}/networks/{network}/members.

smomin · 2024-04-09T03:04:06Z

I'm also hitting a 250 record max. I tried uninstalling and reinstalling the extension. Using version 1.2.2.

In my logs I see lots of errors like:
Unhandled error Error: When querying a collection group and ordering by FieldPath.documentId(), the corresponding value must result in a valid document path, but 'CCkSvn24CXdzecgCG12nR2SXsTV2' is not because it contains an odd number of segments.
I think this started happening after I configured the extension to use (path) as the Alternative Object ID.

Path is a collection group: networkTypes/{networkType}/networks/{network}/members.

Can you change back to using the document id to see if it resolves the issue?

devth · 2024-04-09T13:15:35Z

I used the default document ID prior to yesterday, and it did work (meaning no errors in my logs, and the index was fully populated with ~1200 records).

The problem is my IDs across all sub-collections aren't unique, so I needed to use (path) in order to guarantee uniqueness for the objectID in Algolia.

devth · 2024-04-11T00:11:55Z

Would it make sense to try the code changes suggested in the first or 2nd comments on this issue? #196 (comment)

I'd be happy to test out an RC.

smomin · 2024-04-11T03:42:52Z

hey @devth

here you go, https://console.firebase.google.com/project/_/extensions/install?ref=algolia/firestore-algolia-search@1.2.3-rc.0

devth · 2024-04-11T13:18:32Z

Hey @smomin, this seems to work! Logs from the extension finished with:

Finished full indexing data. 1252 translations succeeded, 0 errors.

LeadDreamer · 2024-04-12T23:44:10Z

I used the default document ID prior to yesterday, and it did work (meaning no errors in my logs, and the index was fully populated with ~1200 records).

The problem is my IDs across all sub-collections aren't unique, so I needed to use (path) in order to guarantee uniqueness for the objectID in Algolia.

The value returned by documentId() is ALWAYS the fully qualified path - mostly because that is actually how Firestore keeps and indexes it. (I have done extensive research on this; Google acknowledges it, and has no intention of changing it. stackoverflow: https://stackoverflow.com/a/58104104/2434784 )

As such, the "Id" only needs to be unique in sub-collections; the rest of the path with keep it unique. But DON'T make the mistake of using only the final leaf (Id) of the documentId. I do make a record in each document of the simple Id for convenience, but don't count on it for global uniqueness.

It's worth noting that the Id returned by .doc() is a full UUID, and is generally fully unique. I personally put absolutely no other formatting requirements on my Id's, for that reason. That's what other fields are for,

LeadDreamer · 2024-04-13T00:49:41Z

Hi @smomin and @StevenH86 thank you for your comments - I now understand why internally this uses a collectionGroup query as that is the only way that firebase can trigger on a document update where there are nested collections.

Yeah, this isn't at all true - I use this pattern all the time. Firestore triggers use a parameterized approach:
"/CollectionTop/{top_Id}/{CollectionMiddleId}/{middle_Id}/CollectionBottom/{bottom_Id}" with the top_id, middle_id, bottom_id and CoollectionMiddleId in an array/object context.param

This properly triggers on any CollectionBottom document from any leaf of the CollectionTop, {CollectionMiddleId}, CollectionBottom tree. I have an entire application built around this.

This is NOT the same as a CollectionGroup - triggers can have parameterized elements/wildcard, but NOT a full Group functionality - so they won't match the collection name from arbitrary levels in the database the way a CollectionGroup can.

Your trigger must always point to a document, even if you're using a wildcard. For example, users/{userId}/{messageCollectionId} is not valid because {messageCollectionId} is a collection. However, users/{userId}/{messageCollectionId}/{messageId} is valid because {messageId} will always point to a document. You CAN even make all the entries parameters:
"{top1}/{id1}/{top2}/{id2}....etc etc"
BUT the number of "/" will be the same, unlike CollectionGroup. I haven't seen CollectionGroup mentioned for triggers - I suspect it's because of how CollectionGroup indices work.

On the other hand, the bulk copy is complicated - Collections and CollectionGroups do not allow for wildcards, and Collection Groups only match the "last collection segment", with no "/" - it's kind of a typical Firestore limitation.

smomin · 2024-04-13T04:14:11Z

Hey @smomin, this seems to work! Logs from the extension finished with:
Finished full indexing data. 1252 translations succeeded, 0 errors.

Before I start the release process for this change, are there any other feedback on the requested change?

LeadDreamer · 2024-04-14T20:38:59Z

Yeah, actually, there is an issue: You use config.collectionPath in two different contexts:

In the trigger:

functions.firestore
  .document(config.collectionPath)
  .onWrite(async (change, context: EventContext): Promise<void>

in the collection/collectionGroup

    logs.info('Is Collection Group?', config.collectionPath.indexOf('/') !== -1);
    if (config.collectionPath.indexOf('/') === -1) {
      query = firestoreDB
        .collection(config.collectionPath);
    } else {
      query = firestoreDB
        .collectionGroup(config.collectionPath.split('/').pop());
    }

The ISSUE is that the TRIGGER must point to a DOCUMENT or DOCUMENTS, and the parameter for collection & collectionGroup must point to a COLLECTION.

Luckily, there is a trivial fix: revise the trigger to:

functions.firestore
  .document(config.collectionPath+"/{anyDoc}")
  .onWrite(async (change, context: EventContext): Promise<void>

the added string creates a parameterized wildcard for the documents - you can EITHER use context.params to retrieve it's actual value, or ignore it and get it from the document itself. Everything else should work the same.

You are correct that .collection and .collectionGroup do not allow wildcards/params. But Firestore triggers absolutely do.

Tracy Hall

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error trying to run fullIndex on > 250 documents #196

Error trying to run fullIndex on > 250 documents #196

StevenH86 commented Feb 1, 2024 •

edited

fringley commented Feb 3, 2024 •

edited

StevenH86 commented Feb 4, 2024

smomin commented Feb 9, 2024

fringley commented Feb 13, 2024

StevenH86 commented Feb 17, 2024

StevenH86 commented Mar 2, 2024

davidoort commented Mar 2, 2024

smomin commented Mar 4, 2024

smomin commented Mar 6, 2024

davidoort commented Mar 7, 2024

StevenH86 commented Mar 8, 2024 •

edited

smomin commented Mar 11, 2024

ljrodriguez1 commented Mar 14, 2024

StevenH86 commented Mar 15, 2024 •

edited

smomin commented Mar 18, 2024

smomin commented Mar 19, 2024

nickycdk commented Apr 5, 2024

smomin commented Apr 5, 2024

devth commented Apr 9, 2024

smomin commented Apr 9, 2024

devth commented Apr 9, 2024 •

edited

devth commented Apr 11, 2024

smomin commented Apr 11, 2024

devth commented Apr 11, 2024

LeadDreamer commented Apr 12, 2024

LeadDreamer commented Apr 13, 2024 •

edited

smomin commented Apr 13, 2024

LeadDreamer commented Apr 14, 2024 •

edited

Error trying to run fullIndex on > 250 documents #196

Error trying to run fullIndex on > 250 documents #196

Comments

StevenH86 commented Feb 1, 2024 • edited

fringley commented Feb 3, 2024 • edited

StevenH86 commented Feb 4, 2024

smomin commented Feb 9, 2024

fringley commented Feb 13, 2024

StevenH86 commented Feb 17, 2024

StevenH86 commented Mar 2, 2024

davidoort commented Mar 2, 2024

smomin commented Mar 4, 2024

smomin commented Mar 6, 2024

davidoort commented Mar 7, 2024

StevenH86 commented Mar 8, 2024 • edited

smomin commented Mar 11, 2024

ljrodriguez1 commented Mar 14, 2024

StevenH86 commented Mar 15, 2024 • edited

smomin commented Mar 18, 2024

smomin commented Mar 19, 2024

nickycdk commented Apr 5, 2024

smomin commented Apr 5, 2024

devth commented Apr 9, 2024

smomin commented Apr 9, 2024

devth commented Apr 9, 2024 • edited

devth commented Apr 11, 2024

smomin commented Apr 11, 2024

devth commented Apr 11, 2024

LeadDreamer commented Apr 12, 2024

LeadDreamer commented Apr 13, 2024 • edited

smomin commented Apr 13, 2024

LeadDreamer commented Apr 14, 2024 • edited

StevenH86 commented Feb 1, 2024 •

edited

fringley commented Feb 3, 2024 •

edited

StevenH86 commented Mar 8, 2024 •

edited

StevenH86 commented Mar 15, 2024 •

edited

devth commented Apr 9, 2024 •

edited

LeadDreamer commented Apr 13, 2024 •

edited

LeadDreamer commented Apr 14, 2024 •

edited