Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FT.Aggregate with APPLY - text fields data is returing in lowercase if we do the multi column sort #296

Closed
balasahebr92 opened this issue May 2, 2024 · 7 comments

Comments

@balasahebr92
Copy link

balasahebr92 commented May 2, 2024

Thanks for reporting an issue in NRedisStack! Please update the appropriate text below, as much data as possible really helps!

NRedisStack Version:0.12.0

Redis Stack Version:

Description:
I am facing the issue with multi column sort - the text fields data am getting in lowercase, i need this values what i stored. the values will be stored as in capital / pascalcase / camelcase but all values am getting in the lowercase. I have did some observation on this please look into this

FT.Aggreagate & FT.Search Commands
FT.Search -
Its support the single column sorting
If I have decimal fields & its type as numeric with set sortable true then, sorting for this column not doing perfect.
Example - FT.Search indexname "*" SORTBY sales_ty ASC;
FT.Search command not support for multi column sorting

2. FT.AGGREGATE with LOAD
Its support the multi column sort for text & numeric fields is working as expected
but if I have the field like (sales - 23435.3446 ) in that case its not working as expected
Example - FT.AGGREGATE indexname * LOAD * SORTBY 6 @manufacturer_desc DESC @units_ty DESC @sortassist DESC LIMIT 0 40;

3. FT.AGGREGATE with APPLY
Its support the multi column sort for text & numeric fields is working as expected.
For numeric field like (2364 or 3435.345), its also working as expected
But issue is for string/text sortable fields - where its return the data in lowercase
Example - FT.AGGREGATE indexname * SORTBY 6 @manufacturer_desc DESC @units_ty DESC @sortassist DESC APPLY @manufacturer_id AS manufacturer_id APPLY @manufacturer_desc AS manufacturer_desc APPLY @week_id AS week_id APPLY @week_enddate AS week_enddate APPLY @units_ly AS units_ly APPLY @fiscal_week_desc AS fiscal_week_desc APPLY @units_ty AS units_ty APPLY @sortassist AS sortassist LIMIT 0 40

Stored Result

image

Sorting Result

image

Please help on this ASAP

@slorello89
Copy link
Member

slorello89 commented May 2, 2024

Hi @balasahebr92, so this is a RediSearch thing, NRedisStack is just responding with the answer that FT.AGGREGATE provides. FT.AGGREGATE operations are highly optimizes based off of what is in the index, when you declare a field as SORTABLE, the default behavior on a hash is to normalize the text, (strips away the diacritics and send the value to lower case), and it uses that normalized form to order the fields.

It's that normalized value that the FT.AGGREGATE responds with, as I mentioned this process is highly optimized, it does not go back and retrieve the actual value from the document by default so it replies with this normalized version.

There's a couple ways you can override this behavior:

  1. You can explicitly load the values after calling the sortby, this will make Redis go and explicitly retrieve the value of that field and use that in the reply while preserving order:
$ FT.AGGREGATE test * SORTBY 2 @name ASC LOAD 1 name
1) (integer) 3
2) 1) "name"
   2) "albatros"
3) 1) "name"
   2) "Crow"
4) 1) "name"
   2) "Zebra"
  1. You can Override this behavior entirely by using UNF (unnormalized form), this will prevent Redis from normalizing the values in the index. Keep in mind this will cause the order to change (Z comes before a!) also note, if you use the LOAD option above and call the LOAD first, it will use that loaded value in the sortby (overriding the normalization).
$ FT.CREATE test SCHEMA name TEXT SORTABLE UNF
$ FT.AGGREGATE test * SORTBY 2 @name ASC
1) (integer) 3
2) 1) "name"
   2) "Crow"
3) 1) "name"
   2) "Zebra"
4) 1) "name"
   2) "albatros"

hopefully this helps!

@balasahebr92
Copy link
Author

balasahebr92 commented May 3, 2024

Hi slorello89, thanks for help!

If I use this command ( FT.AGGREGATE test * SORTBY 2 @name ASC LOAD * ) by taking LOAD after the sortby its looks good up to page no 3 but after for page no 4, 5, 6 its returning the previous page records. am taking load * becuase for payload, properties are dyanamically created for every scehma.

Examples - scenarion 1

  1. FT.AGGREGATE test * SORTBY 2 @name ASC LOAD * limit 1 10 - page 1 => looks good
  2. FT.AGGREGATE test * SORTBY 2 @name ASC LOAD * limit 11 10 - page 2 => looks good
  3. FT.AGGREGATE test * SORTBY 2 @name ASC LOAD * limit 21 10 - page 3 => looks good
  4. FT.AGGREGATE test * SORTBY 2 @name ASC LOAD * limit 31 10 - page 4 => looks good
  5. FT.AGGREGATE test * SORTBY 2 @name ASC LOAD * limit 41 10 - page 5 => returning the page 4 records
  6. FT.AGGREGATE test * SORTBY 2 @name ASC LOAD * limit 51 10 - page 6 => returning the page 4 records

Examples - scenarion 2
FT.AGGREGATE indexname * SORTBY 4 @textFields DESC @sortassist ASC LOAD 6 text1 text2 numeric1 numeric2 text 3 numeric4 LIMIT 1 10

Add the UNF as a true while creating the index still its display the text fields in lowecase
image

image

for this example - its showing the correct records but the text fields showing in lowercase again.

Examples - scenarion 3

FT.AGGREGATE indexname * LOAD 8 manufacturer_desc fiscal_week_desc sales_ty sales_ly sortassist manufacturer_id week_id week_enddate SORTBY 4 @manufacturer_desc ASC @sortassist ASC LIMIT 1 10

for this scenarios its looks good

image

can we use the load by mension properties / load * - before the SORTBY, it will not impact sortyby for decimal values / any performance issue ?

Please help on this.

@slorello89
Copy link
Member

Hi @balasahebr92 - I'm not able to reproduce either of these issues on my end. Not scenario 1 where you get duplicate pages, not scenario 2 where the UNF does not seem to apply.

For scenario 1, I'd ensure that the index is in the state you think it is before querying it.

For scenario 2 my shot in the dark guess is that you did not drop and recreate the index between runs

I created a gist with some example code so you can see what I'm talking about (this uses UNF and works).

@balasahebr92
Copy link
Author

balasahebr92 commented May 3, 2024

Hi slorello89, thanks for help!

Still am not clear the thing
With this below query its working fine for me but as per your suggestion need to use the loadAll after the sortby
FT.AGGREGATE indexname* LOAD * SORTBY 4 @Column1 ASC @column2 DESC LIMIT 1 10

Example - lets I have the 100K+ records and I need to retrieve the data by page size 10 every time but sometime like ( 20, 50, 100 pagesize) so what type of query you suggest for best performance point of view

Query 1

FT.AGGREGATE indexname* LOAD * SORTBY 4 @Column1 DESC @column2 DESC LIMIT 1 10

Query 2
FT.AGGREGATE indexname* LOAD 5 column1 column2 column3 column4 column5 SORTBY 4 @Column1 DESC @column2 DESC LIMIT 1 10

Query 3
FT.AGGREGATE indexname* SORTBY 4 @Column1 DESC @column2 DESC LOAD * LIMIT 1 10

Query 4

FT.AGGREGATE indexname* SORTBY 4 @Column1 DESC @column2 DESC LOAD 5 column1 column2 column3 column4 column5 LIMIT 1 10

@balasahebr92
Copy link
Author

Hi slorello89

As per above scenarios am using the Query 2 it's working for me as per my expectation but if I use the load after the SORTBY - for text fields its return the data in lowercase even if I declare this text column as UNF while creating the index.

thank you very much!

@slorello89
Copy link
Member

Hi @balasahebr92 - as I mentioned, you need to use the LOAD first to override the normalization. RE create the index with UNF, are you certain that you actually deleted the index first? Otherwise it will not override what's already in the schema.

@balasahebr92
Copy link
Author

Hi slorello89
Yeah, its looks good now.

thanks for help. am closing this issue now from my side

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants