Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sphinx index only published cms pages #3708

Merged
merged 3 commits into from
May 29, 2024
Merged

Conversation

akostadinov
Copy link
Contributor

We filter out unpublished pages in SearchPresenter anyway. So better not index them to begin with.

Also don't index anything out of sphinx/manticore index scope globally.

@akostadinov akostadinov self-assigned this Feb 29, 2024
Comment on lines -54 to -56
# association is copied to child factory because we reference provider
# and it is not yet created, but copying association to this factory fixes it
association :provider, :factory => :provider_account
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be outdated because factories work alright without it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will automatically create a :provider for the :cms_page, using the :provider_account factory, unless you manually provide a :provider. Since a provider is mandatory to create a cms_page, I'd say you shouldn't remove this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:cms_template already defines this association. Probably in the past there was some inheritance issue and it had to be duplicated here as well.

@akostadinov akostadinov force-pushed the manticore branch 2 times, most recently from 1ca4d89 to a562646 Compare February 29, 2024 18:50
partial improvement for THREESCALE-10793
@@ -8,5 +8,5 @@
has tenant_id, type: :bigint

indexes :published
scope { CMS::Page.where(searchable: true) }
scope { CMS::Page.where(searchable: true).where.not(published: nil) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just add return false if published.nil? here

def is_searchable?
case
when ! mime_type.html?
false
when liquid_enabled?
template = Liquid::Template.parse(published)
nodelist = template.instance_variable_get("@root").instance_variable_get("@nodelist")
nodelist.none?{ |i| i.is_a?(Liquid::Tag) and not i.is_a?(Liquid::Include) }
else
true
end
end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be intended to set the searchable attribute. And it shouldn't depend on whether page is published or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but I mean that searchabledefines some criteria of when a page should be "searchable". We just add a new criteria here "the page must be searchable if it's published, so I just suggested to put it all to the same bucket. But that's fine as it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes logical sense. But then we will need to ensure to change this value reliably when page is published/unpublished and I'm not really keen on making sure all details are correct. It also follows previous (bad?) example of filtering out unpublished pages separately in the presenter.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What @mayorova says makes sense, and apparently it wouldn't require any additional change, as the model includes a callback to update the searchable value on every change:

before_save :mark_for_searchability

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If they would move in sync, then what is the point to have both of these?

This searchable attribute doesn't make much sense to me to begin with. To make it right IMO a bigger refactoring would be needed.

And I don't find it a priority to make these attributes make sense. It requires more attention that now is due elsewhere.

For now I think this is an improvement and we can have a JIRA to sanitize these attributes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the would move in sync, then what is the point to have both of these?

I don't understand this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"they"

I meant, if both attributes are changed together, then why keep them. I think this searchable is probably redundant and we may just define the proper index scope to skip the elements currently defining it. Why keep a separate field in the database when index can have a scope?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But again, not something I find very useful to spend time on now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, thanks.

end

def search
return @search if @search
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I don't understand this change... Would the rest of the method ever execute? 🤔 (i.e. in which cases @search would not be present?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just normal memoization, before method is called for the first time, the variable would be nil

Co-authored-by: Daria Mayorova <mayorova@users.noreply.github.com>
Copy link
Contributor

@jlledom jlledom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we lack a test to actually verify the behavior implemented by this PR: that existing but not published cms pages are't indexed.

Comment on lines -54 to -56
# association is copied to child factory because we reference provider
# and it is not yet created, but copying association to this factory fixes it
association :provider, :factory => :provider_account
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That will automatically create a :provider for the :cms_page, using the :provider_account factory, unless you manually provide a :provider. Since a provider is mandatory to create a cms_page, I'd say you shouldn't remove this.

@@ -8,5 +8,5 @@
has tenant_id, type: :bigint

indexes :published
scope { CMS::Page.where(searchable: true) }
scope { CMS::Page.where(searchable: true).where.not(published: nil) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What @mayorova says makes sense, and apparently it wouldn't require any additional change, as the model includes a callback to update the searchable value on every change:

before_save :mark_for_searchability

Comment on lines +24 to +32
def reindex(instance)
ThinkingSphinx::Processor.new(instance: instance).upsert
end

def delete_from_index(model, *ids)
ids.each do |id|
ThinkingSphinx::Processor.new(model: model, id: id).delete
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would need some help to understand this changes and how they relate to cms pages

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was removed from parent class, so it had to move here. No need to keep these methods in parent if they are not used there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the changes you made in the parent and in initializers/sphinx.rb don't make this unnecessary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because here we also remove associated buyers when provider is marked to be removed (but not actually removed to allow their own callbacks to kick in)

ids.each do |id|
ThinkingSphinx::Processor.new(model: model, id: id).delete
end
ThinkingSphinx::Processor.new(model: model, id: id).stage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this do and how it replaces the old code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the stage method implementation. It implements this upsert/delete in a better way, based on index scope, not only on the presence of the object.

Comment on lines +24 to +38
# implement conditionally inserting or deleting from the index
# see https://github.com/pat/thinking-sphinx/pull/1258
ThinkingSphinx::Processor.include(Module.new do
def stage
real_time_indices.each do |index|
found = index.scope.find_by(model.primary_key => id)

if found
ThinkingSphinx::RealTime::Transcriber.new(index).copy found
else
ThinkingSphinx::Deletion.perform(index, id)
end
end
end
end)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read the linked PR and now I'm even more confused. Could you please provide a global picture on what is this for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When object is found in AR scope, it is indexed in manticore, otherwise it is removed from index.

@github-actions github-actions bot added the Stale label Apr 13, 2024
@akostadinov
Copy link
Contributor Author

I think we lack a test to actually verify the behavior implemented by this PR: that existing but not published cms pages are't indexed.

We have tests that verify something matching scope is indexed and when not found in scope is deindexed. Would be redundant to write a dedicated CMS one.

https://github.com/3scale/porta/pull/3708/files?diff=unified&w=0#diff-b457d353bc438cc33ec35d44c791aa37721418b3588608b31e6725c51cae2038R92

@3scale 3scale deleted a comment from github-actions bot Apr 13, 2024
@github-actions github-actions bot removed the Stale label Apr 14, 2024
Comment on lines +24 to +32
def reindex(instance)
ThinkingSphinx::Processor.new(instance: instance).upsert
end

def delete_from_index(model, *ids)
ids.each do |id|
ThinkingSphinx::Processor.new(model: model, id: id).delete
end
end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the changes you made in the parent and in initializers/sphinx.rb don't make this unnecessary?

@@ -8,5 +8,5 @@
has tenant_id, type: :bigint

indexes :published
scope { CMS::Page.where(searchable: true) }
scope { CMS::Page.where(searchable: true).where.not(published: nil) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the would move in sync, then what is the point to have both of these?

I don't understand this.

@github-actions github-actions bot added the Stale label May 17, 2024
@akostadinov akostadinov removed the Stale label May 17, 2024
@3scale 3scale deleted a comment from github-actions bot May 17, 2024
@akostadinov akostadinov requested a review from mayorova May 29, 2024 21:56
@akostadinov akostadinov merged commit 42c6152 into 3scale:master May 29, 2024
17 of 21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants