-
Notifications
You must be signed in to change notification settings - Fork 73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sphinx index only published cms pages #3708
Conversation
# association is copied to child factory because we reference provider | ||
# and it is not yet created, but copying association to this factory fixes it | ||
association :provider, :factory => :provider_account |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to be outdated because factories work alright without it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will automatically create a :provider
for the :cms_page
, using the :provider_account
factory, unless you manually provide a :provider
. Since a provider is mandatory to create a cms_page, I'd say you shouldn't remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:cms_template
already defines this association. Probably in the past there was some inheritance issue and it had to be duplicated here as well.
b27b0ef
to
8ac5284
Compare
1ca4d89
to
a562646
Compare
partial improvement for THREESCALE-10793
a562646
to
2cd01cc
Compare
@@ -8,5 +8,5 @@ | |||
has tenant_id, type: :bigint | |||
|
|||
indexes :published | |||
scope { CMS::Page.where(searchable: true) } | |||
scope { CMS::Page.where(searchable: true).where.not(published: nil) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just add return false if published.nil?
here
Lines 68 to 83 in e6cf20e
def is_searchable? | |
case | |
when ! mime_type.html? | |
false | |
when liquid_enabled? | |
template = Liquid::Template.parse(published) | |
nodelist = template.instance_variable_get("@root").instance_variable_get("@nodelist") | |
nodelist.none?{ |i| i.is_a?(Liquid::Tag) and not i.is_a?(Liquid::Include) } | |
else | |
true | |
end | |
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be intended to set the searchable
attribute. And it shouldn't depend on whether page is published or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but I mean that searchable
defines some criteria of when a page should be "searchable". We just add a new criteria here "the page must be searchable if it's published, so I just suggested to put it all to the same bucket. But that's fine as it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes logical sense. But then we will need to ensure to change this value reliably when page is published/unpublished and I'm not really keen on making sure all details are correct. It also follows previous (bad?) example of filtering out unpublished pages separately in the presenter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If they would move in sync, then what is the point to have both of these?
This searchable
attribute doesn't make much sense to me to begin with. To make it right IMO a bigger refactoring would be needed.
And I don't find it a priority to make these attributes make sense. It requires more attention that now is due elsewhere.
For now I think this is an improvement and we can have a JIRA to sanitize these attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the would move in sync, then what is the point to have both of these?
I don't understand this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"they"
I meant, if both attributes are changed together, then why keep them. I think this searchable
is probably redundant and we may just define the proper index scope to skip the elements currently defining it. Why keep a separate field in the database when index can have a scope?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But again, not something I find very useful to spend time on now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, thanks.
end | ||
|
||
def search | ||
return @search if @search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I don't understand this change... Would the rest of the method ever execute? 🤔 (i.e. in which cases @search
would not be present?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just normal memoization, before method is called for the first time, the variable would be nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we lack a test to actually verify the behavior implemented by this PR: that existing but not published cms pages are't indexed.
# association is copied to child factory because we reference provider | ||
# and it is not yet created, but copying association to this factory fixes it | ||
association :provider, :factory => :provider_account |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That will automatically create a :provider
for the :cms_page
, using the :provider_account
factory, unless you manually provide a :provider
. Since a provider is mandatory to create a cms_page, I'd say you shouldn't remove this.
@@ -8,5 +8,5 @@ | |||
has tenant_id, type: :bigint | |||
|
|||
indexes :published | |||
scope { CMS::Page.where(searchable: true) } | |||
scope { CMS::Page.where(searchable: true).where.not(published: nil) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def reindex(instance) | ||
ThinkingSphinx::Processor.new(instance: instance).upsert | ||
end | ||
|
||
def delete_from_index(model, *ids) | ||
ids.each do |id| | ||
ThinkingSphinx::Processor.new(model: model, id: id).delete | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would need some help to understand this changes and how they relate to cms pages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code was removed from parent class, so it had to move here. No need to keep these methods in parent if they are not used there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the changes you made in the parent and in initializers/sphinx.rb
don't make this unnecessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because here we also remove associated buyers when provider is marked to be removed (but not actually removed to allow their own callbacks to kick in)
ids.each do |id| | ||
ThinkingSphinx::Processor.new(model: model, id: id).delete | ||
end | ||
ThinkingSphinx::Processor.new(model: model, id: id).stage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this do and how it replaces the old code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the stage
method implementation. It implements this upsert/delete in a better way, based on index scope, not only on the presence of the object.
# implement conditionally inserting or deleting from the index | ||
# see https://github.com/pat/thinking-sphinx/pull/1258 | ||
ThinkingSphinx::Processor.include(Module.new do | ||
def stage | ||
real_time_indices.each do |index| | ||
found = index.scope.find_by(model.primary_key => id) | ||
|
||
if found | ||
ThinkingSphinx::RealTime::Transcriber.new(index).copy found | ||
else | ||
ThinkingSphinx::Deletion.perform(index, id) | ||
end | ||
end | ||
end | ||
end) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read the linked PR and now I'm even more confused. Could you please provide a global picture on what is this for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When object is found in AR scope, it is indexed in manticore, otherwise it is removed from index.
We have tests that verify something matching scope is indexed and when not found in scope is deindexed. Would be redundant to write a dedicated CMS one. |
def reindex(instance) | ||
ThinkingSphinx::Processor.new(instance: instance).upsert | ||
end | ||
|
||
def delete_from_index(model, *ids) | ||
ids.each do |id| | ||
ThinkingSphinx::Processor.new(model: model, id: id).delete | ||
end | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And the changes you made in the parent and in initializers/sphinx.rb
don't make this unnecessary?
@@ -8,5 +8,5 @@ | |||
has tenant_id, type: :bigint | |||
|
|||
indexes :published | |||
scope { CMS::Page.where(searchable: true) } | |||
scope { CMS::Page.where(searchable: true).where.not(published: nil) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the would move in sync, then what is the point to have both of these?
I don't understand this.
We filter out unpublished pages in
SearchPresenter
anyway. So better not index them to begin with.Also don't index anything out of sphinx/manticore index scope globally.