Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] Nokogiri::XML::Document#css no longer matches namespaced XML attributes #3411

Closed
stanhu opened this issue Jan 15, 2025 · 6 comments · Fixed by #3413
Closed

[bug] Nokogiri::XML::Document#css no longer matches namespaced XML attributes #3411

stanhu opened this issue Jan 15, 2025 · 6 comments · Fixed by #3413

Comments

@stanhu
Copy link

stanhu commented Jan 15, 2025

Please describe the bug

I'm not sure if this is a bug, but I noticed when trying to upgrade from nokogiri v1.16.8 to v1.17.2, that this behavior changed:

require 'nokogiri'

body = <<-XML
<OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/" xmlns:moz="http://www.mozilla.org/2006/browser/search/">
  <ShortName>Test</ShortName>
  <Description>Search Test</Description>
  <InputEncoding>UTF-8</InputEncoding>
  <Image width="16" height="16" type="image/x-icon">http://test.host/favicon.ico</Image>
  <Url type="text/html" method="get" template="http://test.host/search?search={searchTerms}"/>
  <moz:SearchForm>http://test.host/search</moz:SearchForm>
</OpenSearchDescription>
XML

doc = Nokogiri::XML.parse(body)

puts doc.css('OpenSearchDescription *').map(&:name)

In nokogiri v1.16.8, this returned:

ShortName
Description
InputEncoding
Image
Url
SearchForm

In nokogiri v1.17.2, SearchForm is excluded:

ShortName
Description
InputEncoding
Image
Url

I see three commits in lib/nokogiri/xml/searchable.rb that might be relevant:

Is this intentional? Do I need to change my selector to handle this?

@stanhu stanhu added the state/needs-triage Inbox for non-installation-related bug reports or help requests label Jan 15, 2025
@stanhu
Copy link
Author

stanhu commented Jan 15, 2025

It seems I have two options. Option 1 is to drop namespaces:

doc.remove_namespaces!

Option 2 is to register the namespaces and update the query:

namespaces = {
  'default' => 'http://a9.com/-/spec/opensearch/1.1/',
  'moz' => 'http://www.mozilla.org/2006/browser/search/'
}

doc.xpath('//default:OpenSearchDescription/*', namespaces)

@flavorjones
Copy link
Member

@stanhu Interesting! I'll take a look as soon as I can.

@flavorjones flavorjones added topic/namespaces and removed state/needs-triage Inbox for non-installation-related bug reports or help requests labels Jan 16, 2025
@flavorjones
Copy link
Member

Behavior change was introduced in 179d74d:

commit 179d74d (HEAD)
Author: Mike Dalessio mike.dalessio@gmail.com
Date: 2024-06-11 12:03:12 -0400

refactor: move namespaces into the XPathVisitor

Previously this was passed into CSS::Parser.new and used during the
parsing phase. Now it's passed into the visitor and we do namespace
munging there, where it should be.

Note the namespaces has also been moved out of the top-level parser
cache key into XPathVisitor#config.

Finally, note that namespaces is nil by default (instead of being an
empty Hash) which should prevent some unnecessary object creation in
the most common use cases.

So this seems like it's a bug, I'll fix it.

@flavorjones
Copy link
Member

Proposed fix at #3413

flavorjones added a commit that referenced this issue Jan 19, 2025

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
…3413)

**What problem is this PR intended to solve?**

When performing a CSS selector query, an XML document's root namespace
declarations are injected into the XPathVisitor, and the default
namespace is applied to any non-namespaced element in the resulting
xpath query.

However, the wildcard selector "*" should not have this namespace
applied. This regressed in v1.17.0 with 179d74d.

Fixes #3411


**Have you included adequate test coverage?**

Yes.


**Does this change affect the behavior of either the C or the Java
implementations?**

N/A
flavorjones added a commit that referenced this issue Jan 19, 2025

Verified

This commit was signed with the committer’s verified signature.
flavorjones Mike Dalessio
When performing a CSS selector query, an XML document's root namespace
declarations are injected into the XPathVisitor, and the default
namespace is applied to any non-namespaced element in the resulting
xpath query.

However, the wildcard selector "*" should not have this namespace
applied. This regressed in v1.17.0 with 179d74d.

Fixes #3411
@flavorjones
Copy link
Member

Backporting to v1.18.x for a bugfix release: #3414

@flavorjones
Copy link
Member

Fixed in Release v1.18.2 / 2024-01-19 · sparklemotion/nokogiri

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants