Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: ruby/rexml
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: v3.3.7
Choose a base ref
...
head repository: ruby/rexml
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: v3.3.9
Choose a head ref
  • 11 commits
  • 10 files changed
  • 3 contributors

Commits on Sep 4, 2024

  1. Bump version

    kou committed Sep 4, 2024

    Verified

    This commit was signed with the committer’s verified signature.
    hannahhoward Hannah Howard
    Copy the full SHA
    35ee73e View commit details

Commits on Sep 24, 2024

  1. Optimize SAX2Parser#get_namespace (#207)

    ```
    RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.4/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml
    ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin22]
    Calculating -------------------------------------
                             before       after  before(YJIT)  after(YJIT)
                     dom     18.085      17.677        33.086       32.778 i/s -     100.000 times in 5.529372s 5.657097s 3.022471s 3.050832s
                     sax     25.450      26.182        44.797       47.916 i/s -     100.000 times in 3.929249s 3.819475s 2.232309s 2.086982s
                    pull     29.160      29.089        55.407       53.531 i/s -     100.000 times in 3.429304s 3.437757s 1.804825s 1.868072s
                  stream     29.137      29.055        52.780       51.368 i/s -     100.000 times in 3.432007s 3.441754s 1.894649s 1.946724s
    
    Comparison:
                                  dom
            before(YJIT):        33.1 i/s
             after(YJIT):        32.8 i/s - 1.01x  slower
                  before:        18.1 i/s - 1.83x  slower
                   after:        17.7 i/s - 1.87x  slower
    
                                  sax
             after(YJIT):        47.9 i/s
            before(YJIT):        44.8 i/s - 1.07x  slower
                   after:        26.2 i/s - 1.83x  slower
                  before:        25.5 i/s - 1.88x  slower
    
                                 pull
            before(YJIT):        55.4 i/s
             after(YJIT):        53.5 i/s - 1.04x  slower
                  before:        29.2 i/s - 1.90x  slower
                   after:        29.1 i/s - 1.90x  slower
    
                               stream
            before(YJIT):        52.8 i/s
             after(YJIT):        51.4 i/s - 1.03x  slower
                  before:        29.1 i/s - 1.81x  slower
                   after:        29.1 i/s - 1.82x  slower
    ```
    
    - sax
      - YJIT=ON : 1.07x faster
      - YJIT=OFF : 1.03x faster
    naitoh authored Sep 24, 2024

    Verified

    This commit was signed with the committer’s verified signature.
    hannahhoward Hannah Howard
    Copy the full SHA
    2e1cd64 View commit details

Commits on Sep 29, 2024

  1. Fix handling with "xml:" prefixed namespace (#208)

    I found parsing XHTML documents like below fails since v3.3.3:
    
    ```xml
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE html>
    <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
      <head>
        <title>XHTML Document</title>
      </head>
      <body>
        <h1>XHTML Document</h1>
        <p xml:lang="ja" lang="ja">この段落は日本語です。</p>
      </body>
    </html>
    ```
    
    [XML namespace spec][spec] is a little bit ambiguous but document above
    is valid according to an [article W3C serves][article].
    
    I fixed the parsing algorithm. Can you review it?
    
    As an aside, `<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"
    lang="en">` style language declaration is often used in XHTML files
    included in EPUB files because [sample EPUB files][samples] provided by
    IDPF, former EPUB spec authority, use the style.
    
    [spec]: https://www.w3.org/TR/REC-xml-names/#defaulting
    [article]:
    https://www.w3.org/International/questions/qa-html-language-declarations#attributes
    [samples]: https://github.com/IDPF/epub3-samples
    KitaitiMakoto authored Sep 29, 2024
    Copy the full SHA
    78f8712 View commit details
  2. Add 3.3.8 entry

    kou committed Sep 29, 2024
    Copy the full SHA
    4197054 View commit details
  3. test: avoid using needless non ASCII characters

    kou committed Sep 29, 2024
    Copy the full SHA
    036d508 View commit details
  4. Bump version

    kou committed Sep 29, 2024
    Copy the full SHA
    622011f View commit details

Commits on Oct 9, 2024

  1. Optimize IOSource#read_until method (#210)

    ## Why?
    The result of `encode(term)` can be cached.
    
    ## Benchmark
    
    ```
    RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.4/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml
    ruby 3.3.4 (2024-07-09 revision be1089c8ec) [arm64-darwin22]
    Calculating -------------------------------------
                             before       after  before(YJIT)  after(YJIT)
                     dom     17.546      18.512        32.282       32.306 i/s -     100.000 times in 5.699323s 5.402026s 3.097658s 3.095448s
                     sax     25.435      28.294        47.526       50.074 i/s -     100.000 times in 3.931613s 3.534310s 2.104122s 1.997057s
                    pull     29.471      31.870        54.400       57.554 i/s -     100.000 times in 3.393211s 3.137793s 1.838222s 1.737494s
                  stream     29.169      31.153        51.613       52.898 i/s -     100.000 times in 3.428318s 3.209941s 1.937508s 1.890424s
    
    Comparison:
                                  dom
             after(YJIT):        32.3 i/s
            before(YJIT):        32.3 i/s - 1.00x  slower
                   after:        18.5 i/s - 1.75x  slower
                  before:        17.5 i/s - 1.84x  slower
    
                                  sax
             after(YJIT):        50.1 i/s
            before(YJIT):        47.5 i/s - 1.05x  slower
                   after:        28.3 i/s - 1.77x  slower
                  before:        25.4 i/s - 1.97x  slower
    
                                 pull
             after(YJIT):        57.6 i/s
            before(YJIT):        54.4 i/s - 1.06x  slower
                   after:        31.9 i/s - 1.81x  slower
                  before:        29.5 i/s - 1.95x  slower
    
                               stream
             after(YJIT):        52.9 i/s
            before(YJIT):        51.6 i/s - 1.02x  slower
                   after:        31.2 i/s - 1.70x  slower
                  before:        29.2 i/s - 1.81x  slower
    
    ```
    
    - YJIT=ON : 1.00x - 1.06x faster
    - YJIT=OFF : 1.05x - 1.11x faster
    naitoh authored Oct 9, 2024
    Copy the full SHA
    1d0c362 View commit details

Commits on Oct 19, 2024

  1. Fix IOSource#readline for @pending_buffer (#215)

    ## Why?
    Fixed a problem that `@pending_buffer` is not processed when `IOError`
    occurs in `@source.readline` although `@pending_buffer` exists when
    reading XML file.
    naitoh authored Oct 19, 2024
    Copy the full SHA
    cf0fb9c View commit details

Commits on Oct 24, 2024

  1. test: fix indent

    kou committed Oct 24, 2024
    Copy the full SHA
    a09646d View commit details
  2. parser: fix a bug that &#0x...; is accepted as a character reference

    kou committed Oct 24, 2024
    Copy the full SHA
    ce59f2e View commit details
  3. Add 3.3.9 entry

    kou committed Oct 24, 2024
    Copy the full SHA
    38eaa86 View commit details
42 changes: 42 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,47 @@
# News

## 3.3.9 - 2024-10-24 {#version-3-3-9}

### Improvements

* Improved performance.
* GH-210
* Patch by NAITOH Jun.

### Fixes

* Fixed a parse bug for text only invalid XML.
* GH-215
* Patch by NAITOH Jun.

* Fixed a parse bug that `&#0x...;` is accepted as a character
reference.

### Thanks

* NAITOH Jun

## 3.3.8 - 2024-09-29 {#version-3-3-8}

### Improvements

* SAX2: Improve parse performance.
* GH-207
* Patch by NAITOH Jun.

### Fixes

* Fixed a bug that unexpected attribute namespace conflict error for
the predefined "xml" namespace is reported.
* GH-208
* Patch by KITAITI Makoto

### Thanks

* NAITOH Jun

* KITAITI Makoto

## 3.3.7 - 2024-09-04 {#version-3-3-7}

### Improvements
16 changes: 11 additions & 5 deletions lib/rexml/parsers/baseparser.rb
Original file line number Diff line number Diff line change
@@ -150,12 +150,13 @@ module Private
PEDECL_PATTERN = "\\s+(%)\\s+#{NAME}\\s+#{PEDEF}\\s*>"
ENTITYDECL_PATTERN = /(?:#{GEDECL_PATTERN})|(?:#{PEDECL_PATTERN})/um
CARRIAGE_RETURN_NEWLINE_PATTERN = /\r\n?/
CHARACTER_REFERENCES = /&#0*((?:\d+)|(?:x[a-fA-F0-9]+));/
CHARACTER_REFERENCES = /&#((?:\d+)|(?:x[a-fA-F0-9]+));/
DEFAULT_ENTITIES_PATTERNS = {}
default_entities = ['gt', 'lt', 'quot', 'apos', 'amp']
default_entities.each do |term|
DEFAULT_ENTITIES_PATTERNS[term] = /&#{term};/
end
XML_PREFIXED_NAMESPACE = "http://www.w3.org/XML/1998/namespace"
end
private_constant :Private

@@ -166,6 +167,7 @@ def initialize( source )
@entity_expansion_count = 0
@entity_expansion_limit = Security.entity_expansion_limit
@entity_expansion_text_limit = Security.entity_expansion_text_limit
@source.ensure_buffer
end

def add_listener( listener )
@@ -185,7 +187,7 @@ def stream=( source )
@tags = []
@stack = []
@entities = []
@namespaces = {}
@namespaces = {"xml" => Private::XML_PREFIXED_NAMESPACE}
@namespaces_restore_stack = []
end

@@ -568,8 +570,12 @@ def unnormalize( string, entities=nil, filter=nil )
return rv if matches.size == 0
rv.gsub!( Private::CHARACTER_REFERENCES ) {
m=$1
m = "0#{m}" if m[0] == ?x
[Integer(m)].pack('U*')
if m.start_with?("x")
code_point = Integer(m[1..-1], 16)
else
code_point = Integer(m, 10)
end
[code_point].pack('U*')
}
matches.collect!{|x|x[0]}.compact!
if filter
@@ -790,7 +796,7 @@ def parse_attributes(prefixes)
@source.match(/\s*/um, true)
if prefix == "xmlns"
if local_part == "xml"
if value != "http://www.w3.org/XML/1998/namespace"
if value != Private::XML_PREFIXED_NAMESPACE
msg = "The 'xml' prefix must not be bound to any other namespace "+
"(http://www.w3.org/TR/REC-xml-names/#ns-decl)"
raise REXML::ParseException.new( msg, @source, self )
2 changes: 2 additions & 0 deletions lib/rexml/parsers/sax2parser.rb
Original file line number Diff line number Diff line change
@@ -259,6 +259,8 @@ def add( pair )
end

def get_namespace( prefix )
return nil if @namespace_stack.empty?

uris = (@namespace_stack.find_all { |ns| not ns[prefix].nil? }) ||
(@namespace_stack.find { |ns| not ns[nil].nil? })
uris[-1][prefix] unless uris.nil? or 0 == uris.size
2 changes: 1 addition & 1 deletion lib/rexml/rexml.rb
Original file line number Diff line number Diff line change
@@ -31,7 +31,7 @@
module REXML
COPYRIGHT = "Copyright © 2001-2008 Sean Russell <ser@germane-software.com>"
DATE = "2008/019"
VERSION = "3.3.7"
VERSION = "3.3.9"
REVISION = ""

Copyright = COPYRIGHT
10 changes: 8 additions & 2 deletions lib/rexml/source.rb
Original file line number Diff line number Diff line change
@@ -77,6 +77,7 @@ def initialize(arg, encoding=nil)
detect_encoding
end
@line = 0
@term_encord = {}
end

# The current buffer (what we're going to read next)
@@ -227,7 +228,7 @@ def read(term = nil, min_bytes = 1)

def read_until(term)
pattern = Private::PRE_DEFINED_TERM_PATTERNS[term] || /#{Regexp.escape(term)}/
term = encode(term)
term = @term_encord[term] ||= encode(term)
until str = @scanner.scan_until(pattern)
break if @source.nil?
break if @source.eof?
@@ -294,14 +295,19 @@ def current_line

private
def readline(term = nil)
str = @source.readline(term || @line_break)
if @pending_buffer
begin
str = @source.readline(term || @line_break)
rescue IOError
end
if str.nil?
str = @pending_buffer
else
str = @pending_buffer + str
end
@pending_buffer = nil
else
str = @source.readline(term || @line_break)
end
return nil if str.nil?

6 changes: 6 additions & 0 deletions test/parse/test_character_reference.rb
Original file line number Diff line number Diff line change
@@ -13,5 +13,11 @@ def test_linear_performance_many_preceding_zeros
REXML::Document.new('<test testing="&#' + "0" * n + '97;"/>')
end
end

def test_hex_precedding_zero
parser = REXML::Parsers::PullParser.new("<root>&#x61;&#0x61;</root>")
parser.pull # :start_element
assert_equal("a&#0x61;", parser.pull[1]) # :text
end
end
end
17 changes: 17 additions & 0 deletions test/parse/test_text.rb
Original file line number Diff line number Diff line change
@@ -4,6 +4,23 @@
module REXMLTests
class TestParseText < Test::Unit::TestCase
class TestInvalid < self
def test_text_only
exception = assert_raise(REXML::ParseException) do
parser = REXML::Parsers::BaseParser.new('a')
while parser.has_next?
parser.pull
end
end

assert_equal(<<~DETAIL.chomp, exception.to_s)
Malformed XML: Content at the start of the document (got 'a')
Line: 1
Position: 1
Last 80 unconsumed characters:
DETAIL
end

def test_before_root
exception = assert_raise(REXML::ParseException) do
parser = REXML::Parsers::BaseParser.new('b<a></a>')
35 changes: 35 additions & 0 deletions test/parser/test_base_parser.rb
Original file line number Diff line number Diff line change
@@ -23,5 +23,40 @@ def test_large_xml
parser.position < xml.bytesize
end
end

def test_attribute_prefixed_by_xml
xml = <<-XML
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>XHTML Document</title>
</head>
<body>
<h1>XHTML Document</h1>
<p xml:lang="ja" lang="ja">For Japanese</p>
</body>
</html>
XML

parser = REXML::Parsers::BaseParser.new(xml)
5.times {parser.pull}

html = parser.pull
assert_equal([:start_element,
"html",
{"xmlns" => "http://www.w3.org/1999/xhtml",
"xml:lang" => "en",
"lang" => "en"}],
html)

15.times {parser.pull}

p = parser.pull
assert_equal([:start_element,
"p",
{"xml:lang" => "ja", "lang" => "ja"}],
p)
end
end
end
34 changes: 34 additions & 0 deletions test/test_document.rb
Original file line number Diff line number Diff line change
@@ -403,6 +403,40 @@ def test_utf_16
assert_equal(expected_xml, actual_xml)
end
end

class ReadUntilTest < Test::Unit::TestCase
def test_utf_8
xml = <<-EOX.force_encoding("ASCII-8BIT")
<?xml version="1.0" encoding="UTF-8"?>
<message testing=">">Hello world!</message>
EOX
document = REXML::Document.new(xml)
assert_equal("UTF-8", document.encoding)
assert_equal(">", REXML::XPath.match(document, "/message")[0].attribute("testing").value)
end

def test_utf_16le
xml = <<-EOX.encode("UTF-16LE").force_encoding("ASCII-8BIT")
<?xml version="1.0" encoding="UTF-16"?>
<message testing=">">Hello world!</message>
EOX
bom = "\ufeff".encode("UTF-16LE").force_encoding("ASCII-8BIT")
document = REXML::Document.new(bom + xml)
assert_equal("UTF-16", document.encoding)
assert_equal(">", REXML::XPath.match(document, "/message")[0].attribute("testing").value)
end

def test_utf_16be
xml = <<-EOX.encode("UTF-16BE").force_encoding("ASCII-8BIT")
<?xml version="1.0" encoding="UTF-16"?>
<message testing=">">Hello world!</message>
EOX
bom = "\ufeff".encode("UTF-16BE").force_encoding("ASCII-8BIT")
document = REXML::Document.new(bom + xml)
assert_equal("UTF-16", document.encoding)
assert_equal(">", REXML::XPath.match(document, "/message")[0].attribute("testing").value)
end
end
end
end
end
46 changes: 46 additions & 0 deletions test/test_sax.rb
Original file line number Diff line number Diff line change
@@ -99,6 +99,52 @@ def test_sax2
end
end

def test_without_namespace
xml = <<-XML
<root >
<a att1='1' att2='2' att3='&lt;'>
<b />
</a>
</root>
XML

parser = REXML::Parsers::SAX2Parser.new(xml)
elements = []
parser.listen(:start_element) do |uri, localname, qname, attrs|
elements << [uri, localname, qname, attrs]
end
parser.parse
assert_equal([
[nil, "root", "root", {}],
[nil, "a", "a", {"att1"=>"1", "att2"=>"2", "att3"=>"&lt;"}],
[nil, "b", "b", {}]
], elements)
end

def test_with_namespace
xml = <<-XML
<root xmlns="http://example.org/default"
xmlns:foo="http://example.org/foo"
xmlns:bar="http://example.org/bar">
<a foo:att='1' bar:att='2' att='&lt;'>
<bar:b />
</a>
</root>
XML

parser = REXML::Parsers::SAX2Parser.new(xml)
elements = []
parser.listen(:start_element) do |uri, localname, qname, attrs|
elements << [uri, localname, qname, attrs]
end
parser.parse
assert_equal([
["http://example.org/default", "root", "root", {"xmlns"=>"http://example.org/default", "xmlns:bar"=>"http://example.org/bar", "xmlns:foo"=>"http://example.org/foo"}],
["http://example.org/default", "a", "a", {"att"=>"&lt;", "bar:att"=>"2", "foo:att"=>"1"}],
["http://example.org/bar", "b", "bar:b", {}]
], elements)
end

class EntityExpansionLimitTest < Test::Unit::TestCase
class GeneralEntityTest < self
def test_have_value