Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent Behaviour of node.removeChild() #135

Closed
noseworthy opened this issue Sep 24, 2020 · 9 comments · Fixed by #494
Closed

Inconsistent Behaviour of node.removeChild() #135

noseworthy opened this issue Sep 24, 2020 · 9 comments · Fixed by #494
Assignees
Labels
breaking change Some thing that requires a version bump due to breaking changes bug Something isn't working help-wanted External contributions welcome needs investigation Information is missing and needs to be researched
Milestone

Comments

@noseworthy
Copy link

Hello,

I'm seeing some weird behaviour when calling node.removeChild(). If the xml source is "minified" (no pretty formatting) then it removes the first child from the node that I'm calling removeChild() on. If I call removeChild() on the dom instance itself, it replaces the dom instance with an empty string.

If however, the file is pretty printed, and I call dom.removeChild() or node.removeChild() and give it a child node (any child node in the tree) I get the tree returned back with just that node removed.

I'm using xmldom version 0.3.0 in nodejs version 14.12.0 and 10.16.0.

Here's a small mocha test that demonstrates what i'm talking about

const { DOMParser } = require('xmldom');
const { expect } = require('chai');

const MINIFIED_XML = `<?xml version="1.0" encoding="UTF-8" standalone="yes"?><parent><child><grandchild attr="value"/></child></parent>`;
const MINIFIED_XML_WITHOUT_GRANDCHILD = `<?xml version="1.0" encoding="UTF-8" standalone="yes"?><parent><child/></parent>`;

const PRETTY_PRINTED_XML = `
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<parent>
  <child>
    <grandchild attr="value"/>
  </child>
</parent>
`.trim();
const PRETTY_PRINTED_XML_WITHOUT_GRANDCHILD = `
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<parent>
  <child>
    
  </child>
</parent>`.trim();

describe('remove child test', () => {
  it('removes all content from dom if minified', async () => {
    const dom = new DOMParser().parseFromString(MINIFIED_XML);
    const grandchildren = dom.getElementsByTagName('grandchild');

    expect(grandchildren.length).to.equal(1);
    expect(dom.toString()).to.equal(MINIFIED_XML);

    const grandchild = dom.removeChild(grandchildren[0]);

    expect(grandchild.toString()).to.equal('<grandchild attr="value"/>');
    expect(dom.toString()).to.equal(MINIFIED_XML_WITHOUT_GRANDCHILD);
    expect(dom.toString()).not.to.equal('');
  });

  it('correctly removes grandchild from dom if called on parent element', async () => {
    const dom = new DOMParser().parseFromString(MINIFIED_XML);
    const children = dom.getElementsByTagName('child');
    const grandchildren = dom.getElementsByTagName('grandchild');

    expect(children.length).to.equal(1);
    expect(grandchildren.length).to.equal(1);
    expect(dom.toString()).to.equal(MINIFIED_XML);

    const grandchild = children[0].removeChild(grandchildren[0]);

    expect(grandchild.toString()).to.equal('<grandchild attr="value"/>');
    expect(dom.toString()).to.equal(MINIFIED_XML_WITHOUT_GRANDCHILD);
    expect(dom.toString()).not.to.equal('');
  });

  it('removes just the grandchild node if pretty formatted', async () => {
    const dom = new DOMParser().parseFromString(PRETTY_PRINTED_XML);
    const grandchildren = dom.getElementsByTagName('grandchild');

    expect(grandchildren.length).to.equal(1);
    expect(dom.toString()).to.equal(PRETTY_PRINTED_XML);

    const grandchild = dom.removeChild(grandchildren[0]);

    expect(grandchild.toString()).to.equal('<grandchild attr="value"/>');
    expect(dom.toString()).not.to.equal('');
    expect(dom.toString()).to.equal(PRETTY_PRINTED_XML_WITHOUT_GRANDCHILD);
  });
});

Looking at the docs on MDN (https://developer.mozilla.org/en-US/docs/Web/API/Node/removeChild) I'd expect calling removeChild on the dom instance itself to throw a TypeError since I don't think the dom instance is supposed to have removeChild as a function? I'd also maybe expect that calling removeChild() on anything but the child's direct parent to throw a NotFoundError because the node isn't a child of the node.

However, I really like the behaviour that I'm seeing in the pretty printed version of the file. It's real handy to be able to call removeChild on the dom instance and get it to find and remove that child from the tree, wherever it is.

What is the expected behaviour of this method supposed to be?

Thanks!
Mike

@karfau karfau added bug Something isn't working question Contains open questions that need to be answered labels Sep 25, 2020
@karfau
Copy link
Member

karfau commented Sep 25, 2020

Thank you for providing all that information and even tests.

Let me think out loud here:

  • If people call node.removeChild they expect it to always work as expected. So I'm with you that we should fix it to work as specified.
  • Since the API for dom.removeChild is not defined we could keep the current implementation (and fix the implementation for "minified" XML. But it's a "nice to have". (And it might be harder to implement after fixing the other one.) Since a Document is a Node it is actually specified.
  • Question: Does it also work to call node.removeChild for nodes that don't contain the argument in their tree?
    Example: Does it work to call a.removeChild(y)
<xml>
  <a>
    <x />
  </a>
  <b>
    <y />
  </b>
</xml>

Whatever we do, changing this is a breaking change, since current code might rely on removing a "child" this way.

@noseworthy
Copy link
Author

Hey! Thanks for the quick response to the issue, greatly appreciated!

Yeah, I'm not sure what approach I'm looking for here, haha. We've got a work around in our code for this right now, but this bug certainly is a tricky one. For what it's worth, my vote is for strict adherence to the spec (as I understand it... I could be wrong):

  • node.removeChild() removes a child node from the tree and returns it iff the node to be removed is a child of the node removeChild() is being called on.
  • If the child node is not a child of the parent node, then throw a NotFoundError DOMException
  • If the child node doesn't exist in the DOM throw a TypeError

Though you're right. That'll be a doozy of a breaking change for many people I think. Anyway, enough of my opinions on the matter. I'm sure you'll figure out what the best approach here is. 😁

To answer your question, calling removeChild() in this way always does something, and I never get any exceptions. Though what happens is certainly unexpected.

const { DOMParser } = require('xmldom');
const { expect } = require('chai');

const ISSUE_CHECK = `
<xml>
  <a>
    <x/>
  </a>
  <b>
    <y/>
  </b>
</xml>
`.trim();

const ISSUE_CHECK_WITHOUT_Y = `
<xml>
  <a>
    <x/>
  </a>
  <b>
    
  </b>
</xml>
`.trim();

const ISSUE_CHECK_MINIFIED = `<xml><a><x/></a><b><y/></b></xml>`;
const ISSUE_CHECK_WITHOUT_Y_MINIFIED = `<xml><a><x/></a><b/></xml>`;

describe('remove child test', () => {
  it('Remove child from non-parent node pretty printed', async () => {
    const dom = new DOMParser().parseFromString(ISSUE_CHECK);
    const ys = dom.getElementsByTagName('y');
    const as = dom.getElementsByTagName('a');

    expect(dom.toString()).to.equal(ISSUE_CHECK);

    const removedY = as[0].removeChild(ys[0]);

    expect(removedY.toString()).to.equal('<y/>');
    expect(dom.toString()).not.to.equal('');
    expect(dom.toString()).to.equal(ISSUE_CHECK_WITHOUT_Y);
  });

  it('Remove child from non-parent node minified', async () => {
    const dom = new DOMParser().parseFromString(ISSUE_CHECK_MINIFIED);
    const ys = dom.getElementsByTagName('y');
    const as = dom.getElementsByTagName('a');

    expect(dom.toString()).to.equal(ISSUE_CHECK_MINIFIED);

    const removedY = as[0].removeChild(ys[0]);

    expect(removedY.toString()).to.equal('<y/>');
    expect(dom.toString()).not.to.equal('');
    expect(dom.toString()).to.equal(ISSUE_CHECK_WITHOUT_Y_MINIFIED);
  });
});

So the first test passes for me. The document is "pretty printed" and I try to remove a child (<y />) from a node that is not it's parent (<a />). I would expect this to throw an exception. What actually happens is that <y /> is removed from the dom as if I had called removeChild() on <b />.

The second test fails for me, but not in the way I'd expect. I'm doing the same operation, but this time the document is "minified". What happens here is that the node <x /> is removed from <b /> even though I'm calling removeChild() on <a /> with the child <y />; which is super weird. e.g. here is the result of the second test:

AssertionError: expected '<xml><a/><b><y/></b></xml>' to equal '<xml><a><x/></a><b/></xml>'
Expected :<xml><a><x/></a><b/></xml>
Actual   :<xml><a/><b><y/></b></xml>

Thanks again for looking in to this. I hope all of this has been helpful 😄

@karfau
Copy link
Member

karfau commented Sep 25, 2020

This is super helpful in deed. At least it documents that this method is basically not reliable for minified XML and behaves not as specified for pretty printed XML.
From my point of view this is the most severe issue I'm aware of so far.

@davidmc24
Copy link
Contributor

Short summary of the behavior: When removeChild is called with a parent other than oldChild's actual parent, and oldChild has no siblings, it results in setting the specified parent node's first and last child to null incorrectly.

My take on this issue:

  • Document should support the removeChild method, as it is a Node.
  • The first and third tests in the issue description are attempting to call removeChild with arguments that are invalid; specifically, it is attempting to remove a node that isn't a direct child of the specified parent node. As such, as per the MDN docs, the expected behavior should be an exception being thrown.
  • We should enhance xmldom to detect attempted non-child removal and throw NotFoundError/TypeError as specified in the standard. (as @noseworthy already called out)
  • The current behavior in the first test: The specified child has no siblings. Thus, it overwrites the specified parent's firstChild and lastChild with null, which would be correct if it were actually the only child of the parent, but since it isn't the parent's child at all, is clobbering data incorrectly.
  • The current behavior in the third test: The specified child has siblings (whitespace text nodes). Thus, the previous/next updating logic updates the grandchild elements's siblings, which results in the desired behavior.
  • In both the first and third tests, the grandChild element is still present in the childNodes due to Node.removeChild for Document's direct children doesn't update childNodes  #145, but the test isn't asserting on that.
  • Workaround for people using xmldom: ensure that you are only attempting to remove a child from its actual parent (such as using the pattern nodeToRemove.parentNode.removeChild(nodeToRemove), which will avoid the broken behavior called out in this ticket.

@karfau
Copy link
Member

karfau commented Oct 12, 2020

For later reference, I found this specification (of the current, so called "Living Standard") related to node removal: https://dom.spec.whatwg.org/#concept-node-remove

@karfau karfau added the needs investigation Information is missing and needs to be researched label Jan 21, 2021
@karfau karfau added this to the 0.6.0 milestone Jan 21, 2021
@karfau karfau added breaking change Some thing that requires a version bump due to breaking changes help-wanted External contributions welcome and removed question Contains open questions that need to be answered labels Aug 21, 2021
@karfau
Copy link
Member

karfau commented Aug 21, 2021

Still very important, but it slipped through. Any contributions (including tests) are welcome to get this fixed earlier, from maintainer capacity it will hopefully be part of the second next release with breaking changes (not a milestone yet).

@karfau karfau modified the milestone: before 1.0.0 Dec 21, 2021
@karfau karfau modified the milestones: before 1.0.0, 0.9.0 Oct 25, 2022
@karfau
Copy link
Member

karfau commented Nov 5, 2022

Update: some of the things discussed here are fixed for sure in 0.7.9, 0.8.6 and 0.9.0-beta.6
I think the main thing missing here is to throw if someone attempts to remove a node that not part of parent, which will be part of 0.9.0 and some previous beta release. (Not going to be back ported to 0.8.0 and 0.7.9 since it is a breaking change, as far as I know.)

@karfau
Copy link
Member

karfau commented Jun 9, 2023

If nothing new pops up that has a higher prio, this will be the next topic I will work on, towards reaching the 0.9.0 milestone.

@karfau karfau self-assigned this Jun 9, 2023
karfau added a commit that referenced this issue Jun 9, 2023
…not a child

BREAKING-CHANGE: Previously it was possible (but not documented) to call `Node.removeChild` with any node in the tree,
and with certain exceptions, it would work. This is no longer the case: calling `Node.removeChild` this way will now throw a NotFoundError DOMException, as it is described by the specs.

https://dom.spec.whatwg.org/#concept-node-pre-remove
Fixes #135
@karfau
Copy link
Member

karfau commented Jun 9, 2023

I looked into it and I think there is no sane way of supporting the current way of "randomly removing any node by calling removeChild on any other node or the document.
According to the specs, Document shouldn't even have it's own implementation, it should use the implementation of Node.
So I will add the check that is specified which will prevent all the weird issues described here and will throw if child is not a child of the parent as part of #494

karfau added a commit that referenced this issue Jun 9, 2023
…meter

BREAKING-CHANGE: Previously it was possible (but not documented) to call `Node.removeChild` with any node in the tree,
and with certain exceptions, it would work. This is no longer the case: calling `Node.removeChild` this way will now throw a NotFoundError DOMException, as it is described by the specs.

https://dom.spec.whatwg.org/#concept-node-pre-remove
Fixes #135
karfau added a commit that referenced this issue Jun 9, 2023
…er (#494)

BREAKING CHANGE: Previously it was possible (but not documented) to call
`Node.removeChild` with any node in the tree,
and with certain exceptions, it would work. This is no longer the case:
calling `Node.removeChild` with an argument that is not a direct child
of the node that it is called from, will throw a NotFoundError
DOMException, as it is described by the specs.

https://dom.spec.whatwg.org/#concept-node-pre-remove 
Fixes #135
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change Some thing that requires a version bump due to breaking changes bug Something isn't working help-wanted External contributions welcome needs investigation Information is missing and needs to be researched
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

3 participants