Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

poor performance in DOMDifferenceEngine for large XMLs #236

Closed
gerpres opened this issue Nov 8, 2021 · 6 comments
Closed

poor performance in DOMDifferenceEngine for large XMLs #236

gerpres opened this issue Nov 8, 2021 · 6 comments

Comments

@gerpres
Copy link

gerpres commented Nov 8, 2021

I'm comparing two large XMLs.
One parent element contains around 45000 children.

the multiple List.indexOf()-calls in

org.xmlunit.diff.DOMDifferenceEngine.compareNodeLists(Iterable<Node>, XPathContext, Iterable<Node>, XPathContext)

are quite expensive, since all lists contain 45000 elements, and should be replaced by a more performant data-structure.
since the data-structure seems to be 'immutable' for the matches-loop, constructing multiple Map<Node,Integer> instances that contain the indizes, cuts the required comparison-time in half for my local tests.

private static <E> Map<E, Integer> index(Collection<E> collection) {
	Map<E, Integer> indizes = new HashMap<>();

	int i = 0;
	for (E e: collection) {
		indizes.put(e, i++);
	}

	return indizes;
}

and use it like:

private ComparisonState compareNodeLists(Iterable<Node> controlSeq, final XPathContext controlContext, Iterable<Node> testSeq, final XPathContext testContext) {
   ...
   Map<Node, Integer> controlListIndizes = index(controlList);
   ...
   for (Map.Entry<Node, Node> pair: matches) {
      ...
      int controlIndex = controlListIndizes.get(control);
      ...
}
@bodewig
Copy link
Member

bodewig commented Nov 9, 2021

Interesting. The code has not really been optimized in any way - in particular as big inputs will suffer from using DOM anyway. But if a simple change like this really speeds up things, this is wonderful.

Any chance you could create a pull request? If not I'll take care of it myself, but it may take a bit longer.

@bodewig
Copy link
Member

bodewig commented Dec 16, 2021

@gerpres it would be good if you could check commit 012a36a - I'll publish a 2.8.4-SNAPSHOT version right now.

bodewig added a commit that referenced this issue Dec 16, 2021
@bodewig
Copy link
Member

bodewig commented Dec 16, 2021

while porting the change to XMLUnit.NET I just realized I could safe keeping a reference to the temporary list now in -> aa329de

@gerpres
Copy link
Author

gerpres commented Dec 16, 2021

checked with my usecases. significant improvement! thanks.

@bodewig
Copy link
Member

bodewig commented Dec 16, 2021

great, thank you

@bodewig bodewig closed this as completed Dec 16, 2021
@bodewig
Copy link
Member

bodewig commented Dec 16, 2021

I'm in the process of releasing 2.8.4 but it seems Maven central is very slow right now - probably a lot of projects are cutting new releases right now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants