Use a more efficient xml parser library #138

nineinchnick · 2023-03-07T09:26:26Z

What

Reduce the memory usage and allow processing large XML reports (>100 MB) on public GHA runners, that have only 7 GB of memory.

How

Replace xml-js with libxmljs.

I tested memory usage using:

gtime -f '\t%E real,\t%U user,\t%S sys,\t%K amem,\t%M mmem' npm run test

Before:

	0:18.49 real,	16.05 user,	10.00 sys,	0 amem,	230432 mmem

After:

	0:19.70 real,	16.59 user,	10.51 sys,	0 amem,	225952 mmem

Here's another test with a large XML file. This is a test report from the Trino project, it's available in the CI as artifacts.

export INPUT_SKIP_PUBLISHING=true
export INPUT_REPORT_PATHS=$HOME/Downloads/test-reports/test\ report\ test-other-modules/lib/trino-parquet/target/surefire-reports/TEST-TestSuite.xml

The file is almost 100 MB:

% du -mc "$INPUT_REPORT_PATHS" 
89	/Users/jwas/Downloads/test-reports/test report test-other-modules/lib/trino-parquet/target/surefire-reports/TEST-TestSuite.xml

Now run the action:

% gtime -f '\t%E real,\t%U user,\t%S sys,\t%K amem,\t%M mmem' node index.js

Before:

	0:08.11 real,	12.47 user,	1.83 sys,	0 amem,	3198624 mmem

After:

	0:03.00 real,	2.69 user,	0.11 sys,	0 amem,	694784 mmem

3.2 GB drops to 0.7 GB. There are also savings in processing time.

The changes in tests were required because:

in some tests for utils.js, the stacktrace was not properly set
tabs in attributes should be encoded, otherwise they get converted to spaces, which conforms to XML standards

ghaiszaher

@nineinchnick could you fix the lint problems npm run estlint?

nineinchnick · 2023-03-09T10:52:01Z

@ghaiszaher done! I bumped ecmaVersion from 2018 to 2020 to support the optional chaining operator. I think Node 16 should support most of 2020: https://node.green/#ES2020

ghaiszaher

Thanks for your contribution!

This reverts commit 5b7a533.

* Revert "Run the action in a container (#141)" This reverts commit 01b2782. * Revert "Avoid huge input lookup errors (#144)" This reverts commit 7263a78. * Revert "Use a more efficient xml parser library (#138)" This reverts commit 5b7a533.

nineinchnick requested a review from a team as a code owner March 7, 2023 09:26

nineinchnick force-pushed the faster-xml branch 5 times, most recently from 79a3082 to 2be6e74 Compare March 7, 2023 10:59

crypticmind previously approved these changes Mar 7, 2023

View reviewed changes

nineinchnick mentioned this pull request Mar 7, 2023

Potential regression in spilled ORDER BY trinodb/trino#16406

Closed

ghaiszaher self-requested a review March 8, 2023 08:27

ghaiszaher reviewed Mar 9, 2023

View reviewed changes

Use a more efficient xml parser library

6c6c770

nineinchnick dismissed crypticmind’s stale review via 6c6c770 March 9, 2023 10:49

nineinchnick force-pushed the faster-xml branch from 2be6e74 to 6c6c770 Compare March 9, 2023 10:49

ghaiszaher approved these changes Mar 9, 2023

View reviewed changes

ghaiszaher requested a review from crypticmind March 9, 2023 10:53

crypticmind approved these changes Mar 9, 2023

View reviewed changes

ghaiszaher merged commit 5b7a533 into ScaCap:master Mar 9, 2023

nineinchnick deleted the faster-xml branch March 9, 2023 11:46

ghaiszaher mentioned this pull request Mar 9, 2023

Error: Sequence ']]>' not allowed in content #143

Closed

nineinchnick added a commit to nineinchnick/action-surefire-report that referenced this pull request Mar 13, 2023

Revert "Use a more efficient xml parser library (ScaCap#138)"

8657c8b

This reverts commit 5b7a533.

nineinchnick mentioned this pull request Mar 13, 2023

Revert libxml #147

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use a more efficient xml parser library #138

Use a more efficient xml parser library #138

nineinchnick commented Mar 7, 2023 •

edited

ghaiszaher left a comment

nineinchnick commented Mar 9, 2023

ghaiszaher left a comment

Use a more efficient xml parser library #138

Use a more efficient xml parser library #138

Conversation

nineinchnick commented Mar 7, 2023 • edited

What

How

ghaiszaher left a comment

Choose a reason for hiding this comment

nineinchnick commented Mar 9, 2023

ghaiszaher left a comment

Choose a reason for hiding this comment

nineinchnick commented Mar 7, 2023 •

edited