Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use a more efficient xml parser library #138

Merged
merged 1 commit into from Mar 9, 2023

Conversation

nineinchnick
Copy link
Contributor

@nineinchnick nineinchnick commented Mar 7, 2023

What

Reduce the memory usage and allow processing large XML reports (>100 MB) on public GHA runners, that have only 7 GB of memory.

How

Replace xml-js with libxmljs.

I tested memory usage using:

gtime -f '\t%E real,\t%U user,\t%S sys,\t%K amem,\t%M mmem' npm run test

Before:

	0:18.49 real,	16.05 user,	10.00 sys,	0 amem,	230432 mmem

After:

	0:19.70 real,	16.59 user,	10.51 sys,	0 amem,	225952 mmem

Here's another test with a large XML file. This is a test report from the Trino project, it's available in the CI as artifacts.

export INPUT_SKIP_PUBLISHING=true
export INPUT_REPORT_PATHS=$HOME/Downloads/test-reports/test\ report\ test-other-modules/lib/trino-parquet/target/surefire-reports/TEST-TestSuite.xml

The file is almost 100 MB:

% du -mc "$INPUT_REPORT_PATHS" 
89	/Users/jwas/Downloads/test-reports/test report test-other-modules/lib/trino-parquet/target/surefire-reports/TEST-TestSuite.xml

Now run the action:

% gtime -f '\t%E real,\t%U user,\t%S sys,\t%K amem,\t%M mmem' node index.js

Before:

	0:08.11 real,	12.47 user,	1.83 sys,	0 amem,	3198624 mmem

After:

	0:03.00 real,	2.69 user,	0.11 sys,	0 amem,	694784 mmem

3.2 GB drops to 0.7 GB. There are also savings in processing time.

The changes in tests were required because:

  • in some tests for utils.js, the stacktrace was not properly set
  • tabs in attributes should be encoded, otherwise they get converted to spaces, which conforms to XML standards

@nineinchnick nineinchnick requested a review from a team as a code owner March 7, 2023 09:26
@nineinchnick nineinchnick force-pushed the faster-xml branch 5 times, most recently from 79a3082 to 2be6e74 Compare March 7, 2023 10:59
crypticmind
crypticmind previously approved these changes Mar 7, 2023
Copy link
Member

@ghaiszaher ghaiszaher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nineinchnick could you fix the lint problems npm run estlint?

@nineinchnick
Copy link
Contributor Author

@ghaiszaher done! I bumped ecmaVersion from 2018 to 2020 to support the optional chaining operator. I think Node 16 should support most of 2020: https://node.green/#ES2020

Copy link
Member

@ghaiszaher ghaiszaher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution!

@ghaiszaher ghaiszaher merged commit 5b7a533 into ScaCap:master Mar 9, 2023
@nineinchnick nineinchnick deleted the faster-xml branch March 9, 2023 11:46
nineinchnick added a commit to nineinchnick/action-surefire-report that referenced this pull request Mar 13, 2023
@nineinchnick nineinchnick mentioned this pull request Mar 13, 2023
ghaiszaher pushed a commit that referenced this pull request Mar 13, 2023
* Revert "Run the action in a container (#141)"

This reverts commit 01b2782.

* Revert "Avoid huge input lookup errors (#144)"

This reverts commit 7263a78.

* Revert "Use a more efficient xml parser library (#138)"

This reverts commit 5b7a533.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants