Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

module: resolve format for all situations with auto module detection on #53044

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

dygabo
Copy link
Member

@dygabo dygabo commented May 18, 2024

triggered by #53015
solves: #53016

this should be a consistent fix to always resolve the module format correctly.
Enabling module detection by default made a few other tests need some adjustments because in this case they don't generate errors anymore. e.g. test-esm-cjs-exports.js instead of error becasue a .mjs imports a .js with ESM syntax it now successfully imports it and generates the warning that this should be fixed to avoid the performance penalty.

Kindly please review and let me know what you think (if changes are necessary).

make test && make lint => green

@nodejs/loaders

@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/loaders

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. esm Issues and PRs related to the ECMAScript Modules implementation. needs-ci PRs that need a full CI run. labels May 18, 2024
@dygabo
Copy link
Member Author

dygabo commented May 18, 2024

if the solution is feasible and finds approval, the --experimental-detect-module could already be removed in this PR. wdyt?

@RedYetiDev
Copy link
Member

RedYetiDev commented May 18, 2024

If you set detect_module to true, do all the tests (except the expected failures) pass?

330984554-a8fbc807-8c45-408c-a62e-3f87cb6ee3b8

@RedYetiDev
Copy link
Member

Also,
🎉 Thank you for tackling this :-)

@RedYetiDev RedYetiDev added the loaders Issues and PRs related to ES module loaders label May 18, 2024
Copy link
Member

@RedYetiDev RedYetiDev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

Tip

While my review shows my support, I am not a core collaborator, and this review has no power / place in the approval process

Copy link
Member

@GeoffreyBooth GeoffreyBooth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Thanks for getting to the bottom of this.

@@ -110,7 +110,7 @@ class EnvironmentOptions : public Options {
public:
bool abort_on_uncaught_exception = false;
std::vector<std::string> conditions;
bool detect_module = false;
bool detect_module = true;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to fix the format issue in one PR and then unflag the feature in the follow-up.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will split this into two PRs once the failing test with __filename usage is sorted out

test/es-module/test-esm-cjs-exports.js Outdated Show resolved Hide resolved
@@ -92,6 +92,12 @@ let typelessPackageJsonFilesWarnedAbout;
function getFileProtocolModuleFormat(url, context = { __proto__: null }, ignoreErrors) {
const { source } = context;
const ext = extname(url);
const deduceFormat = (fromSource, fromUrl) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should declare this at the top level. I'd also maybe name it determineFormat or something more specific to explain the cases when it's used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also name it what the flag is named, something like detectModuleFormat

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@dygabo
Copy link
Member Author

dygabo commented May 18, 2024

Great work! Thanks for getting to the bottom of this.

Will check your comments and split it into two PRs, will happen maybe beginning of next week.

@RedYetiDev
Copy link
Member

Great work! Thanks for getting to the bottom of this.

Will check your comments and split it into two PRs, will happen maybe beginning of next week.

Lovely!

@dygabo dygabo force-pushed the fix-for-unflagging-module-format-detection branch 2 times, most recently from 1d7b4de to b7ec307 Compare May 20, 2024 16:42
@dygabo
Copy link
Member Author

dygabo commented May 20, 2024

split done, this fixes the --experimental-detect-module edge cases, no unflagging of the option.
The unflagging is prepared and I can submit a PR after this one lands

Comment on lines 95 to 98
if (Buffer.isBuffer(realSource)) {
// `containsModuleSyntax` requires source to be passed in as string
realSource = realSource.toString();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I remove the following lines, all tests still pass. I suggest we remove them (unless we can have a test, but maybe it should be its own PR).

Suggested change
if (Buffer.isBuffer(realSource)) {
// `containsModuleSyntax` requires source to be passed in as string
realSource = realSource.toString();
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed a commit to fix that. This will probably be necessary but I agree, an own PR with relevant test is better

@@ -0,0 +1,13 @@
// Flags: --experimental-detect-module --import ./test/fixtures/es-module-loaders/builtin-named-exports.mjs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than make a new file for this, can it please get moved into the file that has the other tests with this fixture?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, made that with last commit

*/
function detectModuleFormat(source, url) {
try {
let realSource = source ?? readFileSync(url, 'utf8');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An earlier version of containsModuleSyntax expected a file path, where it would load a file from disk and then parse it all in C++ land. Doing readFileSync here means that we cross from JS to C++ to get the file contents and send that large string back across the boundary, only to then send it right back again from JS to C++ for containsModuleSyntax to work with it.

A more performant approach is probably to update containsModuleSyntax to allow for an undefined input in the source parameter, and another optional parameter that can be a file path. Then in this situation where the JS side doesn’t already have the source, we can tell the C++ side to both read it from disk and analyze it, without multiple trips crossing the boundary. cc @anonrig

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reintroduced the reading from file in the native code implementation for the case where code is not passed from the managed part. Slightly modified the logic as proposed in this comment.

The transpile case is currently the only one for which resolve cannot determine the format correctly. For that we only have the needed information during the load phase, not during the resolve hook is running.

@@ -155,7 +155,7 @@ describe('--experimental-detect-module', { concurrency: !process.env.TEST_PARALL
});
}

it('should not hint wrong format in resolve hook', async () => {
it('should hint format correctly for extensionles modules resolve hook', async () => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
it('should hint format correctly for extensionles modules resolve hook', async () => {
it('should hint format correctly for the resolve hook for extensionless modules', async () => {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

* @param {string} source
* @param {URL} url
*/
function detectModuleFormat(source, url) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The format property returned from resolve is optional. The addition of this helper seems like it’s becoming required; if we don’t know the format, we’re going to determine it, rather than waiting for load to figure it out. This means potentially reading the source from disk twice, unless you preserve what you get from the first read, which detectModuleFormat here currently doesn’t. Is there another way to fix the bug where we don’t necessarily read the source in resolve if we weren’t already doing so?

Reading the source within resolve is also problematic because that should be happening in the load hook. If you read file sources here, any custom hooks the user has registered for load won’t get applied before detection is run on whatever source is read here. So a TypeScript file would get misidentified as CommonJS because it can’t parse as ESM (or really, because it can’t parse at all) even if after the hooks are applied it’s transpiled into runnable ESM.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I was thinking for a solution I had a similar chain of thoughts. This change makes resolve format resolution more reliable for most of the cases but as you also mentioned, it still does not cover the case where the load operation would have source modifications (e.g. some ts transpile step). Maybe a better solution would be to:

  • try compile as CJS
  • if possible => detected CJS
  • if not possible, try to compile as ESM
  • if possible => detected ESM
  • otherwise resolve detects format null (or undefined)

This would at least cover all the current tests, it will have the additional performance penalty of reading the file but it will always resolve the format correctly or not resolve it at all if it depends on load operation chain.

I will set the PR to draft until we decide on next steps from here.

Is there another way to fix the bug where we don’t necessarily read the source in resolve if we weren’t already doing so?

not as far as I can think now. Because we have to detect the module type that was not defined by the module author and for that, we currently try to compile it. If we have no source, we cannot make any assumptions that are valid. And it is better imo to stay with null or undefined in this case instead of trying to announce commonjs knowing that this might be wrong and changed later by the load step. This cannot be reliably solved without looking at the code of the module. Or do you see some alternative to it?

Concerning the performance, the whole auto detection mechanism is anyway a performance penalty because we parse/compile the source for it now as well. And we generate the warning that the user should fix it by specifying the type here

@dygabo dygabo marked this pull request as draft May 21, 2024 12:15
@dygabo dygabo force-pushed the fix-for-unflagging-module-format-detection branch from afb82c8 to b2f221f Compare May 26, 2024 17:35
@dygabo dygabo marked this pull request as ready for review May 27, 2024 12:40
RedYetiDev

This comment was marked as off-topic.

@aduh95

This comment was marked as off-topic.

@aduh95
Copy link
Contributor

aduh95 commented May 27, 2024

The C++ linter is failing

@RedYetiDev

This comment was marked as off-topic.

@aduh95

This comment was marked as off-topic.

Co-authored-by: Antoine du Hamel <duhamelantoine1995@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c++ Issues and PRs that require attention from people who are familiar with C++. esm Issues and PRs related to the ECMAScript Modules implementation. loaders Issues and PRs related to ES module loaders needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants