Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PoC] Parse again modifed files for full Rector processing in a single run #3672

Closed
wants to merge 2 commits into from

Conversation

yguedidi
Copy link
Contributor

@yguedidi yguedidi commented Apr 23, 2023

Goal is to remove the need for consecutive Rector executions in order to have it fully process files in a single run.
And also to make caching after successfully processed files reliable 'see issue raised in #3614 (comment)).

Idea started from #3666 (comment)

After some quick investigation, and based on the exemple with NewlineAfterStatementRector, it seems that that particular rector is not able to process new nodes created by previous rectors.
It relies on startLine/endLine node attributes, and those are not set on new nodes created by previous rectors.

A second run of Rector works because the file is parsed again, and so those attributes are set correctly.

So here a proof of concept showing that parsing the processed file changed content before going through rectors again makes it work (see the new test passes).

Parsing again the changed content will have a performance impact for sure, even when parsing from in memory string instead of reading from the file.
This could be mitigated by doing the new parsing only when a rector that requires a new parsing is involved in configuration (see the todo in the diff).
Such rectors could be marked with a RequireNewParsingRectorInterface maybe.
Identifying them all is left for later, but the rule may be that if they depend on node attributes not properly set/updated by previous rectors on new/modified nodes to do their work.

What do you think?

@yguedidi
Copy link
Contributor Author

yguedidi commented Apr 23, 2023

@samsonasik looks like it's block in an infinite loop in the well named \Rector\Core\Tests\Issues\InfiniteLoop\InfiniteLoopTest::testException test 😓

You told me... #3666 (comment)

on multiple re-print consecutively , rector expected to stop, otherwise, that will be infinite loop.

To be completely honest I don't understand how it enters an infinite loop... is the file modified indefinitely?
Wouldn't Rector benefit from an infinite loop detection?

@yguedidi yguedidi mentioned this pull request Apr 23, 2023
@samsonasik
Copy link
Member

The current implementation is fine. Multiple run is fine on some re-printing use cases. That's also make infinite loop less happen. Thank you for understanding.

@samsonasik samsonasik closed this Apr 26, 2023
@yguedidi
Copy link
Contributor Author

yguedidi commented Apr 26, 2023

@samsonasik

The current implementation is fine

Maybe from a maintainer point of view, but not from a user perspective...
I saw no where in the doc that Rector should be run multiple times to have final changes made, or that multiple runs may be required, maybe I missed it?

Also, how Rector could be reliable when used in CI if a single run can't be trusted to have made all changes?
It's my plan at work to use Rector as an automatic review tool.

Multiple run is fine on some re-printing use cases.

Could you elaborate more here? what you mean exactly by "re-printing" use cases?
I understand it as when the changes made by rectors can be a new input to the rectors to do more changes, is this that?

If yes, then the current code intention is to handle this case with this while loop apparently, as if the file is detected has changes, nodes are traversed again by all rectors.
Except if there is something I misunderstood there.

That's also make infinite loop less happen.

Infinite loops being a thing in Rector's business, I do believe that there should be a dedicated solution to avoid them.
There are patterns to avoid them, like max recursion limit (similar to the xdebug max nesting level). I can come with an implementation for that.

I would not sacrifice Rector reliability and end users expectations because of an implementation detail.
But maybe it's only me here, any way to ask the community?


As to me it's clearly valuable to end users, what about making it an optin feature?

Thank you for the time you spent exchanging with me :)
We all want the best for Rector!

@samsonasik
Copy link
Member

Node can be reprint by apply:

$node->setAttribute(AttributeKey::ORIGINAL_NODE, null);

That use case is make long array() to [].

Again, basically, multi rules with type update is just works, but there are use cases that just another run.

The complexity and possible never exists bug before for try to only singe run is not worth the effort, that's just way too complex to cover :), been there, done that.

That just need another run, you can rely on github action to make commit until no more changes :).

@yguedidi
Copy link
Contributor Author

@samsonasik thank you for the clarification! it helped me have an idea of the root of our misalignment 🙂

I completely understand your perspective, thinking that doing multiple runs to get all the fixes "just works", because you are not using Rector manually, but with a CI that is allowed to alter your repository.
The thing is that not everybody embrassed (yet?) the philosophy of allowing automated systems to commit to the repository. 😕

As a first step toward that, I'd like to use Rector on CI in dry run mode, to enforce good practices before PR merge.
But if not all fixes are made by a single execution of Rector, I can't trust it, and potential issues introduced in a PR will "pop" as errors in CI of another PR completely unrelated to those changes after the merge of the first one.

I do believe, but it's clearly an assumption here, that most of Rector end-users don't know about this behavior of Rector, that they may need multiple sequential executions to get full processing of their files.
And I think it's because full processing is what users expect, like it's done in other CS tools (PHP_CodeSniffer, PHP-CS-Fixer, ... run them a second time and you'll get no changes).
Not having full processing forbid using dry run mode on CI because not being reliable.

Note that a complete other solution to my need would have been to have an equivalent PHPCS rules for every rector, but looks like a tedious work and would be a duplication.

and possible never exists bug before for try to only single run

With this I prevent a future bug that I would have opened for sure regarding "and potential issues introduced in a PR will "pop" as errors in CI of another PR completely unrelated to those changes after the merge of the first one".
Better catch bugs before they reach production, right? :)

My setup is not done yet because it had two requirements: reliable cache, and non breaking run on big project.
By working on the first one in #3614 I discovered this behavior of Rector, so now the setup has three requirements. 😅
For reference there is #3563 as an attempt to the second requirement.

that's just way too complex to cover :)

The second commit of this proof of concept PR shows that it's not that complex to achieve the goal I think.
It just need to be finalized.

As a personal challenge, when I'll have some time, hopefully in coming days, I'll work on a cleaner version of this proposal, making it optional using configuration, properly catch infinite loop, and analyze performance impact on running Rector on rector-src when enabled.
I'll open a new PR so you can check it if you want, and hopefully accept it.
But if you don't it's fine, I'll understand if it's not the maintainers' vision for Rector.
For my use case I'll end up using my own fork instead. Wouldn't be perfect but an acceptable cost to get reliability.

Also, if not accepted as an optional feature to give to user, maybe at least document the Rector maintainers philosophy/vision somewhere then, so that it's written and accessible, clarifying things, and so that future users diving deeper in Rector usage/internals like me don't get confused/surprised. (or i'm the only one?)

From our above discussion this is how I understand this philosophy:

  • that Rector is expected to be run automatically and often by CI and not manually by humans
  • that partially fixed files are OK, they will get more fixes in following runs
  • that the way to fix them fully is by multiple successive runs, or over time with recurring automated commits
  • that it's not compatible with dry run mode on CI as a checker, as it's not a CS tool but a refactoring tool

Thank you for reading and patience!
And sorry if I look very opinionated, I know I tend to look a bit pushy/boring in topics I believe in, did my best to be as objective as possible and clearly state when it's personal opinion.
And hope I didn't hurt you in any way as English is not my native language

@yguedidi yguedidi deleted the full-processing branch May 1, 2023 12:42
@yguedidi yguedidi mentioned this pull request May 1, 2023
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants