-
-
Notifications
You must be signed in to change notification settings - Fork 336
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[PoC] Parse again modifed files for full Rector processing in a single run #3672
Conversation
@samsonasik looks like it's block in an infinite loop in the well named You told me... #3666 (comment)
To be completely honest I don't understand how it enters an infinite loop... is the file modified indefinitely? |
The current implementation is fine. Multiple run is fine on some re-printing use cases. That's also make infinite loop less happen. Thank you for understanding. |
Maybe from a maintainer point of view, but not from a user perspective... Also, how Rector could be reliable when used in CI if a single run can't be trusted to have made all changes?
Could you elaborate more here? what you mean exactly by "re-printing" use cases? If yes, then the current code intention is to handle this case with this while loop apparently, as if the file is detected has changes, nodes are traversed again by all rectors.
Infinite loops being a thing in Rector's business, I do believe that there should be a dedicated solution to avoid them. I would not sacrifice Rector reliability and end users expectations because of an implementation detail. As to me it's clearly valuable to end users, what about making it an optin feature? Thank you for the time you spent exchanging with me :) |
Node can be reprint by apply: $node->setAttribute(AttributeKey::ORIGINAL_NODE, null); That use case is make long Again, basically, multi rules with type update is just works, but there are use cases that just another run. The complexity and possible never exists bug before for try to only singe run is not worth the effort, that's just way too complex to cover :), been there, done that. That just need another run, you can rely on github action to make commit until no more changes :). |
@samsonasik thank you for the clarification! it helped me have an idea of the root of our misalignment 🙂 I completely understand your perspective, thinking that doing multiple runs to get all the fixes "just works", because you are not using Rector manually, but with a CI that is allowed to alter your repository. As a first step toward that, I'd like to use Rector on CI in dry run mode, to enforce good practices before PR merge. I do believe, but it's clearly an assumption here, that most of Rector end-users don't know about this behavior of Rector, that they may need multiple sequential executions to get full processing of their files. Note that a complete other solution to my need would have been to have an equivalent PHPCS rules for every rector, but looks like a tedious work and would be a duplication.
With this I prevent a future bug that I would have opened for sure regarding "and potential issues introduced in a PR will "pop" as errors in CI of another PR completely unrelated to those changes after the merge of the first one". My setup is not done yet because it had two requirements: reliable cache, and non breaking run on big project.
The second commit of this proof of concept PR shows that it's not that complex to achieve the goal I think. As a personal challenge, when I'll have some time, hopefully in coming days, I'll work on a cleaner version of this proposal, making it optional using configuration, properly catch infinite loop, and analyze performance impact on running Rector on rector-src when enabled. Also, if not accepted as an optional feature to give to user, maybe at least document the Rector maintainers philosophy/vision somewhere then, so that it's written and accessible, clarifying things, and so that future users diving deeper in Rector usage/internals like me don't get confused/surprised. (or i'm the only one?) From our above discussion this is how I understand this philosophy:
Thank you for reading and patience! |
Goal is to remove the need for consecutive Rector executions in order to have it fully process files in a single run.
And also to make caching after successfully processed files reliable 'see issue raised in #3614 (comment)).
Idea started from #3666 (comment)
After some quick investigation, and based on the exemple with
NewlineAfterStatementRector
, it seems that that particular rector is not able to process new nodes created by previous rectors.It relies on
startLine
/endLine
node attributes, and those are not set on new nodes created by previous rectors.A second run of Rector works because the file is parsed again, and so those attributes are set correctly.
So here a proof of concept showing that parsing the processed file changed content before going through rectors again makes it work (see the new test passes).
Parsing again the changed content will have a performance impact for sure, even when parsing from in memory string instead of reading from the file.
This could be mitigated by doing the new parsing only when a rector that requires a new parsing is involved in configuration (see the todo in the diff).
Such rectors could be marked with a
RequireNewParsingRectorInterface
maybe.Identifying them all is left for later, but the rule may be that if they depend on node attributes not properly set/updated by previous rectors on new/modified nodes to do their work.
What do you think?