Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MBUILDCACHE-86] bugfix / enhancements restoration of outputs on disk #104

Merged
merged 2 commits into from
May 1, 2024

Conversation

kbuntrock
Copy link
Contributor

@kbuntrock kbuntrock commented Oct 18, 2023

Fix https://issues.apache.org/jira/browse/MBUILDCACHE-86

This PR covers some bugfix / enhancements with the restoration of outputs on disk.

It is in draft since I expect some discussions. And the IT test has still do be coded.

"Au menu" :

  • Bugfix / "todo" : files in a base directory containing an underscore were wrongly restored to disk (not at the same location).
    -> To do so, the path is not guessed anymore from the classifier. I introduced a "filePath" property in the "attachedArtifact" section of the buildinfo.xml file.
    -> Because the buildInfo structure change, I changed the cache implementation version from "v1" to "v1.1". I assume it was one of the purpose of this value : we don't have to deal with structure migration. Any previous cache entry is defacto invalidated.
  • Forbid the possibility to extract/restore data in a directory outside the project (like extracting ../../../.ssh for example)
    -> I guess the extraction part is not a vulnerability since someone with commit permissions can guess other ways to extract data. But the possibility of restoring at any place on the disk looks pretty dangerous to me if a remote cache server is compromised.
  • Gives the possibility to restore artefacts on disk, with a dedicated property : maven.build.cache.restoreOnDiskArtefacts (default to true, open for discussion)
  • Introduce "globs" to filter extra attached outputs by filenames.

Following this checklist to help us incorporate your
contribution quickly and easily:

  • Make sure there is a MBUILDCACHE JIRA issue filed
    for the change (usually before you start working on it). Trivial changes like typos do not
    require a JIRA issue. Your pull request should address just this issue, without
    pulling in other changes.
  • Each commit in the pull request should have a meaningful subject line and body.
  • Format the pull request title like [MBUILDCACHE-XXX] - Fixes bug in ApproximateQuantiles,
    where you replace MBUILDCACHE-XXX with the appropriate JIRA issue. Best practice
    is to use the JIRA issue title in the pull request title and in the first line of the
    commit message.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Run mvn clean verify to make sure basic checks pass. A more thorough check will
    be performed on your pull request automatically.
  • You have run the Core IT successfully.

If your pull request is about ~20 lines of code you don't need to sign an
Individual Contributor License Agreement if you are unsure
please ask on the developers list.

To make clear that you license your contribution under
the Apache License Version 2.0, January 2004
you have to acknowledge this by using the following check-box.

Sorry, something went wrong.

@kbuntrock kbuntrock changed the title bugfix / enhancements restoration of outputs on disk [PR-104] bugfix / enhancements restoration of outputs on disk Oct 18, 2023
@kbuntrock kbuntrock force-pushed the evol-generated-element-restoration branch from 429bd8e to db654e7 Compare October 18, 2023 21:52
@j-s-3
Copy link

j-s-3 commented Nov 1, 2023

@kbuntrock My company ran into the issue with clean deleting out the target directories but then not restoring the Jars when the build happens. I tested out your changes against our project and can confirm the jar restoration works as expected. Thank you for adding this! 🎉

@kbuntrock
Copy link
Contributor Author

@kbuntrock My company ran into the issue with clean deleting out the target directories but then not restoring the Jars when the build happens. I tested out your changes against our project and can confirm the jar restoration works as expected. Thank you for adding this! 🎉

Thank you. :) I guess it's more related to issue #92

Related to this one, since there is less discussion than expected, I will code some tests next weekend and then fully submit the PR. :)

@martincekodhima
Copy link

martincekodhima commented Apr 9, 2024

Hey, I think we are running into the same issue as #104 (comment). If the build is cached, the target folder is not being generated and neither is the jar. This causes issues if you have a script that depends on the jar being there.

@kbuntrock Was that fixed in #92 or is that fixed with this MR?

@nobesio
Copy link

nobesio commented Apr 10, 2024

Hi folks! We are experiencing the same issue that @j-s-3 shared here:

@kbuntrock My company ran into the issue with clean deleting out the target directories but then not restoring the Jars when the build happens. I tested out your changes against our project and can confirm the jar restoration works as expected. Thank you for adding this! 🎉

We were wondering if this PR will address it. Thank you!

@j-s-3
Copy link

j-s-3 commented Apr 10, 2024

@martincekodhima @nobesio we've been running a build from this branch since November and it has fixed that issue for us!

@nobesio
Copy link

nobesio commented Apr 10, 2024

Thank you for confirming this, @j-s-3!

@kbuntrock & @AlexanderAshitkin: Is there anything we could do to help you merge and release these changes? I see there is just a comment related a typo.

@kbuntrock
Copy link
Contributor Author

Since no one raised any concerns about to this PR (and even the opposite), I guess I can continue to work on the next steps.
I guess the key element is time. ;)

I have to check how I left this PR in term of tests.

For information, I use on my projects since October a custom version of the cache extension. Including slight modification of this PR. I have to check what was the modification about.

@olamy
Copy link
Member

olamy commented Apr 10, 2024

@kbuntrock Thanks for looking after this. I'm planning to release the cache extension in the coming weeks.
Not sure when do you plan to finish this PR? Would you mind creating a jira for it?
Thanks again

@kbuntrock
Copy link
Contributor Author

@olamy : Surely, will create the Jira about it.
Don't know about the plans too, but a release planned in the coming weeks is a good motivation.

@kbuntrock kbuntrock changed the title [PR-104] bugfix / enhancements restoration of outputs on disk [MBUILDCACHE-86] bugfix / enhancements restoration of outputs on disk Apr 19, 2024
@hboutemy
Copy link
Member

hboutemy commented Apr 20, 2024

@kbuntrock I see you created MBUILDCACHE-86 and renamed the PR title, great
can you also update the Git commit to [MBUILDCACHE-86] instead of [PR-104] please?

Copy link
Member

@hboutemy hboutemy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should just search global "artefact" and fix :)

the new configuration property should probably be documented in https://maven.apache.org/extensions/maven-build-cache-extension/parameters.html

and as written before, please change the PR-104 in the commit message

@AlexanderAshitkin
Copy link
Contributor

Hi, @kbuntrock @hboutemy. This PR looks good. How can we move it from a draft to an open state?

@olamy
Copy link
Member

olamy commented Apr 21, 2024

you should just search global "artefact" and fix :)

Both are correct. It only depends if we consider spelling from UK or from the US
But hey, original English is from the UK :)
I agree everything in the Maven world is Artifact, so yup, for visible API, we should use the same.
But for comments, we can accept Shakespeare's English :)
https://dictionary.cambridge.org/dictionary/english/artefact

@kbuntrock
Copy link
Contributor Author

kbuntrock commented Apr 21, 2024

Hi, @kbuntrock @hboutemy. This PR looks good. How can we move it from a draft to an open state?

Hello @AlexanderAshitkin. I guess this PR needs now a bit of "huile de coude" like we say in France ("elbow oil" ...). Meaning that I have to work. 😊

In details, I need to :

Sorry, something went wrong.

@kbuntrock kbuntrock force-pushed the evol-generated-element-restoration branch from db654e7 to c2878e1 Compare April 22, 2024 21:06
@kbuntrock
Copy link
Contributor Author

The code introduced with MBUILDCACHE-80 has partially the same purpose of this MR. I need to adapt my code.
First version of the merge can be seen in commit : c2878e1

But I still have some work since restored target directory shows some "weird" pom files.

image

@hboutemy
Copy link
Member

some "weird" pom files

oh, this is Maven 4 consumer POMs, that seem to have been generated with random file names

@gnodet do we really want these random file names? Do we need to create a hack in build cache extension to manage this randomness?

@kbuntrock
Copy link
Contributor Author

some "weird" pom files

oh, this is Maven 4 consumer POMs, that seem to have been generated with random file names

@gnodet do we really want these random file names? Do we need to create a hack in build cache extension to manage this randomness?

To add a bit of context, I did some experiment yesterday evening. Here are some extract of the file buildinfo.xml after the execution of the goal package in different contexts:

Test "IncrementalRestoreTest", on this branch:

<goals>
  <goal>package</goal>
</goals>
<artifact>
  <groupId>org.apache.maven.caching.test</groupId>
  <artifactId>mbuildcache-incremental</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <type>jar</type>
  <fileName>mbuildcache-incremental.jar</fileName>
  <fileHash>887c667918b9b71d</fileHash>
  <fileSize>3119</fileSize>
  <filePath>target/mbuildcache-incremental-final.jar</filePath>
</artifact>
<attachedArtifacts>
  <attachedArtifact>
    <groupId>org.apache.maven.caching.test</groupId>
    <artifactId>mbuildcache-incremental</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <classifier>consumer</classifier>
    <type>pom</type>
    <fileName>mbuildcache-incremental-consumer.pom</fileName>
    <fileHash>bf52cc397806673a</fileHash>
    <fileSize>430</fileSize>
    <filePath>target/consumer-4676390733155918308.pom</filePath>
  </attachedArtifact>
</attachedArtifacts>

Test "IncrementalRestoreTest", on the branch master:

<goals>
  <goal>package</goal>
</goals>
<artifact>
  <groupId>org.apache.maven.caching.test</groupId>
  <artifactId>mbuildcache-incremental</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <type>jar</type>
  <fileName>mbuildcache-incremental.jar</fileName>
  <fileHash>41e8d89e0385f771</fileHash>
  <fileSize>3119</fileSize>
</artifact>
<attachedArtifacts>
  <attachedArtifact>
    <groupId>org.apache.maven.caching.test</groupId>
    <artifactId>mbuildcache-incremental</artifactId>
    <version>0.0.1-SNAPSHOT</version>
    <classifier>consumer</classifier>
    <type>pom</type>
    <fileName>mbuildcache-incremental-consumer.pom</fileName>
    <fileHash>bf52cc397806673a</fileHash>
    <fileSize>430</fileSize>
  </attachedArtifact>
</attachedArtifacts>

On a standalone project based on "IncrementalRestoreTest", with the extension code of this branch:

<goals>
  <goal>package</goal>
</goals>
<artifact>
  <groupId>org.apache.maven.caching.test</groupId>
  <artifactId>mbuildcache-incremental</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <type>jar</type>
  <fileName>mbuildcache-incremental.jar</fileName>
  <fileHash>8d6a9a9795c1f249</fileHash>
  <fileSize>3122</fileSize>
  <filePath>target/mbuildcache-incremental-final.jar</filePath>
</artifact>

Meaning that:

  • It might be linked to the IT execution context
  • It is not related to this PR, so I will focus on updating the current tests and put this problem aside.

@hboutemy
Copy link
Member

notice: if you build with Maven 3, you won't have this extra file, which is Maven 4 specific https://cwiki.apache.org/confluence/display/MAVEN/Build+vs+Consumer+POM

@kbuntrock kbuntrock force-pushed the evol-generated-element-restoration branch from c2878e1 to b89e8cc Compare April 25, 2024 19:32
@kbuntrock kbuntrock marked this pull request as ready for review April 25, 2024 19:35
@kbuntrock
Copy link
Contributor Author

kbuntrock commented Apr 25, 2024

Alright, I think I've completed my TODO list. :) (@hboutemy & @AlexanderAshitkin)

EDIT : Hum, will try to fix the javadoc generation this evening

@kbuntrock kbuntrock force-pushed the evol-generated-element-restoration branch from b89e8cc to d715059 Compare April 26, 2024 18:24
@kbuntrock
Copy link
Contributor Author

Alright, javadoc generation should be fixed

@hboutemy hboutemy self-requested a review April 30, 2024 10:38
Copy link
Member

@hboutemy hboutemy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kbuntrock
Copy link
Contributor Author

Just for clarity, #103 should be merged first.

@olamy
Copy link
Member

olamy commented Apr 30, 2024

@kbuntrock #103 has been merged. Thanks!

kbuntrock added 2 commits May 1, 2024 00:10

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
- Bugfix / "todo" : files in a base directory containing an underscore were wrongly restored to disk (not at the same location).
  -> To do so, the path is not guessed anymore from the classifier. I introduced a "filePath" property in the "attachedArtifact" section of the buildinfo.xml file.
  -> Because the buildInfo structure change, I changed the cache implementation version from "v1" to "v1.1". I assume it was one of the purpose of this value : we don't have to deal with structure migration. Any previous cache entry is defacto invalidated.
- Forbid the possibility to extract/restore data in a directory outside the project (like extracting ../../../.ssh for example)
  -> I guess the extraction part is not a vulnerability since someone with commit permissions can guess other ways to extract data. But the possibility of restoring at any place on the disk looks pretty dangerous to me if a remote cache server is compromised.
- Gives the possibility to restore artefacts on disk, with a dedicated property : maven.build.cache.restoreOnDiskArtefacts (default to true, open for discussion)
- Introduce "globs" to filter extra attached outputs by filenames.
+ TI

Fix balises javadoc pour jdk8
@kbuntrock kbuntrock force-pushed the evol-generated-element-restoration branch from d715059 to 4b2455e Compare April 30, 2024 22:12
@kbuntrock
Copy link
Contributor Author

@kbuntrock #103 has been merged. Thanks!

@olamy: rebased, ready to merge :)

@olamy olamy added the bug Something isn't working label May 1, 2024
@olamy olamy merged commit dcc89ad into apache:master May 1, 2024
38 checks passed
@martincekodhima
Copy link

Thanks everyone! 😄

@nobesio
Copy link

nobesio commented May 1, 2024

Thank you, @kbuntrock!

@kbuntrock
Copy link
Contributor Author

Thank you, @kbuntrock!

You're welcome :)

@hacosta
Copy link

hacosta commented May 8, 2024

I'm really looking forward to testing this out, is there a new release planned or a place where I can track it?

@kbuntrock
Copy link
Contributor Author

I'm really looking forward to testing this out, is there a new release planned or a place where I can track it?

Yes, Olivier mentioned on the maven dev list that he is planning to create a release probably this week. (https://lists.apache.org/list.html?dev@maven.apache.org)

@olamy
Copy link
Member

olamy commented May 9, 2024

vote started https://lists.apache.org/thread/do2dc2tpmq2zknyqnf55xpvllop94bq5

if not issues this should be available early next week

@julien-pcd
Copy link

julien-pcd commented May 20, 2024

The change to "Forbid the possibility to extract/restore data in a directory outside the project" is a breaking change for some of my projects.

In my case, I have several multi-module projects with child POMs configured to use ../target/${project.artifactId} as their build directory in order to have a single target dir in the parent rather than N children target directories.

Would you consider adding an opt-out option ? @kbuntrock

@kbuntrock
Copy link
Contributor Author

The change to "Forbid the possibility to extract/restore data in a directory outside the project" is a breaking change for some of my projects.

In my case, I have several multi-module projects with child POMs configured to use ../target/${project.artifactId} as their build directory in order to have a single target dir in the parent rather than N children target directories.

Would you consider adding an opt-out option ? @kbuntrock

Hello @julien-pcd

I'm sorry to ear that. I see 3 options :

  1. An opt-out parameter allowing to disable the security
  2. A configuration parameter allowing a specific path for extract/restore. By default it is project.basedir (the current hardcoded value)
  3. Automatic detection of a wider allowed extract/restore area, based on parents location resolution. But since you can specify relative path for parents, I'm expecting a lot of edge cases.

My favourite solution is by far the number 2.

Could I have your opinion @AlexanderAshitkin and @olamy ?

And I wish a good day yo all of you! 😊

@olamy
Copy link
Member

olamy commented Jun 20, 2024

I would say option 1.
Option 2 might to complicated to implement?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants