Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle cyclic dependency in pip wheels, e.g. PyTorch 2.0 #1076

Closed
siddharthab opened this issue Feb 16, 2023 · 30 comments
Closed

Handle cyclic dependency in pip wheels, e.g. PyTorch 2.0 #1076

siddharthab opened this issue Feb 16, 2023 · 30 comments

Comments

@siddharthab
Copy link
Contributor

siddharthab commented Feb 16, 2023

🚀 feature request

Relevant Rules

pip_repository, whl_library and py_library

Description

rules_python does dependency resolution at the level of a wheel and sets up the entire wheel as one py_library. This difference in the level of granularity here is why we can not handle cyclic dependencies in wheels. Some pip wheels have cyclic dependencies which is valid Python, and pip handles them well. For example, the upcoming PyTorch 2.0 release has a cyclic dependency between the torch (.whl file) and pytorch-triton (.whl file) wheels.

Describe the solution you'd like

The proper solution for this would be to have more granular build units which would also be better from a sandboxing and build caching efficiency perspective. But as an intermediate solution, maybe we could install the wheels with cyclical dependencies between them into one repo (through one whl_library call) and set up the composite of such wheels as one py_library. For gazelle, a resolve directive can ensure that all packages point to the same py_library target.

Describe alternatives you've considered

An alternative could be that I manually build a composite wheel and set it up as a whl_library in my WORKSPACE.

@arrdem
Copy link
Contributor

arrdem commented Feb 16, 2023

Does Gazelle actually have access to this information? Consider a package with platform-dependent dependencies and packaging. Gazelle can't reasonably be made to determine that 1) there is a conditional dependency 2) on some platform+architecture combination it results in a cycle. The dependency graph is only really available after the pip install (or ideally pip download) has occurred.

Maybe one could make pip_parse install requirements, create a py_library/whl_library implementation target pair with no dependencies, collect dependency information, compute connected components within the dependencies, create library/whl targets which wrap connected components into a single dependency and generate alias targets for everything else.

@siddharthab
Copy link
Contributor Author

siddharthab commented Feb 16, 2023

In my intermediate solution, I am not suggesting that Gazelle do anything. The only comment I made regarding Gazelle is that an extra resolve directive would be enough to make things work there without any changes to what it has currently. In fact, looking at how it works more closely, I think Gazelle will work for my suggestion without any changes to it.

My suggestion is all this could be handled at the pip_parse level. The requirements.txt file from pip-compile has enough information to construct a directional graph and detect cycles. And even if we do not want to perform cycle detection in pip_parse, we could manually provide another attribute to it to create composite wheels. This new attribute could be a list of lists or a dict of strings to lists.

For PyTorch 2.0, one could imagine a new pip_parse invocation (without automatic cycle detection) to look like this:

pip_parse(
  name = "pip",
  requirements_lock = "//:requirements_lock.txt",
  composite_wheels = {
    "torch_group": ["torch", "pytorch-triton"],
  },
)

With automatic cycle detection, the invocation would stay as-is.

Going back to the more long-term proper solution, I was suggesting something akin to what Gazelle does for Golang, where dependencies are specified per Go package, and not per Go module. And Gazelle accounts for platform specific dependencies seamlessly there. It is able to do this by running Gazelle inside each go_repository after downloading and extracting the contents, as part of the implementation of go_repository. Note that as per the specification, Go modules are allowed to have cyclical dependencies, but Go packages are not. So Gazelle's behavior for Golang is appropriate.

@aignas
Copy link
Collaborator

aignas commented Feb 17, 2023

@arrdem
Copy link
Contributor

arrdem commented Feb 17, 2023

That makes good sense. Even being able to manually specify composite libraries would be a big help. At present my deployment is maintaining some forked packages with "fixed" dependencies to resolve cycles that way.

@arrdem
Copy link
Contributor

arrdem commented Mar 20, 2023

I've pulled together a draft patch which implements the above sketched composite_deps flow since it blocks some stuff for us, working with my employer to get approval and a CLA to release the code for review.

@Ubehebe
Copy link

Ubehebe commented Apr 6, 2023

I ran into the same problem with the same dependency (PyTorch 2.0). This is an important package, so I think we need to resolve this sooner rather than later. A few observations:

  • The cycle only exists on linux. The macos wheel does not have a dependency on triton at all.
  • An immediate solution is to patch wheel_installer.py here to omit the back-dependency from triton to torch:
if whl.name == "triton":
  sanitised_dependencies = [s for s in sanitised_dependencies if s != '"@pip_torch//:pkg"']
  sanitised_wheel_file_dependencies = [s for s in sanitised_wheel_file_dependencies if s != '"@pip_torch//:whl"' 

The patch can be maintained locally by changing your rules_python http_archive setup to a local_repository.

At a higher level, I'd like to learn more about how the Python ecosystem uses cyclic dependencies, and how we should translate that into a bazel target graph. A quick grep through the triton source code shows that the back-dependency to torch is not easily removed; it's imported from many places.

A couple questions for the OP @siddharthab:

Some pip wheels have cyclic dependencies which is valid Python, and pip handles them well.

Do you know if any PEP specifies what should happen in the presence of cyclic imports?

The proper solution for this would be to have more granular build units

It sounds like you have an idea for how this might be done. Could you be more specific? When a .py file does from torch import cuda, would that correspond to a BUILD dependency like @pip_torch//cuda:pkg?

My concern with the composite_wheels approach discussed above is that it leaks implementation details (the presence of a cyclic dep) into the BUILD-level API. For example, if you wanted to use the same bazel repo on macos and linux, you'd have to list the triton dep in requirements.in and use the torch-triton composite wheel everywhere, even though triton has no purpose on macos.

@arrdem
Copy link
Contributor

arrdem commented Apr 6, 2023

I'd argue that all of pip_parse and even Golang's equivalent repos incantation is "leaking" specifics of the package fetching implementation. You already have to specify your interpreter, setup.py env, the requirements and soforth in the pip_parse incantation. This seems entirely consistent with that pattern, although perhaps the config could/should come from a file rather than being directly written into the WORKSPACE.

Re: my patchset, still chasing our lawyers.

@siddharthab
Copy link
Contributor Author

siddharthab commented Apr 6, 2023

Do you know if any PEP specifies what should happen in the presence of cyclic imports?

A point to be noted here is that we are talking about cyclic dedendencies in wheels and not python packages. I did not find something specifically about cyclic imports in wheels, but it is to be expected from the specification of file contents in a wheel. It simply states that the root contains all files to be installed in purelib or platlib, which are usually both the site-packages directory. So any random subset of python packages can be made a wheel, and there can obviously be cyclic dependencies between these random subsets of packages. So it all comes down to pip and whether it wants to handle cyclic wheels. From what I could see, it forms a transitive closure of the requirements and "installs" them all, so cycles are OK. This is different from other packaging systems (e.g. for R) where dependencies need to be installed in order.

A note on circular Python package imports. While I could not find a PEP or reference doc talking about this specifically either, it is possible from the reference on the import mechanism, because an import statement does not really execute any code in the imported module (but import-from does). This can also be tested with a small example program, and there are several community discussions about this. The mechanism to allow circular imports is what allows submodules in a package to implicitly depend on __init__.py and for __init__.py to explicitly depend on the submodules.

It sounds like you have an idea for how this might be done. Could you be more specific? When a .py file does from torch import cuda, would that correspond to a BUILD dependency like @pip_torch//cuda:pkg?

Yes, I gave a little more thought to this since I posted the issue and I am leaning towards this solution as the best way forward. I also want to implement proper granularity in Gazelle's Python extension, for the benefit of test caching in our internal codebase. Please bear with me if I don't use the right terms or get something wrong because I am new to Python and its packaging system.

As per the spec, there are no restrictions on what file/directory names may exist in a wheel (only setup.py and setup.cfg are explicitly prohibited). All that the spec says is that all the files in a wheel are installed to the site-packages directory, with some notion of a separate data directory that is spread out.

  1. At the top-level, subdirectories can be regular or namespace packages, and .py files can be Python modules. It will be unusual to have files that are typically associated with a package, e.g. __init__.py. These files/directories also need to have some global uniqueness to their names to avoid conflict between wheels meant to be installed to the same location.
  2. There are two special directories - dist-info and data. Specifically, the data directory may have runtime dependencies used by the package. From the language in the spec, I think it will be unusual for a wheel to have non-Python files outside one of the package directories or the data directories.
  3. Unless a Python module depends on a file (Python or not Python) using paths derived from __file__, we can rely on Python's import mechanism to build a dependency graph between modules, and assume that all runtime dependencies are in the data directory.

In my ideal scenario, a Bazel repository initialized from a Python wheel is "spread" out as defined in the spec, and then we run Gazelle to build a finer dependency graph. Gazelle's Python extension currently does not do some of these things correctly.

  1. Python modules can have circular dependencies, and because all packages are modules, packages may also have circular dependencies. Although I would imagine that this is not very common outside the context of a single package.
  2. Within a single package, all modules implicitly depend on __init__.py and __init__.py may also depend on some of these modules. The transitive closure of the modules that __init__.py needs, forms the default py_library target of the package. All modules outside of this transitive closure get their own py_library target that depends on the default target. Note that these other targets can also have transitive closures of their own because of circular imports. We can have a Gazelle option to not decompose explicitly specified packages if it does not get them right by itself.
  3. Any non-Python files in a package may be part of a data filegroup and all modules in the package can depend on it. This filegroup can also include subdirectories if there are no Python files inside them.
  4. Initialization of a package depends on its parent package's initialization (see import mechanism for regular packages). So the default target of a package also depends on the parent's default target.
  5. Every target in a package may also depend on the data files in the wheel from which the package was obtained. But this is optional.

Short of running Gazelle inside an unpacked wheel (with or without my suggested changes), I think a composite wheel is the only way forward to keep everything functional (or you could just omit dependencies because you know you won't use that code). For example, if we just descended into the immediate top-level subdirectories inside a wheel and defined py_library targets there, we would still have cycles in the Bazel action graph. To verify, see that torch._inductor imports triton and triton imports torch, so having torch and triton trees as separate py_library targets won't be enough.

@siddharthab
Copy link
Contributor Author

Edited the last paragraph on my previous comment.

@groodt
Copy link
Collaborator

groodt commented Apr 6, 2023

Cyclic dependencies are invalid / unsupported in the rest of the PyPA ecosystem as well. It’s just that pip has historically not enforced it. Poetry, conda and others are a bit stricter.

The upstream issue with OpenAI/triton has been fixed triton-lang/triton#1374

@groodt groodt closed this as completed Apr 6, 2023
@siddharthab
Copy link
Contributor Author

In the issue you linked above, triton has moved torch to be a test dependency in its metadata file, but it still maintains imports to torch in its modules, and likewise, torch continues to maintain imports to triton in its modules.

Because this is valid Python, this will continue to work as long as the user installs both torch and triton, manually installing the other if needed. This means that Bazel users will also have to manually specify the other dependency in their BUILD file. This solution just shifts the burden to the users, which may be OK as a special case here. I suppose there are some other packages which do the same.

Thanks for looking into this everyone. I think I am OK with the solution (triton deleting the link in its metadata) for now.

@ccutch
Copy link

ccutch commented Apr 8, 2023

Is this being maintained or is this being ignored? It is clear that including both dependencies does not work and if this rules package is no longer being maintained what is the correct package?

@siddharthab
Copy link
Contributor Author

The idea here is that it will work once the torch and triton wheels have been updated to remove the cyclic dependency. I am not sure what the plan there is, we may have to wait a few more days.

@ccutch
Copy link

ccutch commented Apr 10, 2023

I see. I have found a solution in the mean time of running pip at runtime with

import pip

pip.main(['install', 'torch'])

@Ubehebe
Copy link

Ubehebe commented Jun 28, 2023

This issue should be reopened. The circular dependency is still present as of the latest PyTorch release (2.0.1), as I've documented upstream: triton-lang/triton#1374 (comment).

The patch I described in #1076 (comment) above still works, but carrying such a patch for months is not a good long-term solution.

I'm sympathetic to arguments that the Python ecosystem "shouldn't" support cyclic deps. But PyTorch is an important part of the Python ecosystem. rules_python should support it.

@chrislovecnm chrislovecnm reopened this Jun 28, 2023
@chrislovecnm
Copy link
Collaborator

I reopened this issue, but if I remember correctly, this is more a bazel problem and not a rules problem.

@groodt
Copy link
Collaborator

groodt commented Jul 19, 2023

This particular issue with torch and triton has a workaround: pytorch/pytorch#99622 (comment)

@chrislovecnm
Copy link
Collaborator

@alexeagle, @rickeylev et al., I wanted to verify that that bazel the binary does not support circular dependencies, and we should raise the issue there.

@groodt, do you mind summarizing how a user could work around this problem? It is something we should document instead of having a user dig through this issue.

@groodt
Copy link
Collaborator

groodt commented Jul 19, 2023

@chrislovecnm Yes, bazel certainly doesn't support cycles between targets. Here are the error messages that you will see:

ERROR: /home/zhongmingqu/.cache/bazel/_bazel_zhongmingqu/9dac6aec585d38dc753dcfb84402be22/external/pypi_3_8_triton/BUILD.bazel:22:11: in py_library rule @pypi_3_8_triton//:pkg: cycle in dependency graph:
.-> @pypi_3_8_triton//:pkg (4204372a7bb0ec55b382995a9796bae6a21218e5607b1f1ed1019218df840129)
|   @pypi_3_8_torch//:pkg (4204372a7bb0ec55b382995a9796bae6a21218e5607b1f1ed1019218df840129)
`-- @pypi_3_8_triton//:pkg (4204372a7bb0ec55b382995a9796bae6a21218e5607b1f1ed1019218df840129)
WARNING: errors encountered while analyzing target '@pypi_3_8_triton//:pkg': it will not be built
ERROR: /home/zhongmingqu/.cache/bazel/_bazel_zhongmingqu/9dac6aec585d38dc753dcfb84402be22/external/pypi_3_8_triton/BUILD.bazel:22:11: in py_library rule @pypi_3_8_triton//:pkg: cycle in dependency graph: target depends on an already-reported cycle
WARNING: errors encountered while analyzing target '@pypi_3_8_triton//:pkg': it will not be built

Notice the cycle in dependency graph: target depends on an already-reported cycle coming from bazel.

This can be reported to bazelbuild/bazel upstream, but in general, I very much doubt that adding support for cycles is something that can be easily supported in bazel itself. There is some discussion about cycles in the docs, but given that bazel is fundamentally built on DAGs, I'm not really convinced it would be easy to add or even a good idea.

In terms of docs, guidance and workarounds. Yes, we probably should add some user level docs for the questions and best-practices that we constantly get asked about.

There are some discussions that start around in this PR: #1166 (comment)

For this particular issue (and cycles in general), we can add patches support for generic patching, or, we can lean more into annotations and borrow the workaround posted in the conversation that I linked above.

@bruno-digitbio
Copy link

Chiming in that this is now affecting the latest version of Sphinx, another widely-used package.

I've got a minimal reproducing example and more details here:

sphinx-doc/sphinx#11567

@lbjcom
Copy link

lbjcom commented Aug 9, 2023

@bruno-digitbio
I resolved the cyclic dependency issue by following this comment.

@georgevreilly
Copy link

I worked around the Torch/Triton problem by manually patching the Torch wheel.

@OliverFM
Copy link
Contributor

@georgevreilly very simple follow up, but how does one install the manually generated wheel? I have not been able to figure out how to expose a local copy of the wheel to Bazel.
Can you share the last step of adding a bazel rule to include torch as an installed dependency?

@OliverFM
Copy link
Contributor

OliverFM commented Aug 14, 2023

Update: I managed to solve this issue following the this comment as mentioned above. I have run into some CUDA dependency issues, but the minimal repo that I link below solves the issue if you replace torch and triton with apache-airflow as mentioned in the original example.

https://github.com/OliverFM/pytorch_with_gazelle

That said, I would still be very interested in seeing the last steps @georgevreilly's solution as I think that in some cases it could work a lot better.

@georgevreilly
Copy link

We do some extremely bespoke things with installing Python packages, and we're currently in the process of switching over to rules_py because it fixes the overlong PYTHONPATH problem for us. I've barely used the standard rules_python mechanisms and I haven't tried installing the custom Torch package that way.

We do use Artifactory to manage a private Python repository and the package can be installed from there with pip install --index-url=... torch==2.0.1+stripe.2

aignas added a commit to aignas/rules_python that referenced this issue Aug 29, 2023
Before that the users had to rely on patching the actual wheel files and
uploading them as different versions to internal artifact stores if they
needed to modify the wheel dependencies. This is very common when
breaking dependency cycles in `pytorch` or `apache-airflow` packages.
With this feature we can support patching external PyPI dependencies via
unified patches passed into the `pip.whl_mods` extension and the legacy
`package_annotation` macro.

Fixes bazelbuild#1076.
aignas added a commit to aignas/rules_python that referenced this issue Aug 29, 2023
Before that the users had to rely on patching the actual wheel files and
uploading them as different versions to internal artifact stores if they
needed to modify the wheel dependencies. This is very common when
breaking dependency cycles in `pytorch` or `apache-airflow` packages.
With this feature we can support patching external PyPI dependencies via
unified patches passed into the `pip.whl_mods` extension and the legacy
`package_annotation` macro.

Fixes bazelbuild#1076.
@aignas
Copy link
Collaborator

aignas commented Aug 29, 2023

FYI I needed this to be able to use Airflow 2.6.3 in a bazel project as quite a lot of the providers have a dependency on airflow and airflow itself has a dependency on those providers.

I have made #1393 which proposes a potential API for bzlmod and legacy WORKSPACE users (see the examples changes in the PR) and I would love to hear opinions wether the patch support is useable in this case. It certainly work to solve my case and it does feel like a more scalable approach in terms of API maintenance.

@groodt and @rickeylev, I reused the package_annotation API because that was the path of least resistance, but I am open to taking other approaches.

@ph03
Copy link

ph03 commented Aug 29, 2023

I would love to hear opinions wehter the patch support is useable in this case. It certainly work to solve my case and it does feel like a more scalable approach in terms of API maintenance.

This is an awesome feature that will also become handy to resolve other issues that require fiddling with the wheel's contents (I had a couple of other issues that I had to address by updating the wheels themselves, and patching them would be so much easier 👍)

@siddharthab
Copy link
Contributor Author

X-posting from #1166 (comment) for increased visibility.

https://opencollective.com/bazel-python-dep-cycles

@groodt
Copy link
Collaborator

groodt commented Sep 7, 2023

and patching them would be so much easier 👍)

Yes. Totally. Our fork at work has very basic patching functionality. It's much more pleasant and flexible solution and broadly is understandable by most people using bazel because it's quite a familiar API used for batching external repositories and http_archives etc.

We've got a few different options in the works. I think some form of patching will land at some stage.

For anyone looking for simple workarounds right now, this is a very easy option for you: #1166 (comment)

aignas added a commit to aignas/rules_python that referenced this issue Sep 8, 2023
Before that the users had to rely on patching the actual wheel files and
uploading them as different versions to internal artifact stores if they
needed to modify the wheel dependencies. This is very common when
breaking dependency cycles in `pytorch` or `apache-airflow` packages.
With this feature we can support patching external PyPI dependencies via
unified patches passed into the `pip.whl_mods` extension and the legacy
`package_annotation` macro.

Fixes bazelbuild#1076.
aignas added a commit to aignas/rules_python that referenced this issue Sep 25, 2023
Before that the users had to rely on patching the actual wheel files and
uploading them as different versions to internal artifact stores if they
needed to modify the wheel dependencies. This is very common when
breaking dependency cycles in `pytorch` or `apache-airflow` packages.
With this feature we can support patching external PyPI dependencies via
unified patches passed into the `pip.whl_mods` extension and the legacy
`package_annotation` macro.

Fixes bazelbuild#1076.

Add a non-empty patch and show that there can be multiple patches

exp: A different design, that does not require us to put patches to annotations.json

Simplify the design and add extra notes in the implementation

fix: make the legacy WORKSPACE patching compatible with bazel 5.4

chore: update docs

refactor: s/module_override/whl_override

doc: update changelog

doc: improve documentation and code comments on the new features

s/whl_override/whl_library_override/g

refactor wheel installer

feat: support whl_overriding before extraction

Add a note

doc: update changelog

fixup: set better default values for patches

chore: add wheel_repackager.py to the list of pysrcs

fix rebase conflicts

fix: use better defaults for annotations

fixup: update docs

fixup: minor tidy up

add a comment on annotation support for bzlmod

refactor: whl patching to a separate function

finish cleaning up handling of whl_patches for python annotations

Add an empty patch to the pip_repository_annotations example

Move patch argument processing to a single place, next to annotations

Improve the script and add logging

feat!: remove legacy bzlmod patching example

doc: remove changelog entry for legacy patching

feat!: remove the patching support via pip annotations

feat!: remove support for whl_library patching for now
aignas added a commit to aignas/rules_python that referenced this issue Sep 25, 2023
Before that the users had to rely on patching the actual wheel files and
uploading them as different versions to internal artifact stores if they
needed to modify the wheel dependencies. This is very common when
breaking dependency cycles in `pytorch` or `apache-airflow` packages.
With this feature we can support patching external PyPI dependencies via
unified patches passed into the `pip.whl_mods` extension and the legacy
`package_annotation` macro.

Fixes bazelbuild#1076.

Add a non-empty patch and show that there can be multiple patches

exp: A different design, that does not require us to put patches to annotations.json

Simplify the design and add extra notes in the implementation

fix: make the legacy WORKSPACE patching compatible with bazel 5.4

chore: update docs

refactor: s/module_override/whl_override

doc: update changelog

doc: improve documentation and code comments on the new features

s/whl_override/whl_library_override/g

refactor wheel installer

feat: support whl_overriding before extraction

Add a note

doc: update changelog

fixup: set better default values for patches

chore: add wheel_repackager.py to the list of pysrcs

fix rebase conflicts

fix: use better defaults for annotations

fixup: update docs

fixup: minor tidy up

add a comment on annotation support for bzlmod

refactor: whl patching to a separate function

finish cleaning up handling of whl_patches for python annotations

Add an empty patch to the pip_repository_annotations example

Move patch argument processing to a single place, next to annotations

Improve the script and add logging

feat!: remove legacy bzlmod patching example

doc: remove changelog entry for legacy patching

feat!: remove the patching support via pip annotations

feat!: remove support for whl_library patching for now
aignas added a commit to aignas/rules_python that referenced this issue Sep 27, 2023
Before that the users had to rely on patching the actual wheel files and
uploading them as different versions to internal artifact stores if they
needed to modify the wheel dependencies. This is very common when
breaking dependency cycles in `pytorch` or `apache-airflow` packages.
With this feature we can support patching external PyPI dependencies via
unified patches passed into the `pip.whl_mods` extension and the legacy
`package_annotation` macro.

Fixes bazelbuild#1076.

Add a non-empty patch and show that there can be multiple patches

exp: A different design, that does not require us to put patches to annotations.json

Simplify the design and add extra notes in the implementation

fix: make the legacy WORKSPACE patching compatible with bazel 5.4

chore: update docs

refactor: s/module_override/whl_override

doc: update changelog

doc: improve documentation and code comments on the new features

s/whl_override/whl_library_override/g

refactor wheel installer

feat: support whl_overriding before extraction

Add a note

doc: update changelog

fixup: set better default values for patches

chore: add wheel_repackager.py to the list of pysrcs

fix rebase conflicts

fix: use better defaults for annotations

fixup: update docs

fixup: minor tidy up

add a comment on annotation support for bzlmod

refactor: whl patching to a separate function

finish cleaning up handling of whl_patches for python annotations

Add an empty patch to the pip_repository_annotations example

Move patch argument processing to a single place, next to annotations

Improve the script and add logging

feat!: remove legacy bzlmod patching example

doc: remove changelog entry for legacy patching

feat!: remove the patching support via pip annotations

feat!: remove support for whl_library patching for now
aignas added a commit to aignas/rules_python that referenced this issue Oct 12, 2023
This class is for being able to more easily recreate a wheel file after
extracting it. This is not intended for usage outside the rules_python
project.

Towards bazelbuild#1076
github-merge-queue bot pushed a commit that referenced this issue Oct 17, 2023
…parate executions (#1487)

Before the PR the downloading/building of the wheel and the extraction
would be done as a single step, which meant that for patching of the
wheel to happen, we would need to do it within the python script. In
order to have more flexibility in the approach, this PR splits the
process to two separate invocations of the wheel_installer, which
incidentally also helps in a case where the downloading of the wheel
file can happen separately via http_file.

Related issues #1076, #1357
aignas added a commit to aignas/rules_python that referenced this issue Oct 17, 2023
This class is for being able to more easily recreate a wheel file after
extracting it. This is not intended for usage outside the rules_python
project.

Towards bazelbuild#1076
github-merge-queue bot pushed a commit that referenced this issue Oct 17, 2023
… RECORD (#1488)

This class is for being able to more easily recreate a wheel file after
extracting it. This is not intended for usage outside the rules_python
project. Also stop sorting the entries when writing a RECORD file making
the order of the RECORD file to be the same as the order the files to
the zip file are added.

Towards #1076
github-merge-queue bot pushed a commit that referenced this issue Oct 20, 2023
Before that the users had to rely on patching the actual wheel files and
uploading them as different versions to internal artifact stores if they
needed to modify the wheel dependencies. This is very common when
breaking dependency cycles in `pytorch` or `apache-airflow` packages.
With this feature we can support patching external PyPI dependencies via
pip.override tag class to fix package dependencies and/or a broken
`RECORD` metadata file.

Overall design:
* Split the `whl_installer` CLI into two parts - downloading and
extracting.
  Merged in #1487.
* Add a starlark function which extracts the downloaded wheel applies
patches
  and repackages a wheel (so that the extraction part works as before).
* Add a `override` tag_class to the `pip` extension and allow users to
pass patches
  to be applied to specific wheel files.
* Only the root module is allowed to apply patches. This is to avoid far
away modules
modifying the code of other modules and conflicts between modules and
their patches.

Patches have to be in `unified-diff` format.

Related #1076, #1166, #1120
@groodt
Copy link
Collaborator

groodt commented Oct 25, 2023

Now that #1393 has landed, I'm going to close this issue because installation cycles can be patched out in userland. We'll keep monitoring the situation and see how frequently any packages beyond torch==2.0.0 have cycles and whether the patching solution is adequate.

@groodt groodt closed this as completed Oct 25, 2023
github-merge-queue bot pushed a commit that referenced this issue Nov 28, 2023
This patch reworks the `pip_repository` machinery to allow users to
manually annotate groups of libraries which form packaging cycles in
PyPi and must be simultaneously installed.

The strategy here is to transform any dependencies `A` and `B` which
have dependencies and are mutually dependent

```mermaid
graph LR;
    A-->B;
    A-->D;
    A-->E;
    B-->A;
    B-->F;
    B-->G;
```

into a new "dependency group" `C` which has `A*` and `B*` as
dependencies, defined as `A` and `B` less any direct dependencies which
are members of the group. This is viable _for python_ because Python
files just need to be emplaced into a runfiles directory for the
interpreter. We don't actually have a true hard dependency between the
build definition of `A` requiring the build product `B` be available
which requires that the build product of `A` be available.

```mermaid
graph LR
     C-->A*;
     A*-->D;
     A*-->E;
     C-->B*;
     B*-->F;
     B*-->G;
```
This gets us most of the way there, as a user can now safely write
`requirement("A")` and we can provide them with `C`, which has the
desired effect of pulling in `A`, `B` and their respective transitives.

There is one remaining problem - a user writing `deps =
[requirement("A"), requirement("B")]` will take a double direct
dependency on `C`. So we need to insert a layer of indirection,
generating `C_A` and `C_B` which serve only as unique aliases for `C` so
that we can support the double dependency. Our final dependency graph
then is as follows

```mermaid
graph LR
     C_A-->C;
     C_B-->C;
     C-->A*;
     A*-->D;
     A*-->E;
     C-->B*;
     B*-->F;
     B*-->G;
```

Addresses #1076, #1188

## To do
- [x] Get rebased
- [x] Get re-validated manually
- [x] Buildifier
- [x] Get CI happy
- [x] Update documentation
- [x] Update changelog

---------

Co-authored-by: Ignas Anikevicius <240938+aignas@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests