Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We appear to import the submodule version of gitdb (but never do) #1717

Closed
EliahKagan opened this issue Oct 20, 2023 · 2 comments · Fixed by #1720
Closed

We appear to import the submodule version of gitdb (but never do) #1717

EliahKagan opened this issue Oct 20, 2023 · 2 comments · Fixed by #1720

Comments

@EliahKagan
Copy link
Contributor

EliahKagan commented Oct 20, 2023

Summary

git/__init__.py imports exceptions indirectly from gitdb before it adjusts sys.path, causing the code from the gitdb git-submodule not to be used under any circumstances.

Although this could probably be fixed by moving gitdb.exc imports below the _init_externals call, I recommend against it, because this is an opportunity to further decrease the project's use and dependence on its git-submodules. This is to say that I suggest viewing this as a bug due to the code appearing to do something it doesn't, rather than as a bug due to the code not doing something it should. Even without removing the git-submodule--which GitPython's tests need--the logic to insert it in sys.path could simply be removed.

Details

The unit tests use the direct gitdb git-submodule as well as the indirect smmap git-submodule through it (and for this reason the git-submodules cannot be immediately removed). However, git/__init__.py also appears to use them by cauisng gitdb to be imported from the gitdb git-submodule under some circumstances:

__version__ = "git"
# { Initialization
def _init_externals() -> None:
"""Initialize external projects by putting them into the path"""
if __version__ == "git" and "PYOXIDIZER" not in os.environ:
sys.path.insert(1, osp.join(osp.dirname(__file__), "ext", "gitdb"))
try:
import gitdb
except ImportError as e:
raise ImportError("'gitdb' could not be found in your PYTHONPATH") from e
# END verify import
# } END initialization
#################
_init_externals()
#################

The intent is that, then, importing Python modules some of which contain gitdb imports should use the gitdb Python module from the gitdb git-submodule. Furthermore, it is intended that this happen even if the gitdb package is installed, as evidenced by how the git/ext/gitdb directory is inserted near the beginning of sys.path. The idea is that the git-submodule version of gitdb would be used in these immediately subsequent imports:

# { Imports
try:
from git.config import GitConfigParser # @NoMove @IgnorePep8
from git.objects import * # @NoMove @IgnorePep8
from git.refs import * # @NoMove @IgnorePep8
from git.diff import * # @NoMove @IgnorePep8
from git.db import * # @NoMove @IgnorePep8
from git.cmd import Git # @NoMove @IgnorePep8
from git.repo import Repo # @NoMove @IgnorePep8
from git.remote import * # @NoMove @IgnorePep8
from git.index import * # @NoMove @IgnorePep8
from git.util import ( # @NoMove @IgnorePep8
LockFile,
BlockingLockFile,
Stats,
Actor,
rmtree,
)
except GitError as _exc:
raise ImportError("%s: %s" % (_exc.__class__.__name__, _exc)) from _exc
# } END imports

However, that does not happen. The code from the gitdb git-submodule is actually never used, because before any of that, we import git.exc:

from git.exc import * # @NoMove @IgnorePep8

git.exc imports from the gitdb Python module:

GitPython/git/exc.py

Lines 8 to 18 in 44102f3

from gitdb.exc import ( # noqa: @UnusedImport
AmbiguousObjectName,
BadName,
BadObject,
BadObjectType,
InvalidDBRoot,
ODBError,
ParseError,
UnsupportedOperation,
to_hex_sha,
)

Although that code was touched in #1659, the change there did not affect this behavior in any way. Furthermore, the imports in git.exc from gitdb.exc are not the only imports in git.exc that cause gitdb to be imported. For example, git.exc also imports from git.util, which imports from gitdb.util.

Because the from git.exc import * imports gitdb before sys.path is modified, the installed gitdb dependency is used, rather than the git-submodule version of gitdb. Subsequent imports reuse the already imported gitdb module. That the installed gitdb is being used can be verified in a REPL by importing git and then gitdb and checking gitdb.__file__:

>>> import git
>>> import gitdb
>>> gitdb.__file__
'/home/ek/repos-wsl/GitPython/.venv/lib/python3.12/site-packages/gitdb/__init__.py'

It is using gitdb from the environment's site-packages directory, which is the installed version. This alternative demonstration (in a new REPL) is no more robust, but it is perhaps more compelling since no code outside of GitPython ever imports gitdb:

>>> import git
>>> import sys
>>> sys.modules["gitdb"]
<module 'gitdb' from '/home/ek/repos-wsl/GitPython/.venv/lib/python3.12/site-packages/gitdb/__init__.py'>

sys.path was modified, just too late to have had an effect. Checking it in the same REPL after importing git reveals:

>>> sys.path
['', '/home/ek/repos-wsl/GitPython/git/ext/gitdb', '/usr/lib/python312.zip', '/usr/lib/python3.12', '/usr/lib/python3.12/lib-dynload', '/home/ek/repos-wsl/GitPython/.venv/lib/python3.12/site-packages']

Possible solutions

Restoring the behavior of using the git-submodule dependency (not my recommendation)

Because the actual change to sys.path does take effect, if it were added before anything were imported from gitdb then it would have worked. This experiment, done in a new REPL, verifies this:

>>> import sys
>>> sys.path.insert(1, "/home/ek/repos-wsl/GitPython/git/ext/gitdb")
>>> import git
>>> sys.modules["gitdb"]
<module 'gitdb' from '/home/ek/repos-wsl/GitPython/git/ext/gitdb/gitdb/__init__.py'>

It was probably working prior to 6752fad, which added that import from gitdb.exc. So moving the from git.exc import * line below the _init_externals() call would probably be sufficient to cause the gitdb git-submodule to be used, and used under the exact circumstances under which it was intended, and currently wrongly appears, to be used.

But I recommend against that. 6752fad was part of #1229, which was merged well over two years ago. At least if this issue has not been discovered before now, then importing gitdb from the git-submodule (which is relevant, after all, only to people developing GitPython) is not something anyone is relying on as the default behavior.

Restoring that behavior but in a weakened form (not my recommendation)

gitdb contains similar code, for its smmap dependency. That code in gitdb does work, because there is no direct or indirect import of smmap before it runs. However, its logic is different. Rather than inserting the smmap git-submodule directory near the front of the path as GitPython does...

    if __version__ == "git" and "PYOXIDIZER" not in os.environ:
        sys.path.insert(1, osp.join(osp.dirname(__file__), "ext", "gitdb"))

...it instead inserts it at the end, where a package installed any other way would take precedence:

    if 'PYOXIDIZER' not in os.environ:
        where = os.path.join(os.path.dirname(__file__), 'ext', 'smmap')
        if os.path.exists(where):
            sys.path.append(where)

So we could move the import of exception types below the _init_externals call, but also weaken the logic in _init_externals to match the behavior of the corresponding logic in gitdb.

But I recommend against this, too. I'd much rather do the simpler and, in my opinion, more useful thing of eliminating this logic altogether. I think this is the best approach even if no corresponding change is made in gitdb, but that the same thing should be done in gitdb, and for the same reason.

Removing the _init_externals logic and documenting editable installation of dependencies

I recommend _init_externals be removed, and in the infrequent (but plausible) case that one wants local changes to the gitdb git-submodule to be used when the local GitPython code is run, one can install the gitdb dependency by making an editable install with the git-submodule path, rather than allowing pip to automatically install gitdb from PyPI. (See related discussion in gitdb#90 and smmap#51.)

That is, pass -e git/ext/gitdb in a pip install command. For example:

ek@Glub:~/repos-wsl/GitPython (main $=)$ python3.12 -m venv .venv
ek@Glub:~/repos-wsl/GitPython (main $=)$ . .venv/bin/activate
(.venv) ek@Glub:~/repos-wsl/GitPython (main $=)$ python -m pip install -U pip wheel
  ... (output omitted for brevity) ...
(.venv) ek@Glub:~/repos-wsl/GitPython (main $=)$ pip install -e . -e git/ext/gitdb
Obtaining file:///home/ek/repos-wsl/GitPython
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... done
  Installing backend dependencies ... done
  Preparing editable metadata (pyproject.toml) ... done
Obtaining file:///home/ek/repos-wsl/GitPython/git/ext/gitdb
  Installing build dependencies ... done
  Checking if build backend supports build_editable ... done
  Getting requirements to build editable ... done
  Preparing editable metadata (pyproject.toml) ... done
Collecting smmap<6,>=3.0.1 (from gitdb==4.0.10)
  Using cached smmap-5.0.1-py3-none-any.whl.metadata (4.3 kB)
Using cached smmap-5.0.1-py3-none-any.whl (24 kB)
Building wheels for collected packages: GitPython, gitdb
  Building editable for GitPython (pyproject.toml) ... done
  Created wheel for GitPython: filename=GitPython-3.1.40-0.editable-py3-none-any.whl size=9794 sha256=3d31f9b0a44f58d64b73fa78ec5d681d1265def380a44b7725ebc8e887e78782
  Stored in directory: /tmp/pip-ephem-wheel-cache-sy2luyhg/wheels/27/35/44/18948e9bc28191247cebeb25bb595b99e7c611bdd848ca7b24
  Building editable for gitdb (pyproject.toml) ... done
  Created wheel for gitdb: filename=gitdb-4.0.10-0.editable-py3-none-any.whl size=4306 sha256=6a0fba25581d6f08ea1c3396a87cc87f31e168f27a40a39c55938d3026cac811
  Stored in directory: /tmp/pip-ephem-wheel-cache-sy2luyhg/wheels/33/53/45/0fe45259258a26e9d0b60a71ea239d1a2e7d01a8eaadee912e
Successfully built GitPython gitdb
Installing collected packages: smmap, gitdb, GitPython
Successfully installed GitPython-3.1.40 gitdb-4.0.10 smmap-5.0.1
(.venv) ek@Glub:~/repos-wsl/GitPython (main $=)$ python
Python 3.12.0 (main, Oct  2 2023, 15:04:50) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import gitdb
>>> gitdb.__file__
'/home/ek/repos-wsl/GitPython/git/ext/gitdb/gitdb/__init__.py'

More likely than pip install -e . -e git/ext/gitdb is that one would want to use pip install -e ".[test]" -e git/ext/gitdb, which I have also verified works. But I've shown the simpler command because the optional dependencies from the test extra make the output quite long.

This can be done in a separate pip install command, either before or after . is installed:

  • If pip install -e git/ext/gitdb is run first, then pip install -e . or pip install -e .[test] will use the editably installed git-submodule version of gitdb, because the dependency is satisfied already (at least unless GitPython declares a version range for the gitdb dependency that does not include the version in the git-submodule, which I have not tested).
  • If pip install -e git/ext/gitdb is run second, then because the editable install from that path was given explicitly, pip will uninstall the version from PyPI (that pip install -e . installed to satisfy the dependency) and perform the editable install.

So it works in one or two commands and in any order.

Of course, if one also wants an editable install of smmap from the recursive submodule to be used, then one must tell pip to do an editable install from its path as well.

Having developers who want the git-submodule versions of gitdb and/or smmap to be used just install them editably has the additional advantage that static type checkers, at least if installed in the same environment or otherwise aware of what is installed in it (which they should be), could know that the git-submodule version is being used when it is. (This relates to the topic of #1716, though there is no order dependency for fixing these issues.)

In conclusion...

I recommend that _init_externals be removed, imports be regrouped and reordered in a way that makes it easier to understand and edit them, and the technique of making editable installs of gitdb and/or smmap dependencies from git-submodules be documented somewhere in GitPython's documentation as something that may occasionally be useful when working on GitPython and its dependencies together.

Since this would still not commonly be done, I'm not sure it really belongs in README.md, and perhaps it should only go in documentation in doc/. However, that documentation is out of date with respect to installation and development practices, so if this bug is to be fixed before the documentation in doc/ is revised and updated, then it may make sense to add the information about this to README.md, even if it will be moved out of README.md later. Alternatively, documentation could be omitted until then, or it could be noted in comments in git/__init__.py and not otherwise documented, or fixing this could be put off altogether until the doc/ documentation is revised and updated (though I would least prefer that option).

@Byron
Copy link
Member

Byron commented Oct 20, 2023

Thanks so much for uncovering this issue and for the analysis of possible solutions!

Somehow I thought that the _init_externals logic was already gone, as at least in my head I have settled for pip -e as well.

I hope you will feel comfortable enough to go ahead with your preferred solution, with documentation update or not, whatever feels best. Having this documented somewhere (i.e. in README right now) seems good enough. To me it's most important that the sys.path altering code goes away as affecting global interpreter settings as library certainly isn't anything I'd endorse today.

EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Oct 20, 2023
This removes the logic that, under some circumstances when not
using a version from PyPI, would insert the location of the gitdb
git-submodule near the front of sys.path.

(As noted in gitpython-developers#1717, the specific way this was being done was not
causing the git-submodule's version of gitdb to actually be used.
But it was still modifying sys.path, which this now prevents.)
EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Oct 20, 2023
This removes the logic that, under some circumstances when not
using a version from PyPI, would insert the location of the gitdb
git-submodule near the front of sys.path.

(As noted in gitpython-developers#1717, the specific way this was being done was not
causing the git-submodule's version of gitdb to actually be used.
But it was still modifying sys.path, which this now prevents.)

The installation test, which had verified the insertion into
sys.path, is modified accordingly, except that for now the check
that the very first sys.path entry is undisturbed is kept in place.
EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Oct 20, 2023
This removes the logic that, under some (most) circumstances when
not using a version from PyPI, would insert the location of the
gitdb git-submodule near the front of sys.path.

(As noted in gitpython-developers#1717, the specific way this was being done was not
causing the git-submodule's version of gitdb to actually be used.
But it was still modifying sys.path, which this now prevents.)

The installation test, which had verified the insertion into
sys.path, is modified accordingly, except that for now the check
that the very first sys.path entry is undisturbed is kept in place.
EliahKagan added a commit to EliahKagan/gitdb that referenced this issue Oct 20, 2023
This removes the logic that appended the git submodule directory
for smmap to sys.path under most circumstances when the version
of gitdb was not from PyPI. Now gitdb does not modify sys.path.

See gitpython-developers/GitPython#1717
and gitpython-developers/GitPython#1720 for
context. This change is roughly equivalent to the change to
GitPython, though as noted the behavior being eliminated is subtly
different here and there.
@EliahKagan
Copy link
Contributor Author

EliahKagan commented Oct 20, 2023

I hope you will feel comfortable enough to go ahead with your preferred solution

I've opened #1720 and gitdb#102 for this.

Having this documented somewhere (i.e. in README right now) seems good enough.

#1720 includes an addition to GitPython's README.md.

I haven't added anything about this to the gitdb documentation (in gitdb#102 or otherwise).

To me it's most important that the sys.path altering code goes away as affecting global interpreter settings as library certainly isn't anything I'd endorse today.

I agree, though fortunately this only happened when non-PyPI versions were used.

EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Oct 20, 2023
This removes the logic that, under some (most) circumstances when
not using a version from PyPI, would insert the location of the
gitdb git-submodule near the front of sys.path.

(As noted in gitpython-developers#1717, the specific way this was being done was not
causing the git-submodule's version of gitdb to actually be used.
But it was still modifying sys.path, which this now prevents.)

The installation test, which had verified the insertion into
sys.path, is modified accordingly, except that for now the check
that the very first sys.path entry is undisturbed is kept in place.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

2 participants