Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The API Reference documents too few modules #1854

Closed
EliahKagan opened this issue Mar 2, 2024 · 2 comments · Fixed by #1855
Closed

The API Reference documents too few modules #1854

EliahKagan opened this issue Mar 2, 2024 · 2 comments · Fixed by #1855

Comments

@EliahKagan
Copy link
Contributor

EliahKagan commented Mar 2, 2024

The API Reference in the Sphinx documentation is missing some modules. But I am unsure exactly which ones should be added. There seem to be three cases.

At least one omission should be corrected:

  1. The top-level git module is not listed, and while its __version__ attribute is listed explicitly, git.refresh is not, so the Git.refresh method that should only be called directly in rare, unusual cases has its documentation included in the reference page, while the git.refresh function that is much more reasonable to call is omitted.

But for some of the other modules that aren't listed there, it seems intentional. These appear to divide into two more cases:

  1. Modules that are packages where everything interesting is defined in submodules.

  2. Modules where it may make sense to avoid creating the appearance that users of GitPython should ever directly use anything from them.

These are all the modules of GitPython that are not listed in doc/source/reference.rst, such that they do not have their public symbols automatically collected, and presented with docstrings, by Sphinx autodoc:

git
git.compat
git.db
git.index
git.objects
git.objects.submodule
git.refs
git.repo
git.types

Case 1, which should be included at least to get git.refresh and its docstring, is of course:

git

Case 2, where maybe they shouldn't be included because nothing of interest is directly contained in them, are:

git.index
git.objects
git.objects.submodule
git.refs
git.repo

Case 3, where maybe they shouldn't be included, or shouldn't be included without making clear in their module docstrings that they should not usually be used, are:

git.compat
git.db
git.types

Regarding this third case, symbols in these modules are referenced in docstrings or type annotations. More of these references are references from a technical Sphinx perspective than before, as of #1850, but it was always necessary to be able to understand what was in them, to understand what some names referred to and their significance.

Therefore, for this third case, even if I am correct in my belief that they were omitted to discourage direct use, it seems to me that it may be best include them, with suitable guidance added in module docstrings (which autodoc will ensure is rendered at the top of the relevant section).

I am unsure what should be done here, and unsure if all three of those modules of the third kind are really in the same situation. But even if they are and the approach I suggest is to be taken, I am unsure how strongly their use should be discouraged, and if a reason should be given (and if so, what reasons). This is what I know about them, that is related to this:

  • git.compat is to some extent a relic of when Python 2 was supported, and besides that it is mostly used within GitPython to deal with encoding, yet it may no longer be needed if/when GitPython's handling of strings and bytes is improved in the future, and it may not be useful for use in codebases that handle such things correctly from the start.
  • git.db is related to the gitdb library, and one of the symbols is includes in __all__ is GitDB, whose use is today discouraged. The other symbol is GitCmdObjectDB. That might be seen today as a workaround for the way GitPython expects to receive an injected object database dependency with a GitDB-like interface. However, GitCmdObjectDB appears in various places and it's useful to understand it to understand the significance of it being the default. It seems to me that the git.db module should not only be included in documentation, but that no special effort should be made to discourage its use (though the GitDB could be clarified as not recommended).
  • git.types provides types used throughout GitPython, and the assume_never function. The types do not have docstrings. However, it may still be useful to list them so references in Sphinx documentation to them are rendered as links and so it is clear they really are part of GitPython rather than external. In addition, some of the types are TypedDicts, and I think autodoc will list the members. (If what ought to be done hinges on that, then I can to an experiment first to check; you can let me know.)

My sense is that git.db and git.types should be promoted to have the same status as most modules; while git.compat should either be similarly promoted, or a discouragement to use it added to its docstring, or both.

These are all part of GitPython's public interface: they are named as conventionally public names, and while the documentation currently omits pages for them, it nowhere characterizes them as private or states that they should not be used in code outside GitPython. So to avoid breaking backward compatibility, they should remain public.

But public use of them (use outside GitPython) could be deprecated if appropriate. As far as I can tell, that should either be done in git.compat only, or nowhere. If it is to be done for git.compat, then it could be accompanied by renaming it to git._compat and having uses in GitPython use that name, while git.compat would re-export its public names (and could also emit DeprecationWarning).

@EliahKagan EliahKagan changed the title Too few modules are included in reference.rst The API Reference documents too few modules Mar 2, 2024
EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Mar 2, 2024
EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Mar 2, 2024
@Byron
Copy link
Member

Byron commented Mar 2, 2024

Thanks so much, and again, for the analysis!

I perfectly agree with the assessment and generally think that everything that is part of the public interface (i.e. not prefixed with _) should also be documented and available from the generated documentation.

git.compat is a bit of a special case and in order to avoid any chance for downstream breakage, I'd prefer to keep it as is while assuring that people know that they probably don't want to use it or its content.

git.types contains types that are used, but are missing docstrings, and I'd hope these could be added to make them seem more legitimate. Otherwise it would seems these are obscure intentionally, which probably shouldn't be the case anymore.

EliahKagan added a commit to EliahKagan/GitPython that referenced this issue Mar 4, 2024
To make clear that code outside GitPython would not typically
benefit from using anything in that module.

See gitpython-developers#1854 for context.
@EliahKagan
Copy link
Contributor Author

EliahKagan commented Mar 4, 2024

I perfectly agree with the assessment and generally think that everything that is part of the public interface (i.e. not prefixed with _) should also be documented and available from the generated documentation.

Names not prefixed with _, or those that are public by the combination of applicable guidelines including the value of __all__? Sphinx does the latter automatically, which I think is what we want (aside from "protected" things, per #1848), but I want to be sure.

For modules that contain __all__ and introduce functions, classes, and variables whose names do not start with _ but are also not listed in __all__, such names are not usually regarded to be public, though PEP-8 does still recommend giving such names leading underscores.

GitPython seems to deliberately introduce some such names, especially in git.util but also elsewhere, that are meant to be "internal," i.e., not conceptually private to the module, but intended only for consumption in other GitPython code (i.e., private to GitPython).

On the one hand, this is not a practice that I would recommend doing in a new Python codebase. Python's convention-based access/visibility, which the language does not enforce and which is not always feasible to check precisely with static analysis either, is not well-suited to the notion of "internal" or "package private" that are part of some languages' semantics. Instead, in new projects, a module can itself be named with a leading _, in which case it is conceptually nonpublic unless documented otherwise, and if it must introduce members that are meant to be public, those can be imported by a similarly named but public module (e.g., util for _util) and "reexported" by listing them in its __all__.

However, although that technique is usable even in projects with existing APIs that have to be maintained, I don't advocate that it be rushed into for GitPython, where it would lead to a proliferation of new _-named modules. It could perhaps be gradually adopted, if at all. See also the parenthesized paragraph mentioning this technique in 9e5d0aa (#1751).

I think it is reasonable, at least for now, for functions originally (or at some point) intended to be "internal" to stay that way, staying out of their modules' __all__. Of course, this is only applicable when __all__ already existed (and didn't list them). Otherwise their naming conventions have made them public even if that was less than fully deliberate and even if documentation for them was not emitted (unless they were somewhere documented as non-public).

However, another possible approach is to just make them all public, or consider them so. I do not advocate this--I am advocating against it, at least at this time--but I can see an argument for it, in that some developers writing software that uses GitPython may have erroneously assumed they are suitable for use based on them appearing as options for code completion in editors/IDEs, and because there are or at least have been some cases such as #1719 where more non-underscore names than conventionally public had been intended as public.

git.compat is a bit of a special case and in order to avoid any chance for downstream breakage, I'd prefer to keep it as is while assuring that people know that they probably don't want to use it or its content.

git.types contains types that are used, but are missing docstrings, and I'd hope these could be added to make them seem more legitimate. Otherwise it would seems these are obscure intentionally, which probably shouldn't be the case anymore.

I've included such changes in #1859 (though for git.types.Commit_ish, further revision may be needed, due to #1858).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

2 participants