Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure project directory layout and exception hierarchy #271

Merged
merged 3 commits into from
Jul 30, 2021

Conversation

peternowee
Copy link
Member

This PR restructures the project directory layout and exception hierarchy to bring them in line with common practices and provide a better foundation for custom exceptions and logging. It follows up on discussions in #171 and #230.

See the individual commit messages and new ChangeLog entries for additional remarks. I won't copy them here.

Reviews/comments welcome. I won't merge it before 2021-07-26 anyway, later if necessary to resolve any comments.


@ankostis: In response to some of your comments of last year:

The Exceptions should be definitely defined in the package-module.

I now define them in exceptions.py and in __init__.py do from pydot.exceptions import *. That comes down to the same as what you meant, right?

It is tempting to copy-paste the whole pydot.py contents on the package-module
but i would recommend against it.

Hmm, I am not sure if I followed up this advice completely. Taken literally, at least I did not copy-paste everything into __init__.py. Only __author__, __version__ and __license__. However, I did rename pydot.py to core.py and then added from pydot.core import * to __init__.py, which basically comes down to importing "the whole pydot.py" after all. Not sure if you meant to advise against that as well, but I do not really see how we can avoid that without breaking things. See example below.

TIP1:
When a user imports a module further into a package, the package-module gets import
in its entirety - so if it is "heavy" it delays the import. That is important when
the different functionality is split in multiple submodules of the package.
Example:
A package contains 3 sub-modules: base.py, sphinx_extension.py
If base.py contains also code for irrelevant http-server communications,
whenever from <package> import sphinx_extension code is trun,
it has to pay the price for this never-to-be-used http-server functionality.

TIP2
The modules are split based on the 3rd-party imports. If a module is completely disparate,
it might become an "extras" in the setup.py.

I have been thinking about how to apply this to pydot, but run into objections regarding backwards compatibility and user-friendliness.

  • The obvious candidate to be split off from core is the parser-related code, which includes a dependency on PyParsing. Most of it is already in a separate module (dot_parser.py). There are only two functions in core.py that depend on dot_parser: graph_from_dot_data() and graph_from_dot_file(). If we would move those out of core.py and drop the import of dot_parser, the loading time for pydot would be cut by 46% (from 89 ms to 48 ms, as measured by python -X importtime -c "import pydot", Python 3.7.3, Celeron N3350, lowest of multiple runs). This reduction is mostly (98%) attributable to not importing pyparsing anymore.

    However, I am not very eager to remove parsing from the standard functionality offered by pydot. I mean, currently import pydot means you get access to all its functionality immediately. If we take the parsing functionality out, users would need to start thinking about how to import which part of pydot. To use the parsing functions they would need to do import pydot.dot_parser, from pydot import dot_parser or from pydot.dot_parser import graph_from_dot_data, graph_from_dot_file. This is not only a breaking change for existing scripts (we could at best smooth the transition by adding a deprecation warning in pydot 2.0 and postponing the removal from core to pydot 3.0), but it is also inconvenient for new users and new scripts. An additional import would be needed, different prefixes need to be used (e.g. pydot.dot_parser.graph_from_dot_data()) and in the interactive interpreter, using tab completion on pydot. would not show these functions anymore for easy discovery of what pydot offers.

    So, I was thinking if instead we could maybe just turn it around: Keep the parsing as default functionality of pydot, but offer users that are looking to reduce load time a way to manually import only the (core) modules they need. However, I am getting the impression that this is simply impossible, because __init__.py always gets executed, even if the user only imports a specific module inside the package. We need __init__.py to carry the parsing stuff if we want to offer that as default functionality of pydot, but then we do not want that to load when a user chooses to import only the core and exceptions modules for example. Those two wishes are simply incompatible with each other, right? Hope you understand what I am trying to say and please correct me if I am wrong.

    Alternatively, if making pydot lighter is the goal, we could just "lazy import" dot_parser by moving the import to inside the function graph_from_dot_data(). A lot can be said about this practice, see this SO question for example. I think we can get over the style objections, and as I understand the performance overhead of repeated import statements for an already imported module is minimal. However, a more important issue may be that any problems met during the import will only show up once the function is called. On the other hand, I assume parsing is one of the first steps in most user scripts, so any import problems should still show up early on. Hope to hear your thoughts. This can also be done separate of this PR.

  • One might also consider splitting off the functions related to calling Graphviz to a separate module. However, when it comes to offering users the option not to load that module, similar questions as for splitting off the parsing will come up. Calling Graphviz is the reason most people use pydot and the core pydot class Dot has methods Dot.create() and Dot.write() that depend heavily on those functions. Again, we could move the import to inside Dot.create(), but in this case that would really make import problems show up much too late, because users will likely call Graphviz only at the end of their scripts. Finally, the performance benefit is not as strong here: Not loading Graphviz output functions and not importing os and subprocess reduces pydot load time by around 6%.

Again, hope to hear your thoughts. In both cases, we can also decide about it later. No need to do everything in this PR already.

This commit brings the directory layout in line with current
recommendations by the Python Packaging Authority and others.

    Before                        After

    |-- setup.py                  |-- setup.py
    |                             |-- src/
    |                             |   `-- pydot/
    |                             |       |-- __init__.py
    |-- pydot.py                  |       |-- core.py
    |-- dot_parser.py             |       `-- dot_parser.py
    `-- test/                     `-- test/
        `-- pydot_unittest.py         `-- pydot_unittest.py

There are many opinions on what would be the best directory layout for
a Python project, particularly on whether or not to use an additional
`src/` layer. For example:

- https://blog.ionelmc.ro/2014/05/25/python-packaging/#the-structure
- pypa/packaging.python.org#320
- https://github.com/pypa/packaging.python.org/blob/7866cb69f1aa290a13c94da77cfb46a079cfcfa2/source/tutorials/packaging-projects.rst#a-simple-project
- https://stackoverflow.com/questions/193161/what-is-the-best-project-structure-for-a-python-application

Suffice to say, I can basically understand the arguments in favor of a
`src/` layout and do not find the arguments against it strong enough
not to use it.

Moved the dunder global variables (`__version__` etc.) to `__init__.py`
to ensure they remain available to the user doing `import pydot`. (They
would not be included in `from pydot.core import *` because their names
start with an underscore and I did not want to start keeping `__all__`
just for this.) A test for `pydot.__version__` is added to the
testsuite as well.

A question that remains open for the moment is whether we should reduce
the weight of pydot "core" by splitting off functionality, most notably
the parsing functionality. Although that would reduce pydot loading
time, it would also make it more difficult for users to discover and
use that functionality. For details see:
pydot#271 (comment)
For now, things are kept as-is, namely that `pydot` always imports
`dot_parser`. This is now accomplished via `__init__.py` and `core.py`.

Discussed in pydot#171, pydot#230 and pydot#271.

Special thanks to Kostis Anagnostopoulos (@ankostis) for his advice.
This commit removes a check from the test suite that was introduced by
commit 50bd730 ([pydot#164][164]). The check raised an error in
case the current working directory was the top of the source tree. The
purpose was to prevent testing the pydot sources from the source tree
instead of the installed pydot.

The risk of that inadvertently happening is considerably lower now that
the sources were moved to subdirectory `src/`. Also note that [Python
_running a script_ does not by default search the working directory][1]
anyway.

The removal of the check makes the `--no-check` option redundant.
However, we cannot remove it yet because that would cause an
`unrecognized arguments` error in existing scripts. `argparse` does not
support [deprecating CLI arguments][2] yet. So, instead we print a
deprecation warning. The remaining `parse_args()` and `import argparse`
can be removed in a later major release of pydot.

[1]: https://chrisyeh96.github.io/2017/08/08/definitive-guide-python-imports.html
[2]: https://bugs.python.org/issue39467
[164]: pydot#164
- Module structure following example from NetworkX and other projects.
  This adds an `import pydot` to `core.py`, which is cyclic/circular,
  but necessary to make the exceptions available. Cyclic imports are
  not uncommon and Python can handle this. Using the exceptions in
  pydot source code now requires prefixing `pydot.`, for example:

      raise pydot.Error("Message")

- Exception hierarchy changes. See ChangeLog and class docstrings in
  this commit for details. Additional background notes:

  - Removal of the unused exception class `InvocationException`: It was
    introduced in 842173c of 2008 (v1.0.2), but got in disuse when
    commits 9b3c1a1 and bc639e7 of 2016 (v1.2.0) fell back to using
    `Exception` and `assert` (`AssertionError`) again. If we ever need
    a custom class like this again in the future, it would probably
    have a different signature for context data (e.g. a DOT string,
    input and output), different parent class (or classes, e.g.
    `PydotException, CalledProcessError`) and perhaps a different name
    (e.g. ending in `...Error`), so no need to keep the old class
    around for that.

Further additions to the exception hierarchy are likely before the
final release of pydot 2.0.

Discussed in pydot#171, pydot#230 and pydot#271.
@peternowee peternowee changed the title [WIP] Restructure project directory layout and exception hierarchy Restructure project directory layout and exception hierarchy Jul 30, 2021
@peternowee
Copy link
Member Author

@ankostis I will merge this now already, because I want to keep things moving, but feel free to still comment on it later. Thank you for all your help and hope you are doing well.

@peternowee peternowee merged commit 90936e7 into pydot:master Jul 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant