Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception on parsing dot file #171

Open
goerz opened this issue Apr 26, 2018 · 20 comments
Open

Exception on parsing dot file #171

goerz opened this issue Apr 26, 2018 · 20 comments

Comments

@goerz
Copy link

goerz commented Apr 26, 2018

With structure.dot as

digraph qnet {
    rankdir="TB";
    graph [pad="0", ranksep="0.25", nodesep="0.25"];
    node [penwidth=0.5, height=0.25, color=black, shape=box, fontsize=10, fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans"];
    edge [penwidth=0.5, arrowsize=0.5];
    compound=true;

    subgraph cluster_qnet {
        fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans";
        fontsize=12;
        label="qnet";
        tooltip="qnet";

        subgraph cluster_algebra {
            toolbox          [target="_top"; tooltip="qnet.algebra.toolbox",          href="../API/qnet.algebra.toolbox.html"];
            library          [target="_top"; tooltip="qnet.algebra.library",          href="../API/qnet.algebra.library.html"];
            core             [target="_top"; tooltip="qnet.algebra.core",             href="../API/qnet.algebra.core.html";             width=1.3];
            pattern_matching [target="_top"; tooltip="qnet.algebra.pattern_matching", href="../API/qnet.algebra.pattern_matching.html"; width=1.3];
            toolbox -> library;
            toolbox -> core;
            toolbox -> pattern_matching;
            library -> core;
            core    -> pattern_matching [weight=0, minlen=0];
            library -> pattern_matching;
            href = "../API/qnet.algebra.html"; target="_top";
            label="algebra";
            tooltip="qnet.algebra";
            graph[style=filled; fillcolor="#EEEEEE"];
        }

        convert          [target="_top"; tooltip="qnet.convert",       href="../API/qnet.convert.html"];
        visualization    [target="_top"; tooltip="qnet.visualization", href="../API/qnet.visualization.html"];
        printing         [target="_top"; tooltip="qnet.printing",      href="../API/qnet.printing.html"];
        utils            [target="_top"; tooltip="qnet.utils",         href="../API/qnet.utils.html"; width=1];

        { rank=same; convert[width=0.8]; visualization[width=0.8]; printing[width=0.8]; }
        convert       -> visualization [minlen=3, style=invis];
        visualization -> printing      [minlen=3];
        visualization -> toolbox  [minlen=2, lhead=cluster_algebra];
        printing      -> toolbox  [lhead=cluster_algebra];
        convert       -> toolbox  [lhead=cluster_algebra];

        core             -> utils [ltail=cluster_algebra];
        pattern_matching -> utils [ltail=cluster_algebra];
        convert          -> utils [minlen=6];
        printing         -> utils [minlen=6];
    }

}

I get the following exception:

>>> import pydot
>>> pydot.graph_from_dot_file("structure.dot", encoding='utf-8')
Traceback (most recent call last):
  File "<ipython-input-5-d070a81b3bde>", line 1, in <module>
    pydot.graph_from_dot_file("structure.dot", encoding='utf-8')
  File "/Users/goerz/anaconda3/lib/python3.5/site-packages/pydot.py", line 238, in graph_from_dot_file
    graphs = graph_from_dot_data(s)
  File "/Users/goerz/anaconda3/lib/python3.5/site-packages/pydot.py", line 221, in graph_from_dot_data
    return dot_parser.parse_dot_data(s)
  File "/Users/goerz/anaconda3/lib/python3.5/site-packages/dot_parser.py", line 554, in parse_dot_data
    err)
TypeError: Can't convert 'ParseException' object to str implicitly

That exception is actually raised during the handling of another Exception:

ParseException: Expected "}" (at char 351), (line:8, col:31)

It doesn't seem to me that the file is an invalid dot file (and the graphviz command line utilities convert it fine)

@johnyf johnyf self-assigned this Apr 26, 2018
@johnyf johnyf added the bug label Apr 26, 2018
@johnyf
Copy link
Contributor

johnyf commented Apr 26, 2018

Thanks for reporting these errors. The ParseException appears to be caused by the semicolons that occur as separators within attribute lists. Replacing them with commas avoids the parsing error:

digraph qnet {
    rankdir="TB";
    graph [pad="0", ranksep="0.25", nodesep="0.25"];
    node [penwidth=0.5, height=0.25, color=black, shape=box, fontsize=10, fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans"];
    edge [penwidth=0.5, arrowsize=0.5];
    compound=true;

    subgraph cluster_qnet {
        fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans";
        fontsize=12;
        label="qnet";
        tooltip="qnet";

        subgraph cluster_algebra {
            toolbox          [target="_top", tooltip="qnet.algebra.toolbox",          href="../API/qnet.algebra.toolbox.html"];
            library          [target="_top", tooltip="qnet.algebra.library",          href="../API/qnet.algebra.library.html"];
            core             [target="_top", tooltip="qnet.algebra.core",             href="../API/qnet.algebra.core.html",             width=1.3];
            pattern_matching [target="_top", tooltip="qnet.algebra.pattern_matching", href="../API/qnet.algebra.pattern_matching.html", width=1.3];
            toolbox -> library;
            toolbox -> core;
            toolbox -> pattern_matching;
            library -> core;
            core    -> pattern_matching [weight=0, minlen=0];
            library -> pattern_matching;
            href = "../API/qnet.algebra.html"; target="_top";
            label="algebra";
            tooltip="qnet.algebra";
            graph[style=filled, fillcolor="#EEEEEE"];
        }

        convert          [target="_top", tooltip="qnet.convert",       href="../API/qnet.convert.html"];
        visualization    [target="_top", tooltip="qnet.visualization", href="../API/qnet.visualization.html"];
        printing         [target="_top", tooltip="qnet.printing",      href="../API/qnet.printing.html"];
        utils            [target="_top", tooltip="qnet.utils",         href="../API/qnet.utils.html", width=1];

        { rank=same; convert[width=0.8]; visualization[width=0.8]; printing[width=0.8]; }
        convert       -> visualization [minlen=3, style=invis];
        visualization -> printing      [minlen=3];
        visualization -> toolbox  [minlen=2, lhead=cluster_algebra];
        printing      -> toolbox  [lhead=cluster_algebra];
        convert       -> toolbox  [lhead=cluster_algebra];

        core             -> utils [ltail=cluster_algebra];
        pattern_matching -> utils [ltail=cluster_algebra];
        convert          -> utils [minlen=6];
        printing         -> utils [minlen=6];
    }

}

A minimal example that demonstrates this issue is:

graph {
    foo[bar; bar]
}

which raises foo[bar; bar] ^Expected "}" (at char 15), (line:2, col:8), as opposed to:

graph {
    foo[bar, bar]
}

which is parsed without errors.

The cause of the error appears to be the parsing rule that defines what an attribute list is:

https://github.com/erocarrera/pydot/blob/d6ac9e9244d1a882103422ac2b35ceef96f5dfe3/dot_parser.py#L473-L474

In particular, the expression comma.suppress() allows only commas as attribute separators, not semicolons. Looking at the DOT language grammar though, either commas or semicolons can occur as attribute separators:

a_list | : | ID '=' ID [ (';' | ',') ] [ a_list ]

Addressed in f504c8b and 2437ed3.

@johnyf johnyf added this to the 1.3.0 milestone Apr 30, 2018
@ankostis
Copy link

ankostis commented Apr 11, 2020

There is still a bug lurking here, in the code handling of the origina-exception,
where it attempts to concatenate exception with strings:

pydot/dot_parser.py

Lines 551 to 554 in 48ba231

print(
err.line +
" "*(err.column-1) + "^" +
err)

It should become:

    print("%s%s^%s" % (err.line, " "*(err.column-1), err))

@ankostis
Copy link

But it is better to let the exception bubble-up, or else, the user will receive a None dot.

ankostis added a commit to pygraphkit/graphtik that referenced this issue Apr 11, 2020
to let exception bubble-up.
@peternowee
Copy link
Member

@ankostis In my PR #219 I also worked on this part of the code. I will reply to your suggestions here:

First, replying to #171 (comment):

There is still a bug lurking here, in the code handling of the origina-exception,
where it attempts to concatenate exception with strings:

pydot/dot_parser.py

Lines 551 to 554 in 48ba231

print(
err.line +
" "*(err.column-1) + "^" +
err)

It should become:

    print("%s%s^%s" % (err.line, " "*(err.column-1), err))

This is an alternative solution for the TypeError error. In my solution, I changed err to str(err). I tested your solution, both in Python 2.7.16 and 3.7.3, and it also works: No more TypeError:

>>> import pydot; pydot.graph_from_dot_data("graph { n [label=<some problematic DOT string>>]; }")
graph { n [label=<some problematic DOT string>>]; }          ^Expected "}", found '['  (at char 10), (line:1, col:11)

I am still missing the newlines though. Aren't you? Perhaps some Linux/Windows difference? See https://github.com/pydot/pydot/pull/219/files#diff-baef597193866f900a2726a8a4667b12R556 for my solution which also adds newlines.

Second, replying to #171 (comment):

But it is better to let the exception bubble-up, or else, the user will receive a None dot.

and to #171 (reference) which points to your monkey patch in the pydot-using software pygraphkit/graphtik@3efd534 which patches out the pydot try... except block altogether:

Letting the exception bubble-up sounds reasonable, but perhaps this can also be accomplished by adding a raise statement at the end of the except-block, perhaps even doing exception chaining? I guess this means that we cannot return None anymore then. Perhaps some users are currently expecting and handling None, so that may be too big of a change for a point release, maybe even for a minor release.

Your patch also shows that you prefer pydot not to print details of the ParseException anymore. This touches on the general principle held by many that a library should not print anything at all. I ran into this issue elsewhere: pydot currently also prints a detailed error message when Graphviz dot returns an error code. I commented on that in #218 (comment). As I argued there, although I can understand the general notion that a library should not print, the practical problem is that not every user might immediately know what to do with an exception. The printing of error details by pydot allows end users to immediately start troubleshooting their DOT graph syntax rather than first having to learn how to dig into a ParseException or having to ask the maintainer of their pydot-using main software to add such exception-handling code. It may be possible to help the end users with this by other means, for example through good exception chaining with a pydot custom exception type that would lead the end user to some good pydot documentation on how to troubleshoot. But without such an alternative in place, I don't know if it is a good idea to remove the printing already. Perhaps use a "debug" or "verbose" flag as suggested by @prmtl in #218 (comment). We could make it a pydot-wide flag, so that it controls both the printing of ParseException and Graphviz dot error details, or two separate flags, so that users may implement their own handling step by step, one pydot component at a time. Maybe at first, we can let the flags default to True, so that default behavior remains unchanged (=printing) and it can be included in a minor release already without much risk. Then anyone who wants to handle these exceptions himself already, can start and set them to False manually. Pydot documentation can suggest downstream developers to start looking into this. Then at a later stage, once we get some idea on how well the change has been received and how else we can help end users troubleshoot exceptions, we can announce the default will change to False (=no more printing) in the next major release for example.

Hope you can let me know what you think. Thanks.

@ankostis
Copy link

ankostis commented Apr 12, 2020

Thank you for you thoughtful reponse.

The printing of error details by pydot allows end users to immediately start troubleshooting their DOT graph syntax rather than first having to learn how to dig into a ParseException or having to ask the maintainer of their pydot-using main software to add such exception-handling code.

The "end-users" of this library are supposed to be other developers, not final users, correct?
So in principal, it should be better to just raise the exception [edit: with all context included], and let the client code handle it.

Now there is a highly problematic point in Python srtandard library, when calling out other processes with [subprocess.Popen](https://docs.python.org/3/library/subprocess.html#popen-objects),
becuase Python tosses out error context (STDOUT, STDERR, cmd-line).
Specifically for such sub-process errors, I'm all for a custom exception class & exception-chaining.

Now, having written Py2/Py3 & Windows/POSIX compatible-code to collect & attach STDOUT, STDERR & cmd-line in a custom exception, i can tell, it is not an easy task
(due to byte-encoding issues in Py2, and OS-exception differences with Windows).

That way, this library would still raise self-evident exceptions, and avoid printing spurious error-messages to stdout that it's easy to miss.

(btw, shouldn't in print stderr instead, or better use logging standard module?)

@ankostis
Copy link

My commenta above expressed in this review: #219 (review)

@peternowee
Copy link
Member

peternowee commented Apr 12, 2020

The printing of error details by pydot allows end users to immediately start troubleshooting their DOT graph syntax rather than first having to learn how to dig into a ParseException or having to ask the maintainer of their pydot-using main software to add such exception-handling code.

The "end-users" of this library are supposed to be other developers, not final users, correct?

Hmm, not really, sorry. At least, I did not mean the developers that use pydot directly. For the problem I described, I was thinking of indirect pydot users, which could be final users, but also include developers in case there are multiple layers between pydot and the final end-user. I mean all the ones that are more than one layer below pydot. So, first, one layer below pydot, you have the developer that directly uses pydot to quickly write a library or program. We can expect him to learn how to troubleshoot pydot, but should we also expect him to add handling for all the possible exceptions that pydot may raise before he publishes his software? Do these automatically bubble-up if he does nothing? Won't he ruin the pydot exception by raising his own exception instead of chaining? Will he keep his handling code up-to-date with changes to pydot exception classes (I know, we should ensure backwards compatibility, but still)? Then, on all the layers below that, so two or more layers below pydot, are the indirect users of pydot. If pydot does not print, and somewhere in the chain of the libraries and programs between pydot and the final user, the exception that pydot raised is dropped, then these indirect users are in a bad position, developer or not. They do not know pydot yet and are suddenly confronted with an exception without any details.

OTOH, we could just say that this is the fault of those developers in-between and that we only support the direct users of pydot. When indirect users or end-users file support questions at pydot, we can always tell them to go ask the intermediary developers first. However, that may not always be easy for the end-user. And the triaging of those initial support requests at pydot also costs time.

I do not know how often this all occurs, so maybe it is a bit of a theoretical discussion, a bit too pessimistic, but I just thought we could try a middle way first, like that debug/verbose-flag with a default of True mentioned in my comment above, so that direct pydot-using developers can flip it to False themselves once they are confident about their own handling of pydot exceptions.

I think we can agree on how things should be done, but we cannot control the developers between pydot and the final user and therefore I am a bit weary of what it would lead to in practice. You seem to have more practical experience, so please tell me if you are more optimistic.

So in principal, it should be better to just raise the exception [edit: with all context included], and let the client code handle it.

In principle, yes, I have been thinking a lot about that as well, see #218 (comment) for my earlier attempt to start this discussion. I was awaiting a response from this project's maintainer @prmtl before writing more about the details. I think you and I are both thinking in the same direction.

Now there is a highly problematic point in Python srtandard library, when calling out other processes with [subprocess.Popen](https://docs.python.org/3/library/subprocess.html#popen-objects),
becuase Python tosses out error context (STDOUT, STDERR, cmd-line).
Specifically for such sub-process errors, I'm all for a custom exception class & exception-chaining.

Now, having written Py2/Py3 & Windows/POSIX compatible-code to collect & attach STDOUT, STDERR & cmd-line in a custom exception, i can tell, it is not an easy task
(due to byte-encoding issues in Py2, and OS-exception differences with Windows).

* GitPython has [done this fully compatibly](https://github.com/gitpython-developers/GitPython/blob/d39bd5345af82e3acbdc1ecb348951b05a5ed1f6/git/exc.py#L28) (if i remember correctly).

* I have [done this for Py3.6+ Windows/POSIX](https://github.com/ankostis/polyvers/blob/master/pvcmd/polyvers/utils/oscmd.py#L26).

Your code there seems to be in the direction that I was thinking about as well, yes: A custom exception class containing everything we want the end user to know. Here are some of the sources I read about this last year:

By the way, there is still an unused class lying around in the pydot code: InvocationException, in disuse since commits 9b3c1a1 and bc639e7 of 2016 (v1.2.0). I think it was actually meant to do something similar as we are discussing here, so maybe its removal in 2016 may actually be considered somewhat of a regression. Maybe we can revive/revert it, extend it and even make it backward compatible with its earlier behavior again. OTOH, we might need to distinguish two separate classes, one for errors from calls to external programs (Graphviz dot, issue 218) and one from errors from calls to Python libraries (pyparsing), because they will carry different context data.

Back to your comment about compatibility, I can imagine that this is an issue. Last year, I looked at subprocess.run and the CalledProcessError exception it can raise. However, that exception class only carries stdout and stderr output since Python 3.5. Also, exception chaining is only available in Python 3. Maybe we should just follow the Python support schedule and only support Python 3.5 and higher?

I did not know about the Windows/POSIX differences, but can also imagine they exist, especially when it comes to OS-related exceptions. Maybe in this case of ParseException, we are sufficiently abstracted from the OS for it to cause trouble, but in the case of Graphviz dot errors (issue 218), we might run into it.

That way, this library would still raise self-evident exceptions, and avoid printing spurious error-messages to stdout that it's easy to miss.

Yes, I see the benefit.

(btw, shouldn't in print stderr instead, or better use logging standard module?)

I found it was difficult to find sources on the general notion of library output, but from what I did find, I understand that printing to stderr is considered almost equally bad. Logging to system logs (I know, not what you meant) is equally problematic. Some sources (not all Python):
https://stackoverflow.com/questions/4201856/error-handling-strategies-in-a-shared-library-c
https://www.reddit.com/r/C_Programming/comments/97ebfj/redirect_stdout_and_stderr_to_a_buffer/e47n9kh/

With regard to the logging standard module of Python, I do not have experience with it yet. I remember looking into it for a while last year, but I think I felt that exception handling, rather than a log, was more fit for conveying the details of the error. For example, it feels a big unnatural to dump an entire DOT file in a log, although the same may be argued for exceptions I guess. Also, what happens to the logs if any of the libraries and programs between pydot and the final user does not use the logging module? Do the logs just get lost then (not good, because the end user is still in the dark), or written to syslog (also not good, because the basic idea seems to be that a library is not supposed to make noise without permission)?

My commenta above expressed in this review: #219 (review)

Ok, I will reply there as I go. I can see already that some of your suggestions require some more investigation. I do not always have as much time.

Thanks for the interesting discussion. :)

@ankostis
Copy link

ankostis commented Apr 12, 2020

[edit:] Thank you for the references to SO for the general problems.

Intro

As they say, there are just 2 hard problems in programming:

  1. naming things,
  2. error handling, and
  3. off-by-one errors :-)

I'm particularly intrigued by the 2nd class of errors, because it is intimately linked to the architecture of program. Actually 80% of code we write is to convey to fellow programmers our thoughts, and to control their reactions to unseen cases (i.e errors).

Error handling & Logging

I'm picking some interesting points you raised to respond:

OTOH, we could just say that this is the fault of those developers in-between and that we only support the direct users of pydot.

👍
In a library, there is simple rule, when you don't know what to do with an error, let it bubble!
Mark in the docs what your method raises, and you've done your duty.
(for the top-level application with UI facing users, the stakes are higher, but it's not our concern here).
If you're feeling really kind, you may add a DEBUG-level log statement with more stuff that only upset users need to see :-).

With regard to the logging standard module of Python, I do not have experience with it yet. I remember looking into it for a while last year, but I think I felt that exception handling, rather than a log, was more fit for conveying the details of the error.

Well, it's not that hard, because for library code you should not fiddle with logging configurations - that's the responsibility of downstream programmers, for the app facing the final-user.
You roughly create a logger instance ate the module level and use it through out:

logger = logging.getLogger(__name__)
... 
log.info("Hello %s", "world") 
...
log.info("Terrible thing happened: %s", "a very long shit!", exc_info=1) 
...
log.debug("Study this stacktrace:", stack_info=1) 

(*) to be pedantic, this pattern is not completely correct, but 99% of code out there does that, and it's bad only for long running processes.

... understand that printing to stderr is considered almost equally bad.

Well, it's not that bad, and certainly it's much better than polluting stdout, but indeed, logging is the preferred way to write to stderr.

Do the logs just get lost then (not good, because the end user is still in the dark), or written to syslog ...

Yes, they are lost. No, they do not end upo to syslog, unless an intermediate developer installs the respective log-handler for that. [edit] Or the library ends up running in some daemon with stderr redirected to syslog.
But again, these are not our concerns.

For example, it feels a big unnatural to dump an entire DOT file in a log, although the same may be argued for exceptions I guess.

The good thing about logging module is that you can have your cake, and eat it only later.
I mean, you dump the whole thing (a big DOT file) in DEBUG level, which is usually disabled,
and since the syntax above does not manipulate strings in any way, there is a very small penalty to pay, and much bigger benefit in case of trouble further down the (future) road.

Maybe we should just follow the Python support schedule and only support Python 3.5 and higher?

Absolutely 👍👍

I suggest then to take a look at my oscmd and the monekypatching of the standard-exception to contain also the cwd and sterr. Check also the TCs.

Bonus track: Are chained-exceptions over-rated?

When you develop a library you still need to configure logging subsystem,
for the test-cases, and for reviewing you log messages if they make sense, have errors, etc.
It's the amazing pytest that comes to the rescue; it avails the --log-level=DEBUG option, saving the lib developer from having to configure the logging sub-system. Which btw it doesn't need much effort:

logging.basicConfig(level=DEBUG)

But as i stressed, lib-devs must stay away from configuring the logging system. And should refrain from interpreting the errors the lb generates - these are not their business.
Actually they should not even try to catch exceptions :;) Too often, a finally pattern like this can deal with cases where you need some clean up after ther error. And that avoid interrupting the stack-trace, so chained-exceptions are not needed:

ok= False
try:
    // do somtehing
    ok = True
finally:
    if not ok:
        //clean up

@peternowee
Copy link
Member

Funny that the apparent error in your quote about hard problems in programming is actually hard to handle for me. How to respond to this? Let me just say your memory of the quote may no longer be valid and should be refreshed next time you read it from disk. :)

For the rest, I just wanted to say that I have read your reply and it all looks very interesting. I will study and think about these subjects a bit more when I have the time. My schedule is quite unpredictable, so no need to wait for me if you get a chance to make improvements you see fit.

@ankostis
Copy link

Let's hope you still have some time left for treading the original quotes :-)

@peternowee
Copy link
Member

Haha, cool, maybe one day your version will make it to that page as well.

@peternowee
Copy link
Member

I guess that from the start, you wanted me to raise it.

peternowee added a commit to peternowee/pydot that referenced this issue Apr 18, 2020
When parsing DOT strings or files, pydot's parser calls the third-party
library PyParsing. If PyParsing meets a problem, it raises a
`ParseException`, which pydot then uses to report some details.

Currently, during the handling of that exception, a `TypeError` occurs.
It is caused by an attempt to concatenate a string with the exception
using the `+` (addition/plus) operator, which is (not allowed)[1]. The
problem was introduced in 2016 by commit (b4a3810)[2].

This current commit now removes that concatenation and goes back to
printing three separate lines. The end result is the same as Lance
Hepler's (original 2013 solution)[3] and (the example code in the
PyParsing documentation)[4].

I tested that behavior is now as expected (printing parser error
details without `TypeError`) under Python 2.7.16 with PyParsing 2.4.2
and Python 3.7.3 with PyParsing 2.4.2 and 3.0.0a2 ((486b1fd2e)[5]).
Unit tests all pass (same three combinations, when run together with
the test suite fix proposed in PR 211).

This (closes pydot#176)[6] ("Error in error handling hides the real error
message"), reported by Shish, and resolves the `TypeError` part of
(issue pydot#171)[7] ("Exception on parsing dot file"), reported by Michael
Goerz.

[1]: https://docs.python.org/3/reference/expressions.html#binary-arithmetic-operations
[2]: pydot@b4a3810#diff-baef597193866f900a2726a8a4667b12R563-R566
[3]: nlhepler/pydot@687eb7e#diff-baef597193866f900a2726a8a4667b12R551-R553
[4]: pyparsing/pyparsing@2c6f881#diff-7ab8c6f0d66d170e10b60b9f65334e18R694-R697
[5]: https://github.com/pyparsing/pyparsing/tree/486b1fd2ef7e98966665f915bc59856996ffb5b0
[6]: pydot#176
[7]: pydot#171
peternowee added a commit to peternowee/pydot that referenced this issue Apr 18, 2020
When parsing DOT strings or files, pydot's parser calls the third-party
library PyParsing. If PyParsing meets a problem, it raises a
`ParseException`, which pydot then uses to report some details.

Currently, during the handling of that exception, a `TypeError` occurs.
It is caused by an attempt to concatenate a string with the exception
using the `+` (addition/plus) operator, which is [not allowed][1]. The
problem was introduced in 2016 by commit [b4a3810][2].

This current commit now removes that concatenation and goes back to
printing three separate lines. The end result is the same as Lance
Hepler's [original 2013 solution][3] and [the example code in the
PyParsing documentation][4].

I tested that behavior is now as expected (printing parser error
details without `TypeError`) under Python 2.7.16 with PyParsing 2.4.2
and Python 3.7.3 with PyParsing 2.4.2 and 3.0.0a2 ([486b1fd2e][5]).
Unit tests all pass (same three combinations, when run together with
the test suite fix proposed in PR 211).

This [closes pydot#176][6] ("Error in error handling hides the real error
message"), reported by Shish, and resolves the `TypeError` part of
[issue pydot#171][7] ("Exception on parsing dot file"), reported by Michael
Goerz.

[1]: https://docs.python.org/3/reference/expressions.html#binary-arithmetic-operations
[2]: pydot@b4a3810#diff-baef597193866f900a2726a8a4667b12R563-R566
[3]: nlhepler/pydot@687eb7e#diff-baef597193866f900a2726a8a4667b12R551-R553
[4]: pyparsing/pyparsing@2c6f881#diff-7ab8c6f0d66d170e10b60b9f65334e18R694-R697
[5]: https://github.com/pyparsing/pyparsing/tree/486b1fd2ef7e98966665f915bc59856996ffb5b0
[6]: pydot#176
[7]: pydot#171
@peternowee
Copy link
Member

@ankostis: I have read up on logging and exception handling a bit and I have seen 'some' light. Thank you for your pointers.

I now agree with most of what you said:

  • Raise an exception.
  • Log a DEBUG-level log line.
  • Don't bother configuring handlers or the root logger.
  • Yes, DEBUG log lines will get lost by default and someone has to make an effort to see them.

However, I am still not sure about what exception to raise exactly. I still want to help that unlucky end-user that gets confronted with it. For example, you wrote:

In a library, there is simple rule, when you don't know what to do with an error, let it bubble!

and

[Lib-devs] should refrain from interpreting the errors the lb generates - these are not their business.

Is there room for an exception to those rules here? Because I do have some idea on what to do with the error and I do think we can actually add some valuable interpretation, like:

  • We know that we were parsing a DOT-string. PyParsing did not.
  • We can point the user to the DOT Language definition and the pydot issue tracker.

I have drafted a basic troubleshooting guide to be included in the pydot documentation that:

  • Explains the user how to get the DOT-string and pyparsing's explanation:
    • From the log.
    • From the exception.
  • Points the user to:
    • The DOT Language definition to check the DOT syntax.
    • The maintainer of any intermediate software they use that may have auto-generated the DOT string.
    • The pydot issue list in case they feel their DOT string was valid, and some general advice on reporting bugs.
    • The PyParsing documentation in case they want to investigate in more detail.

I drafted it now as a new paragraph in the README.md, but it could also be split off to a new TROUBLESHOOTING.md of course.

The only thing I still need now, is a way to point the user to that documentation.

If we just allow the original exception to bubble up without any additional information, users with little experience reading tracebacks or otherwise unaware of pydot's role, will not immediately look at our troubleshooting guide. They will just start searching for the term ParseException and end up in PyParsing's documentation and possibly even ask for support there.

Some of the ways I considered to point the user to our documentation:

  • Write the DEBUG log out by default: No. I chased this for a long time, but finally concluded that this approach is all wrong. I kept running into potential logging configuration conflicts with other modules. And printing by default just goes against the whole nature of the DEBUG level. (Note: I will still log a DEBUG line, but I now accept that the user may have to do some work to see it.)

  • Logging a log line of level ERROR before raising an exception: No. The problem with this is that the exception may get handled at a higher level and then the log line is still there ("You're giving me two errors.", Mario Corchero, Pycon 2019, Logging HOWTO: When to use logging).

  • Add an in-line comment in our code at the end of the line that may indirectly cause the exception, so that the comment will be shown as part of the traceback:

    File "dot_parser.py", line 575, in parse_dot_data
      tokens = graphparser.parseString(s)   # ParseException? See pydot documentation.
    

    No. Drawbacks: Still not very visible in the middle of a traceback. Plus the comment would probably get deleted by another developer during the next code cleanup.

  • Drop the original ParseException and raise a new exception: No(?) It feels wrong to erase history like that. And copying over all the data is ugly and a lot of work.

  • Try to change the original ParseException's message: No(?) It feels so hacky. Example: https://stackoverflow.com/questions/9157210/how-do-i-raise-the-same-exception-with-a-custom-message-in-python/9157277#9157277

  • Raising a new chained exception, chained to the original ParseException: Yes? In your previous comment, under "Are chained-exceptions over-rated?", you showed that chained exceptions were not always necessary and pointed out that not using them avoided interrupting the stack-trace. I see your point, especially in the clean-up example you gave, but here I also want to weigh it against the benefit of being able to give the user some additional information. Can the pain of experienced users having to read two tracebacks be offset by the gain of inexperienced users saving troubleshooting time?

    In case we decide to use chained exceptions, the question still remains what kind of exception we should use:

    • Another one of the same exception (e.g. our ParseException chained to the original ParseException): No. Drawbacks: Confusing to have two of the same exceptions with two different messages. Copying all that data over is ugly, a lot of work and unnecessary duplication. Not copying the data makes the new exception non-conformant with its documented attributes.

    • A built-in exception, such as ValueError, with just the message that we want to get out (chained to the original exception):

    • A custom exception (chained to the original): Yes? I hope we are not going to copy over any data from the original exception, now that it is chained to ours. Also, if we use unique names for our custom exceptions, we might not even need to literally point to our documentation, as a web search for that unique name would already point to pydot. An added benefit of using custom exceptions for both PR 218 and PR 219 is that we can offer a unified way of accessing the DOT-string. The question that then naturally comes up is whether we should create:

      • Two separate, independent exception classes (for starters), or
      • One single custom exception class for both cases (maybe later), or
      • Two custom exceptions derived from a single base class (maybe later).

      With those last two options, I run into even bigger questions, like: Why are pydot.py (from where Graphviz is called) and dot_parser.py (from where PyParsing is called) separate? Does sharing an exception or exception hierarchy between the two fit in with that? Should we define the exception hierarchy in a separate file? What to do with the existing exceptions, which have some problems already? Naming issues, etc. etc. My time is limited, so I hope to prevent such scope creep.
      Maybe we can avoid that discussion for a while now and start with the first option, i.e. create two custom exceptions for PR 218 and PR 219, completely independent from each other. We can use identical names for identical attributes, such as the DOT-string, so that won't stand in the way of any later integration in an exception hierarchy. We could add a comment near each of the exception class definitions to make future developers aware of this. Further integration can be discussed later then, possibly in a separate PR.

To conclude the above list: I am currently thinking of a custom exception class with our own message and carrying only our additional attributes, chained to the original exception which has its own attributes. And for now no integration between the custom exception classes needed for PR 218 and PR 219.

About the contents of our message, I think it can contain a short interpretation from our side (e.g. Supplied string cannot be parsed as DOT or, more presumptuous, Invalid DOT-string or DOT-syntax error) and a pointer to our documentation (e.g. See pydot documentation for help.).

Example using ValueError:

Traceback (most recent call last):
  File "dot_parser.py", line 559, in parse_dot_data
    tokens = graphparser.parseString(s)
  File "pyparsing.py", line 1955, in parseString
    raise exc
  File "pyparsing.py", line 3003, in parseImpl
    raise ParseException(instring, loc, self.errmsg, self)
pyparsing.ParseException: Expected {'graph' | 'digraph'}, found 'g'  (at char 0), (line:1, col:1)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "mre-with-logging.py", line 25, in <module>
    G = pydot.graph_from_dot_data("g blah blah")
  File "pydot.py", line 287, in graph_from_dot_data
    return dot_parser.parse_dot_data(s)
  File "dot_parser.py", line 575, in parse_dot_data
    raise ValueError(hint) from err
ValueError: Supplied string cannot be parsed as DOT. See pydot documentation for help.

Now the error ends in a clear message pointing to the pydot documentation.

Alternatively, using a custom exception with a unique name that should help users find pydot documentation by themselves:

[...]
pyparsing.ParseException: Expected {'graph' | 'digraph'}, found 'g'  (at char 0), (line:1, col:1)

The above exception was the direct cause of the following exception:
[...]
  File "dot_parser.py", line 583, in parse_dot_data
    raise new_err from err
dot_parser.InvalidDotError: Supplied string cannot be parsed as DOT

This last one is my current preference. It inherits from ValueError, but adds a custom attribute dot_string containing the string our function received and was also sent to pyparsing.

I hope you can let me know what you think.

Replying to your other comments:

Maybe we should just follow the Python support schedule and only support Python 3.5 and higher?

Absolutely 👍👍

Ok, I will assume Python 3.5+ in #218 and #219 from now on (next force-push). For Python 2, I created a very targeted bug fix for inclusion in a bug fix/point release in #227.

I suggest then to take a look at my oscmd and the monekypatching of the standard-exception to contain also the cwd and sterr. Check also the TCs.

This is relevant only to PR #218, I think. I had a quick look and I will look at it in more detail later and see what I can re-use, but I can already tell you that I don't really feel comfortable monkey-patching the standard library, for the usual risks involved. I would rather use a custom exception, I think. Also, I have come to accept that not all of our additional data needs to be part of the exception string, so maybe there is no need to override __str__.

When you develop a library you still need to configure logging subsystem,
for the test-cases, and for reviewing you log messages if they make sense, have errors, etc.
It's the amazing pytest that comes to the rescue; it avails the --log-level=DEBUG option, saving the lib developer from having to configure the logging sub-system.

Ok, I still need to look at testing. I will come back on this if I have any questions later.

Which btw it doesn't need much effort:

logging.basicConfig(level=DEBUG)

Yes, that should do the trick in many cases. I have put this in the proposed documentation as the first method to try when someone wants to see the DEBUG lines. But there are two possible downsides:

  • If the root logger already has a handler attached to it (for example because an imported module set it up), the user's later call to basicConfig will not have any effect. It will not add a handler, nor will it even change the level of the root logger anymore.
  • The level= argument here sets the level of the root logger to DEBUG (not its handler, btw, which basicConfig sets up to handle everything anyway). A root logger of level DEBUG results in all NOTSET loggers from all modules to inherit DEBUG as well. This could make the log quite noisy.

Therefore, I plan to document the following alternative as well:

logging.getLogger('pydot').addHandler(logging.StreamHandler())
logging.getLogger('pydot').setLevel(logging.DEBUG)
logging.getLogger('pydot.pydot').setLevel(logging.DEBUG)
logging.getLogger('pydot.dot_parser').setLevel(logging.DEBUG)

Notes:

  • Less prone to conflicts with logging configuration by other modules, because it does not touch the root logger/handler.
  • Attaching a handler to a library logger is normally not advised, but this is not going to be part of the code, but of the troubleshooting guide, where I think it is acceptable.
  • I explicitely set the levels of both the parent and the child loggers, because if another module has set the child loggers to a specific level already, they would not inherit from the parent anymore.
  • The above code results in the following logging tree (visualized using the logging_tree package):
    <--""
       Level WARNING
       |
       o<--"pydot"
           Level DEBUG
           Handler Stream <_io.TextIOWrapper name='<stderr>' mode='w' encoding='UTF-8'>
           |
           o<--"pydot.dot_parser"
           |   Level DEBUG
           |
           o<--"pydot.pydot"
               Level DEBUG
    

Finally, two other issues that were brought up earlier in the context of PR 218 and/or PR 219 and on which my thoughts have started shifting:

  • Less immediate need for flags, options or environment variables to influence logging, error raising or verbosity. Partly because I have started to see the light with regard to logging and exception handling, but also because I am running into time constraints and I want to limit the scope of these two PRs. Others can make such suggestions in separate PRs and we can always rebase as things get moving.
  • Less need for a transition/backwards compatibility period for the API changes. This follows on earlier discussion here in Exception on parsing dot file #171 (comment) and in [WIP] Improve our handling of pyparsing errors #219 (comment). For [WIP] Improve our handling of Graphviz errors #218, I think backwards compatibility should be out of the question anyway, because the current behavior is so wrong (see the mess in Problem with graph.write.png...AssertionError: 1 #203 for example). Both PRs now target a future major/minor release in which more API changes will occur, so I will just suffice with clear changelog entries and docstring changes. Here again, though, time constraints do play a role. Not only do I worry about the first implementation of backwards compatibility, but also about the long-term maintainability of the complex transition code and management of the transition schedule. If anyone has the time for it, feel free to send in a detailed proposal and I will reconsider.

@ankostis
Copy link

ankostis commented May 1, 2020

What an excellent analysis document! 😄

Here are my thoughts:

Comments in the raising code

Add an in-line comment in our code at the end of the line ...

Actually that's a neat way to add human context to the error without cluttering the exception workload.
I use that exactly as you did in the example, to question the tendency to panic of the user :-)

If I would order the channels to convery messages to the user, it is the following:

  • ex message,
  • ex-stack-trace (like the comments in the offending site)
  • chained exception msg & stacktrace
  • logging (wich although consulted rarely, it is greatly appreciated when developing on top, or when a desperate user searches more alternatives)

Chained-exception

Yes, when original exception curries a big load, like pyparsing, but it might be irrelevant to you code, chain it.
And if you need to append a lot of special context yourself, craft a new exception type - some care is needed when implementing new exceptions, i've seen a lot of clumsy implementations that interacyt badly with default machinery of exceptions.

My preferred way is to store all info in the original *args tuple, and make virtual @property methods indexing into the tuple with meaningful names. I keep the convention the 1st arg to be the msg, which is constructed "outside" of the exception, and the other arguments are context infos (e.x dot-string, cwd, stderr, etc)
Of course then you need to override __str__() to print simply the message with self[0] while the default __repr__() would print e.g. NewError('Syntax error atr dot line 6: hi five msg', '{graph\n ...', '/tmp/dfgdyut', '<the stderr here>', ...).

I would inherit ValueError for the custom error (since it is a problem in the input).

Another point: My convention is always to include the crux of the chained exception into the top one,
in order for the user to get the idea early, and need to look at the chain just for stack-traces and pin-pointing location of the error,
for instance:

except Exception as ex:
    ...  # contrcuct context infos
    raise TopException(f"Graphviz failed due to: {ex}", cwd, stder, ...) from ex

Package & module organization

I agree, the current organization of python files is "unusual", and complicates crafting a sensible exception hierarchy.
The majority of distributions in PyPi are either a single python-file (a module),
or a package (a folder with __init__.py package-module) containing more modules files inside.

If dot_parser.py us considered a "private" file, it' i's easy to retrofit the existing 2 modules into a more conventional format like that:

pydot/
  +--__init__.py  # all "public" objects must be defined/imported in this module
  +--dot_parser.py 
  +--base.py      # read below 

The Exceptions should be definitely defined in the package-module.

It is tempting to copy-paste the whole pydot.py contents on the package-module
but i would recommend against it.

TIP1:
When a user imports a module further into a package, the package-module gets import
in its entirety - so if it is "heavy" it delays the import. That is important when
the different functionality is split in multiple submodules of the package.

Example:
A package contains 3 sub-modules: base.py, sphinx_extension.py
If base.py contains also code for irrelevant http-server communications,
whenever from <package> import sphinx_extension code is trun,
it has to pay the price for this never-to-be-used http-server functionality.

TIP2
The modules are split based on the 3rd-party imports. If a module is completely disparate,
it might become an "extras" in the setup.py.

Copy all in the base.py (or choose another name for this module, just my convention, try core.py),
and import from package-module only what is needed.

Python 3.X compatibility

Please, lease, consider PY3.6+!
PY3.5 has:

  • Deterministic dict order and in **kw - important for many applications.
  • f-string formated literals
  • file-path protocol (pathlib) for many standard APIs.

Anything before PY3.6 is legacy, along with PY2.7.

logging

I explicitely set the levels of both the parent and the child loggers, because ...

But that is exactly the main architectural design of logging subsystems,
to configure parent and proliferate to children.
Also this is related to the package-restructuring.
The sample code you gave has some strange hierarchy e.g. pydot.pydot & pydot.dot_parser.
Have you already considered restructuring?
Is these string hand-crafted? (and not simply getLogger(__name__)?).
It is better to always have the logger-names coincide with the package structure,
so as not to need to consult the docs for that.

For the package structure as it is, i would recommend just the in the docs just:

logging.basicConfig(level=0)
logging.getLogger().level=0)  # in case already configured

and leave all the rest (i.e. reduce clutter) for the curious user to figure it herself :-)
(most developers are quite familiar with logging)

@peternowee
Copy link
Member

What an excellent analysis document! 😄

Thank you and thank you for sharing your thoughts. My reply again:

Comments in the raising code

Add an in-line comment in our code at the end of the line ...

Actually that's a neat way to add human context to the error without cluttering the exception workload.
I use that exactly as you did in the example, to question the tendency to panic of the user :-)

Great minds think alike. :)

But innovative as it may be, I found the resulting stack trace still too obscure for the user and the use of comments for this purpose too unconventional for a project with multiple contributors. I concluded that my goal of helping out the inexperienced user was better accomplished by having the message in the final line.

If I would order the channels to convery messages to the user, it is the following:

  • ex message,
  • ex-stack-trace (like the comments in the offending site)
  • chained exception msg & stacktrace
  • logging (wich although consulted rarely, it is greatly appreciated when developing on top, or when a desperate user searches more alternatives)

With those first two items you also mean final exceptions that have earlier exceptions chained to them, right? Not just stand-alone exceptions, right?

Chained-exception

[...]

My preferred way is to store all info in the original *args tuple, and make virtual @property methods indexing into the tuple with meaningful names. I keep the convention the 1st arg to be the msg, which is constructed "outside" of the exception, and the other arguments are context infos (e.x dot-string, cwd, stderr, etc)
Of course then you need to override __str__() to print simply the message with self[0] while the default __repr__() would print e.g. NewError('Syntax error atr dot line 6: hi five msg', '{graph\n ...', '/tmp/dfgdyut', '<the stderr here>', ...).

Why choose *args over attributes? I was planning to use attributes, as that seems to be the general advice:

I would inherit ValueError for the custom error (since it is a problem in the input).

Yes, I certainly plan to do that for the parsing problem (#219).

For the Graphviz problems (#218), I still need to check if the problem is always in the input. Perhaps I can differentiate between different possible problems (e.g. DOT-syntax problem, temporary .dot-file not found, Graphviz executable not found, maybe more). It will take some time to investigate this. After that, we can discuss further in #218 if necessary.

Another point: My convention is always to include the crux of the chained exception into the top one,
in order for the user to get the idea early, and need to look at the chain just for stack-traces and pin-pointing location of the error,
for instance:

except Exception as ex:
    ...  # contrcuct context infos
    raise TopException(f"Graphviz failed due to: {ex}", cwd, stder, ...) from ex

Yes, ok, this is possible, but I guess it depends on the case a little bit. It may be useful in cases where we are forced to use a custom exception because we want to add some data, but we do not have anything interesting to add to the original message, perhaps in the case of #218 (Graphviz/CalledProcessError) indeed.

However, in cases where we can provide the user with a better explanation, such as this #171 and #219 (pyparsing/ParseException), inclusion of the original message may not be necessary.

So, thanks for the tip, I will use it where appropriate.

Package & module organization

I agree, the current organization of python files is "unusual", and complicates crafting a sensible exception hierarchy.
The majority of distributions in PyPi are either a single python-file (a module),
or a package (a folder with __init__.py package-module) containing more modules files inside.

If dot_parser.py us considered a "private" file, it' i's easy to retrofit the existing 2 modules into a more conventional format like that:

pydot/
  +--__init__.py  # all "public" objects must be defined/imported in this module
  +--dot_parser.py 
  +--base.py      # read below 

The Exceptions should be definitely defined in the package-module.

It is tempting to copy-paste the whole pydot.py contents on the package-module
but i would recommend against it.

TIP1:
When a user imports a module further into a package, the package-module gets import
in its entirety - so if it is "heavy" it delays the import. That is important when
the different functionality is split in multiple submodules of the package.
Example:
A package contains 3 sub-modules: base.py, sphinx_extension.py
If base.py contains also code for irrelevant http-server communications,
whenever from <package> import sphinx_extension code is trun,
it has to pay the price for this never-to-be-used http-server functionality.

TIP2
The modules are split based on the 3rd-party imports. If a module is completely disparate,
it might become an "extras" in the setup.py.

Copy all in the base.py (or choose another name for this module, just my convention, try core.py),
and import from package-module only what is needed.

I am sorry, but as I said in my previous comment, due to time constraints I hope to update my PRs #218 and #219 separate of the bigger questions related to package and module organization. Making the PRs depend on big questions like this would unnecessarily hamper progress, in my eyes. That is why I proposed to use separate, independent custom exceptions in pydot.py and dot_parser.py for now. Of course, if this bigger question gets solved before PR 218/219 are merged, I will rebase and reconsider using an exception hierarchy. But if PR 218/219 get merged earlier, the exceptions they introduce will just become part of the bigger question. Too bad, but at least we will have enjoyed incremental improvements in the mean time.

Of course, I will try to anticipate on the outcomes of the bigger questions, which is why I proposed to use identical names for identical attributes and to add comments near the exception class definitions to make other developers aware of the possible future integration. If you have anything to add to that, you can still let me know.

Back to the package & module organization: How about we create a new issue for that? Would you like to do that (or maybe even a PR) or shall I do it? You seem to have more experience with this subject than I do. If I would have to create a new issue for it now, it would be nothing more than a reminder/todo-item with just a copy of your above comments. It would probably take me weeks, if not months, to push out a first proposal, while you might be able to come up with a proposal in just a few days.

Python 3.X compatibility

Please, lease, consider PY3.6+!
PY3.5 has:

  • Deterministic dict order and in **kw - important for many applications.
  • f-string formated literals
  • file-path protocol (pathlib) for many standard APIs.

Anything before PY3.6 is legacy, along with PY2.7.

Difficult question. Some additional arguments:

In favor of dropping PY3.5 support:

  • PY3.6 functionality you mentioned, such as f-strings.
  • For the Scientific Python ecosystem, Numpy NEP 29 recommends supporting at Python versions released in the last 42 months months. For PY3.5 that means 2019-03-13 (2015-09-13 + 42 months).

Against dropping PY3.5 support:

Inconclusive:

  • I searched Github.com Issues for combinations of "PY3.5" "python" "3.5" "compatibility" and "support" and had a quick look at discussions in 15 projects, most of them from the last few months. Don't take this as exact science, but here is a tally of the impressions I got from those discussions:
    • 6 of the 15: Drop PY3.5 support.
    • 3 of the 15: Keep PY3.5 support for now (1 even consciously supporting PY2.7 and PY3.4).
    • 3 of the 15: PY3.5 support on best-effort basis / welcoming PRs to re-add PY3.5 support by removal of f-strings.
    • 3 of the 15: Inconclusive: 1 was still discussing, 1 had different strategies for different components, 1 decided to drop based on the wrong assumption that Python's official support for PY3.5 had ended in 2019-11 already.

So, I am still hesitant. Do you use PY3.6-specific functionality in your open PR now, or need it for something else you are working on? If not, I propose we assume PY3.5 support for now and reconsider again when the need arises, or in September 2020. Is that ok?

logging

I explicitely set the levels of both the parent and the child loggers, because ...

But that is exactly the main architectural design of logging subsystems,
to configure parent and proliferate to children.

Yes, that works as long as the child loggers are still NOTSET, but you left out the rest of the sentence:

... if another module has set the child loggers to a specific level already, they would not inherit from the parent anymore.

Remember, this is for use in the documentation on how to get access to the DEBUG log lines as a second, alternative method to try if the first method did not work. Therefore, it has to be stronger and assume things like child loggers not inheriting from the parent because their level is not NOTSET anymore.

Also this is related to the package-restructuring.
The sample code you gave has some strange hierarchy e.g. pydot.pydot & pydot.dot_parser.
Have you already considered restructuring?

I am open for suggestions and willing to consider changes to the logger hierarchy and logger names as long as my changes have not been merged.

What logger hierarchy and names did you expect instead? Something like pydot.core and pydot.dot_parser?

Is these string hand-crafted? (and not simply getLogger(__name__)?).

Yes, handcrafted. getLogger(__name__) currently results in loggers named just pydot and dot_parser, which means no parent/child logger level inheritance at all. I figured that if I was going to have to prefix them with a fixed string, I might as well just hard-code their names altogether.

It is better to always have the logger-names coincide with the package structure,
so as not to need to consult the docs for that.

I don't know about "always". If we agree that the current package structure is "unusual" and are thinking about changing that structure (see Package & module organization above), we could already try to let the logger-names anticipate the most likely future package structure by now hard-coding what we expect __name__ to return in the future.

I understand that there is a risk that we guess wrong, but that risk needs to be weighted against the costs of waiting for the new structure. If someone thinks the new structure should be implemented first, they should contribute to that (again, see Package & module organization above). I am willing to rebase and reconsider the logging setup in my PRs as long as they have not been merged.

For the package structure as it is, i would recommend just the in the docs just:

logging.basicConfig(level=0)
logging.getLogger().level=0)  # in case already configured

Yes, ok, this will work in case:

  • The pydot loggers are still NOTSET, meaning they inherit their level from the root logger. If not (some other module already set the pydot loggers to ERROR for example), the level of the root logger does not affect the level of the pydot loggers, and
  • The existing handler attached to the root logger does not have some unexpected handler level, filter or output channel set.

and leave all the rest (i.e. reduce clutter) for the curious user to figure it herself :-)
(most developers are quite familiar with logging)

Hmm, I don't know. We are talking about a section in the documentation that is dedicated to getting the DEBUG log lines out. I think there is room for giving an second, alternative method in case the first method did not work. I will try not let it result in clutter.

@ankostis
Copy link

ankostis commented May 3, 2020

I agree almost wholeheartedly with everything you concluded.
Even your questions about clarification son something i wrote had the correct priority, sparing an answer from me :-)

It's just your hesitation for "what if the logger has stopped being NOTSET" seems rather peculiar.
How is that possible to happen without the developer or final user noticing that?.

Don't forget, libraries don't touch logging configs (to avoid imbroglios like these).
So it's the final application that would have changed levels/handlers, and it better know what it is doing and for what endeavor - you cannot anticipate that, you should not.
Not even in the docs!

I did that in the past, and tried to anticipate the problems novice users would face in my final application,
and ended up explaining how to setup a new python environment in various OSs and platforms e.g. pypy, egg, pip, whl, native DLLs * libraries, you get the point.
The things a novice user would not know is immense, as StackOverflow shows.

Nowdays, i rarely point to some considerable important 3rdp document - i don't write instructions for other libraries because then i have to maintain instructions about other people's business :-)

And you are not even talking about a final application here.
It's a library.

If you ask me, even the 2-line example code i gave is too much, not fitting for a library project
Developers capable of using a library, must have read at least once some guide and that's it. After that, they know!
The lib docs is enough to write which are the names of the loggers (since here they will not cincide with the package structure)

Given the opportunity, very welcomed to foresee the proper package structure in the logger-names.

@peternowee
Copy link
Member

I agree almost wholeheartedly with everything you concluded.
Even your questions about clarification son something i wrote had the correct priority, sparing an answer from me :-)

Good, thank you, happy to hear that. Thank you for providing new insights. I learned a lot the last few weeks.

It's just your hesitation for "what if the logger has stopped being NOTSET" seems rather peculiar.
How is that possible to happen without the developer or final user noticing that?.

Don't forget, libraries don't touch logging configs (to avoid imbroglios like these).
So it's the final application that would have changed levels/handlers, and it better know what it is doing and for what endeavor - ...

Yes, I was thinking of the final application mostly. I only started studying logging a few weeks ago, so I have not seen much of how final applications use it in practice, but the fear that they might change library logging levels originated from the following things I saw and read:

  • Regarding the motive: In his 2019 Pycon talk on exceptions, Mario Corchero talked about what final applications do when they think a library's logging is too noisy: "You're giving me two errors. [...] Quite often what they do is: I'm going to shut the logger off of that library." (7:19-9:02). His example was about excess logging at ERROR level and we are only talking about DEBUG level now, but the point is that it made me realize how tempting it may be for main applications to just shut library logging off completely. (He also gave a talk on logging on Pycon 2018, but I don't remember he had more on this particular subject then.)
  • Regarding the means: The Python Logging HOWTO warns that logging.config.dictConfig() and fileConfig() both have an option disable_existing_loggers that defaults to True. I can imagine that these configuration methods are especially used in final applications that depend on many libraries. (Btw, I just tried it myself and found out that disable_existing_loggers actually does not touch our logger levels, but sets an undocumented .disabled attribute on them to True. To correct this, I would need to add another two lines to my proposed second, alternative method again. (This underlines your point below that we cannot anticipate everything.))

Anyway, I hope that explains my fear that our loggers may get disabled in ways that the first method (basicConfig) may not solve. I do not know to what extent this actually happens in practice.

... you cannot anticipate that, you should not.
Not even in the docs!

I did that in the past, and tried to anticipate the problems novice users would face in my final application,
and ended up explaining how to setup a new python environment in various OSs and platforms e.g. pypy, egg, pip, whl, native DLLs * libraries, you get the point.
The things a novice user would not know is immense, as StackOverflow shows.

Hehe, funny. Exceptional question and great response.

Nowdays, i rarely point to some considerable important 3rdp document - i don't write instructions for other libraries because then i have to maintain instructions about other people's business :-)

And you are not even talking about a final application here.
It's a library.

If you ask me, even the 2-line example code i gave is too much, not fitting for a library project
Developers capable of using a library, must have read at least once some guide and that's it. After that, they know!
The lib docs is enough to write which are the names of the loggers (since here they will not cincide with the package structure)

Yes, ok, I see. Thanks for sharing your experiences and the balance you found. Very valuable. I will need to find a similar balance.

Specifically with regard to this case, I guess I should also consider that anything I put in the documentation will need to be maintained later as well (e.g. if child logger names get changed in the code). Maybe I will just start out with only the basicConfig() method and maybe a pointer to the Python logging documentation or that guide. Then we can always consider documenting the alternative methods if actual support questions for that come in. I will think about it some more and the result will be in the PRs.

Given the opportunity, very welcomed to foresee the proper package structure in the logger-names.

Ok, I will try, probably something like pydot.core and pydot.dot_parser then, but that may change as long as the relevant PRs are not merged yet and as the future package structure becomes more clear.

About that discussion on package structure (Package & module organization): I still think that deserves a separate issue or PR. Do you want to work on that, or shall I just open an issue for anyone to pick up?

peternowee added a commit to peternowee/pydot that referenced this issue May 11, 2020
Reasons why support for Python 3.5 is not dropped yet:

- We do not urgently need Python 3.6+ specific functionality.
- Python 3.5 is still receiving security support until 2020-09-13 [1].
- Recent PyPI download statistics for pydot [2] show that 20 to 40% of
  the users are still on Python 3.5.

For further details, see [pydot#171][3].

[1]: https://devguide.python.org/#status-of-python-branches
[2]: https://pypistats.org/packages/pydot
[3]: pydot#171 (comment)
peternowee added a commit to peternowee/pydot that referenced this issue May 11, 2020
Why support for Python 3.5 is not dropped yet:

- We do not urgently need Python 3.6+ specific functionality.
- Python 3.5 is still receiving security fixes until 2020-09-13 [1][2].
- Recent PyPI download statistics for pydot [3] show that 20 to 40% of
  the users are still on Python 3.5.

For further details, see [pydot#171][4].

[1]: https://en.wikipedia.org/wiki/History_of_Python#Table_of_versions
[2]: https://devguide.python.org/#status-of-python-branches
[3]: https://pypistats.org/packages/pydot
[4]: pydot#171 (comment)
peternowee added a commit to peternowee/pydot that referenced this issue May 15, 2020
pydot currently prints some error messages to standard output
(`stdout`). Printing anything is considered bad practice for libraries
[1] [2]. This becomes even more disturbing if the prints contain long
Graphviz output or DOT strings.

[1]: https://stackoverflow.com/questions/4201856/error-handling-strategies-in-a-shared-library-c
[2]: https://www.reddit.com/r/C_Programming/comments/97ebfj/redirect_stdout_and_stderr_to_a_buffer/e47n9kh/

This commit lays the groundwork for switching from printing to logging
by providing a basic library logging setup and some user documentation
on how to read the logs. Later commits will replace the actual printing
by logging.

Discussion: pydot#171.
@ankostis
Copy link

Regarding the pesky disable_existing_loggers issue, the solution is simple, as explained in this decade old blog, not to store the logger on the module, but to ask it everytime.

I have not ever seen any code doing that :-).

@peternowee
Copy link
Member

Yes, thank you, I had come across that as well. It is quite smart, but also very unpractical, requiring getLogger() calls sprinkled throughout the code (or using a custom logging method). In the end, as you suggested, I just stopped worrying about all the possible logging setups and just pointed to the Python logging documentation and the logging_tree package (see PR #231).

peternowee added a commit to peternowee/pydot that referenced this issue May 29, 2020
pydot currently prints some error messages to standard output
(`stdout`). Printing anything is considered bad practice for libraries
[1] [2]. This becomes even more disturbing if the prints contain long
Graphviz output or DOT strings.

[1]: https://stackoverflow.com/questions/4201856/error-handling-strategies-in-a-shared-library-c
[2]: https://www.reddit.com/r/C_Programming/comments/97ebfj/redirect_stdout_and_stderr_to_a_buffer/e47n9kh/

This commit lays the groundwork for switching from printing to logging
by providing a basic library logging setup and some user documentation
on how to read the logs. Later commits will replace the actual printing
by logging.

Discussion: pydot#171.
@peternowee peternowee modified the milestones: 1.3.0, 1.5.0 or 2.0.0 Aug 29, 2020
peternowee added a commit to peternowee/pydot that referenced this issue Sep 23, 2020
When parsing DOT strings or files, pydot's parser calls the third-party
library PyParsing. If PyParsing meets a problem, it raises a
`ParseException`, which pydot then uses to report some details.

Currently, during the handling of that exception, a `TypeError` occurs.
It is caused by an attempt to concatenate a string with the exception
using the `+` (addition/plus) operator, which is [not allowed][1]. The
problem was introduced in 2016 by commit [b4a3810][2].

This current commit now removes that concatenation and goes back to
printing three separate lines. The end result is the same as Lance
Hepler's [original 2013 solution][3] and [the example code in the
PyParsing documentation][4].

I tested that behavior is now as expected (printing parser error
details without `TypeError`) under Python 2.7.16 with PyParsing 2.4.2
and Python 3.7.3 with PyParsing 2.4.2 and 3.0.0a2 ([486b1fd2e][5]).
Unit tests all pass (same three combinations, when run together with
the test suite fix proposed in PR 211).

This [closes pydot#176][6] ("Error in error handling hides the real error
message"), reported by Shish, and resolves the `TypeError` part of
[issue pydot#171][7] ("Exception on parsing dot file"), reported by Michael
Goerz.

[1]: https://docs.python.org/3/reference/expressions.html#binary-arithmetic-operations
[2]: pydot@b4a3810#diff-baef597193866f900a2726a8a4667b12R563-R566
[3]: nlhepler/pydot@687eb7e#diff-baef597193866f900a2726a8a4667b12R551-R553
[4]: pyparsing/pyparsing@2c6f881#diff-7ab8c6f0d66d170e10b60b9f65334e18R694-R697
[5]: https://github.com/pyparsing/pyparsing/tree/486b1fd2ef7e98966665f915bc59856996ffb5b0
[6]: pydot#176
[7]: pydot#171
peternowee added a commit to peternowee/pydot that referenced this issue Jun 24, 2021
This starts a series of commits to drop support for Python 2 and 3.4 as
discussed in [pydot#171][171] and [pydot#229][229].

| Python version | Python EOL | PyPI 2020-05-02 | PyPI 2021-06-21 |
|----------------|------------|-----------------|-----------------|
| Python 2.7     | 2020-01-01 |             26% |             12% |
| Python 3.4     | 2019-03-18 |              0% |              0% |
| Python 3.5     | 2020-09-30 |             27% |              3% |
| Python 3.6     | 2021-12    |             21% |             14% |
| Python 3.7     | 2023-06    |             23% |             51% |
| Python 3.8     | 2024-10    |              3% |             13% |
| Python 3.9     | 2025-10    |              0% |              6% |
| Python 3.10    | 2026-10    |               - |              0% |

EOL : End of life, from End of security support on [Wikipedia][1].
PyPI: Python Package Index statistics for pydot from [PyPIstats.org][2].

Without support for Python 2, wheel distributions of pydot can [no
longer be marked as "universal"][3], so removing that from `setup.cfg`.

**USER FEEDBACK REQUESTED**
We are considering if pydot 2.0 should drop support for Python 3.5
and 3.6 as well. If this would affect you, please leave a comment in
[pydot#268][268].

[1]: https://en.wikipedia.org/w/index.php?title=History_of_Python&oldid=1022680403#Table_of_versions
[2]: https://pypistats.org/packages/pydot
[3]: https://github.com/pypa/packaging.python.org/blob/44313e5db4d729eede0bcb91c08d6ec93e89c5c8/source/guides/dropping-older-python-versions.rst#dealing-with-the-universal-wheels
[171]: pydot#171 (comment)
[229]: pydot#229
[268]: pydot#268
peternowee added a commit to peternowee/pydot that referenced this issue Jun 25, 2021
This starts a series of commits to drop support for Python 2 and 3.4 as
discussed in [pydot#171][171] and [pydot#229][229].

| Python version | Python EOL | PyPI 2020-05-02 | PyPI 2021-06-21 |
|----------------|------------|-----------------|-----------------|
| Python 2.7     | 2020-01-01 |             26% |             12% |
| Python 3.4     | 2019-03-18 |              0% |              0% |
| Python 3.5     | 2020-09-30 |             27% |              3% |
| Python 3.6     | 2021-12    |             21% |             14% |
| Python 3.7     | 2023-06    |             23% |             51% |
| Python 3.8     | 2024-10    |              3% |             13% |
| Python 3.9     | 2025-10    |              0% |              6% |
| Python 3.10    | 2026-10    |               - |              0% |

EOL : End of life, from End of security support on [Wikipedia][1].
PyPI: Python Package Index statistics for pydot from [PyPIstats.org][2].

Without support for Python 2, wheel distributions of pydot can [no
longer be marked as "universal"][3], so removing that from `setup.cfg`.

**USER FEEDBACK REQUESTED**
We are considering if pydot 2.0 should drop support for Python 3.5
and 3.6 as well. If this would affect you, please leave a comment in
[pydot#268][268].

[1]: https://en.wikipedia.org/w/index.php?title=History_of_Python&oldid=1022680403#Table_of_versions
[2]: https://pypistats.org/packages/pydot
[3]: https://github.com/pypa/packaging.python.org/blob/44313e5db4d729eede0bcb91c08d6ec93e89c5c8/source/guides/dropping-older-python-versions.rst#dealing-with-the-universal-wheels
[171]: pydot#171 (comment)
[229]: pydot#229
[268]: pydot#268
@peternowee
Copy link
Member

peternowee commented Jun 30, 2021

Hi @SHuang-Broad, I think the issue you reported in #269 is a duplicate of this one. Look at this comment of @johnyf in 2018 regarding the ParsingError. You can ignore the long discussion of 2020, which spawned from the TypeError that was solved in pydot 1.4.2 already and then moved on to other subjects.

Back to the ParsingError: @johnyf proposed a fix in commit f504c8b of branch dev, later rebased to commit a8b0df4 of #241. I have not had time to think about if the fix could lead to any regressions. It was too risky for the point release pydot 1.4.2 and for pydot 2.0 I am focusing on laying the foundations first. After that, there are also some other people waiting for responses for me. Since the fix proposed by @johnyf seems quite straightforward and we are in development stage now, I suppose some limited testing could suffice, I could merge it quickly and then we would just see how it works out in practice, but I still need to think about that. As said, some of the groundwork needs to come first anyway (#229: planned for merging end of this week, #230: PR in the making). A fix should be part of the final pydot 2.0 anyway, but that may still take many months.

Until then, hope you can use the workaround mentioned by @johnyf: Use a comma , instead of a semicolon ; to separate DOT attributes.

peternowee added a commit to peternowee/pydot that referenced this issue Jul 12, 2021
This commit brings the directory layout in line with current
recommendations by the Python Packaging Authority and others.

    Before                        After

    |-- setup.py                  |-- setup.py
    |                             |-- src/
    |                             |   `-- pydot/
    |                             |       |-- __init__.py
    |-- pydot.py                  |       |-- core.py
    |-- dot_parser.py             |       `-- dot_parser.py
    `-- test/                     `-- test/
        `-- pydot_unittest.py         `-- pydot_unittest.py

There are many opinions on what would be the best directory layout for
a Python project, particularly on whether or not to use an additional
`src/` layer. For example:

- https://blog.ionelmc.ro/2014/05/25/python-packaging/#the-structure
- pypa/packaging.python.org#320
- https://github.com/pypa/packaging.python.org/blob/7866cb69f1aa290a13c94da77cfb46a079cfcfa2/source/tutorials/packaging-projects.rst#a-simple-project
- https://stackoverflow.com/questions/193161/what-is-the-best-project-structure-for-a-python-application

Suffice to say, I can basically understand the arguments in favor of a
`src/` layout and do not find the arguments against it strong enough
not to use it.

Moved the dunder global variables (`__version__` etc.) to `__init__.py`
to ensure they remain available to the user doing `import pydot`. (They
would not be included in `from pydot.core import *` because their names
start with an underscore and I did not want to start keeping `__all__`
just for this.) A test for `pydot.__version__` is added to the
testsuite as well.

.... TODO: Something about separating parsing from core or not? ...

Discussed in pydot#171, pydot#230 and pydot#271.

Special thanks to Kostis Anagnostopoulos (@ankostis) for his advice.
peternowee added a commit to peternowee/pydot that referenced this issue Jul 12, 2021
- Module structure following example from NetworkX and other projects.
  This adds an `import pydot` to `core.py`, which is cyclic/circular,
  but necessary to make the exceptions available. Cyclic imports are
  not uncommon and Python can handle this. Using the exceptions in
  pydot source code now requires prefixing `pydot.`, for example:

      raise pydot.Error("Message")

- Exception hierarchy changes. See ChangeLog and class docstrings in
  this commit for details. Additional background notes:

  - Removal of the unused exception class `InvocationException`: It was
    introduced in 842173c of 2008 (v1.0.2), but got in disuse when
    commits 9b3c1a1 and bc639e7 of 2016 (v1.2.0) fell back to using
    `Exception` and `assert` (`AssertionError`) again. If we ever need
    a custom class like this again in the future, it would probably
    have a different signature for context data (e.g. a DOT string,
    input and output), different parent class (or classes, e.g.
    `PydotException, CalledProcessError`) and perhaps a different name
    (e.g. ending in `...Error`), so no need to keep the old class
    around for that.

Further additions to the exception hierarchy are likely before the
final release of pydot 2.0.

Discussed in pydot#171, pydot#230 and pydot#271.
peternowee added a commit to peternowee/pydot that referenced this issue Jul 30, 2021
This commit brings the directory layout in line with current
recommendations by the Python Packaging Authority and others.

    Before                        After

    |-- setup.py                  |-- setup.py
    |                             |-- src/
    |                             |   `-- pydot/
    |                             |       |-- __init__.py
    |-- pydot.py                  |       |-- core.py
    |-- dot_parser.py             |       `-- dot_parser.py
    `-- test/                     `-- test/
        `-- pydot_unittest.py         `-- pydot_unittest.py

There are many opinions on what would be the best directory layout for
a Python project, particularly on whether or not to use an additional
`src/` layer. For example:

- https://blog.ionelmc.ro/2014/05/25/python-packaging/#the-structure
- pypa/packaging.python.org#320
- https://github.com/pypa/packaging.python.org/blob/7866cb69f1aa290a13c94da77cfb46a079cfcfa2/source/tutorials/packaging-projects.rst#a-simple-project
- https://stackoverflow.com/questions/193161/what-is-the-best-project-structure-for-a-python-application

Suffice to say, I can basically understand the arguments in favor of a
`src/` layout and do not find the arguments against it strong enough
not to use it.

Moved the dunder global variables (`__version__` etc.) to `__init__.py`
to ensure they remain available to the user doing `import pydot`. (They
would not be included in `from pydot.core import *` because their names
start with an underscore and I did not want to start keeping `__all__`
just for this.) A test for `pydot.__version__` is added to the
testsuite as well.

A question that remains open for the moment is whether we should reduce
the weight of pydot "core" by splitting off functionality, most notably
the parsing functionality. Although that would reduce pydot loading
time, it would also make it more difficult for users to discover and
use that functionality. For details see:
pydot#271 (comment)
For now, things are kept as-is, namely that `pydot` always imports
`dot_parser`. This is now accomplished via `__init__.py` and `core.py`.

Discussed in pydot#171, pydot#230 and pydot#271.

Special thanks to Kostis Anagnostopoulos (@ankostis) for his advice.
peternowee added a commit to peternowee/pydot that referenced this issue Jul 30, 2021
- Module structure following example from NetworkX and other projects.
  This adds an `import pydot` to `core.py`, which is cyclic/circular,
  but necessary to make the exceptions available. Cyclic imports are
  not uncommon and Python can handle this. Using the exceptions in
  pydot source code now requires prefixing `pydot.`, for example:

      raise pydot.Error("Message")

- Exception hierarchy changes. See ChangeLog and class docstrings in
  this commit for details. Additional background notes:

  - Removal of the unused exception class `InvocationException`: It was
    introduced in 842173c of 2008 (v1.0.2), but got in disuse when
    commits 9b3c1a1 and bc639e7 of 2016 (v1.2.0) fell back to using
    `Exception` and `assert` (`AssertionError`) again. If we ever need
    a custom class like this again in the future, it would probably
    have a different signature for context data (e.g. a DOT string,
    input and output), different parent class (or classes, e.g.
    `PydotException, CalledProcessError`) and perhaps a different name
    (e.g. ending in `...Error`), so no need to keep the old class
    around for that.

Further additions to the exception hierarchy are likely before the
final release of pydot 2.0.

Discussed in pydot#171, pydot#230 and pydot#271.
peternowee added a commit to peternowee/pydot that referenced this issue Aug 2, 2021
pydot currently prints some error messages to standard output
(`stdout`). Printing anything is considered bad practice for libraries
[1] [2]. This becomes even more disturbing if the prints contain long
Graphviz output or DOT strings.

[1]: https://stackoverflow.com/questions/4201856/error-handling-strategies-in-a-shared-library-c
[2]: https://www.reddit.com/r/C_Programming/comments/97ebfj/redirect_stdout_and_stderr_to_a_buffer/e47n9kh/

This commit lays the groundwork for switching from printing to logging
by providing a basic library logging setup and some user documentation
on how to read the logs. Later commits will replace the actual printing
by logging.

Discussed in pydot#171 and pydot#231.
lkk7 added a commit that referenced this issue Apr 14, 2024
* Implement basic library logging

pydot currently prints some error messages to standard output
(`stdout`). Printing anything is considered bad practice for libraries
[1] [2]. This becomes even more disturbing if the prints contain long
Graphviz output or DOT strings.

[1]: https://stackoverflow.com/questions/4201856/error-handling-strategies-in-a-shared-library-c
[2]: https://www.reddit.com/r/C_Programming/comments/97ebfj/redirect_stdout_and_stderr_to_a_buffer/e47n9kh/

This commit lays the groundwork for switching from printing to logging
by providing a basic library logging setup and some user documentation
on how to read the logs. Later commits will replace the actual printing
by logging.

Discussed in #171 and #231.

* Fixup: Black reformatting

* Fixup: Use `__name__` to name loggers

* Fixup: Add parent logger, Reorder registration

Notes:
- I am aware that in `__init__.py`, the placement of the logger
  registration before the imports of the submodules leads to additional
  warnings in code checkers:
  From pylint:
      C0413: Import "from pydot.exceptions import *" should be placed at the top of the module (wrong-import-position)
      C0413: Import "from pydot.core import *" should be placed at the top of the module (wrong-import-position)
  From flake8:
      E402 module level import not at top of file
      E402 module level import not at top of file
  I do not really see how we can avoid this if we want to log a first
  message early during pydot initialization. Also note that the
  concerned imports are all pydot "internal" submodules and that for
  `__init__.py` importing those is its main task, so this having a
  central place in the file does not seem unnatural.
- In `test/pydot_unittest.py`, added a final `importlib.reload(pydot)`
  to prevent subsequent tests from failing, probably caused by one of
  the caveats mentioned in:
  https://docs.python.org/3/library/importlib.html#importlib.reload

* Fixup: Documentation reordering 1

* Fixup: Documentation reordering 2

* Fixup: Documentation changes (minor)

Notes:
- Removed the hint that the parent logger can be used to control all
  pydot loggers, because this is basic knowledge covered by the Python
  documentation to which we already refer.

* Renumber links, Update PyPI URL in README.md

* Fixup: update the changelog

* Fixup: stabilize black usage

---------

Co-authored-by: lkk7 <lukaszlapinski7@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants