Split attribute quoting from ID quoting, rewrite attribute formatting, store names unquoted #363

ferdnyc · 2024-06-17T09:29:18Z

As the scope of #339 has creeped rather a lot and threatened to push it into a state of perpetual limbo, I've started splitting out some of the changes into more focused PRs. This is another one of those, though the rather far-ranging nature might make it seem otherwise. This PR affects:

Attribute formatting

Separate ID and attribute quoting, the same way Fix quoting logic for attribute values #339 started out doing
Fix quoting for attribute values
Completely rewrite the attribute formatting process, moving the meat of the work into new, centralized Common methods, which all classes' to_string() methods now call to get their formatted attributes
Remove the attribute sorting, stomping on @lkk7's Don't use sorted() on attributes in to_string() methods #361 (sorry! I did cherry-pick your test fixes into this PR, though.)

Name storage & formatting

Remove all quoting of ID strings on input, switching to instead always performing quoting on output, so that strings are stored internally the same way the user inputs them, making lookups via .get_*() methods less confusing/annoying.
Take advantage of the fact that we know when we're outputting a name vs. some other syntax, and quote graphviz keywords used as names to avoid syntax errors in the generated source. Make exception for the default-values node names.

Closes #361
Closes #355
Closes #181
Fixes #358
Fixes #282
Fixes #258
Fixes #246
Fixes #180
Fixes #111
Fixes #85

ferdnyc · 2024-06-17T09:35:04Z

(Actually I think #85 was already fixed by #320, but it never got closed so might as well close it here.)

github-actions · 2024-06-17T18:06:52Z

This pull request has conflicts, please resolve those so that the changes can be evaluated.

github-actions · 2024-06-17T18:09:48Z

All conflicts have been resolved, thanks!

lkk7

Great change, thanks.

I believe that the massive test suite with all the dot files should cover these changes safely?

src/pydot/core.py

ferdnyc · 2024-06-20T22:07:36Z

The tests probably do cover all of this, but I'd feel better adding some extra tests specifically for the rewritten functionality.

I started writing some, then stumbled on the fact that it isn't always easy to compare values in the test suite. (On the plus side, trying to compare Node values was how I stumbled on the broken parenting behavior that led to #365.)

Instead of quote_if_necessary() trying to be all things to all situations, split into quote_id_if_necessary(), which uses the same logic as before, and quote_attr_if_necessary(), which is heavily biased towards quoting, not doing so only if the value is numeric, HTML, or already double-quoted.

Add a _format_attr() function to format single attributes, and call it in a list comprehension when generating formatted attributes for each of the to_string() methods.

`quote_id_if_necessary()` will quote any ID that matches a keyword, except ones it's passed in its `unquoted_keywords` argument. (This is used by `Node.to_string()` to prevent quoting of the `node`, `edge`, and `graph` defaults keywords.)

By storing names without added quotes, using methods like `get_node()` becomes easier, because the string the user has to search for is the same one they initially created the node with. Storing quotes-needed IDs _internally_ unquoted is fine, as long as we always quote them before they're output in `to_string()`.

- Add isalnum() escape-hatch to any_needs_quotes() (instant False) - Add prefix= argument to attrs_string(), to output with a prefix iff a formatted string is produced. (Used to prepend a space when necessary.)

- `test_id_storage_and_lookup` verifies that identifiers are stored in the form they were supplied by the caller (w/r/t quoting of values), can be looked up using the `get_*()` functions in the same format, and are reliably quoted only on output. - `test_keyword_quoting` checks that Dot keywords can be used in graph name fields, node name fields, and attribute fields, and will be quoted to ensure they're parsed correctly. It also verifies that an edge attributes default node will _not_ have its name quoted.

ferdnyc · 2024-07-10T01:24:56Z

@lkk7

OK, I have just now added sufficient new tests targeted specifically at the functionality in this PR, that I think I can declare it working and verified. (I also fixed the items brought up in review.)

I hit on a method of testing that the get_{node,edge,etc.}() methods were returning the expected objects (even though they won't be the same Python objects, because they're recreated each time a value is returned) by comparing the id() value of the obj_dict inside each object.

If, for example, Graph.get_node() returns a particular Node object, it won't be the same Node object that was originally passed to Graph.add_node(), but it will contain the same obj_dict that was originally passed in, because those are all stored by reference and don't get copied internally. So, code like this in the tests works, where direct comparison of the objects would not:

g = pydot.Graph()
a = pydot.Node("my node")
g.add_node(a)
a_out = g.get_node("my node")[0]
self.assertEqual(id(a_out.obj_dict), id(a.obj_dict))

I may even define __eq__ dunder methods for Node, Edge, etc. that implement that check, so that we can use some_node == some_returned_node or self.assertEqual(some_node, some_returned_node) and get the correct results. (It'd make the tests I just wrote a lot cleaner, for one.) But that can be a followup PR.

ferdnyc · 2024-07-10T01:50:15Z

If I'm not mistaken all I've done there is a poor-man's reimplementation of is, since a is b is effectively a concise way of saying id(a) == id(b). unittest does have an assertIs(), so the test could theoretically also be written:

g = pydot.Graph()
a = pydot.Node("my node")
g.add_node(a)
a_out = g.get_node("my node")[0]
self.assertIs(a_out.obj_dict, a.obj_dict)

lkk7

Great PR, thank you!
I'll check and close all these issues (where applicable) this weekend.

tusharsadhwani · 2024-07-11T22:56:51Z

Amazing work!

lkk7 · 2024-07-13T12:32:16Z

@ferdnyc Well, I shouldn't close the issues until the fixes are properly released.
I think now is a good time for a release, so I'll prepare a changelog

ferdnyc · 2024-07-13T14:04:11Z

@lkk7

Oh... Oops. Sorry, they already closed. The PR was set to auto-close all of them.

I shouldn't close the issues until the fixes are properly released.

I mean, I guess there are two schools of thought on that. As soon as the fix is in the repo, the issue is solved. Anyone who wants to run with the master branch can always install the development code with:

pip install git+https://github.com/pydot/pydot.git

(Hmmm... we should probably add that to the README.)

ferdnyc · 2024-07-13T14:05:33Z

(Ugh, that reminds me. As much of a pain in the ass as it can be, it's probably a mitzvah if we take the time to rename the master branch to main.)

lkk7 · 2024-07-13T14:14:10Z

Oh, that's fine. The release will be coming soon anyway.

I'll add that note to README.md and I'll change the branch name. Edit: i see you're the README update. 😄 But I'll do the branch rename

…as been fixed on pydot/pydot#363 Signed-off-by: Christian López Barrón <chris.gfz@gmail.com>

* removed _check_colon_quotes coming from issue pydot/pydot#258 which has been fixed on pydot/pydot#363 --------- Signed-off-by: Christian López Barrón <chris.gfz@gmail.com>

ferdnyc force-pushed the format-attrs branch 3 times, most recently from 9ace32a to bee1cac Compare June 17, 2024 10:07

github-actions bot added the conflicts label Jun 17, 2024

github-actions bot removed the conflicts label Jun 17, 2024

ferdnyc mentioned this pull request Jun 17, 2024

Fix parenting behavior for duplicate-name nodes #365

Merged

lkk7 reviewed Jun 17, 2024

View reviewed changes

src/pydot/core.py Outdated Show resolved Hide resolved

src/pydot/core.py Outdated Show resolved Hide resolved

lkk7 mentioned this pull request Jun 18, 2024

HTML tags #366

Open

ferdnyc added 9 commits July 9, 2024 20:44

Standardize attribute formatting

8d9ddc2

Add a _format_attr() function to format single attributes, and call it in a list comprehension when generating formatted attributes for each of the to_string() methods.

Quote dot keywords, where appropriate

cafd018

`quote_id_if_necessary()` will quote any ID that matches a keyword, except ones it's passed in its `unquoted_keywords` argument. (This is used by `Node.to_string()` to prevent quoting of the `node`, `edge`, and `graph` defaults keywords.)

Don't sort attributes when formatting

fdb2461

Quoting and output-formatting tweaks

094adcb

- Add isalnum() escape-hatch to any_needs_quotes() (instant False) - Add prefix= argument to attrs_string(), to output with a prefix iff a formatted string is produced. (Used to prepend a space when necessary.)

Fix f-string usage for old Python

f0a157a

Fix unquoted_keywords

118cfd1

ferdnyc force-pushed the format-attrs branch from 9bb1e8d to ded7d10 Compare July 10, 2024 01:00

Address review notes

Loading
Loading status checks…

a6aa848

ferdnyc force-pushed the format-attrs branch from ef349b8 to a6aa848 Compare July 10, 2024 01:19

lkk7 self-requested a review July 11, 2024 21:30

lkk7 approved these changes Jul 11, 2024

View reviewed changes

lkk7 merged commit ea01315 into pydot:master Jul 11, 2024
18 checks passed

ferdnyc deleted the format-attrs branch July 13, 2024 13:58

ferdnyc mentioned this pull request Jul 15, 2024

Prepare changelog for 3.0.0 #378

Merged

MarcCote mentioned this pull request Jul 24, 2024

DOC: Fix typo in the code snippet provided in the docstring of nx_pydot.pydot_layout(). networkx/networkx#7572

Merged

chrizzFTD added a commit to chrizzFTD/networkx that referenced this pull request Aug 18, 2024

removed _check_colon_quotes coming from issue pydot/pydot#258 which h…

3ba2c21

…as been fixed on pydot/pydot#363 Signed-off-by: Christian López Barrón <chris.gfz@gmail.com>

chrizzFTD mentioned this pull request Aug 18, 2024

pydot - Remove Colon Check on Strings networkx/networkx#7606

Merged

ferdnyc mentioned this pull request Sep 20, 2024

Fix edge quoting (broken in 3.0.0) #384

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split attribute quoting from ID quoting, rewrite attribute formatting, store names unquoted #363

Split attribute quoting from ID quoting, rewrite attribute formatting, store names unquoted #363

ferdnyc commented Jun 17, 2024 •

edited

Loading

ferdnyc commented Jun 17, 2024

github-actions bot commented Jun 17, 2024

github-actions bot commented Jun 17, 2024

lkk7 left a comment

ferdnyc commented Jun 20, 2024

ferdnyc commented Jul 10, 2024

ferdnyc commented Jul 10, 2024 •

edited

Loading

lkk7 left a comment •

edited

Loading

tusharsadhwani commented Jul 11, 2024

lkk7 commented Jul 13, 2024

ferdnyc commented Jul 13, 2024

ferdnyc commented Jul 13, 2024

lkk7 commented Jul 13, 2024 •

edited

Loading

Split attribute quoting from ID quoting, rewrite attribute formatting, store names unquoted #363

Split attribute quoting from ID quoting, rewrite attribute formatting, store names unquoted #363

Conversation

ferdnyc commented Jun 17, 2024 • edited Loading

Attribute formatting

Name storage & formatting

ferdnyc commented Jun 17, 2024

github-actions bot commented Jun 17, 2024

github-actions bot commented Jun 17, 2024

lkk7 left a comment

Choose a reason for hiding this comment

ferdnyc commented Jun 20, 2024

ferdnyc commented Jul 10, 2024

ferdnyc commented Jul 10, 2024 • edited Loading

lkk7 left a comment • edited Loading

Choose a reason for hiding this comment

tusharsadhwani commented Jul 11, 2024

lkk7 commented Jul 13, 2024

ferdnyc commented Jul 13, 2024

ferdnyc commented Jul 13, 2024

lkk7 commented Jul 13, 2024 • edited Loading

ferdnyc commented Jun 17, 2024 •

edited

Loading

ferdnyc commented Jul 10, 2024 •

edited

Loading

lkk7 left a comment •

edited

Loading

lkk7 commented Jul 13, 2024 •

edited

Loading