Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split attribute quoting from ID quoting, rewrite attribute formatting, store names unquoted #363

Merged
merged 10 commits into from
Jul 11, 2024

Conversation

ferdnyc
Copy link
Member

@ferdnyc ferdnyc commented Jun 17, 2024

As the scope of #339 has creeped rather a lot and threatened to push it into a state of perpetual limbo, I've started splitting out some of the changes into more focused PRs. This is another one of those, though the rather far-ranging nature might make it seem otherwise. This PR affects:

Attribute formatting

Name storage & formatting

  • Remove all quoting of ID strings on input, switching to instead always performing quoting on output, so that strings are stored internally the same way the user inputs them, making lookups via .get_*() methods less confusing/annoying.
  • Take advantage of the fact that we know when we're outputting a name vs. some other syntax, and quote graphviz keywords used as names to avoid syntax errors in the generated source. Make exception for the default-values node names.

Closes #361
Closes #355
Closes #181
Fixes #358
Fixes #282
Fixes #258
Fixes #246
Fixes #180
Fixes #111
Fixes #85

@ferdnyc
Copy link
Member Author

ferdnyc commented Jun 17, 2024

(Actually I think #85 was already fixed by #320, but it never got closed so might as well close it here.)

@ferdnyc ferdnyc force-pushed the format-attrs branch 3 times, most recently from 9ace32a to bee1cac Compare June 17, 2024 10:07
Copy link

This pull request has conflicts, please resolve those so that the changes can be evaluated.

Copy link

All conflicts have been resolved, thanks!

Copy link
Member

@lkk7 lkk7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great change, thanks.

I believe that the massive test suite with all the dot files should cover these changes safely?

@lkk7 lkk7 mentioned this pull request Jun 18, 2024
@ferdnyc
Copy link
Member Author

ferdnyc commented Jun 20, 2024

The tests probably do cover all of this, but I'd feel better adding some extra tests specifically for the rewritten functionality.

I started writing some, then stumbled on the fact that it isn't always easy to compare values in the test suite. (On the plus side, trying to compare Node values was how I stumbled on the broken parenting behavior that led to #365.)

ferdnyc added 9 commits July 9, 2024 20:44
Instead of quote_if_necessary() trying to be all things to all
situations, split into quote_id_if_necessary(), which uses the
same logic as before, and quote_attr_if_necessary(), which is
heavily biased towards quoting, not doing so only if the value
is numeric, HTML, or already double-quoted.
Add a _format_attr() function to format single attributes, and call
it in a list comprehension when generating formatted attributes for
each of the to_string() methods.
`quote_id_if_necessary()` will quote any ID that matches a keyword,
except ones it's passed in its `unquoted_keywords` argument. (This
is used by `Node.to_string()` to prevent quoting of the `node`,
`edge`, and `graph` defaults keywords.)
By storing names without added quotes, using methods like `get_node()`
becomes easier, because the string the user has to search for is the
same one they initially created the node with. Storing quotes-needed
IDs _internally_ unquoted is fine, as long as we always quote them
before they're output in `to_string()`.
- Add isalnum() escape-hatch to any_needs_quotes() (instant False)
- Add prefix= argument to attrs_string(), to output with a prefix
  iff a formatted string is produced. (Used to prepend a space when
  necessary.)
- `test_id_storage_and_lookup` verifies that identifiers are stored in
  the form they were supplied by the caller (w/r/t quoting of values),
  can be looked up using the `get_*()` functions in the same format,
  and are reliably quoted only on output.
- `test_keyword_quoting` checks that Dot keywords can be used in
  graph name fields, node name fields, and attribute fields, and will
  be quoted to ensure they're parsed correctly. It also verifies that
  an edge attributes default node will _not_ have its name quoted.
@ferdnyc
Copy link
Member Author

ferdnyc commented Jul 10, 2024

@lkk7

OK, I have just now added sufficient new tests targeted specifically at the functionality in this PR, that I think I can declare it working and verified. (I also fixed the items brought up in review.)

I hit on a method of testing that the get_{node,edge,etc.}() methods were returning the expected objects (even though they won't be the same Python objects, because they're recreated each time a value is returned) by comparing the id() value of the obj_dict inside each object.

If, for example, Graph.get_node() returns a particular Node object, it won't be the same Node object that was originally passed to Graph.add_node(), but it will contain the same obj_dict that was originally passed in, because those are all stored by reference and don't get copied internally. So, code like this in the tests works, where direct comparison of the objects would not:

g = pydot.Graph()
a = pydot.Node("my node")
g.add_node(a)
a_out = g.get_node("my node")[0]
self.assertEqual(id(a_out.obj_dict), id(a.obj_dict))

I may even define __eq__ dunder methods for Node, Edge, etc. that implement that check, so that we can use some_node == some_returned_node or self.assertEqual(some_node, some_returned_node) and get the correct results. (It'd make the tests I just wrote a lot cleaner, for one.) But that can be a followup PR.

@ferdnyc
Copy link
Member Author

ferdnyc commented Jul 10, 2024

If I'm not mistaken all I've done there is a poor-man's reimplementation of is, since a is b is effectively a concise way of saying id(a) == id(b). unittest does have an assertIs(), so the test could theoretically also be written:

g = pydot.Graph()
a = pydot.Node("my node")
g.add_node(a)
a_out = g.get_node("my node")[0]
self.assertIs(a_out.obj_dict, a.obj_dict)

@lkk7 lkk7 self-requested a review July 11, 2024 21:30
Copy link
Member

@lkk7 lkk7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR, thank you!
I'll check and close all these issues (where applicable) this weekend.

@lkk7 lkk7 merged commit ea01315 into pydot:master Jul 11, 2024
18 checks passed
@tusharsadhwani
Copy link
Contributor

Amazing work!

@lkk7
Copy link
Member

lkk7 commented Jul 13, 2024

@ferdnyc Well, I shouldn't close the issues until the fixes are properly released.
I think now is a good time for a release, so I'll prepare a changelog

@ferdnyc ferdnyc deleted the format-attrs branch July 13, 2024 13:58
@ferdnyc
Copy link
Member Author

ferdnyc commented Jul 13, 2024

@lkk7

Oh... Oops. Sorry, they already closed. The PR was set to auto-close all of them.

I shouldn't close the issues until the fixes are properly released.

I mean, I guess there are two schools of thought on that. As soon as the fix is in the repo, the issue is solved. Anyone who wants to run with the master branch can always install the development code with:

pip install git+https://github.com/pydot/pydot.git

(Hmmm... we should probably add that to the README.)

@ferdnyc
Copy link
Member Author

ferdnyc commented Jul 13, 2024

(Ugh, that reminds me. As much of a pain in the ass as it can be, it's probably a mitzvah if we take the time to rename the master branch to main.)

@lkk7
Copy link
Member

lkk7 commented Jul 13, 2024

Oh, that's fine. The release will be coming soon anyway.

I'll add that note to README.md and I'll change the branch name. Edit: i see you're the README update. 😄 But I'll do the branch rename

chrizzFTD added a commit to chrizzFTD/networkx that referenced this pull request Aug 18, 2024
…as been fixed on pydot/pydot#363

Signed-off-by: Christian López Barrón <chris.gfz@gmail.com>
rossbar pushed a commit to networkx/networkx that referenced this pull request Aug 19, 2024

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
* removed _check_colon_quotes coming from issue
   pydot/pydot#258 which
   has been fixed on pydot/pydot#363

---------

Signed-off-by: Christian López Barrón <chris.gfz@gmail.com>
rossbar pushed a commit to Peiffap/networkx that referenced this pull request Sep 6, 2024
* removed _check_colon_quotes coming from issue
   pydot/pydot#258 which
   has been fixed on pydot/pydot#363

---------

Signed-off-by: Christian López Barrón <chris.gfz@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment