Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standardisation of common attributes (classes, names) #32

Open
chrisjsewell opened this issue Apr 7, 2022 · 7 comments
Open

Standardisation of common attributes (classes, names) #32

chrisjsewell opened this issue Apr 7, 2022 · 7 comments
Labels
Accepted Small improvements to the AST or MyST syntax have been approved.

Comments

@chrisjsewell
Copy link
Member

As specified here: https://docutils.sourceforge.io/docs/ref/doctree.html#common-attributes, there are some common attributes associated with all docutils nodes, and this should essentially be the same here.

As an example, here:

This should be classes: ['tip']

@rowanc1
Copy link
Member

rowanc1 commented Apr 7, 2022

Is there something in the mdast ecosystem to point to?

React is classNames and html is class. Both are strings.

Can we have an array of classes with spaces in them?

This shouldn't be allowed:

classes:
    - 'myClass mySecondclass'

Instead:

classes:
    - myClass
    - mySecondclass

@chrisjsewell
Copy link
Member Author

@rowanc1 rowanc1 added the Accepted Small improvements to the AST or MyST syntax have been approved. label Apr 7, 2022
@chrisjsewell
Copy link
Member Author

chrisjsewell commented Apr 10, 2022

@rowanc1 and @fwkoch further to our discussion regarding identifier:
on further thought, I feel it's just irreconcilable with jupyter-book/myst-parser, to only allow a single identifier per element.

Take this simple example:

# main

## subtitle

(target1)=
(target2)=
### Sub-subtitle

[ref1](target1)
[ref2](sub-subtitle)

This is how it is resolved by docutils:

$ myst-docutils-pseudoxml test.md            
<document ids="main" names="main" source="test.md" title="main">
    <title>
        main
    <subtitle ids="subtitle" names="subtitle">
        subtitle
    <target refid="target1">
    <target refid="target2">
    <section ids="sub-subtitle target2 target1" names="sub-subtitle target2 target1">
        <title>
            Sub-subtitle
        <paragraph>
            <reference refid="target1">
                ref1
            
            <reference refid="sub-subtitle">
                ref2

As you can see, not only is the header assigned the identifiers coming from the targets, it is also assigned a "slug" identifier based on its content (which is not an unusual practice when rendering Markdown).

Not allowing multiple identifiers would render this example, and by extension jupyter-book itself, non myst-spec compliant, which is obviously extremely problematic 😬.

To clarify some extra terminology from docutils:

  • targets are deemed explicit identifiers, in that all must be unique across the document, otherwise a warning is emitted
  • heading slugs are deemed implicit identifiers, in that only the first occurrence of an identifier within the document is kept, and any duplicates are silently dropped

Here also is the rendering of this example as html/latex:

$ myst-docutils-html5 test.md
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
...
<body>
<main id="main">
<h1 class="title">main</h1>
<p class="subtitle" id="subtitle">subtitle</p>

<section id="sub-subtitle">
<span id="target2"></span><span id="target1"></span>
<h2>Sub-subtitle</h2>
<p>
<a class="reference internal" href="#target1">ref1</a>
<a class="reference internal" href="#sub-subtitle">ref2</a>
</p>
</section>
</main>
</body>
</html>
$ myst-docutils-latex test.md
...
\begin{document}
\title{main%
  \label{main}%
  \\%
  \DUdocumentsubtitle{subtitle}%
  \label{subtitle}}
\author{}
\date{}
\maketitle


\section{Sub-subtitle%
  \label{sub-subtitle}%
  \label{target2}%
  \label{target1}%
}

\hyperref[target1]{ref1}
\hyperref[sub-subtitle]{ref2}

\end{document}

@chrisjsewell
Copy link
Member Author

FYI, if you want to see how anything is resolved by myst-docutils, simply install https://github.com/pypa/pipx, and pipx install myst-docutils, which will give you access to the above CLIs

@rowanc1
Copy link
Member

rowanc1 commented Apr 10, 2022

Can you point to a user using multiple stacked targets in a Jupyter Book that exists today? Or an example of this being used in a sphinx project?

A few notes:

  1. This is a divergence from what is set out in mdast (which already has precedence for identifier/label)
  2. It is a large complication to support in down steam tools.
  3. Helping users get to a canonical ID should be a goal of our work. This has benefits in science communication (e.g. see work on PIDs).
  4. The HTML example output is not semantic, (e.g. $('id').innerText). I think we can do better than docutils here.
  5. The LaTeX compiles, but is also very non-standard, and there are zero tutorials I have found that suggest this is even possible. (i.e. it is a relatively unknown quirk of latex)
  6. The python parser could easily: throw a warning on multiple stacked labels before passing on to sphinx, and help the user improve their references to get to a canonical, explicit label.
  7. There is no reason that myst has to support all quirks of docutils

My take: multiple-ids are unused (I have never seen this in any non-contrived, user example1), bring no additional features to the end user, and can be easily cleaned up by throwing warnings in a post-parsing transform in any implementation. Introducing a list of IDs to refer to a single element is a significant additional complexity that means all state management becomes harder, especially for cross-project linking (e.g. some equivalent of inter-sphinx, or any work around PIDs ongoing in research/library communities).

Looking forward to talking this through on Monday. There are lots of options on how to support this in Python/JB before passing on to sphinx. I am suggesting we support a subset of sphinx's complexity, and provide tools to help users refactor their documents with explicit references/labels.

Footnotes

  1. With the possible exception of implicit references that have subsequently been made explicit. I think this can be taken care of in a state-management task rather than in the MDAST spec though.

@chrisjsewell
Copy link
Member Author

Looking forward to talking this through on Monday.

Yeh absolutely, happy to discuss. What I want to emphasize, is this is not a trivial choice.
As we have discussed previously, myst-spec should initially represent what myst actually is now, not what we want it to be in the future

Can you point to a user using multiple stacked targets in a Jupyter Book that exists today?

Any project that refers to headings by both targets and heading slugs.

There are lots of options on how to support this in Python/JB before passing on to sphinx.
The python parser could easily: throw a warning on multiple stacked labels before passing on to sphinx

I feel this is somewhat a misunderstanding of how Jupyter Book (via myst-parser) works:
None of this processing is done by myst-parser, it's all handled by docutils/sphinx.
Getting mst-parser to act in this manner, if it could be done, would at least require a substantial re-write, to override core parts of docutils functionality

There is no reason that myst has to support all quirks of docutils

I would not say that this is merely a quirk of docutils though, it is a core design aspect: https://docutils.sourceforge.io/docs/ref/doctree.html#common-attributes

significant additional complexity, especially for cross-project linking (e.g. some equivalent of inter-sphinx)

But inter-sphinx already does work with multiple IDs

This is a divergence from what is set out in mdast (which already has precedence for identifier/label)

I feel this is a misunderstanding of what identifier is actually used for in MDAST.
It is not a canonical ID for a node and, whether we use singular or multiple IDs for a node, they should not be stored under identifier, specifically to delineate from MDAST's identifier
Take as an example:

[a]

[a]: https://example1.com
[a]: https://example2.com

goes to MDAST resembling

<paragraph>
  <linkReference identifier="a">
<definition identifier="a">
<definition identifier="a">
  1. linkReference has an identifier which is not actually its identify, it is what is referencing (https://github.com/syntax-tree/mdast#association)
  2. there are multiple definitions with the same identifier (because they are eventually resolved "implicitly")
  3. The definition.identifier can only be referenced by linkReference, they are completely independent of myst identifiers, e.g. you cannot do {ref}`a`

this is also the same for footnoteReference/footnoteDefinition

Whether we use something like mystId (singular) or mystIds (plural), a core requirement should be:
in a "well-formed" document, I am able to walk through the AST, and generate an unambiguous mapping of REFID -> Node, in order to resolve what a {ref) is pointing towards.
For this requirement, note it does not actually matter whether the relationship is one-to-one, or many-to-one
(just as long as it is not one-to-many, or many-to-many)

Helping users get to a canonical ID should be a goal of our work.

Taking the above discussion, I would ask what do you mean by a canonical ID?
Since you can essentially have multiple ID "sets" within a single document: IDs relating definitions, footnotes, {ref}, Jupyter code cells, intersphinx (there is now a separate external role (sphinx-doc/sphinx#9822).

sphinx essentially handles this via the any role, and the resolution logic underpinning it (https://www.sphinx-doc.org/en/master/usage/restructuredtext/roles.html?highlight=roles#role-any).
Domains can maintain their own identifier maps, for particular reference sets.

@chrisjsewell
Copy link
Member Author

A thing that one might consider, is also setting a (probably SHA256) UUID for every node in the AST. This would provide an "unequivocal" identifier for all nodes, irrespective of what was referencing it.
Then specific reference names, are just aliases to those

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Accepted Small improvements to the AST or MyST syntax have been approved.
Projects
None yet
Development

No branches or pull requests

2 participants