Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow non-RST parsers to substitute the Locale transform #8852

Open
jpmckinney opened this issue Feb 8, 2021 · 9 comments · May be fixed by #12238
Open

Allow non-RST parsers to substitute the Locale transform #8852

jpmckinney opened this issue Feb 8, 2021 · 9 comments · May be fixed by #12238

Comments

@jpmckinney
Copy link
Contributor

jpmckinney commented Feb 8, 2021

Describe the bug

79650f5 (#4938) added code that adds a line of hyphens under (for example) an admonition's title's text.

However, a line of hyphens is an RST-specific syntax for a heading. It does not indicate a heading in all parsers.

In Markdown, for example, a line of hyphens is a second-level heading https://spec.commonmark.org/0.29/#setext-headings. This can cause the heading level to jump from 0 to 2.

To Reproduce

I can create a test scenario, but the bug should be clear from reading the code, with the above context.

Expected behavior

For Markdown, we can just change the hyphen to an equals sign, which Markdown will interpret as a first-level heading.

For other parsers, I suppose there would need to be hooks for those parsers to convert the string to a heading.

@tk0miya
Copy link
Member

tk0miya commented Feb 9, 2021

I'm interested what is

@tk0miya tk0miya closed this as completed Feb 9, 2021
@jpmckinney
Copy link
Contributor Author

@tk0miya I think your comment got cut off?

Also, the general case isn’t fixed - #8853 just fixed one scenario.

@tk0miya tk0miya reopened this Feb 9, 2021
@tk0miya
Copy link
Member

tk0miya commented Feb 9, 2021

Oops, Sorry. My understanding is current Locale transform is developed only for reST translation. So it does not work well with other parsers. I merged #8853 to Sphinx because it does not affect translating of the reST document. But it does not mean that Locale transform will support the markdown translation. If your goal is translating the whole of markdown document, we need to develop the mechanism to switch a translation-transform for each parser.

@jpmckinney
Copy link
Contributor Author

Thanks, yes, that seems like the appropriate solution.

@jpmckinney jpmckinney changed the title Locale Transform has RST-specific code Allow non-RST parsers to substitute the Locale transform Feb 9, 2021
@jpmckinney
Copy link
Contributor Author

jpmckinney commented Feb 13, 2021

I'm thinking the Locale transform is mostly generic. There are just a few places where it constructs RST strings. I'm wondering if those lines can be moved to methods (e.g. title, literal_block, etc.). That way, another parser can subclass the Locale transform and override those methods.

@n-peugnet
Copy link
Contributor

If I understand correctly, all these issues come from the fact that in publish_msgstr() we parse the source message once again but without the context of the rest of the source file:

doc = reader.read(
source=StringInput(source=source,
source_path=f"{source_path}:{source_line}:<translated>"),
parser=parser,
settings=settings,
)

From what it saw in the extracted messages from the gettext builder, it seems a message can never span over multiple "block elements", like for exemple multiple paragraphs. So a cleaner way to achieve the desired result would be to only parse for "inline elements". For reST it would be Inliner.parse(), for MyST it would be ParserInline.parse.

There could be a function in Sphinx's Parser base class like parse_inline() that children should implement.

sphinx/sphinx/parsers.py

Lines 24 to 31 in 80d5396

class Parser(docutils.parsers.Parser):
"""
A base class of source parsers. The additional parsers should inherit this class instead
of ``docutils.parsers.Parser``. Compared with ``docutils.parsers.Parser``, this class
improves accessibility to Sphinx APIs.
The subclasses can access sphinx core runtime objects (app, config and env).
"""

It seems like it should allow to remove all the strange hacks in the i18n code like:

# Avoid "Literal block expected; none found." warnings.
# If msgstr ends with '::' then it cause warning message at
# parser.parse() processing.
# literal-block-warning is only appear in avobe case.
if msgstr.strip().endswith('::'):
msgstr += '\n\n dummy literal'
# dummy literal node will discard by 'patch = patch[0]'

# Structural Subelements phase1
# There is a possibility that only the title node is created.
# see: https://docutils.sourceforge.io/docs/ref/doctree.html#structural-subelements
if isinstance(node, nodes.title):
# This generates: <section ...><title>msgstr</title></section>
msgstr = msgstr + '\n' + '=' * len(msgstr) * 2

Does this idea make sense to you? Do you think it could work?

@jpmckinney
Copy link
Contributor Author

I'd like @chrisjsewell and @choldgraf from MyST Parser to reflect on your proposed parse_inline.

In general, removing those hacks would be ideal!

@n-peugnet n-peugnet linked a pull request Apr 7, 2024 that will close this issue
@n-peugnet
Copy link
Contributor

n-peugnet commented Apr 7, 2024

@jpmckinney:

I'd like @chrisjsewell and @choldgraf from MyST Parser to reflect on your proposed parse_inline.

In general, removing those hacks would be ideal!

FYI, I made a proof of concept in #12238

@jpmckinney
Copy link
Contributor Author

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants