Allow non-RST parsers to substitute the Locale transform #8852

jpmckinney · 2021-02-08T18:18:44Z

Describe the bug

79650f5 (#4938) added code that adds a line of hyphens under (for example) an admonition's title's text.

However, a line of hyphens is an RST-specific syntax for a heading. It does not indicate a heading in all parsers.

In Markdown, for example, a line of hyphens is a second-level heading https://spec.commonmark.org/0.29/#setext-headings. This can cause the heading level to jump from 0 to 2.

To Reproduce

I can create a test scenario, but the bug should be clear from reading the code, with the above context.

Expected behavior

For Markdown, we can just change the hyphen to an equals sign, which Markdown will interpret as a first-level heading.

For other parsers, I suppose there would need to be hooks for those parsers to convert the string to a heading.

The text was updated successfully, but these errors were encountered:

tk0miya · 2021-02-09T13:19:14Z

I'm interested what is

jpmckinney · 2021-02-09T14:13:45Z

@tk0miya I think your comment got cut off?

Also, the general case isn’t fixed - #8853 just fixed one scenario.

tk0miya · 2021-02-09T17:34:19Z

Oops, Sorry. My understanding is current Locale transform is developed only for reST translation. So it does not work well with other parsers. I merged #8853 to Sphinx because it does not affect translating of the reST document. But it does not mean that Locale transform will support the markdown translation. If your goal is translating the whole of markdown document, we need to develop the mechanism to switch a translation-transform for each parser.

jpmckinney · 2021-02-09T22:43:15Z

Thanks, yes, that seems like the appropriate solution.

jpmckinney · 2021-02-13T04:27:02Z

I'm thinking the Locale transform is mostly generic. There are just a few places where it constructs RST strings. I'm wondering if those lines can be moved to methods (e.g. title, literal_block, etc.). That way, another parser can subclass the Locale transform and override those methods.

n-peugnet · 2024-01-15T23:16:01Z

If I understand correctly, all these issues come from the fact that in publish_msgstr() we parse the source message once again but without the context of the rest of the source file:

sphinx/sphinx/transforms/i18n.py

Lines 73 to 78 in 80d5396

    
           doc = reader.read( 
        
               source=StringInput(source=source, 
        
                                  source_path=f"{source_path}:{source_line}:<translated>"), 
        
               parser=parser, 
        
               settings=settings, 
        
           )

From what it saw in the extracted messages from the gettext builder, it seems a message can never span over multiple "block elements", like for exemple multiple paragraphs. So a cleaner way to achieve the desired result would be to only parse for "inline elements". For reST it would be Inliner.parse(), for MyST it would be ParserInline.parse.

There could be a function in Sphinx's Parser base class like parse_inline() that children should implement.

sphinx/sphinx/parsers.py

Lines 24 to 31 in 80d5396

    
           class Parser(docutils.parsers.Parser): 
        
               """ 
        
               A base class of source parsers.  The additional parsers should inherit this class instead 
        
               of ``docutils.parsers.Parser``.  Compared with ``docutils.parsers.Parser``, this class 
        
               improves accessibility to Sphinx APIs. 
        
               The subclasses can access sphinx core runtime objects (app, config and env). 
        
               """

It seems like it should allow to remove all the strange hacks in the i18n code like:

sphinx/sphinx/transforms/i18n.py

Lines 459 to 465 in 80d5396

    
           # Avoid "Literal block expected; none found." warnings. 
        
           # If msgstr ends with '::' then it cause warning message at 
        
           # parser.parse() processing. 
        
           # literal-block-warning is only appear in avobe case. 
        
           if msgstr.strip().endswith('::'): 
        
               msgstr += '\n\n   dummy literal' 
        
               # dummy literal node will discard by 'patch = patch[0]'

sphinx/sphinx/transforms/i18n.py

Lines 472 to 477 in 80d5396

    
           # Structural Subelements phase1 
        
           # There is a possibility that only the title node is created. 
        
           # see: https://docutils.sourceforge.io/docs/ref/doctree.html#structural-subelements 
        
           if isinstance(node, nodes.title): 
        
               # This generates: <section ...><title>msgstr</title></section> 
        
               msgstr = msgstr + '\n' + '=' * len(msgstr) * 2

Does this idea make sense to you? Do you think it could work?

jpmckinney · 2024-01-16T17:03:50Z

I'd like @chrisjsewell and @choldgraf from MyST Parser to reflect on your proposed parse_inline.

In general, removing those hacks would be ideal!

n-peugnet · 2024-04-07T18:41:49Z

@jpmckinney:

I'd like @chrisjsewell and @choldgraf from MyST Parser to reflect on your proposed parse_inline.

In general, removing those hacks would be ideal!

FYI, I made a proof of concept in #12238

jpmckinney · 2024-05-29T17:04:41Z

LGTM!

jpmckinney added the type:bug label Feb 8, 2021

This was referenced Feb 8, 2021

Issues with sphinx Locale transform (assumes rST) executablebooks/MyST-Parser#302

Open

i18n: Locale transform: Change heading syntax to work for both RST and Markdown #8853

Merged

tk0miya added the internals:internationalisation label Feb 9, 2021

tk0miya added this to the 3.5.0 milestone Feb 9, 2021

tk0miya closed this as completed Feb 9, 2021

tk0miya reopened this Feb 9, 2021

jpmckinney changed the title ~~Locale Transform has RST-specific code~~ Allow non-RST parsers to substitute the Locale transform Feb 9, 2021

tk0miya removed this from the 3.5.0 milestone Feb 14, 2021

jpmckinney mentioned this issue Dec 6, 2021

Error with code-blocks translation executablebooks/MyST-Parser#444

Open

n-peugnet mentioned this issue Sep 6, 2022

Duplicate label in translation only executablebooks/MyST-Parser#357

Open

AA-Turner added this to the some future version milestone Sep 29, 2022

n-peugnet mentioned this issue Jan 1, 2024

Numbered headings (for example starting with 1.) are not translated with Sphinx executablebooks/MyST-Parser#852

Open

n-peugnet linked a pull request Apr 7, 2024 that will close this issue

[8.x] Parser agnostic i18n Locale transform #12238

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow non-RST parsers to substitute the Locale transform #8852

Allow non-RST parsers to substitute the Locale transform #8852

jpmckinney commented Feb 8, 2021 •

edited

tk0miya commented Feb 9, 2021

jpmckinney commented Feb 9, 2021

tk0miya commented Feb 9, 2021

jpmckinney commented Feb 9, 2021

jpmckinney commented Feb 13, 2021 •

edited

n-peugnet commented Jan 15, 2024

jpmckinney commented Jan 16, 2024

n-peugnet commented Apr 7, 2024 •

edited

jpmckinney commented May 29, 2024

Allow non-RST parsers to substitute the Locale transform #8852

Allow non-RST parsers to substitute the Locale transform #8852

Comments

jpmckinney commented Feb 8, 2021 • edited

tk0miya commented Feb 9, 2021

jpmckinney commented Feb 9, 2021

tk0miya commented Feb 9, 2021

jpmckinney commented Feb 9, 2021

jpmckinney commented Feb 13, 2021 • edited

n-peugnet commented Jan 15, 2024

jpmckinney commented Jan 16, 2024

n-peugnet commented Apr 7, 2024 • edited

jpmckinney commented May 29, 2024

jpmckinney commented Feb 8, 2021 •

edited

jpmckinney commented Feb 13, 2021 •

edited

n-peugnet commented Apr 7, 2024 •

edited