Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove documentation gettext resources #9948

Closed

Conversation

benjaoming
Copy link
Contributor

@benjaoming benjaoming commented Jan 26, 2023

There's a lot of very outdated contents on Transifex, and rather than risking someone makes the effort of translating it, I think we should remove it pending another wave of translation activity.

  • We never published any translations
  • All of these contents will become even more outdated in the Diataxis refactor

I kept the old invoke task since it seems like a valuable trace to leave behind in case we want to reintroduce translations.

If this sounds good, I will also remove the Transifex project: https://www.transifex.com/readthedocs/readthedocs-docs/dashboard/


📚 Documentation previews 📚

@benjaoming benjaoming requested review from a team as code owners January 26, 2023 11:39
@benjaoming benjaoming changed the title Remove outdated and mostly untranslated documentation gettext resources Remove documentation gettext resources Jan 26, 2023
@ericholscher
Copy link
Member

We never published any translations

I think this is the big thing... I do wonder about the value of translating our docs overall, since it hasn't shown much interest, I think this probably makes sense.

Also pinging @agjohnson & @humitos who have opinions here I imagine, before I 👍 it.

@agjohnson
Copy link
Contributor

I raised #9938 for this recently. For purposes of having localized introductory content, and for dogfooding our own features, I see value in having good translation coverage on at least some portion of our docs. If translators want to do more here, I'm happy to have the efforts. We've discussed this in the past and talked about focusing on introductory guide content primarily though. I feel having several pages translated in major locales is attainable.

I'd agree that we should change our doc translation workflow at some point, and that translations can't sustain documentation translations for most of our docs. But we've absolutely neglected our translations in our docs and shouldn't expect much effort from translators if we're not doing our part either. We're not running the update scripts with any frequency right now, so our translations are just set up to fail.

Instead, I would still say we should update our translations more often, probably starting after more refactoring. This project does not get much traffic there, so I don't think any major changes are required right now. Recreating all of the Transifex pieces and teams later is not a fun amount of work either.

There's already a project announcement:
https://www.transifex.com/readthedocs/communication/?q=project%3Areadthedocs-docs

The changes I would see here are:

  • Limit resources we're publishing to Transifex so translators aren't spread thin
  • Actually update the translations
  • Actually use the translations

None of this seems super important work for us now. I think we can revisit this later in the refactor process.

@benjaoming
Copy link
Contributor Author

None of this seems super important work for us now. I think we can revisit this later in the refactor process.

@agjohnson That's why I was suggesting to delete it and remove the Transifex project:

There's a lot of very outdated contents on Transifex, and rather than risking someone makes the effort of translating it, I think we should remove it pending another wave of translation activity.

Re: dogfooding, I agree that this is healthy but we do not actually use this project for dogfooding because it's too big. I do Danish translations happily and I believe other team members can work on Spanish translations. But I think we need a much much smaller project, Read the Docs' docs has >1 week of translation work, so it's way too big for that purpose.

@benjaoming
Copy link
Contributor Author

@agjohnson

I raised #9938 for this recently.

I did NOT see that issue, what a coincidence! I guess we were both wondering about the same thing when the suggestion to translate sphinx_rtd_theme's docs came in? :)

I think it's the right solution.

Recreating all of the Transifex pieces and teams later is not a fun amount of work either.

I was thinking the same, that's why I kept this around: https://github.com/readthedocs/readthedocs.org/pull/9948/files#diff-2e746c9eacad6e2c2fdefdc6a665b5ce4607ea384ebf97a729044b79acc778d6 - the creation of a project on Transifex and an API key will be easy. The rest of the workflow has already been designed and implemented in this Invoke task 👍

But I think there's a huge amount of work to be done wrt. deciding what to translate, how often etc. How should a translated subset look to the user? All of that is great to discuss in #9938 👍

But until then, I think we need to have a "blank sheet".

@agjohnson
Copy link
Contributor

agjohnson commented Jan 27, 2023

I was thinking the same, that's why I kept this around

I was talking more about dropping all the resources and teams from Transifex, and re-engaging translators later to build the teams back up. This is maybe a worse outcome than having stale translations resources for a while longer. Having unused translation resources doesn't seem like a problem we need to solve right now, given translators rightly aren't enthusiastic to update 4-6 year old translation sources anyways.

I guess we were both wondering about the same thing when the suggestion to translate sphinx_rtd_theme's docs came in?

Actually it was a translator asking about these docs specifically. It's probably the same translator, who just finished translating our application in French and was asking about these documentation pages. I'd like to have some forward momentum here.

But I think there's a huge amount of work to be done wrt. deciding what to translate, how often etc. How should a translated subset look to the user?

I don't want to inflate this project too much, I'm still just describing the bare minimum to translations -- introductory guides like the Read the Docs guide and sphinx guide to start. We should be updating documentation resources during release, along with the application translation updates.

Partial translations is a good point, though I'm not worried about this much -- translations are commonly not complete. We have some technical options here, but I don't want to derail documentation refactoring with this just yet.

But until then, I think we need to have a "blank sheet".

Well, I guess this is what I'm not very convinced of. We can solve stale translation resources any time we want by updating them regularly. And if a language has coverage across our high priority pages, we can use the translation.

Here's pt_BR, for example. Translators got our high priority pages, we could use this translation, but the sources are 4-6 years out of date at this point 😞

image

Deleting our translation resources during refactor seems unwarranted to me as any content that is just moving to a new location during the refactor, and is not being re-written, translators can still work with right now. Gettext will accumulate the tokens the same way, even if they are moved to a different file. But, the doc translations don't have traction right now anyways, so I'm not sure stale resources or refactoring churn in translations need to be a strong concern either way.

So, I would still say that we should update these resources regularly, tune page priority, direct translators to the docs project when the refactor has matured more, and see if translators can keep up with the resources we want translated. If translators can keep up, we can use the translation. If not, we can't use the translation either way, but that's already the trade off with contributing to a translation.

@ericholscher
Copy link
Member

I feel like RTD doesn't have a great workflow for translating a small subset of content, so folks would just be landing on the docs with a couple headings and pages translated, but nothing else. I'm generally -0 on the effort required here, given how good machine language translation is, and how much work it will take to get something that 99% of our users won't use, given how limited it is :/ I'd be curious if anyone has data showing that a small subset of intro guides being translated is meaningful.

I think having translated content in some other format/repo is likely going to be a much better outcome, even just having them in our normal user docs. I would be +1 on just having a small list of human translated content that we publish in our normal docs, or just an index that links out to other people's blog/hosting of this content so we aren't in charge of it.

@humitos
Copy link
Member

humitos commented Jan 31, 2023

I'm a pro-translations person 👍🏼 . I've been translating documentation (and books!) since I learnt English. I also hate wasting translators' time 😄

With those things in mind, I want to say that I'd love to see our documentation translated and updated in multiple languages. However, at this moment, I don't think we have the bandwidth required to make this project doable together with all the other things we have in our roadmap without being a big distraction. I prefer a simple "No, we are not translating our docs for now" as an answer than pointing translators to Transifex and wasting their time because we never update/deploy/publish them 😞

@agjohnson

I raised #9938 for this recently. For purposes of having localized introductory content, and for dogfooding our own features, I see value in having good translation coverage on at least some portion of our docs.

This is something we are able to commit to and I'd like to start from here if anything. I think it's not the right time to start just now, tho. However, we can keep talking about this and find a good time to prioritize this work and do it.

@ericholscher

I feel like RTD doesn't have a great workflow for translating a small subset of content, so folks would just be landing on the docs with a couple headings and pages translated, but nothing else

This is a pretty common problem that many translators are hitting. Actually, there is a Sphinx improvement suggestion to allow translators to warn users about this case: sphinx-doc/sphinx#11157

@benjaoming
Copy link
Contributor Author

benjaoming commented Jan 31, 2023

I also enjoy a bit of translation every once and then, just did sphinx-rtd-theme in Danish :)

For dog-fooding, I think it would be great to translate a smaller documentation project together (and with the community!) while discovering a repeatable workflow and content design for translating subsets of documentation, keeping the user's experience in mind!

The problem we are facing about translating a subset of the project is likely faced by the majority of documentation projects out there.

Copy link
Member

@ericholscher ericholscher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the next steps here? I'm 👍 on merging this, as I do think this is just outdated and not doing anyone much good.

With those things in mind, I want to say that I'd love to see our documentation translated and updated in multiple languages. However, at this moment, I don't think we have the bandwidth required to make this project doable together with all the other things we have in our roadmap without being a big distraction. I prefer a simple "No, we are not translating our docs for now" as an answer than pointing translators to Transifex and wasting their time because we never update/deploy/publish them 😞

Sounds like that's a 👍 on removing them until we have bandwidth to do it properly?

@humitos
Copy link
Member

humitos commented Feb 15, 2023

I'm 👍🏼 removing these from our repository for now since we are not translating them and we don't plan to do soon due the time limitation we have. Together with merging this PR, we should remove the source files from Transifex as well so people don't translate them in vane.

@agjohnson
Copy link
Contributor

agjohnson commented Feb 15, 2023

It sounds like everyone thinks the main problem is translations not being updated consistently, but we still don't want to update translation sources 😉

I'm still not convinced we need to do anything here, especially if we're mostly trying to be protective of translator time -- translators are already not active on this project. If we want to add translations back later anyways, I'd still say let's just work towards updating our translations more often instead.

However, I do feel comfortable publishing partial translations, especially any that hit high priority pages. If 80/90% completion is our goal, we won't hit that anytime soon on any of our translations and might as well remove translation artifacts.

I'm also a fan of a hybrid approach, one that uses machine translation to seed translations, and let translators manually tune the translation. I've used https://github.com/fkirc/attranslate and it's easy and quick to achieve a translation that doesn't put translation responsibility on the reader.

and how much work it will take to get something that 99% of our users won't use, given how limited it is

Just to clarify, the numbers here definitely aren't great, but they aren't this grim 😆

image

This is all of our docs.readthedocs.io (sampled) user for all of 2022. The top three non-english locales represent ~14% of our doc sessions. But that's still only total 51k sessions.

I'd be curious if anyone has data showing that a small subset of intro guides being translated is meaningful.

We can look at Godot for data. This is the last year of usage for English and simplified Chinese, for their most requested page:

image

Most notable, the simplified Chinese users requesting en/latest vs zh_CN/latest (290k vs 62k).

I prefer a simple "No, we are not translating our docs for now" as an answer than pointing translators to Transifex and wasting their time because we never update/deploy/publish them

I'd agree that we've wasted translators time, most especially by not updating the translation resources though. But I wouldn't feel as bad if we were updating translations, at least we've addressed one of their problems.

But if we want to guard their time more, we shouldn't be translating into locales other than our top 3 or 4 really -- simplified chinese, russian, and portuguese.

For dog-fooding, I think it would be great to translate a smaller documentation project together

Core team probably shouldn't own any of the translating. We should rely on community for translations, probably solely even. If a community can't keep up a translation without our intervention, that's a good sign the translation won't be used either. Also, I don't think any of us are fluent in any of our most popular languages 🙃

This is also why I have more interest in leaving this translations. Our work is in managing community, not actually translating.

I feel like RTD doesn't have a great workflow for translating a small subset of content

I'd put the blame on the tooling, and mostly gettext probably. When gettext is used for prose documents, it seems to inevitably result in partial translated pages. But I'm not sure of great alternative to a partial translation.

At least RTD's modeling around translations might actually help, as users could maintain translations without being tied to Sphinx/gettext based translations. However, all of the translator tooling is already in the gettext ecosystem.

@benjaoming
Copy link
Contributor Author

benjaoming commented Feb 15, 2023

Nice long discussion here 😊 👍

If 80/90% completion is our goal

I'd argue that's a bad goal for this project's size and development pace. It's very hard to achieve. It's impossible to maintain for such a large documentation and have a living source documentation without constantly breaking translations.

I described the solution/challenge as I see it in the same comment:

discovering a repeatable workflow and content design for translating subsets of documentation, keeping the user's experience in mind!

I'd love to do a case study on what other documentation projects successfully do for their translations and workflows. I think these cases should be given as talks at conferences :)

Giving this a second-thought, I don't think that subsets are necessarily the method. We might also have entirely different versions. A complex problem with our documentation is that it's highly inter-connected, rich with cross-references.

In the Wagtail community, we talk about 1:1 translations and free-form trees. I made some slides about it that I can't find right now.

Core team probably shouldn't own any of the translating.

My comment was about "dog fooding". And a different project.

I'd still say let's just work towards updating our translations more often instead.

I think that's the worst possible solution: It gives the core team more work, gives translators more work to achieve an impossible >80% translation, brings readers broken experiences and confusion (or rather: we'll probably never publish incomplete translations).

We can look at Godot for data. This is the last year of usage for English and simplified Chinese, for their most requested page:

Is this the user docs? Or something from dev docs? install.html doesn't exist in the user docs, but maybe it got refactored? I don't understand how #4 in the list is a 404 in any case: https://docs.readthedocs.io/zh_CN/latest/install.html - there are 64k visitors on a 404 URL?

I think that these are great insights, it's great to know the impact that a Chinese translation can have. Using such insights, we can create a translation-friendly version (or subset) of the documentation that integrates nicely with the full English documentation set.

@ericholscher
Copy link
Member

ericholscher commented Feb 15, 2023

I also think we're discussing a bunch of theory here, but the current state of things is that:

  • We have a bunch of translation files that are out of date
  • Our docs translations have only ever gotten a small percentage of pages translated
  • We don't have a good process for shipping a small percentage of translated pages

The takeaway here for me is discussing what our next steps should be, not the theoretical value of translation.

Nobody has really replied on my idea of having translated content in our primary docs, instead of using gettext and our translation support? Similarly, I would be 👍 on third-party translations we like to, that others maintain, and that we have no control over. But having a es version of our docs with just the tutorial translated seems like the worst option, and I'd much prefer to just have /latest/tutorial/es/ or something.

@benjaoming
Copy link
Contributor Author

I'm 👍 on wrapping up this by removing the current translations in Git and on Transifex, get a clean sheet and talk about the next steps.

It's not an undoable action... We can always re-fetch translated contents from Git history if there's anything we want to push into Transifex translation memory on a new resource or project. And we can always reach out to active translators to invite them to a new wave of activity, once the dining table is set for a tasty translation meal.

Nobody has really replied on my idea of having translated content in our primary docs, instead of using gettext and our translation support?

That's a solution for a next step that's in line with what I also had some thoughts about.. how to design our documentation such that a subset or version is translated, rather than the bulk. And such that the translation workflow can work with out rapidly changing/living/agile documentation.

I like that a new solution requires some creative thinking and gives people the energy to discuss it 👍

@agjohnson
Copy link
Contributor

I'd argue that's a bad goal for this project's size and development pace.

Yup agreed. We're not going to hit that without a fair amount of effort.

I'd love to do a case study on what other documentation projects successfully do for their translations and workflows.

Well, they update their translations and have enough traffic to justify whole project translation 😄

I don't think we need to overthink this for our purposes, but understanding customer/user use is almost always going to be a benefit 👍

My comment was about "dog fooding". And a different project.

Roger. Dogfooding is my main priority in all of this. Even if our translations are low traffic, we took a step to understand user's use/workflow better.

I'd still translate our docs though, in the past our docs have had the most non-english traffic.

I think that's the worst possible solution: It gives the core team more work

I'm still talking about a much less thorough translation workflow, basically the minimum viable translation. Our dashboard translations certainly don't take core team time right now, the translators self sustain this on their own. I don't think docs need to be different here for us. The onus can still be almost entirely on translators.

Is this the user docs? Or something from dev docs?

It's just from GA, the page with the most traffic that wasn't an index. I didn't look into the source content.

Nobody has really replied on my idea of having translated content in our primary docs, instead of using gettext and our translation support?

I touched on a bit above, but our modeling actually might be a benefit here as we can still use RTD translations even if we do something non-standard. However translator tooling is mostly gettext based already, so I'd stick inside that ecosystem.

Similarly, I would be +1 on third-party translations we like to, that others maintain, and that we have no control over.

This is what I'm advocating for, our dashboard is translated in this fashion already. Our sources are updated automatically and translators self sustain translations, without core team intervention. On the dashboard, we have no minimal completion defined however.

But having a es version of our docs with just the tutorial translated seems like the worst option, and I'd much prefer to just have /latest/tutorial/es/ or something.

I think you were hinting at this too, but keeping subproject+translation would be a good overlap too: docs.readthedocs.io/projects/tutorial/es/. It isolates the project, doesn't create a new workflow, and we test a subproject and translation URL. I think this is a fair approach to work towards.

However, I haven't been pushing towards technical solutions here as translations are still on our non-priority list.

We can always re-fetch translated contents from Git history if there's anything we want to push into Transifex translation memory on a new resource or project. And we can always reach out to active translators to invite them to a new wave of activity

Well, as the person historically managing Transifex and our translations, I am pre-emptively delegating all of this work then 😉

@benjaoming
Copy link
Contributor Author

Well, as the person historically managing Transifex and our translations, I am pre-emptively delegating all of this work then 😉

I'm willing to run the risk, I'll happily walk the "reverse path" if I'm wrong on something here 😃

@humitos
Copy link
Member

humitos commented Feb 16, 2023

@ericholscher @agjohnson

Nobody has really replied on my idea of having translated content in our primary docs, instead of using gettext and our translation support?

I touched on a bit above, but our modeling actually might be a benefit here as we can still use RTD translations even if we do something non-standard. However translator tooling is mostly gettext based already, so I'd stick inside that ecosystem.

I agree with Anthony here. I've translated different project using different workflows and tools. The best experience I had was when using all the standard tools that everybody uses. The worst was when we defined our own workflow without using gettext. So, I wouldn't go into "our own creative way". It just will make things even harder.

In particular, if we want to delegate the translation to other people (non-core team) since they already know how to use these tools and we don't have to explain them anything custom.

@ericholscher
Copy link
Member

I'm not even talking about other tools.. I'm just talking about RST/MD files in our repo that are in another language? I understand that tracking diffs there is not ideal, but just an idea as a starting point for further discussion.

I'm still not understanding what vision people have for a small subset of translated content. Is it just /es/latest/ with 5% of the pages translated, and the rest in English?

The realistic outcome I imagine with our current tooling is something like docs.rtfd.io/projects/tutorial/es/latest/ where we break out the "translatable" content into a smaller repository that changes less frequently, and actually stands a chance of being 100% translated?

@agjohnson
Copy link
Contributor

Is it just /es/latest/ with 5% of the pages translated, and the rest in English?

More than 5%, but if high priority pages are consistently updated, that's enough for me.

We have several options here immediately:

  • We don't sync low priority pages by excluding them from client push
  • We disable translation for the resource at Transifex:

image

This also has the effect of excluding these resources from statistics, so languages can maintain 100% translation.

The realistic outcome I imagine with our current tooling is something like docs.rtfd.io/projects/tutorial/es/latest/

Noted above in my comment as well. Seems okay, but cross project linking is harder.

@benjaoming
Copy link
Contributor Author

The realistic outcome I imagine with our current tooling is something like docs.rtfd.io/projects/tutorial/es/latest/ where we break out the "translatable" content into a smaller repository that changes less frequently, and actually stands a chance of being 100% translated?

What @ericholscher mentions here is definitely doable and gives me a good feeling that it's the right path to start on. Getting started is fairly easy, and it can grow organically and based in needs ("can we also include this in the translation?").

Having a translateable breakout project in other important languages allows us to decide when we want to ask translators to update translations. We could for instance have several iterations on "Getting Started" before migrating the English content into the translated version.

Noted above in my comment as well. Seems okay, but cross project linking is harder.

I think that we can be clever about how we migrate an existing page to the translation-friendly version. We can create some simple tooling that maintains a set of patches or whatever that happens in the transformation from "non-translatable page full of references" to "translation-friendly page". Maybe we could even create a nice Sphinx directive translation-alternatives:: or some other notation mechanism to manage the differences when it comes to especially cross-references in translation-friendly subsets. I really think this is a great mission to embark on, but I'd still like to study how other Sphinx projects may have done this. Also, I don't think that we need tooling for a simple start, we can also add tooling as it becomes evident what we want and need.

To me, having a small subset for translation would resound well, considering the POV of a translator.

As a translator, I would like to

  Put huge efforts into a thorough and accurate translation

Such that

  I can see my work published to readers

There are sooo many things I could write in the negated user story "As a translator, I would NOT like to" 🙃

@ericholscher
Copy link
Member

ericholscher commented Feb 22, 2023

Notes from our call:

High level status

☑️ Agreed on doing Tutorial content
☑️ Agreed on using gettext & transifex
☑️ Only show tutorial content on transifex to translate
- ❓ How to implement this process on RTD
- ⚠️ English content can have a warning banner (/es/latest/api/ for example)

Next steps

v1: "Transifex is not lies" - We update it regularly (hide non-tutorial files in transfex to start).
v2: "Nice, published tutorial content" - Engage with translators, publish top-level es/fr with only-translated content TOC.

@ericholscher
Copy link
Member

I put this comment in #9938 as well -- I think we probably close this PR, and move forward with v1 in a few sprints.

@benjaoming benjaoming closed this Feb 22, 2023
@benjaoming benjaoming deleted the delete-docs-translations branch February 22, 2023 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants