-
-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Host the Python docs on ReadTheDocs #5
Comments
ping @pradyunsg because you're speaking a lot about readthedocs :) |
Disclaimer: I work at Read the Docs.
I'm not going to start selling Read the Docs because that's not my topic 😄 --Currently, I'm more interested in making sure that "the migration is possible from the technical aspect" and mention some notes to consider.
What kind of support do you refer to here? Take into account that Read the Docs by default adds EthicalAds on the documentation it hosts. I suppose there may be some different opinions about this and I think it worth having a conversation about the different possibilities we can explore.
We are currently using Read the Docs for the Spanish translation of the CPython documentation. It takes ~500 seconds to build the HTML with our smallest builder (2 CPU, 2Gb RAM, Each commit merged into an active branch from English, should trigger a re-build for all the translations to update them as well. So, one single merge will trigger 1 (version) x (8 languages) = 8 builds * ~1000 seconds = ~2 hours (is my math correct? 🤔 ) Edit: I put 8 versions originally, but that's not correct. Only one version is required to be built. The version where the merge was done. If it was done in Edit: the unit of the result was incorrect. I changed "2 minutes" by "2 hours" thanks to @hugovk for noting this. Take into account that translations may require some extra work to be able to clone CPython repository to get the source files. In the Spanish translation we are using a git submode for this and then changing the Note that Read the Docs only support one PDF output file. So, all the pages are together in one big PDF file, instead of split into the tutorial and others as the current behavior. There is an issue to discuss supporting multiple PDF output at readthedocs/readthedocs.org#2045
This page looks like a simple index of a directory containing all the versions. There is not a direct match of this to a feature on RTD that allows this listing. I'd suppose that you may need to create an RST file or include an HTML page that lists all the current releases. Also, a URL like
This shouldn't be a problem if there is a way to re-generate them (e.g. running To summarize, I think there some initial things bumped here:
Hopefully, this helps to understand some of the technical requirements and provides some value regarding the work effort required conversation. |
~8,000 seconds = ~133 minutes = ~2h13m
~1,000 seconds = ~17 minutes |
We can mostly support custom URL's like this now. It's a beta feature, but if it's a blocker we can certainly manage it. That said, we don't currently support directory indexing, but we could create a version of the page thats branded properly in the docs theme.
Just to be explicit, we're happy to host the Python docs without ads. I know there has been some discussion about having PSF sponsors included on the pages, and we've had separate conversations with the PSF team about us enabling that with our open source EthicalAds platform. We can definitely discuss having an additional business relationship with the PSF for hosting the docs, but I don't consider it a prerequisite. We get a lot of value out of Python, and are happy to support the projects we use heavily ad-free if needed (we already do this with Sphinx).
We could upload pre-built docs if they don't ever change as a one-off. It's not supported on the platform, but we can manually do it if needed.
Agreed. In the past there were some technical issues that would have prevented this, but I think we've solved most of them. We can do a bit more work to fully replicate the docs.python.org setup with existing URL's on our side if you'd like to see a tech demo, but we'd hope to do that after there was an agreement on moving to RTD from the team. My primary non-technical selling point to hosting docs on RTD is that any work that goes into building features will now be shared with the entire Python community via RTD, instead of work done on a custom docs hosting tool. Similarly, any work the community does to improve RTD will automatically flow back to Python itself. Hopefully these additional resources will make doc hosting better for the whole ecosystem. Please let us know if you have any questions about the product features or feasibility of hosting, as this progresses. |
Only because a lot of what the docs build scripts are doing, is something provided by RTD as well. 😅 And RTD has the benefit of being a platform, and thus being able to provide things like PR previews. |
Just wanted to note that this might also be worth thinking about while we're doing #1. It would be awesome if we could get a new theme and not have to implement all the various Python-specific code to support it. We'd be interested in helping with this, if it was of interest. Not sure what kind of 👍 we'd need to move forward on it, but I'd love to get more effort going into RTD integration instead of 1-off integrations of custom build logic. We've made good progress on a few of the above issues, and would be happy to do custom stuff for Python in the places where we don't have platform support. |
I think it's a great idea to host the docs on Read the Docs:
We've been discussing this in docs-community monthly meetings and Discord, and there's seems to be support on moving forward. We have a large build matrix of languages. versions, and output formats (HTML, PDF, EPUB, etc). I would suggest an incremental approach, starting with just English, dev and and HTML. Once we're happy with it, we can remove that particular build from the server and add another language/version/output. Perhaps doing English+HTML first for the main builds (currently /dev, /3 and maybe /3.11) will give the biggest initial benefit. Then other languages+HTML. It might be that we keep some builds on the server for much longer, like PDF/EPUB etc. I think this is fine, the reduced load of moving HTML builds off the server will improve the queue time for other builds. More concretely, I think the first thing to do is deal with the HTML language/version switcher. Currently this is added on the server builds via https://github.com/python/docsbuild-scripts. We'll need a similar switcher for RTD builds. I don't think it necessarily needs to be identical to the legacy one, but should be similar because people will be switching to/from the old and new sites. We would develop it on something like https://cpython-previews.readthedocs.io (which is used for PR previews) and when happy, switch https://docs.python.org to it, stop the old server HTML build, and move to the next increment. Thoughts? |
I don't know how RTD survives financially today (yes I know it has been very hard). Building more often that we build now (for every push, with PDF US letter, PDF A3, epub, txt, downloadable HTML) will take a lot of resources, more than the 4 virtual CPU that we already use 24h/24h. I also don't know how RTD pays for the bandwidth, but I bet docs.python.org consumes some bandwidth (@ewdurbin any numbers?), it's probably nothing compared to PyPI but yet, let's check that. PSF side the bandwidth is sponsored by Fastly, but RTD side I don't know... I think that the PSF and RTD should talk about money here: the PSF have to ensure, at the very least, that we don't penalize RTD. But it's not enough, I think the PSF should back up RTD in case RTD looses a vital sponsor (CPU/bandwidth provider or something) to ensure the future. @ericholscher @ewdurbin. My non-financial view on this: RTD has proven they are the Python documentation hosting platform. Using RTD will make cpython a good citizen of the ecosystem. In other words, |
Building more often: If we don't want to build every merge, we can write a custom command to skip builds. For example, we should only do so for Docs: https://docs.readthedocs.io/en/stable/build-customization.html#cancel-build-based-on-a-condition Some numbers: During the 30-day Plausible trial, we had 6.5 million total visits and 10.9 million pageviews to the 3.11 and 3=3.12 English sites. |
We could feasibly add a proxy/cache via Fastly to read the docs, we do this for PEPs. That might also solve keeping A |
Echoing support for the proxy/cache concept, I don't think there was ever a philosophical objection to hosting on RTD, just technical and sustainability questions. Both the PSF and RTD are much better equipped these days to have the sustainability discussion directly between PSF and RTD staff than either were back when RTD was still relatively new, which leaves the technical side of things for the docs community to consider. If the existing Fastly + Nginx docs endpoint is retained, with just the build and hosting of the version specific docs for actively maintained versions shifted to RTD, then it may be feasible to migrate new docs builds without needing to worry about too many one-off CPython-docs-specific hosting capabilities in RTD (instead leaving those in the existing PSF infrastructure). I do think it would be important to be explicit about the intended benefits of the hosting change, though. As far as I am aware, the main CPython docs builds are still just periodic (daily?), which would make the commit triggered builds in RTD one of the biggest improvements on offer. |
Another benefit of using Read the Docs is the server side search. |
Just wanted to chime in here from the RTD perspective. RTD has a similar relationship with sponsored CDN hosting, where we aren't paying for bandwidth. So we don't have much worry about traffic or CDN support there. If it's a benefit to y'all to be able to put your own CDN in front, that's something we can support, but we already have our CDN configured to purge automatically on build, redirects, and other settings change, which won't flow through. That's likely not a huge deal, but might lead to delays in content changes, so something to consider. I don't want to block this discussion on sustainability -- RTD is able to host the Python docs without any specific payment -- as noted above we're in a much better place these days, with our server costs and CDN sponsored and a full-time team working on the platform. We would still love some kind of support from the PSF, but it's not required. We'd just ask that the fact that RTD is hosting it is shown somewhere on the pages, perhaps in the footer. Overall, we're excited to be able to host the official Python docs, and want to work with y'all to make it successful. I know @humitos has already chimed in here, but we're both available to answer any questions that y'all might have, as needed. |
Thanks for jumping in @ericholscher. From PSF infra side, I'm a huge +1 on moving away from bespoke docs build infra as much as is possible. Given that RTD's CDN is sponsored as well I see no reason to involve our CDN in front of RTD's CDN (except perhaps as a mechanism for transition). From PSF general side, we always like to acknowledge our hosting providers so assuming it is doable from the docs side of things adding a hosted by is 100% fine. Additional considerations would be another conversation with our Director of Resource Development. I'm happy to start that conversation once we have established a technical plan and committed to RTD. |
Edit: Yes. With a 30 day lookback. I believe this is generally sufficient. |
Hmm, do we? Hugo & co. recently ran a Plausible trial to get some metrics, after failing to get them from server logs. |
Yes, currently Fastly logs are streamed to a server in real time for "right now" analysis and rotated off after 7 days. All logs are archived in segments to S3 for three years. Getting some basics from the real-time is pretty straightforward, we have only used the archives a couple times (more often for other projects like python.org).
I agree! I will investigate automating pulling and storing the basic stats long-term if/when we switch over to RTD. |
This should be doable from the docs side. The footer is defined in python-docs-theme, we can add a variable that adds the acknowledgement for RTD builds. |
If I recall from a past discussion of adding sponsor logos to the docs, the rub there is that we only would want to include it in the online builds and not the offline builds. Is that possible? |
Good point, we can use the |
Please see PR to add a "Hosted by" link to the theme's footer if a variable is defined: python/python-docs-theme#165. |
Of note, RTD does have some basic analytics functionality built in. It's not got a ton of features, but it might be useful for the basics like "what are the top pages people are reading?": https://docs.readthedocs.io/en/stable/analytics.html |
Integrate the new Read the Docs Addons JavaScript into the Python Docs Sphinx theme to render versions and languages selector nicely. References: * Discord thread: https://discord.com/channels/935215565872693329/1159601953265942589 * Implementation of Addons JavaScript `CustomEvent`: readthedocs/addons#64 * Conversation about using Read the Docs: python/docs-community#5
From time to time the discussion arise about moving docs.python.org to readthedocs, there's even an experimental python.readthedocs.io.
I think there's many pros and cons of using readthedocs.
I'd personally be in favor of it if we (the PSF) support them (the cpython Doc is a big one with a lot of traffic, it take around 24h of CPU to build all versions × languages with all PDF A4, PDF letter, HTML, plaintext, epub).
In the other hand it's not an easy task, among other things docs.python.org is not only about generating the docs but also hosting history:
The text was updated successfully, but these errors were encountered: