Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core): make broken link checker detect broken anchors - add onBrokenAnchors config #9528

Merged
merged 70 commits into from Jan 4, 2024

Conversation

OzakIOne
Copy link
Collaborator

@OzakIOne OzakIOne commented Nov 10, 2023

Motivation

  • The broken link checker should now be able to report broken anchors.
    Previously it could only report broken paths.
  • Add siteConfig.onBrokenAnchors option, defaults to warn for v3.x retro-compatibility, no breaking change.
  • Add useBrokenLinks() core hook to collect page links and anchors (for advanced cases and plugin authors only)

Fix #3321

Fix #9057

Test Plan

unit tests + dogfood

Test links

Preview: https://deploy-preview-9528--docusaurus-2.netlify.app/

Docs:

@facebook-github-bot facebook-github-bot added the CLA Signed Signed Facebook CLA label Nov 10, 2023
Copy link

netlify bot commented Nov 10, 2023

[V2]

Name Link
🔨 Latest commit cc74ef8
🔍 Latest deploy log https://app.netlify.com/sites/docusaurus-2/deploys/659455ffc792270008698501
😎 Deploy Preview https://deploy-preview-9528--docusaurus-2.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

github-actions bot commented Nov 10, 2023

⚡️ Lighthouse report for the deploy preview of this PR

URL Performance Accessibility Best Practices SEO PWA Report
/ 🟠 74 🟢 98 🟢 100 🟢 100 🟠 89 Report
/docs/installation 🟠 85 🟢 98 🟢 100 🟢 100 🟠 89 Report
/docs/category/getting-started 🟠 76 🟢 100 🟢 100 🟢 90 🟠 89 Report
/blog 🟠 76 🟢 100 🟢 100 🟢 90 🟠 89 Report
/blog/preparing-your-site-for-docusaurus-v3 🟠 63 🟢 97 🟢 100 🟢 100 🟠 89 Report
/blog/tags/release 🟠 70 🟢 100 🟢 100 🟠 80 🟠 89 Report
/blog/tags 🟠 78 🟢 100 🟢 100 🟢 90 🟠 89 Report

@slorber slorber changed the title feat(core): warn broken anchors feat(core): add anchor link support to broken link checker Nov 10, 2023
@Josh-Cena
Copy link
Collaborator

I don't particularly like this solution, because many other things can create anchors without using the Heading component. Since we only run the checker after a build anyway, why don't we read the HTML file and find all IDs? On a related note @slorber do you know why our broken link checker uses the Link component to find valid links and the SPA routes data to find valid targets, instead of using the build output as the source-of-truth?

Here's the checker I built for MDN: https://github.com/jc-verse/mdn-checker/blob/master/packages/mdn-checker/src/rules/anchor-links.ts

@slorber
Copy link
Collaborator

slorber commented Nov 20, 2023

I don't particularly like this solution, because many other things can create anchors without using the Heading component.

I understand the concern, although most achors will usually be created through the heading component. For remaining cases we'll expose a core API to report usage of links and anchors.

The API is not designed yet but ideally users should be able to create their own link/anchor componennts and have it work nicely with the broken link checker.

Since we only run the checker after a build anyway, why don't we read the HTML file and find all IDs?

We could also do that to check links too, not just anchors.
There are already tooling to let users do that in userland (cf this solution or SaaS tooling like Ahrefs)

IMHO users appreciate the fact that we provide an integrated solution for link checking.

And I have good hope to someday be able to do some fancy things with the graph we collect, and have heuristics to report most broken links in dev mode thanks to those new APIs and a bit of caching.

On a related note @slorber do you know why our broken link checker uses the Link component to find valid links and the SPA routes data to find valid targets, instead of using the build output as the source-of-truth?

That's an initial implementation detail, but it turns out it wasn't a bad one IMHO since it works nicely and is quite robust.

And this gives the opportunity to have something more opinionated in the future.

We probably only want to double check html files emitted through the Docusaurus SPA. Not 100% sure it's our responsibility to report broken links on /static html files, javadoc, openapi or other things that Docusaurus sites can generate before build.

Let's also not forget that regular non-SPA links are real HTML navigations and you can't know the redirects set up at the CDN/server level: this can lead to false positives.


For these reasons I'd like for now to pursue in the current direction.

I wish it didn't require introducing a new core API for it, but it's not too bad IMHO.

@Josh-Cena
Copy link
Collaborator

Josh-Cena commented Nov 21, 2023

Okay, that makes sense to me. I have an existing use case with a remark plugin that emits DLs with <dt id="term">term</dt>, so it would be good if the anchor checker can accommodate that. Will I need to emit a React component in the future?

@slorber
Copy link
Collaborator

slorber commented Nov 23, 2023

Okay, that makes sense to me. I have an existing use case with a remark plugin that emits DLs with <dt id="term">term</dt>, so it would be good if the anchor checker can accommodate that. Will I need to emit a React component in the future?

Great

I don't think we'll expose a core React component but only a hook. Now you can decide to create your own component that uses the hook and emit it if it makes things simpler to integrate for your use-case

@slorber slorber marked this pull request as ready for review January 2, 2024 17:46
@slorber slorber changed the title feat(core): add anchor link support to broken link checker feat(core): make broken link checker detect broken anchors - add onBrokenAnchors config Jan 4, 2024
@slorber slorber merged commit fd49301 into facebook:main Jan 4, 2024
32 checks passed
slorber added a commit that referenced this pull request Jan 5, 2024
…rokenAnchors` config (#9528)

Co-authored-by: sebastienlorber <lorber.sebastien@gmail.com>
@slorber slorber added backported This PR has been backported to a stable version of Docusaurus and removed to backport This PR is planned to be backported to a stable version of Docusaurus labels Jan 5, 2024
@johnnyreilly
Copy link
Contributor

johnnyreilly commented Jan 5, 2024

I don't think it's a big issue, but this feature broken relative links to the /atom.xml and /rss.xml feeds:

screenshot of broken build

https://github.com/johnnyreilly/blog.johnnyreilly.com/actions/runs/7426096653/job/20209108231?pr=809

I'm tackling this by making them fully qualified links instead: https://github.com/johnnyreilly/blog.johnnyreilly.com/pull/809/files#diff-78b51ee0667cbd2ab45d9da97c4128aee41826b6a32cf6662a70810068269939

As I say, I don't think it's a big issue but I thought I should mention it!

@slorber
Copy link
Collaborator

slorber commented Jan 8, 2024

Ah yes this is code we removed recently that used to look for existing static files on the build folder.
Maybe it was a mistake to remove it, I don't know.
Or maybe you shouldn't use our built-in Link component for linking to those? and a simple <a> tag should be enough?

If it's a markdown link, you can also use pathname:///atom.xml to opt-out of the SPA link behavior

@johnnyreilly
Copy link
Contributor

It's a markdown link - I wasn't aware of the pathname:///atom.xml mechanism so thanks for that!

Hard coding to the main website isn't actually a problem for me in this case - it's fine. Just thought I'd share in case it wasn't intentional.

@birdofpreyru
Copy link

birdofpreyru commented Jan 9, 2024

Hey @slorber , sorry if it is explained somewhere and I've missed it, but what is the way to make the broken anchor checker to work with custom (non-header) anchors in markdown? I.e. I use <a id="my-anchor-name"></a> to anchor a particular place in my markdown text, where using a header does not fit at all, and the anchor checker reports all links to such anchors as broken. All docs I found so far only seem to cover custom anchors created inside react-coded pages?

@slorber
Copy link
Collaborator

slorber commented Jan 11, 2024

@birdofpreyru this falls under the case where you need to use the useBrokenLinks hook:

https://docusaurus.io/blog/releases/3.1#broken-anchors-checker

CleanShot 2024-01-11 at 11 23 46@2x CleanShot 2024-01-11 at 11 24 14@2x

The example shows a header, but it doesn't necessarily have to be a header. You can just encapsulate this API in any component that create anchors, in your case:

export default MyCustomAnchor({id}) {
  const brokenLinks = useBrokenLinks();
  brokenLinks.collectAnchor(id);
  return <a id={id}></a>
}

I use to anchor a particular place in my markdown text, where using a header does not fit at all, and the anchor checker reports all links to such anchors as broken. All docs I found so far only seem to cover custom anchors created inside react-coded pages?

When you use <a id="my-anchor-name"></a> in MDX, this tag is not "processed by Docusaurus" and is kept as is. I'd suggest using <MyCustomAnchor id="my-anchor-name" /> instead in your MDX documents. You can register that component to MDXComponents to avoid importing it everywhere.

One thing we could do is to map <a> tags to our @docusaurus/Link components automatically, and enhance it to report anchor ids. That's what MDX v1 did in the past but decided to move out of this logic for various reasons, so I'm not sure it's a good idea to restore this behavior on our side.

Eventually we could still enhance our link component and allow you to write <Link id="my-anchor-name"/> in MDX. That would only be a convenient shortcut we provide, but nothing you could not achieve on your own.

@birdofpreyru
Copy link

Thanks @slorber ,

You can register that component to MDXComponents to avoid importing it everywhere.

That clears my primary doubt regarding useBrokenLinks() for MDX — I did not know it was possible to register components in such way, and it did not cross my mind to investigate it; and importing a custom anchor component everywhere felt like a drag.

Eventually we could still enhance our link component and allow you to write in MDX. That would only be a convenient shortcut we provide, but nothing you could not achieve on your own.

That would be awesome, to have a standard out-of-the-box way to anchor. I mean, I can achieve on my own admonitions, code blocks, and what not; but it is a way more convenient they just come ready to use :)

@slorber
Copy link
Collaborator

slorber commented Jan 12, 2024

@birdofpreyru I tried to improve things in #9732

It's now possible to use the <Link> component to create such anchors instead of <a>:

import Link from '@docusaurus/Link';

<Link id="my-anchor-id"/>

Note that I still recommend you add it the Link component to the MDXComponents map, to avoid having to import it everywhre.


I tried to see if it was possible to support collecting anchors for arbitrary <htmlTag id="xyz"/> found in md/mdx in a generic way.

I think there could be a way through a rehype plugin, but it's a bit complicated technically so I'd rather wait a bit for feedback to see if this is really necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Argos Add this label to run UI visual regression tests. See argos.yml GH action. backported This PR has been backported to a stable version of Docusaurus CLA Signed Signed Facebook CLA pr: new feature This PR adds a new API or behavior.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Broken link check is case-insensitive Detect broken anchor links
6 participants