Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Allow user to control whether URL fragments are ignored when crawling #1454

Open
karanbirsingh opened this issue Jun 4, 2021 · 3 comments

Comments

@karanbirsingh
Copy link
Contributor

karanbirsingh commented Jun 4, 2021

Is your feature request related to a problem? Please describe.

By default, Apify ignores URL fragments when computing URL uniqueness. This means http://www.example.com#foo and http://www.example.com#bar are considered equal. These URLs are skipped once http://www.example.com is crawled. This makes sense for many websites because URL fragments often link to sections within the same HTML page.

We found an example of an internal website whose URL fragment links load different HTML pages. Running ai-scan --crawl produced logs that discovered many links but exited without crawling them.

Specifying explicit input URLs doesn't work around the problem. We still strip URL fragments from input URLs.

Describe the solution you'd like

Clients of the service and the accessibility-insights-scan package should be able to control whether Apify includes URL fragments in its uniqueness check. We may be able to leverage the keepUrlFragment argument in Apify.

Clients who use URL fragments to link to sections of a page (like we do in https://accessibilityinsights.io/) would not use the option (to avoid scans on duplicate UI).

@ghost
Copy link

ghost commented Jun 4, 2021

This issue has been marked as ready for team triage; we will triage it in our weekly review and update the issue. Thank you for contributing to Accessibility Insights!

@ferBonnin
Copy link

Reviewed with Maxim, marking this as ready for work for CLI and to be leveraged in GH action only. This is bug sized.

@ferBonnin
Copy link

adding a note that another user has encountered this issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants