-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[12.x] Validate paths for UTF-8 characters #54370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thanks for submitting a PR! Note that draft PR's are not reviewed. If you would like a review, please mark your pull request as ready for review in the GitHub user interface. Pull requests that are abandoned in draft may be closed due to inactivity. |
*/ | ||
public function __construct() | ||
{ | ||
parent::__construct(400, 'Malformed URL.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you want to use the \Symfony\Component\HttpFoundation\Response
status code constants?
parent::__construct(400, 'Malformed URL.'); | |
parent::__construct(Response::HTTP_BAD_REQUEST, 'Malformed URL.'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will leave this decision up to the maintainers, it doesn't really matter to me.
TLDR: Currently if there invalid UTF-8 characters in the URL, the framework will fail to call some wildcard and fallback routes. This PR fixes this issue by adding a check for invalid UTF-8 characters in the URL. This PR also adds a test for this issue.
The Problem
Currently if a Person visits a URL like
http://localhost:8000/%C0
, the framework will fail to call wildcards and fallback routes. This is because the URL-Path contains the invalid UTF-8 characters%C0
. This is an encoded character and would be represented as the following binary number:1100 0000
. This is not a valid UTF-8 character, as the first bit of a UTF-8 character must be0
for a single byte character,110
for a two-byte character,1110
for a three-byte character, and1111 0
for a four-byte character. That means that the URL-Path would expect a two-byte character, but the character is only one byte.The URIValidator uses
preg_match
which can returnint|false
(0 - not matching, 1 - matching, false - error).So error and not matching, both have falsy values, which means it will never call the wildcard or fallback routes.
The issue with that is for example that statamic, or any other application which registeres one of the following routes, these routes are never called and not even the fallback route is called.
Some applications use the fallback route, to handle the 404 page with more information, which would then not be available in the 404 page. As these variables are not available the rendering of the 404 page results in an error, resulting in a 500 error.
The solutions
One solution is this PR, which validates the URL-Path and checks if there are invalid UTF-8 characters in the URL. If there are invalid UTF-8 characters in the URL, the framework will abort with 400 malformed url.
One other solution would be a whole rewrite of the fallback handling, which would be a bigger change and would require more time to implement. But then the fallback route would always be called, even if there are invalid UTF-8 characters in the URL.
Why the master / 12.x branch
As there are many applications in the wild and I can not be sure, that this would not break some applications, therefor I opted for the master branch.
Related issues / discussions:
Credits
Thanks to @crynobone for discussing this issue.
Also mentioning @duncanmcclean for the related statamic issue.