Fix method parsing #190

jeddenlea · 2024-10-26T23:52:23Z

The is_token function, used exclusively for parsing the method in a request line, allows more values than it should. In particular, it allows a leading space to be parsed. This problem is not exposed in hyper, which revalidates any method extracted by httparse, otherwise I'm sure this would have been noticed sooner!

Checking for a single range of valid bytes is very fast, so I've taken care to make sure that making is_token more complicated doesn't slow down the most common case. While exploring a variety of options, I found the existing benchmark scheme to be a bit misleading because it would test only a single method at a time, so I've made a new benchmark that roughly simulates a mix of requests. Ultimately, what I found to be a reasonable fix without any slowdown for the 99.9999% case is to check b'A'..=b'Z' and then fall back to a "byte map".

Both methods and header names have the same set of allowed bytes, a "token", but their uses are slightly different. I thought it would make sense to rename is_token to is_method_token, to mimic is_header_name_token.

src/lib.rs

The `is_token` function, used exclusively for parsing the method in a request line, allows more values than it should. In particular, it allows a leading space to be parsed. This problem is not exposed in hyper, which revalidates any method extracted by httparse, otherwise I'm sure this would have been noticed sooner! Checking for a single range of valid bytes is very fast, so I've taken care to make sure that making `is_token` more complicated doesn't slow down the most common case. While exploring a variety of options, I found the existing benchmark scheme to be a bit misleading because it would test only a single method at a time, so I've made a new benchmark that roughly simulates a mix of requests. Ultimately, what I found to be a reasonable fix without any slowdown for the 99.9999% case is to check `b'A'..=b'Z'` and then fall back to a "byte map". Both methods and header names have the same set of allowed bytes, a "token", but their uses are slightly different. I thought it would make sense to rename `is_token` to `is_method_token`, to mimic `is_header_name_token`.

seanmonstar

Thanks!

I wanted to include this in seanmonstar#190, but had to pull it at the last minute when I found the MSRV was 1.36. But, now that it's been updated to 1.47, we can do more things in `const`.

I wanted to include this in #190, but had to pull it at the last minute when I found the MSRV was 1.36. But, now that it's been updated to 1.47, we can do more things in `const`.

seanmonstar reviewed Oct 28, 2024

View reviewed changes

src/lib.rs Show resolved Hide resolved

jeddenlea force-pushed the master branch from ab65f6a to 53fc6fd Compare October 29, 2024 05:17

seanmonstar approved these changes Oct 29, 2024

View reviewed changes

seanmonstar merged commit 9f6702b into seanmonstar:master Oct 29, 2024
41 checks passed

jeddenlea mentioned this pull request Jan 27, 2025

Modernize byte_map! #197

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix method parsing #190

Fix method parsing #190

jeddenlea commented Oct 26, 2024

seanmonstar left a comment

Fix method parsing #190

Fix method parsing #190

Conversation

jeddenlea commented Oct 26, 2024

seanmonstar left a comment

Choose a reason for hiding this comment