Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: implement valid AccountId generation from arbitrary data #9107

Merged
merged 6 commits into from
May 26, 2023

Conversation

miraclx
Copy link
Contributor

@miraclx miraclx commented May 25, 2023

Looks like the former implementation of Arbitrary for AccountId would assume all strings were well-formed AccountId's. Skipping the validation logic. Which is untrue.

This patch swaps out the derive for a custom implementation that ensures all arbitrary byte slices are validated before an AccountId is constructed.

@miraclx miraclx requested a review from a team as a code owner May 25, 2023 00:58
@miraclx miraclx requested a review from akhi3030 May 25, 2023 00:58
@miraclx miraclx changed the base branch from master to miraclx/repin-account-id May 25, 2023 00:58
@miraclx miraclx force-pushed the miraclx/fix-arbitrary-account-id branch from 5652ce6 to defbe94 Compare May 25, 2023 01:00
@akhi3030 akhi3030 requested a review from Ekleog-NEAR May 25, 2023 08:33
@akhi3030
Copy link
Collaborator

@Ekleog-NEAR: can you review this please?

Copy link
Collaborator

@Ekleog-NEAR Ekleog-NEAR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I just sent a few suggestions down there :)

(Also, it’d be nice to have a proper cargo feature flag for Arbitrary rather than rely on cfg(fuzzing), but it’s orthogonal to this PR)

Comment on lines +395 to +400
fn arbitrary_take_rest(u: arbitrary::Unstructured<'a>) -> arbitrary::Result<Self> {
<&str as arbitrary::Arbitrary>::arbitrary_take_rest(u)?
.parse()
.map_err(|_| arbitrary::Error::IncorrectFormat)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT this has a lower success rate than arbitrary. So the default implementation would probably be better than this one, and this function can just be removed.

(arbitrary_take_rest is mostly used for generating arbitrary collections, where the generator knows it can take all that remains. Alternatively it’d also be possible to copy-paste the contents of the arbitrary function and replace the arbitrary::<&str> with an arbitrary_take_rest::<&str> here, but I’m not sure it’s worth the effort)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Figured the implementation of <&str as Arbitrary>::arbitrary_take_rest would also suffice for our AccountId.

Any non-Ok call to the method for &str's should also be unsuccessful when generating AccountId's.

I believe they'd have used that implementation for a reason.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re right that str::arbitrary_take_rest currently does the same thing as your PR; I’ve just submitted rust-fuzz/arbitrary#151, let’s see what they say :) (either way, this is certainly not a blocker, the current behavior is already much better than the old one that just generated invalid AccountId’s)

core/account-id/src/lib.rs Outdated Show resolved Hide resolved
@miraclx
Copy link
Contributor Author

miraclx commented May 25, 2023

Also, it’d be nice to have a proper cargo feature flag for Arbitrary rather than rely on cfg(fuzzing), but it’s orthogonal to this PR

We presently do, at least until #9106 merges. I'm replacing it with the cfg attr because it is present when fuzzing. But it forces us to bump our MSRV to 1.63 even though it's conditional and near-account-id compiles just fine at 1.61.

Nothing should break here.

@miraclx miraclx force-pushed the miraclx/fix-arbitrary-account-id branch from e8cdc87 to 18223ac Compare May 25, 2023 16:30
Base automatically changed from miraclx/repin-account-id to master May 26, 2023 14:53
@miraclx miraclx force-pushed the miraclx/fix-arbitrary-account-id branch from dcc799b to 8d93a45 Compare May 26, 2023 15:02
@near-bulldozer near-bulldozer bot merged commit a782a6a into master May 26, 2023
2 checks passed
@near-bulldozer near-bulldozer bot deleted the miraclx/fix-arbitrary-account-id branch May 26, 2023 15:19
Err(e) => {
if let Some((valid_up_to, _)) = e.char {
let valid = &s[..valid_up_to];
debug_assert!(AccountId::validate(valid).is_ok());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@miraclx I feel this is an invalid assumption. Maybe I miss something, but wouldn’t a|xyz fail validation and this code then try to construct account id a as valid? (It is too short)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch there. Although, it won't actually construct an invalid account ID, due to the validated assertion. It would panic, and we don't want that either. I'll follow this up with a patch. Thank you!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably should be a fix of AccountId::validate (probably coupled with a major release as it’s technically a breaking change, even though it’s actually a bugfix):

  • ParseErrorKind::TooLong should return a char up-to-which it’s valid
  • ParseErrorKind::{InvalidChar,RedundantSeparator} should return None as a char if it’s actually not possible to make a valid account-id from a substring of the provided string

Does that make sense? (I had assumed that was the way validate worked during the review, so if we were two being tricked by this it’s probably worth fixing the root cause, unless there’s a specific reason for the current behavior of validate)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ParseErrorKind::TooLong should return a char up-to-which it’s valid

I'd argue there's nothing to gain from this information that AccountId::MAX_LEN doesn't already provide.

ParseErrorKind::{InvalidChar,RedundantSeparator} should return None as a char if it’s actually not possible to make a valid account-id from a substring of the provided string

I'm not convinced this helps either. The point of validate is to test if the complete input string slice adheres to all the rules and is safe to be represented as a NEAR AccountId. The feedback it provides is to aid eventual modification of the input by the user, and not necessarily dynamically by the program.

validate returning InvalidChar in frol's example only tells you that the second character is invalid. It doesn't provide any guarantees that what has been validated so far (when taken in isolation) will construct a valid ID. Validation is only complete after all the input has been consumed.

To do what you're suggesting, we'll have to do backtracked validation on error detection. And the only utility to that is to slice the input, (like we're doing with this arbitrary implementation). Which is only really useful if the input was computer-generated. AccountId inputs are often sourced from humans.

For example, in a_-b, a_ is perfectly valid until we detect that the next character is -. We can only return information about the current invalid character, as opposed to running a backtracked validation of the a_ as well.

We can move this outside though, and do it directly in our arbitrary implementation, in which case, all we have to do is run recursive validation on slices of the input (provided indexes from the validator). Until either the index reaches 0 or the validation returns an Ok.

✨: Even a slice of a valid AccountId isn't a valid AccountId without further validation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue there's nothing to gain from this information that AccountId::MAX_LEN doesn't already provide

If eg. the character at MAX_LEN is a ., then the char returned would be the one before that.

I'm not convinced this helps either. The point of validate is to test if the complete input string slice adheres to all the rules and is safe to be represented as a NEAR AccountId. The feedback it provides is to aid eventual modification of the input by the user, and not necessarily dynamically by the program.

I guess the question raised here is what do you want the API of validate to actually be. Good news: I just found out that char is a private API anyway, so it’s not a big deal, and whichever option we choose here (changing validate or changing the arbitrary implementation), it’s not going to have any impact in practice.

I guess my takeaway from this discussion would be, we should rename ParseAccountError::char into something like error_on, so that it’s obvious that it doesn’t indicate the location until which the provided account-id was valid. WDYT?

near-bulldozer bot pushed a commit that referenced this pull request May 31, 2023
Fixes unintended panic in the `Arbitrary` implementation of `AccountId`.

As pointed out in #9107 (comment), certain `Unstructured` data sources like `a|bcd` or `a_-b` causes a panic with the current implementation.

This patch reduces the consumed input to the limit of validity adherence. Failing otherwise. But never panics.

Worth noting this implementation does not provide a fallback for input sources shorter than `AccountId::MIN_LEN` or longer than `AccountId::MAX_LEN`.
nikurt pushed a commit that referenced this pull request May 31, 2023
Looks like the former implementation of `Arbitrary` for `AccountId` would assume all strings were well-formed `AccountId`'s. Skipping the validation logic. Which is untrue.

This patch swaps out the derive for a custom implementation that ensures all arbitrary byte slices are validated before an `AccountId` is constructed.
nikurt pushed a commit to nikurt/nearcore that referenced this pull request Jun 8, 2023
)

Looks like the former implementation of `Arbitrary` for `AccountId` would assume all strings were well-formed `AccountId`'s. Skipping the validation logic. Which is untrue.

This patch swaps out the derive for a custom implementation that ensures all arbitrary byte slices are validated before an `AccountId` is constructed.
nikurt pushed a commit to nikurt/nearcore that referenced this pull request Jun 8, 2023
)

Looks like the former implementation of `Arbitrary` for `AccountId` would assume all strings were well-formed `AccountId`'s. Skipping the validation logic. Which is untrue.

This patch swaps out the derive for a custom implementation that ensures all arbitrary byte slices are validated before an `AccountId` is constructed.
nikurt pushed a commit that referenced this pull request Jun 9, 2023
Fixes unintended panic in the `Arbitrary` implementation of `AccountId`.

As pointed out in #9107 (comment), certain `Unstructured` data sources like `a|bcd` or `a_-b` causes a panic with the current implementation.

This patch reduces the consumed input to the limit of validity adherence. Failing otherwise. But never panics.

Worth noting this implementation does not provide a fallback for input sources shorter than `AccountId::MIN_LEN` or longer than `AccountId::MAX_LEN`.
nikurt pushed a commit that referenced this pull request Jun 13, 2023
Looks like the former implementation of `Arbitrary` for `AccountId` would assume all strings were well-formed `AccountId`'s. Skipping the validation logic. Which is untrue.

This patch swaps out the derive for a custom implementation that ensures all arbitrary byte slices are validated before an `AccountId` is constructed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants