Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Custom keyword validation #473

Merged
merged 19 commits into from
May 1, 2024

Conversation

samgqroberts
Copy link
Contributor

@samgqroberts samgqroberts commented Apr 10, 2024

This was originally put up by @bnjt here #394. That repository was deleted which deleted the PR, so this PR re-creates that PR.

Additionally, this PR adds an improvement to the original PR whereby existing keywords can also have their behavior be overridden by custom logic.

Resolves #379
Resolves #354

Copy link
Owner

@Stranger6667 Stranger6667 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! The main point I'd like to discuss is whether there should be a way to store extra, user-defined state inside custom validators. What do you think?

}
true
}
let definition = CustomKeywordDefinition::Validator {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking about whether it would give more flexibility if there was a trait instead? This way it would be possible to have some inner state inside

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like a good idea, but a few clarifying questions: do you mean creating a trait like CustomValidator with the validate and is_valid functions as methods on it, and then the user would define a struct that implements CustomValidator and provide it to CompilationOptions (in like a Box I think)? If so perhaps it should be combined somehow with the Validator trait?

Also while instinctually this feels like the right move, I'm struggling a bit to come up with a concrete use case example, like for a test case. There's this #394 (comment), but I wonder if that doesn't need state internal to the validator, the validate / is_valid functions could potentially just reference the lookup table. Something like that's probably worth a test case anyway also.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean creating a trait like CustomValidator with the validate and is_valid functions as methods on it, and then the user would define a struct that implements CustomValidator and provide it to CompilationOptions (in like a Box I think)? If so perhaps it should be combined somehow with the Validator trait?

Yep, or, maybe exposing the Validator trait?

Also while instinctually this feels like the right move, I'm struggling a bit to come up with a concrete use case example, like for a test case

An example that I have in mind is a counter. Even though it might sound trivial, it unblocks interesting use cases like collecting usage statistics from different keywords (i.e. if the user "wraps" the existing keyword implementation with such a counter), which in turn is useful for building coverage maps for test cases sent to API endpoints (documented with Open API), which in turn is useful for coverage-guided fuzzing of such endpoints.

Comment on lines 1 to 9
use crate::compilation::context::CompilationContext;
use crate::compilation::options::CustomKeywordDefinition;
use crate::keywords::CompilationResult;
use crate::paths::{InstancePath, JSONPointer, PathChunk};
use crate::validator::Validate;
use crate::ErrorIterator;
use serde_json::{json, Value};
use std::fmt::{Display, Formatter};
use std::sync::Arc;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got to set rustfmt.toml to make sure everything is consistently formatted, but generally the imports are grouped in this crate.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, I merged #474 so, rustfmt now checks for such cases


// An Arc<Value> is used so the borrow checker doesn't need explicit lifetime parameters.
// This would pollute dependents with lifetime parameters.
pub(crate) type CustomValidateFn =
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the CustomIsValidFn function uses regular references, do we need Arc here? I.e. how would the signature look like otherwise as you mention extra lifetimes

// define compilation options that include the custom format and the overridden keyword
let validator = Arc::new(Mutex::new(Box::new(CountingValidator { count: 0 })));
let mut options = JSONSchema::options();
let options = options.with_custom_keyword("countme", validator.clone());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Stranger6667 unfortunately I think I've hit the end of my expertise with trait objects, Arc, Mutex, Box, etc. Everything else compiles, however I can't figure out how to maintain the type of validator here so that I can access the count member later on in assertions.

error[E0308]: mismatched types
   --> src/compilation/mod.rs:623:62
    |
623 |         let options = options.with_custom_keyword("countme", validator.clone());
    |                               -------------------            ^^^^^^^^^^^^^^^^^ expected `Arc<Mutex<Box<...>>>`, found `Arc<Mutex<Box<CountingValidator>>>`
    |                               |
    |                               arguments to this method are incorrect
    |
    = note: expected struct `Arc<std::sync::Mutex<Box<(dyn for<'instance, 'schema> CustomKeywordValidator<'instance, 'schema> + 'static)>>>`
               found struct `Arc<std::sync::Mutex<Box<CountingValidator>>>`
    = help: `CountingValidator` implements `CustomKeywordValidator` so you could box the found value and coerce it to the trait object `Box<dyn CustomKeywordValidator>`, you will have to change the expected type as well

Also, I'm not thrilled with how the with_custom_keyword interface requires an Arc of a Mutex of a Box of the thing... for non inner state use cases that's unnecessary (see the other custom keyword tests above), but I'm not sure how to support both. Any ideas for either problem? (including suggestions like I've modeled this completely wrong lol)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at it right now. I'll push a few commits to your branch which will this work and then we can continue discussing the interface :)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a few thoughts:

Given the inner state, I think that custom_keywords should store initialization functions (boxed this time, unlike formats), so such validators won't be shared across multiple locations in the schema. This will remove the need for Arc<Mutex<T>>, so no unconditional locking when any custom validator is used - stateless validators will be more performant. Alternatively, we can store something that implements Validator + Default, which is likely a better option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, so, I think again my inexperience is showing. My attempts with Validator + Default fail because that cannot be a trait object, and if we try to store a fn in CompilationOptions we can't capture variables when in the closure and that wouldn't be able to clone the count in the inner state test, and if we try to store a Fn then that's either not thread safe or not cloneable. Any ideas?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I haven't thought about the trait object thing. Let me play with the code a bit

P.S. Offtopic - I've put together some ideas about this library's 1.0 API and would appreciate any feedback (see #475)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in a240bd5! However, there are a few issues:

  • Initialization functions require &'static so, it is problematic to pass a value there (like a counter) and then read from it without leaking.
  • The API does not spark joy :( However, given that there could be multiple occurrences of the same keyword, every one of them should have its separate state, so it is probably the only way to provide this kind of capability. I.e. - each validator is unique, so we need to give a way to initialize each of them. With stateless validators, things could be simpler.

Not 100% sure if it is worth the trouble and if the use case I mentioned above could be implemented differently

true
}
}

// define compilation options that include the custom format and the overridden keyword
let validator = Arc::new(Mutex::new(Box::new(CountingValidator { count: 0 })));
let count = Arc::new(Mutex::new(0));
let boxed: Box<dyn CustomKeywordValidator> = Box::new(CountingValidator {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's interesting that Rust can't preserve that this is a CountingValidator while passing it to a function that expects a dyn CustomKeywordValidator. That's what I was getting hung up on. The inner state being behind another Arc<Mutex<T>> seems like a good alternative.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, Rust preserves this information, but behind an indirection. Dynamic dispatch "knows" at runtime what kind of object is there, so it can do a vtable lookup and find the relevant implementation of methods from CustomKeywordValidator. It is also possible to downcast dyn CustomKeywordValidator to CountingValidator via downcast_ref, though it is not that convenient to use.

Without this kind of indirection, it would not be clear how much space to allocate on the stack for that type + as we store it in a hashmap, we can only use boxed trait objects.

@Stranger6667
Copy link
Owner

Let me know if there is anything I can do to help with moving this forward :)

@samgqroberts
Copy link
Contributor Author

@Stranger6667 sounds good! day job has taken over this week but I'll dig back into this PR this weekend

@Stranger6667
Copy link
Owner

I am thinking about moving the internal state of CompiledCustomKeywordValidator to the inner validator, so if they are not needed, the user may choose to keep the overall structure smaller, i.e. pushing this decision to the user rather than always storing it (as in compile_custom_keyword_validator)

So, instead of closure without arguments, it could be an equivalent of compile-like functions which will return Box<CustomValidator> which in turn will be stored in the wrapper.

However, at this point, the CustomKeywordValidator trait looks very akin to Validator, so I am thinking if it should be used instead, this way there will be one less indirection + we'll reuse the existing code. Let me know what you think!

@Stranger6667
Copy link
Owner

Stranger6667 commented Apr 22, 2024

I have an idea how what the API could look like:

#[derive(Debug)]
struct AsciiKeyword {
    size: usize
}

impl jsonschema::CustomKeyword for AsciiKeyword {
    fn is_valid(&self, instance: &Value) -> bool {
        todo!()
    }
}

fn ascii_keyword_factory(schema: &Value) -> Arc<dyn jsonschema::CustomKeyword> {
    Arc::new(AsciiKeyword { size: 42 })
}

let validator = JSONSchema::options()
    .with_keyword(
        "ascii",
        |schema| -> Arc<dyn jsonschema::CustomKeyword> {
            Arc::new(AsciiKeyword { size: 42 })
        }
    )
    .with_keyword("also-ascii", ascii_keyword_factory)
    .compile(&schema)
    .unwrap();

I've implemented it for the new jsonschema version - at least it compiles and the code above works. Here is the implementation:

  • Traits - CustomKeyword that provides the relevant methods (is_valid, etc), CustomKeywordFactory is what is actually stored inside options, and is implemented for functions that accept 1 argument of the Value type. Those should be extended to match all the needed arguments from the current implementation
  • Function to add new keywords

With some adjustments (mainly for other arguments like path, etc) it should work

P.S. I haven't solved how to use such trait objects just yet, but I think it should be possible

@Stranger6667
Copy link
Owner

I'll push some changes to make things work with the API I described above, though there is still enough room to adjust compilation / is_valid / validate signatures so that your use case is still possible

@samgqroberts
Copy link
Contributor Author

@Stranger6667 Thanks for running with this. I'll be ready to take another look after your updates.

@Stranger6667
Copy link
Owner

I'll add some more changes soon! Here are a few observations:

  • Some logic should be moved to the "build" phase, so as soon as we know that the schema is invalid it should be reported. Hence the keyword factory should return Result
  • Should custom validators return a single error? It may simplify the logic a bit

Otherwise, I think that the path forward is more or less clear

@Stranger6667
Copy link
Owner

Stranger6667 commented Apr 30, 2024

Update:

There are not that many things left:

  • Tune privacy of some types (InstancePath)
  • Update docs (missing docs + doctests)
  • Tests for schema errors inside custom keywords
  • Custom validation errors (there should be a way to provide a custom error message)
  • Expose Validate instead of Keyword (they are more or less the same)
  • Try to make keyword factories work with closures without adding a lifetime to CompilationOptions
  • Extract changes related to paths.rs to a separate PR

Benjamin Tobler and others added 15 commits May 1, 2024 14:20
Add support for user defined custom keyword validation. The user
provides and registers custom validator functions when configuring a
JSONSchema.

Custom keyword validators may be used when the user wants to enforce
constraints that can't, or can't easily, be expressed in JSON schema.
Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
Copy link

codecov bot commented May 1, 2024

Codecov Report

Attention: Patch coverage is 94.66192% with 15 lines in your changes are missing coverage. Please review.

Project coverage is 90.02%. Comparing base (092b573) to head (b284f75).

Files Patch % Lines
jsonschema/src/compilation/mod.rs 94.59% 12 Missing ⚠️
jsonschema/src/keywords/custom.rs 87.50% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #473      +/-   ##
==========================================
+ Coverage   89.70%   90.02%   +0.32%     
==========================================
  Files          57       58       +1     
  Lines        9643     9923     +280     
==========================================
+ Hits         8650     8933     +283     
+ Misses        993      990       -3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
Signed-off-by: Dmitry Dygalo <dmitry@dygalo.dev>
@Stranger6667
Copy link
Owner

Stranger6667 commented May 1, 2024

I think it is ready to merge!

From what I see it should be possible to implement all the cases mentioned in different issues:

Please, let me know if there is anything missing to cover your use cases. I'd be glad to work on them here, or in a follow-up PR.

Thank you all for opening those issues and providing a lot of details and context, I appreciate it a lot! Sorry for the delay, but I hope to get this feature released this week.

cc @eirnym @platoscave as you participated in #394, feel free to leave your comments here :)

@Stranger6667 Stranger6667 merged commit aa94a4b into Stranger6667:master May 1, 2024
30 of 31 checks passed
@samgqroberts
Copy link
Contributor Author

👏 Thank you @Stranger6667 – the final draft looks great for my use case

@samgqroberts samgqroberts deleted the custom-validators branch May 18, 2024 16:56
@Stranger6667
Copy link
Owner

Thank you @samgqroberts ! Let me know if you need anything else from my side :) I’d be happy to chat

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Custom Validators
3 participants