Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some general improvements / discussion #243

Conversation

cpaulik
Copy link
Contributor

@cpaulik cpaulik commented Feb 19, 2025

This adds three things

  1. Allows validation of extensions for collections.
  2. Improve error reporting by using best_match to select the error.
  3. Add a schema map argument similar to the node-stac-validator schemaMap to allow validating against local copies of schemas during development.

I'm happy to open separate MRs if one of the features is not wanted.
I'm also happy to add tests once we agree that a feature is going to be included.

@cpaulik cpaulik force-pushed the validate-collections-add-schema-map-improve-error-reporting branch from fb75fca to e37cf56 Compare February 19, 2025 12:52
@cpaulik cpaulik force-pushed the validate-collections-add-schema-map-improve-error-reporting branch from e37cf56 to acffbf1 Compare February 20, 2025 09:42
@cpaulik
Copy link
Contributor Author

cpaulik commented Feb 20, 2025

Force pushed a fix for the Mypy issues and updated existing tests as well.

@jonhealy1
Copy link
Collaborator

Hi @cpaulik all good ideas here :)

@cpaulik cpaulik force-pushed the validate-collections-add-schema-map-improve-error-reporting branch from d988288 to 7c60300 Compare February 20, 2025 13:32
@cpaulik
Copy link
Contributor Author

cpaulik commented Feb 20, 2025

Thanks. Tests should hopefully pass after the latest push. I forgot a breakpoint() in the code. 🤦

@cpaulik
Copy link
Contributor Author

cpaulik commented Feb 20, 2025

I'll add tests for the schema_map feature next. Still need to figure out if additional tests for the collection extensions functionality are needed. At least one of the tests I adapted seems to do that already.

cpaulik and others added 2 commits February 20, 2025 22:22
This also adds a schema property to make sure the validation message references
the overridden schema and not the original one
@cpaulik
Copy link
Contributor Author

cpaulik commented Feb 21, 2025

I've added another test for making sure references to sub-schemas also are replaced by the schema_map correctly.
So from my side this is done. Looking forward to the review.

@jonhealy1
Copy link
Collaborator

Hi @cpaulik. Looks really good. Just a couple of things. 1. An explanation and an example of a schema map, how to use them, to the readme. 2. An entry in the changelog, highlighting these changes. The changes would be added under the Unreleased section for now. Thanks!

@cpaulik
Copy link
Contributor Author

cpaulik commented Feb 25, 2025

Thanks, I added both. Let me know if that is what you had in mind.

@cpaulik
Copy link
Contributor Author

cpaulik commented Feb 25, 2025

I did just discover a small bug - stac-validator doesn't seem to validate extensions for collections if recursive mode is used. I'll need to look into that

@cpaulik
Copy link
Contributor Author

cpaulik commented Feb 25, 2025

That was just a matter of not keeping all the schemas in the message. That' fixed with the latest commit and handled similar to how it it done with item during recursive validation

@cpaulik
Copy link
Contributor Author

cpaulik commented Feb 25, 2025

I did just find another issue. It seems in recursive mode errors are not correctly reported since the default_validator doesn't actually raise a jsonschema.exceptions.ValidationError but just returns a message with valid_stac set to False.

I'll fix that as well.

@cpaulik
Copy link
Contributor Author

cpaulik commented Feb 25, 2025

The last commit should fix recursive reporting of errors for collections.

I do have a question about error reporting in the recursive case.
At the moment stac-validator reports all the messages. Wouldn't it be more useful to only show the messages that have valid_stac set to False in case of error?

At the moment in recursive mode we could print out 100s of messages that are valid which makes it hard to find the ones that actually cause the error.

@jonhealy1
Copy link
Collaborator

@cpaulik You are right about recursive messaging. People may want some feedback so they can see the validation is working. An optional verbose mode for recursive would probably be the best.

@cpaulik
Copy link
Contributor Author

cpaulik commented Mar 4, 2025

the last commit does the following in recursive mode

  • if an error is found -> only show the items with errors and set exitcode 1
  • if an error is found and verbose mode is on -> show all items and set exitcode 1
  • if no error is found show all items and set exitcode 0

Exitcode is unchanged and verbose mode with no errors is also unchanged.

@cpaulik
Copy link
Contributor Author

cpaulik commented Mar 5, 2025

I found another bug when validating a catalog with multiple child collections. Previously the StacValidate.valid was not set correctly if the last child collection was valid. I've fixed this and also added a test for it.

Copy link
Collaborator

@jonhealy1 jonhealy1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi. Great work here. What do you think about a config json (maybe containing key value pairs) that could be used instead of specifying numerous --schema-map pairs. It wouldn't replace what you are doing here, it would just be another option.

README.md Outdated
default mode.
-c, --custom TEXT Validate against a custom schema (local
filepath or remote schema).
--schema-map <TEXT TEXT>... Schema path to replaced by (local) schema path
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there should be a -s option for example, to make it a little easier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

"schema": [
"https://cdn.staclint.com/v1.0.0-beta.1/collection.json",
],
"schema": [],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to still show the schema here that was validated against,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure actually. This is an interesting case. This test runs with extensions = True which means only validate extensions if I understand correctly.

The json file we validate against here has an empty list in stac_extensions so in this case it seems correct to not validate against anything.

I've changed this to fall back on the core-validator in this case but I would argue that at least a warning should be logged in this case.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If only extensions are to be validated you are correct. Otherwise, every item validates against the relevant item spec schema by default. You can change this back if you would like.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--extensions is for extensions only, so we should change this back before we merge

@cpaulik
Copy link
Contributor Author

cpaulik commented Mar 7, 2025

What do you think about a config json (maybe containing key value pairs) that could be used instead of specifying numerous --schema-map pairs. It wouldn't replace what you are doing here, it would just be another option.

I think it could be useful but I would want to make that a separate MR. We could take some inspiration from the config file from stac-node-validator

@jonhealy1 jonhealy1 self-requested a review March 8, 2025 01:49
Copy link
Collaborator

@jonhealy1 jonhealy1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really great work here, thank you!

@jonhealy1
Copy link
Collaborator

The config file from stac-node-validator looks really useful. A separate PR would be better.

@jonhealy1 jonhealy1 self-requested a review March 8, 2025 02:05
@jonhealy1 jonhealy1 merged commit 57f8191 into stac-utils:main Mar 12, 2025
6 checks passed
@cpaulik
Copy link
Contributor Author

cpaulik commented Mar 13, 2025

Thanks for merging and making a release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants