Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix 1611 incorrect errors when validating data with a missiing required column #1620

Conversation

amelie-rondot
Copy link
Contributor


This PR fixes incorrect errors which occur when validating data with missing required column, for the specific case of schema_syncoption.

After investigating, the better way to fix these incorrect errors we found is to remove missing columns from field_info Row.__init__() parameter at the creation of the Row in row_stream() local method.

Test cases have been added to traduce expected behavior:
Validation of tabular data with schema with one or many required fields, and data with one or many missing required columns and schema_sync option is set to true

Other test cases have not been broken.

Expected behavior could also be observed with command line:

frictionless run validate data.csv --schema schema.json --schema-sync

with:

schema.json
----
{
    "$schema": "https://frictionlessdata.io/schemas/table-schema.json",
    "fields": [
        {
            "name": "A",
            "title": "Field A",
            "type": "string",
            "constraints": {"required": true}
        },
        {"name": "B", "title": "Field B", "type": "string"},
        {"name": "C", "title": "Field C", "type": "string"}
    ]
}

data.csv
----
B, C
b, c

Only one error missing-label in the header's field "A" occurs in the validation report, as expected.

@roll
Copy link
Member

roll commented Jan 24, 2024

Thanks @amelie-rondot !

Can you please make format?

@amelie-rondot
Copy link
Contributor Author

@roll thanks a lot for reviewing!
I committed format fixes.

…hen-validating-data-with-a-missiing-required-column
Copy link
Member

@roll roll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I added a comment

@@ -310,7 +310,18 @@ def row_stream():
for row_number, cells in enumerated_content_stream:
self.stats.rows += 1

# Create row
# NB: missing required labels are not included in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move it from the row iteration cycle? Currently, this code will be executed for every row in a table

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!
Indeed, you're right, it's possible to move this block before initializing row_stream TableResource attribute.
I've just committed it. I also added some new test cases to insure that resulting behaviour corresponds to the expected one.

Copy link
Member

@roll roll left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@roll roll merged commit 75be599 into frictionlessdata:main Jan 29, 2024
2 of 6 checks passed
@amelie-rondot amelie-rondot deleted the Fix-1611-incorrect-errors-when-validating-data-with-a-missiing-required-column branch February 6, 2024 11:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Incorrect errors when validating data with a missing required column
2 participants