Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maybe inconsistent duplicate check in record_artifacts_as_dict #601

Open
lukpueh opened this issue May 16, 2023 · 5 comments
Open

Maybe inconsistent duplicate check in record_artifacts_as_dict #601

lukpueh opened this issue May 16, 2023 · 5 comments

Comments

@lukpueh
Copy link
Member

lukpueh commented May 16, 2023

record_artifacts_as_dict helper called from runlib.in_toto_* interface functions returns a dictionary of artifact URIs with their hashes. The hashes are generated in batches per artifact resolver and combined here:

in-toto/in_toto/runlib.py

Lines 182 to 187 in 4581e4b

# Hash artifacts in a batch per resolver
# FIXME: The behavior may change if we hash each artifact individually,
# because the left-prefix duplicate check in FileResolver only works for the
# artifacts hashed in one batch.
for resolver, uris in resolver_for_uris.items():
artifact_hashes.update(resolver.hash_artifacts(uris))

Currently, there is no URI duplicate check when combining these batches. This may not be necessary, because artifact URIs are usually distinguished by their schemes, but artifact resolver implementations may decide to arbitrarily mangle the URIs prior to returning them. Relying on a resolver to not return URIs that we already have is brittle.

At the same time, we do raise on duplicate URIs as a result of file mangling (lstrip) in FileResolver.hash_artifacts:

# Fail if left-stripping above results in duplicates
if self._lstrip_paths and path in existing_paths:
raise PrefixError(
"Prefix selection has resulted in non unique dictionary key "
f"'{path}'"
)

Expected behaviour
A user should consistently be informed, if, when using an interface function, which record artifacts, the results may not be as expected, because of duplicate URIs. IMO a warning should be enough.

@lukpueh lukpueh mentioned this issue May 16, 2023
3 tasks
@adityasaky
Copy link
Member

adityasaky commented May 24, 2023

I agree with adding a check at the point marked in the first snippet. I think, however, a warning may not suffice. Are there scenarios where an overwrite is fine and it's not an error? I can't think of a valid use case and I prefer to error rather than (relatively) hide away the behaviour. If the warning is missed, the metadata is less complete than the user thinks, which I'm not a fan of.

@chasen-bettinger
Copy link
Contributor

@adityasaky @lukpueh Did you two want to move forward with erroring instead of warning?

@lukpueh
Copy link
Member Author

lukpueh commented May 31, 2023

@adityasaky is right, that this shouldn't go undetected, so erroring out does make sense. Not sure, which error to use though. PrefixError does not fit here. Adding a new custom error seems overkill, given that it's unlikely that a caller will handle it, but it probably makes sense for semantics. What do you think?

@chasen-bettinger
Copy link
Contributor

@lukpueh I'm all for being explicit and clear when the option is available. Overkill? Yea, I see it. A better user experience? I think so.

@adityasaky
Copy link
Member

Let's introduce a new exception and replace PrefixError with it when we're ready for in-toto 3.0.0, a generic error for any artifact collisions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants