-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add data provider #391
Add data provider #391
Conversation
Codecov Report
@@ Coverage Diff @@
## master #391 +/- ##
==========================================
+ Coverage 80.73% 82.96% +2.22%
==========================================
Files 5 6 +1
Lines 244 317 +73
==========================================
+ Hits 197 263 +66
- Misses 47 54 +7 see 3 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
yield model | ||
|
||
def new(self, match_criteria, new_dependencies=True, data_only=True): | ||
matching_definitions = self._select_definitions(match_criteria) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we do something when match_criteria do not match anything in configuration file?
Right now this will be an empty generator, so tests that want to handle a case when nothing matches need to do:
obj = next(data_provider.obj.new(...), None)
if not obj:
# handle this gracefully
If test expects something to match and blindly uses generated value, it will fail with uncaught StopIteration
. I think failing test is OK, but StopIteration
is not very obvious failure root cause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think explicitly checking and handling None
like this is better than letting a test blow up with StopIteration
, as long as the checks we add will log/raise an error with more specific details.
Alternatively, if we expect this sanity-checking to be a common pattern, we could write a context manager to wrap the generator call, and that context manager could be responsible for checking None
or excepting the StopIteration
and also log an error or raise a more appropriate exception. I don't necessarily love this idea because it might get ugly or not work in all cases; I'm just dumping an idea here in case it inspires a better idea.
Or the suggestion from my other comment about having a regular non-generator function/method for the use cases that don't need to iterate results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm just going to raise something inheriting from ValueError with a message along the lines of "provided match_criteria did not match anything; check your config file".
In rare cases where it's ok-ish to not match anything, caller can just catch this exception.
cred = Credential(cred_type="network", client=shared_client, password=uuid4()) | ||
cred.create() | ||
# add the id to the list to destroy after the test is done | ||
cleanup.append(cred) | ||
cred = next( | ||
data_provider.credentials.new( | ||
match_criteria={"type": "network", "password": Table.is_not_null()}, | ||
data_only=False, | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more of an exercise. We never run scan with that credential, so we don't need a valid password. However, it shows how we can save typing cred.create()
and cleanup.append(cred)
all over again - both can be handled by DataProvider.
|
||
cred.ssh_keyfile = f"/sshkeys/{tmp_dir}/{sshkeyfile_name}" | ||
cred.ssh_keyfile = ssh_network_cred_data.ssh_keyfile |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm really happy how this turned out. As long as we can assume that config contains at least one credential with valid ssh key, we can use DataProvider to generate an endless stream of different credentials using SSH key. This effectively turns DISCOVERY-284 from a blocker to wishlist item, and allows us to run entire suite sooner than I initially expected.
sshkeyfile.touch() | ||
ssh_network_cred_data = next( | ||
data_provider.credentials.new( | ||
match_criteria={"type": "network", "sshkeyfile": Table.is_not_null()}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I meant - I'm not super happy that we make littletable a core of Camayoc API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something I don't like either, to have Table
exposed but I'm not seeing this as a blocker issue we can live with that.
@@ -65,7 +65,7 @@ def test_restart(shared_client, cleanup, source_type): | |||
6) Assert that the scan completes | |||
:expectedresults: A restarted scan completes | |||
""" | |||
scn = util.prepare_scan(source_type, cleanup) | |||
scn = util.prepare_scan(source_type, data_provider) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I admit this is one use case I have missed - sometimes you want an object, but you have certain requirements about its dependencies. It's not very clear how DataProvider could fill this, especially while maintaining backwards compatibility.
Right now your only option is to ask for a first object in dependency chain that you have requirements for, and then construct the rest manually. This requires you to still call QPCModel.create()
and cleanup()
. You can hide this in helper function, like here, but it does feel like something DataProvider should provide.
I'm taking some time to digest. sorry for the delay 😅 |
model = self._create_from_definition( | ||
definition=definition, new_dependencies=new_dependencies, data_only=data_only | ||
) | ||
yield model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I've overlooked something in this PR, but are you ever actually utilizing the iterability of these generators yet? It looks like you're currently only calling next
once before discarding the generator (like in test_update_password_to_sshkeyfile
), which feels like the generator is an unnecessary abstraction.
I assume you plan to add more iteration logic around this framework after it is established. Maybe seeing an example of that would help complete my understanding of how everything will work together.
If you expect a common use case is to require only one result from the generator, where iteration is not relevant or useful, I might consider adding an alternate method that is expected to act like a regular function and return only exactly one value. Maybe just some syntactic sugar to wrap next(iterator, None)
and make it a bit more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about cases like tests.qpc.cli.test_credentials::test_clear_all
, where you want multiple credentials/sources/scans. Initially I had separate methods for single object and multiple objects, but eventually decided it's easier to consider single object to be a special case - even if it's pretty common special case.
Since naming is hard, how would you approach this? I'm thinking:
class ModelWorker:
def defined(self, match_criteria) -> ModelClass:
def new(self, match_criteria) -> ModelClass:
def defined_generator(self, match_criteria) -> Generator[ModelClass, None, None]:
def new_generator(self, match_criteria) -> Generator[ModelClass, None, None]:
(Generator type signature is weird - TIL you can do generator.send(value)
)
But maybe it should be many_defined
? Or defined_many
? Or leave defined
as generator and add single_defined
/ one_defined
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Naming is hard. 🙂
I don't have any super strong opinions, and I believe your style opinion is more important here since you spend much more time in this repo than I do, but I think my preference would be explicit about both singular and plural outputs like define_one
, define_many
, new_one
, and new_many
. This removes all ambiguity at a glance.
Also, yes, type hints are weird for generators. An alternate syntax that I would suggest is the simpler Iterator[ModelClass]
. You can use this if you use the output only as a simple iterator and if you are not using generator-specific send
and throw
. Iterator
is the parent ABC of Generator
: https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes
6027752
to
ad3ac85
Compare
The CI is currently broken due to typing-extensions 4.6.0 (released earlier today) and pydantic: typing-extensions bug , pydantic bug. Locally you can |
@quipucords/development @ruda this is now ready for review and merge. For current CI failure, see comment above. I'm open to suggestions and comments, but at this point my plan is to merge it sooner rather than later and move all ssh key tests to data provider, so we can reap the benefits. Let the results speak for themselves. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
as for the issue with dependency versions, shouldn't we lock dependencies on requirements.txt or addopt poetry?
@bruno-fs We should adopt poetry, yes. DISCOVERY-298 is about that, sitting at low priority. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's go ahead and merge it.
sshkeyfile.touch() | ||
ssh_network_cred_data = next( | ||
data_provider.credentials.new( | ||
match_criteria={"type": "network", "sshkeyfile": Table.is_not_null()}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something I don't like either, to have Table
exposed but I'm not seeing this as a blocker issue we can live with that.
This is a proposal of new data providing facility. It's far from complete, but it may work for certain cases and I feel it's complete enough to enable discussion about something tangible.
This is fundamental framework work that - once completed - will enable us to tackle DISCOVERY-293. As it turned out, it will also enable to mostly tackle DISCOVERY-284. In other words, with this done, we will be set up to run all API and CLI tests in CI.
Commit 2f62d0f is for illustrative purposes only. I will remove it before opening PR for merge.
The basic assumption is that most of the time, tests want data that is valid, and that can be safely removed afterwards. Sometimes tests want only data (i.e. negative object creation tests), sometimes they want created data (i.e. negative object update tests). Sometimes they want shared dependencies (to save time needed to create them), sometimes they want completely new chain of dependencies (to ensure independence from other tests).
Another important point is that data should be created lazily, only when actually needed. This is large problem with current setup - scans are run automatically when running any test, because some tests might need scan data - even if we deselected them all in current pytest session.
The data provider is meant as a helper for most common use cases, not mandatory solution for all your data needs. You can still create and maintain QPCModels manually, as you did in the past. (Although I do feel that data provider should be the only one responsible for data cleanup, no matter if objects were created manually or not.)
Open questions: