Add data provider #391

mirekdlugosz · 2023-04-12T08:46:46Z

This is a proposal of new data providing facility. It's far from complete, but it may work for certain cases and I feel it's complete enough to enable discussion about something tangible.

This is fundamental framework work that - once completed - will enable us to tackle DISCOVERY-293. As it turned out, it will also enable to mostly tackle DISCOVERY-284. In other words, with this done, we will be set up to run all API and CLI tests in CI.

Commit 2f62d0f is for illustrative purposes only. I will remove it before opening PR for merge.

The basic assumption is that most of the time, tests want data that is valid, and that can be safely removed afterwards. Sometimes tests want only data (i.e. negative object creation tests), sometimes they want created data (i.e. negative object update tests). Sometimes they want shared dependencies (to save time needed to create them), sometimes they want completely new chain of dependencies (to ensure independence from other tests).

Another important point is that data should be created lazily, only when actually needed. This is large problem with current setup - scans are run automatically when running any test, because some tests might need scan data - even if we deselected them all in current pytest session.

The data provider is meant as a helper for most common use cases, not mandatory solution for all your data needs. You can still create and maintain QPCModels manually, as you did in the past. (Although I do feel that data provider should be the only one responsible for data cleanup, no matter if objects were created manually or not.)

Open questions:

Should dependencies be created through the same interface as object under test itself? Let's say you are running CLI test for scans. With current implementation, you ask data provider for valid data of scan, but source and credential of this potential scan are going to be created through HTTP API anyway.
Using littletable for object querying. Alternative projects with similar goals are asq and PyFunctional. I picked littletable because it seems to be the most active right now. But whatever we pick, I'm not super happy that it is becoming the core API camayoc is depending on. On the other hand, I am picking something precisely because I don't want to invent my own iterable querying API.

codecov · 2023-04-12T08:49:36Z

Codecov Report

Merging #391 (6027752) into master (eb87cc2) will increase coverage by 2.22%.
The diff coverage is n/a.

❗ Current head 6027752 differs from pull request most recent head ad3ac85. Consider uploading reports for the commit ad3ac85 to get more accurate results

@@            Coverage Diff             @@
##           master     #391      +/-   ##
==========================================
+ Coverage   80.73%   82.96%   +2.22%     
==========================================
  Files           5        6       +1     
  Lines         244      317      +73     
==========================================
+ Hits          197      263      +66     
- Misses         47       54       +7

see 3 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

mirekdlugosz · 2023-04-12T08:47:34Z

camayoc/data_provider.py

+            yield model
+
+    def new(self, match_criteria, new_dependencies=True, data_only=True):
+        matching_definitions = self._select_definitions(match_criteria)


Should we do something when match_criteria do not match anything in configuration file?

Right now this will be an empty generator, so tests that want to handle a case when nothing matches need to do:

obj = next(data_provider.obj.new(...), None) if not obj: # handle this gracefully

If test expects something to match and blindly uses generated value, it will fail with uncaught StopIteration. I think failing test is OK, but StopIteration is not very obvious failure root cause.

Yeah, I think explicitly checking and handling None like this is better than letting a test blow up with StopIteration, as long as the checks we add will log/raise an error with more specific details.

Alternatively, if we expect this sanity-checking to be a common pattern, we could write a context manager to wrap the generator call, and that context manager could be responsible for checking None or excepting the StopIteration and also log an error or raise a more appropriate exception. I don't necessarily love this idea because it might get ugly or not work in all cases; I'm just dumping an idea here in case it inspires a better idea.

Or the suggestion from my other comment about having a regular non-generator function/method for the use cases that don't need to iterate results.

I think I'm just going to raise something inheriting from ValueError with a message along the lines of "provided match_criteria did not match anything; check your config file".

In rare cases where it's ok-ish to not match anything, caller can just catch this exception.

mirekdlugosz · 2023-04-12T08:48:30Z

camayoc/tests/qpc/api/v1/credentials/test_network_creds.py

-    cred = Credential(cred_type="network", client=shared_client, password=uuid4())
-    cred.create()
-    # add the id to the list to destroy after the test is done
-    cleanup.append(cred)
+    cred = next(
+        data_provider.credentials.new(
+            match_criteria={"type": "network", "password": Table.is_not_null()},
+            data_only=False,
+        )
+    )


This is more of an exercise. We never run scan with that credential, so we don't need a valid password. However, it shows how we can save typing cred.create() and cleanup.append(cred) all over again - both can be handled by DataProvider.

mirekdlugosz · 2023-04-12T08:49:02Z

camayoc/tests/qpc/api/v1/credentials/test_network_creds.py


-    cred.ssh_keyfile = f"/sshkeys/{tmp_dir}/{sshkeyfile_name}"
+    cred.ssh_keyfile = ssh_network_cred_data.ssh_keyfile


I'm really happy how this turned out. As long as we can assume that config contains at least one credential with valid ssh key, we can use DataProvider to generate an endless stream of different credentials using SSH key. This effectively turns DISCOVERY-284 from a blocker to wishlist item, and allows us to run entire suite sooner than I initially expected.

mirekdlugosz · 2023-04-12T08:49:21Z

camayoc/tests/qpc/api/v1/credentials/test_network_creds.py

-    sshkeyfile.touch()
+    ssh_network_cred_data = next(
+        data_provider.credentials.new(
+            match_criteria={"type": "network", "sshkeyfile": Table.is_not_null()},


This is what I meant - I'm not super happy that we make littletable a core of Camayoc API.

This is something I don't like either, to have Table exposed but I'm not seeing this as a blocker issue we can live with that.

mirekdlugosz · 2023-04-12T08:49:46Z

camayoc/tests/qpc/api/v1/scanjobs/test_pause_cancel_restart.py

@@ -65,7 +65,7 @@ def test_restart(shared_client, cleanup, source_type):
        6) Assert that the scan completes
    :expectedresults: A restarted scan completes
    """
-    scn = util.prepare_scan(source_type, cleanup)
+    scn = util.prepare_scan(source_type, data_provider)


I admit this is one use case I have missed - sometimes you want an object, but you have certain requirements about its dependencies. It's not very clear how DataProvider could fill this, especially while maintaining backwards compatibility.

Right now your only option is to ask for a first object in dependency chain that you have requirements for, and then construct the rest manually. This requires you to still call QPCModel.create() and cleanup(). You can hide this in helper function, like here, but it does feel like something DataProvider should provide.

bruno-fs · 2023-04-13T15:42:42Z

I'm taking some time to digest. sorry for the delay 😅

infinitewarp · 2023-05-05T15:38:12Z

camayoc/data_provider.py

+            model = self._create_from_definition(
+                definition=definition, new_dependencies=new_dependencies, data_only=data_only
+            )
+            yield model


Maybe I've overlooked something in this PR, but are you ever actually utilizing the iterability of these generators yet? It looks like you're currently only calling next once before discarding the generator (like in test_update_password_to_sshkeyfile), which feels like the generator is an unnecessary abstraction.

I assume you plan to add more iteration logic around this framework after it is established. Maybe seeing an example of that would help complete my understanding of how everything will work together.

If you expect a common use case is to require only one result from the generator, where iteration is not relevant or useful, I might consider adding an alternate method that is expected to act like a regular function and return only exactly one value. Maybe just some syntactic sugar to wrap next(iterator, None) and make it a bit more readable.

I was thinking about cases like tests.qpc.cli.test_credentials::test_clear_all, where you want multiple credentials/sources/scans. Initially I had separate methods for single object and multiple objects, but eventually decided it's easier to consider single object to be a special case - even if it's pretty common special case.

Since naming is hard, how would you approach this? I'm thinking:

class ModelWorker: def defined(self, match_criteria) -> ModelClass: def new(self, match_criteria) -> ModelClass: def defined_generator(self, match_criteria) -> Generator[ModelClass, None, None]: def new_generator(self, match_criteria) -> Generator[ModelClass, None, None]:

(Generator type signature is weird - TIL you can do generator.send(value))

But maybe it should be many_defined? Or defined_many? Or leave defined as generator and add single_defined / one_defined?

I agree. Naming is hard. 🙂

I don't have any super strong opinions, and I believe your style opinion is more important here since you spend much more time in this repo than I do, but I think my preference would be explicit about both singular and plural outputs like define_one, define_many, new_one, and new_many. This removes all ambiguity at a glance.

Also, yes, type hints are weird for generators. An alternate syntax that I would suggest is the simpler Iterator[ModelClass]. You can use this if you use the output only as a simple iterator and if you are not using generator-specific send and throw. Iterator is the parent ABC of Generator: https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes

mirekdlugosz · 2023-05-23T12:22:37Z

The CI is currently broken due to typing-extensions 4.6.0 (released earlier today) and pydantic: typing-extensions bug , pydantic bug. Locally you can pip install typing-extensions==4.5.0 and all unit tests will pass. I don't want to introduce any workarounds, as pydantic team is working now on releasing a fix, so this should start working again tomorrow or so.

mirekdlugosz · 2023-05-23T12:29:23Z

@quipucords/development @ruda this is now ready for review and merge. For current CI failure, see comment above.

I'm open to suggestions and comments, but at this point my plan is to merge it sooner rather than later and move all ssh key tests to data provider, so we can reap the benefits. Let the results speak for themselves.

bruno-fs

LGTM

as for the issue with dependency versions, shouldn't we lock dependencies on requirements.txt or addopt poetry?

mirekdlugosz · 2023-05-23T13:37:11Z

@bruno-fs We should adopt poetry, yes. DISCOVERY-298 is about that, sitting at low priority.

ruda

Let's go ahead and merge it.

ruda · 2023-04-14T14:31:53Z

camayoc/tests/qpc/api/v1/credentials/test_network_creds.py

-    sshkeyfile.touch()
+    ssh_network_cred_data = next(
+        data_provider.credentials.new(
+            match_criteria={"type": "network", "sshkeyfile": Table.is_not_null()},


This is something I don't like either, to have Table exposed but I'm not seeing this as a blocker issue we can live with that.

mirekdlugosz added enhancement not ready for merge For branches or PRs that should not be merged without contacting the author labels Apr 12, 2023

mirekdlugosz requested review from ruda, infinitewarp, abellotti, bruno-fs and nicolearagao April 12, 2023 08:46

mirekdlugosz commented Apr 12, 2023

View reviewed changes

infinitewarp reviewed May 5, 2023

View reviewed changes

ruda added this to the Discovery 1.3 milestone May 22, 2023

Add Data Provider

ad3ac85

mirekdlugosz force-pushed the add-data-provider branch from 6027752 to ad3ac85 Compare May 23, 2023 11:56

mirekdlugosz marked this pull request as ready for review May 23, 2023 12:23

mirekdlugosz removed the not ready for merge For branches or PRs that should not be merged without contacting the author label May 23, 2023

mirekdlugosz requested a review from infinitewarp May 23, 2023 12:23

bruno-fs approved these changes May 23, 2023

View reviewed changes

ruda approved these changes May 23, 2023

View reviewed changes

ruda merged commit 1d6c055 into master May 23, 2023
16 of 20 checks passed

ruda deleted the add-data-provider branch November 23, 2023 19:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add data provider #391

Add data provider #391

mirekdlugosz commented Apr 12, 2023

codecov bot commented Apr 12, 2023 •

edited

mirekdlugosz Apr 12, 2023

infinitewarp May 5, 2023

mirekdlugosz May 11, 2023

mirekdlugosz Apr 12, 2023

mirekdlugosz Apr 12, 2023

mirekdlugosz Apr 12, 2023

ruda Apr 14, 2023

mirekdlugosz Apr 12, 2023

bruno-fs commented Apr 13, 2023

infinitewarp May 5, 2023

mirekdlugosz May 11, 2023

infinitewarp May 11, 2023 •

edited

mirekdlugosz commented May 23, 2023

mirekdlugosz commented May 23, 2023

bruno-fs left a comment

mirekdlugosz commented May 23, 2023

ruda left a comment

ruda Apr 14, 2023


		cred.ssh_keyfile = f"/sshkeys/{tmp_dir}/{sshkeyfile_name}"
		cred.ssh_keyfile = ssh_network_cred_data.ssh_keyfile

Add data provider #391

Add data provider #391

Conversation

mirekdlugosz commented Apr 12, 2023

codecov bot commented Apr 12, 2023 • edited

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bruno-fs commented Apr 13, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

infinitewarp May 11, 2023 • edited

Choose a reason for hiding this comment

mirekdlugosz commented May 23, 2023

mirekdlugosz commented May 23, 2023

bruno-fs left a comment

Choose a reason for hiding this comment

mirekdlugosz commented May 23, 2023

ruda left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Apr 12, 2023 •

edited

infinitewarp May 11, 2023 •

edited