Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add data provider #391

Merged
merged 1 commit into from
May 23, 2023
Merged

Add data provider #391

merged 1 commit into from
May 23, 2023

Conversation

mirekdlugosz
Copy link
Collaborator

This is a proposal of new data providing facility. It's far from complete, but it may work for certain cases and I feel it's complete enough to enable discussion about something tangible.

This is fundamental framework work that - once completed - will enable us to tackle DISCOVERY-293. As it turned out, it will also enable to mostly tackle DISCOVERY-284. In other words, with this done, we will be set up to run all API and CLI tests in CI.

Commit 2f62d0f is for illustrative purposes only. I will remove it before opening PR for merge.

The basic assumption is that most of the time, tests want data that is valid, and that can be safely removed afterwards. Sometimes tests want only data (i.e. negative object creation tests), sometimes they want created data (i.e. negative object update tests). Sometimes they want shared dependencies (to save time needed to create them), sometimes they want completely new chain of dependencies (to ensure independence from other tests).

Another important point is that data should be created lazily, only when actually needed. This is large problem with current setup - scans are run automatically when running any test, because some tests might need scan data - even if we deselected them all in current pytest session.

The data provider is meant as a helper for most common use cases, not mandatory solution for all your data needs. You can still create and maintain QPCModels manually, as you did in the past. (Although I do feel that data provider should be the only one responsible for data cleanup, no matter if objects were created manually or not.)

Open questions:

  • Should dependencies be created through the same interface as object under test itself? Let's say you are running CLI test for scans. With current implementation, you ask data provider for valid data of scan, but source and credential of this potential scan are going to be created through HTTP API anyway.
  • Using littletable for object querying. Alternative projects with similar goals are asq and PyFunctional. I picked littletable because it seems to be the most active right now. But whatever we pick, I'm not super happy that it is becoming the core API camayoc is depending on. On the other hand, I am picking something precisely because I don't want to invent my own iterable querying API.

@mirekdlugosz mirekdlugosz added enhancement not ready for merge For branches or PRs that should not be merged without contacting the author labels Apr 12, 2023
@codecov
Copy link

codecov bot commented Apr 12, 2023

Codecov Report

Merging #391 (6027752) into master (eb87cc2) will increase coverage by 2.22%.
The diff coverage is n/a.

❗ Current head 6027752 differs from pull request most recent head ad3ac85. Consider uploading reports for the commit ad3ac85 to get more accurate results

@@            Coverage Diff             @@
##           master     #391      +/-   ##
==========================================
+ Coverage   80.73%   82.96%   +2.22%     
==========================================
  Files           5        6       +1     
  Lines         244      317      +73     
==========================================
+ Hits          197      263      +66     
- Misses         47       54       +7     

see 3 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

yield model

def new(self, match_criteria, new_dependencies=True, data_only=True):
matching_definitions = self._select_definitions(match_criteria)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do something when match_criteria do not match anything in configuration file?

Right now this will be an empty generator, so tests that want to handle a case when nothing matches need to do:

obj = next(data_provider.obj.new(...), None)
if not obj:
	# handle this gracefully

If test expects something to match and blindly uses generated value, it will fail with uncaught StopIteration. I think failing test is OK, but StopIteration is not very obvious failure root cause.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think explicitly checking and handling None like this is better than letting a test blow up with StopIteration, as long as the checks we add will log/raise an error with more specific details.

Alternatively, if we expect this sanity-checking to be a common pattern, we could write a context manager to wrap the generator call, and that context manager could be responsible for checking None or excepting the StopIteration and also log an error or raise a more appropriate exception. I don't necessarily love this idea because it might get ugly or not work in all cases; I'm just dumping an idea here in case it inspires a better idea.

Or the suggestion from my other comment about having a regular non-generator function/method for the use cases that don't need to iterate results.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm just going to raise something inheriting from ValueError with a message along the lines of "provided match_criteria did not match anything; check your config file".

In rare cases where it's ok-ish to not match anything, caller can just catch this exception.

Comment on lines 39 to 45
cred = Credential(cred_type="network", client=shared_client, password=uuid4())
cred.create()
# add the id to the list to destroy after the test is done
cleanup.append(cred)
cred = next(
data_provider.credentials.new(
match_criteria={"type": "network", "password": Table.is_not_null()},
data_only=False,
)
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of an exercise. We never run scan with that credential, so we don't need a valid password. However, it shows how we can save typing cred.create() and cleanup.append(cred) all over again - both can be handled by DataProvider.


cred.ssh_keyfile = f"/sshkeys/{tmp_dir}/{sshkeyfile_name}"
cred.ssh_keyfile = ssh_network_cred_data.ssh_keyfile
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really happy how this turned out. As long as we can assume that config contains at least one credential with valid ssh key, we can use DataProvider to generate an endless stream of different credentials using SSH key. This effectively turns DISCOVERY-284 from a blocker to wishlist item, and allows us to run entire suite sooner than I initially expected.

sshkeyfile.touch()
ssh_network_cred_data = next(
data_provider.credentials.new(
match_criteria={"type": "network", "sshkeyfile": Table.is_not_null()},
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I meant - I'm not super happy that we make littletable a core of Camayoc API.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I don't like either, to have Table exposed but I'm not seeing this as a blocker issue we can live with that.

@@ -65,7 +65,7 @@ def test_restart(shared_client, cleanup, source_type):
6) Assert that the scan completes
:expectedresults: A restarted scan completes
"""
scn = util.prepare_scan(source_type, cleanup)
scn = util.prepare_scan(source_type, data_provider)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit this is one use case I have missed - sometimes you want an object, but you have certain requirements about its dependencies. It's not very clear how DataProvider could fill this, especially while maintaining backwards compatibility.

Right now your only option is to ask for a first object in dependency chain that you have requirements for, and then construct the rest manually. This requires you to still call QPCModel.create() and cleanup(). You can hide this in helper function, like here, but it does feel like something DataProvider should provide.

@bruno-fs
Copy link
Contributor

I'm taking some time to digest. sorry for the delay 😅

model = self._create_from_definition(
definition=definition, new_dependencies=new_dependencies, data_only=data_only
)
yield model
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I've overlooked something in this PR, but are you ever actually utilizing the iterability of these generators yet? It looks like you're currently only calling next once before discarding the generator (like in test_update_password_to_sshkeyfile), which feels like the generator is an unnecessary abstraction.

I assume you plan to add more iteration logic around this framework after it is established. Maybe seeing an example of that would help complete my understanding of how everything will work together.

If you expect a common use case is to require only one result from the generator, where iteration is not relevant or useful, I might consider adding an alternate method that is expected to act like a regular function and return only exactly one value. Maybe just some syntactic sugar to wrap next(iterator, None) and make it a bit more readable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about cases like tests.qpc.cli.test_credentials::test_clear_all, where you want multiple credentials/sources/scans. Initially I had separate methods for single object and multiple objects, but eventually decided it's easier to consider single object to be a special case - even if it's pretty common special case.

Since naming is hard, how would you approach this? I'm thinking:

class ModelWorker:
    def defined(self, match_criteria) -> ModelClass:
    def new(self, match_criteria) -> ModelClass:
    def defined_generator(self, match_criteria) -> Generator[ModelClass, None, None]:
    def new_generator(self, match_criteria) -> Generator[ModelClass, None, None]:

(Generator type signature is weird - TIL you can do generator.send(value))

But maybe it should be many_defined? Or defined_many? Or leave defined as generator and add single_defined / one_defined?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Naming is hard. 🙂

I don't have any super strong opinions, and I believe your style opinion is more important here since you spend much more time in this repo than I do, but I think my preference would be explicit about both singular and plural outputs like define_one, define_many, new_one, and new_many. This removes all ambiguity at a glance.

Also, yes, type hints are weird for generators. An alternate syntax that I would suggest is the simpler Iterator[ModelClass]. You can use this if you use the output only as a simple iterator and if you are not using generator-specific send and throw. Iterator is the parent ABC of Generator: https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes

@ruda ruda added this to the Discovery 1.3 milestone May 22, 2023
@mirekdlugosz
Copy link
Collaborator Author

The CI is currently broken due to typing-extensions 4.6.0 (released earlier today) and pydantic: typing-extensions bug , pydantic bug. Locally you can pip install typing-extensions==4.5.0 and all unit tests will pass. I don't want to introduce any workarounds, as pydantic team is working now on releasing a fix, so this should start working again tomorrow or so.

@mirekdlugosz mirekdlugosz marked this pull request as ready for review May 23, 2023 12:23
@mirekdlugosz mirekdlugosz removed the not ready for merge For branches or PRs that should not be merged without contacting the author label May 23, 2023
@mirekdlugosz
Copy link
Collaborator Author

@quipucords/development @ruda this is now ready for review and merge. For current CI failure, see comment above.

I'm open to suggestions and comments, but at this point my plan is to merge it sooner rather than later and move all ssh key tests to data provider, so we can reap the benefits. Let the results speak for themselves.

Copy link
Contributor

@bruno-fs bruno-fs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

as for the issue with dependency versions, shouldn't we lock dependencies on requirements.txt or addopt poetry?

@mirekdlugosz
Copy link
Collaborator Author

@bruno-fs We should adopt poetry, yes. DISCOVERY-298 is about that, sitting at low priority.

Copy link
Member

@ruda ruda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go ahead and merge it.

sshkeyfile.touch()
ssh_network_cred_data = next(
data_provider.credentials.new(
match_criteria={"type": "network", "sshkeyfile": Table.is_not_null()},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I don't like either, to have Table exposed but I'm not seeing this as a blocker issue we can live with that.

@ruda ruda merged commit 1d6c055 into master May 23, 2023
16 of 20 checks passed
@ruda ruda deleted the add-data-provider branch November 23, 2023 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants