Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e framework: test labels #112894

Merged
merged 6 commits into from
Oct 13, 2023

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Oct 6, 2022

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

This is an implementation of the idea outlined in
https://github.com/kubernetes/enhancements/pull/3042/files#r988729747.

It becomes possible to select tests in different ways:

    ginkgo -v -focus="should respect internalTrafficPolicy.*\[FeatureGate:ServiceInternalTrafficPolicy\]"

    ginkgo -v -label-filter="FeatureGate:ServiceInternalTrafficPolicy"

    ginkgo -v -label-filter="Beta"

When the test runs, ginkgo shows it as:

    [It] should respect internalTrafficPolicy=Local Pod to Pod [FeatureGate:ServiceInternalTrafficPolicy] [Beta] [FeatureGate:ServiceInternalTrafficPolicy, Beta]

The test name and the labels at the end are in different colors. Embedding the
annotations inside the text is redundant and only done because users of the e2e
suite might expect it. Also, our tooling that consumes test results currently
doesn't know about ginkgo labels.

Notes for reviewers

Please look at this commit-by-commit.

For an overall impression on how to use the new APIs, see test/e2e/framework/internal/unittests/bugs/bugs.go.

Does this PR introduce a user-facing change?

NONE

@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Oct 6, 2022
@k8s-ci-robot k8s-ci-robot added area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/test sig/network Categorizes an issue or PR as relevant to SIG Network. sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Oct 6, 2022
@@ -31,3 +37,114 @@ func AnnotatedLocation(annotation string) types.CodeLocation {
codeLocation = types.NewCustomCodeLocation(annotation + " | " + codeLocation.String())
return codeLocation
}

// It is a wrapper around ginkgo.It which supports [WithFeatureGate],
// [WithFeature], [WithEnvironment], [WithConformance], [WithNodeConformance]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether these functions should have the With prefix is open for debate. A common prefix groups them together, but also makes the invocation longer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aojea: any preference from your side here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the prefix With

@@ -2517,7 +2518,7 @@ var _ = common.SIGDescribe("Services", func() {
}
})

ginkgo.It("should respect internalTrafficPolicy=Local Pod to Pod [Feature:ServiceInternalTrafficPolicy]", func() {
framework.It("should respect internalTrafficPolicy=Local Pod to Pod", framework.WithFeatureGate(features.ServiceInternalTrafficPolicy), func() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also accept "ServiceInternalTrafficPolicy" as string here. But then we loose the compile-time verification that the string is actually a valid feature gate. It would still get checked at runtime whenever the suite gets invokved.

The advantages would be:

  • no need for the extra package import
  • shorter
  • no need to change anything here when a feature graduates and (later) gets removed from k8s.io/kubernetes/pkg/features (but see the KEP discussion how the test should get annotated for a stable feature)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to maintain the link to the code, to avoid drifting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's also my preference. Using the pre-defined string constant also has the advantage that tools like gopls can reliably find tests that reference it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I extended the PR with sets of valid features and environments. The framework itself doesn't register any of those, this will have to be added elsewhere (test/e2e/environments and test/e2e/features?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added pre-defined features and node features in their own shared packages (test/e2e/feature, test/e2e/nodefeature). The names were chosen so that using these constants reads well. Tests are not allowed to add their own features or node features (registry gets frozen after those pre-defined ones).

There is nothing for environments yet because that's currently unused.

We have to allow arbitrary labels - there are simply too many of those and they might be useful. Those arbitrary labels don't get registered. That leads to a loophole: framework.WithLabel("Feature:SomeCustomFeature" is currently valid, but should be feature.SomeCustomFeature with a corresponding entry in the shared package. I think we can check those custom labels and complain when one of the other functions should be used instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need a way to verify that tests are registered properly before we can merge this. There are some existing verify scripts that need to be updated. My plan is to do the verification as much as possible inside the E2E suite itself:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've completely revamped how issues discovered during test registration are handled. Each E2E suite now records such problems, then at the end does a sanity check that there were no issues. This works for -dry-run and -focus=some tests, so it'll be impossible to miss.

A verify script also checks this as part of pull-kubernetes-verify.


// WithFeatureGate adds a certain feature gate as label to a [framework.It].
// The feature gate must be listed in
// [k8s.io/apiserver/pkg/util/feature.DefaultMutableFeatureGate].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will solve the problem of feature promotion, that is common to left e2e tests behind and not update them ... I think is something we should do

@aojea
Copy link
Member

aojea commented Oct 7, 2022

/assign @spiffxp @aojea

// the empty string.
var level string
if spec.PreRelease == "" {
level = "Stable"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my vote is about not tagging stables at all, as they must be part of the core system I don't see the value in the feature gate tags

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I added it only because kubernetes/enhancements#3042 has it. The actual usage is not clear to me yet.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is for those rare cases when feature gate is GA, but not locked to default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a GA feature that can still be disabled? I can see why continuing to list such a feature as test tag may be useful.

But how does marking such a feature as "Stable" help? In other words, how would the "Stable" tag be used? If it's not used, we don't need it...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not attached to the tag name. Just thought we need something so we can find jobs that set the FeatureGate, but haven't set it's stage

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we still need to clarify what's supposed to happen when a feature graduates to GA and then later gets removed from the set of defined features.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my vote is about not tagging stables at all

Agree, I feel like this could imply [Stable] should be added to everything that is currently not [Alpha], [Beta] or [Deprecated] and I don't see the value in it. Also featuregate.GA is already "" so we're keeping parity in some sense.

this is for those rare cases when feature gate is GA, but not locked to default.

Some examples to motivate discussion:

ExecProbeTimeout: {Default: true, PreRelease: featuregate.GA}, // lock to default and remove after v1.22 based on KEP #1972 update
ExpandCSIVolumes: {Default: true, PreRelease: featuregate.GA}, // remove in 1.26
ExpandInUsePersistentVolumes: {Default: true, PreRelease: featuregate.GA}, // remove in 1.26
ExpandPersistentVolumes: {Default: true, PreRelease: featuregate.GA}, // remove in 1.26

Just thought we need something so we can find jobs that set the FeatureGate, but haven't set it's stage

regex or label equivalent to +FeatureGate:.* -Alpha -Beta -Deprecated but again I'm not sure why we would want or need this.

If we set the convention that everything uses WithFeatureGate which auto-derives alpha/beta/deprecated I'm not sure this matters so much?

we still need to clarify what's supposed to happen when a feature graduates to GA and then later gets removed from the set of defined features

Same thing, no stable/GA label? I think you're talking Feature, not FeatureGate here, and I'm not entirely sure why the difference matters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was talking FeatureGate and how it transitions (or doesn't?) to Feature.

Let's look at a fictional test that depends on FeatureGate foo. While that is in alpha/beta, we annotate the test with [FeatureGate: foo] [alpha/beta], with '[alpha/beta]' derived from the definition in kube_features.go.

Now it graduates to GA and remains in kube_features.go, either because it isn't locked to its default yet or because all features are kept around a bit longer to avoid breaking code that uses --feature-gates foo=true. During that time we annotate the test with [FeatureGate: foo] and nothing else.

The last step is removal from kube_features.go. Now we remove [FeatureGate: foo]. We have to do something, because it won't compile anymore.

Or do we turn it into [Feature: foo]? In all cases or only where it matters? It matters if the feature, even though GA, is not available everywhere.

For features that are available everywhere, adding [Feature: foo] has the advantage that it becomes easy to run all tests related to a certain feature. The downside is that many jobs currently exclude tests marked with [Feature: <something>] and would therefore exclude the feature. I'm undecided.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should not create this relation Feature <---> FeatureGate, those are independent things that some times be related, a FeatureGate doesn't have to be related to a Feature necessarily.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So once a feature gate gets removed from kube_features.go, we also remove the test annotation. Works for me.

// TODO: Describe and Context wrappers.

func WithFeature(name string) withFeature {
// TODO: verify that the feature is known.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aojea: do you agree that WithFeature and WithEnvironment should only accept "known" values?

We would have to create a registry similar to FeatureGate and add all values that are currently used in-tree to it by default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really think we it should be typed, this avoids drifting and "forces" the discussion when adding new types, @spiffxp @BenTheElder @SergeyKanzhelev WDYT, this is related to the KEP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 12, 2022
@dims
Copy link
Member

dims commented Dec 12, 2022

If you still need this PR then please rebase, if not, please close the PR

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 12, 2022
@pohly pohly mentioned this pull request Dec 12, 2022
@aojea
Copy link
Member

aojea commented Oct 9, 2023

you have some error with the vendor folder

+func withCancelCause(parent context.Context) (context.Context, func(error)) {

  • return context.WithCancelCause(parent)

This LGTM from a quick look, but others should check

/assign @SergeyKanzhelev @BenTheElder

@pohly
Copy link
Contributor Author

pohly commented Oct 10, 2023

you have some error with the vendor folder

Let's merge the gingko+gomega update separately in #121082, then I can remove that part from this PR.

Always printing "Enabling in-tree volume drivers" whenever the E2E suite is
initializing doesn't provide any useful information and makes output of the
upcoming -list-tests look weird.
-list-tests is a more concise alternative for `ginkgo --dry-run` with one line
per test. In contrast to `--dry-run`, it really lists all tests. `--dry-run`
without additional parameters uses the default skip expression from the E2E
context, which filters out flaky and feature-gated tests. The output includes
the source code location where each test is defined. It is sorted by test
name (not source code location) because that order is independent of
reorganizing the source code and ordering by location can be achieved with
"sort".

-list-labels has no corresponding feature in Ginkgo.

One possible usage is to figure out what values might make sense for
-focus/skip/label-filter.

Unit tests will follow in a future commit.
@pohly
Copy link
Contributor Author

pohly commented Oct 10, 2023

/hold cancel

Ginkgo update was merged separately and isn't in this PR anymore.

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Oct 10, 2023
If something goes wrong during the test registration phase, the only solution
so far was to panic. This is not user-friendly and only allows to report one
problem at a time.

If initialization can continue, then a better solution is to record a bug,
continue, and then report all bugs together.

This also works when just listing tests. The new verify-e2e-suites.sh uses that
to check all test suites (identified as "packages that call
framework.AfterReadingAllFlags", with some exceptions) as part of
pull-kubernetes-verify.

Example output for a fake

    framework.RecordBug(framework.NewBug("fake bug during SIGDescribe", 0))

in test/e2e/storage/volume_metrics.go:
```
$ hack/verify-e2e-suites.sh
go version go1.21.1 linux/amd64
ERROR: E2E test suite invocation failed for test/e2e.
   ERROR: E2E suite initialization was faulty, these errors must be fixed:
   ERROR: test/e2e/storage/volume_metrics.go:49: fake bug during SIGDescribe
E2E suite test/e2e_kubeadm passed.
E2E suite test/e2e_node passed.
```
These wrapper functions set labels in addition to injecting the annotation into
the test text. It then becomes possible to select tests in different ways:

    ginkgo -v --focus="should respect internalTrafficPolicy.*\[FeatureGate:ServiceInternalTrafficPolicy\]"

    ginkgo -v --label-filter="FeatureGate:ServiceInternalTrafficPolicy"

    ginkgo -v --label-filter="Beta"

When a test runs, ginkgo shows it as:

    [It] should respect internalTrafficPolicy=Local Pod to Pod [FeatureGate:ServiceInternalTrafficPolicy] [Beta] [FeatureGate:ServiceInternalTrafficPolicy, Beta]

The test name and the labels at the end are in different colors. Embedding the
annotations inside the text is redundant and only done because users of the e2e
suite might expect it. Also, our tooling that consumes test results currently
doesn't know about ginkgo labels.

Environments, features and node features as described by
https://github.com/kubernetes/enhancements/tree/master/keps/sig-testing/3041-node-conformance-and-features
are also supported.

The framework and thus (at the moment) test/e2e do not have any pre-defined
environments and features. Adding those and modifying tests will follow in
a separate commit.
framework.SIGDescribe is better because:
- Ginkgo uses the source code location of the test, not of the wrapper,
  when reporting progress.
- Additional annotations can be passed.

To make this a drop-in replacement, framework.SIGDescribe generates a function
that can be used instead of the former SIGDescribe functions.

windows.SIGDescribe contained some additional code to ensure that tests are
skipped when not running with a suitable node OS. This gets moved into a
separate wrapper generator, to allow using framework.SIGDescribe as intended.
To ensure that all callers were modified, the windows.sigDescribe isn't
exported anymore (wasn't necessary in the first place!).
The list is based on the -list-tests output.
@pohly
Copy link
Contributor Author

pohly commented Oct 10, 2023

/retest

@akhilerm
Copy link
Member

Retriggering the below presubmits as there has been a GCE image change in the jobs in kubernetes/test-infra#31008

pull-kubernetes-e2e-gci-gce-ingress
pull-kubernetes-e2e-ubuntu-gce-network-policies
pull-kubernetes-e2e-gci-gce-ipvs

@akhilerm
Copy link
Member

/test pull-kubernetes-e2e-gci-gce-ingress
/test pull-kubernetes-e2e-ubuntu-gce-network-policies
/test pull-kubernetes-e2e-gci-gce-ipvs

@aojea
Copy link
Member

aojea commented Oct 12, 2023

/lgtm
/approve

Thanks @pohly for the multiple iterations ,

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 12, 2023
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: cffbf8d02759d8885c69ad05fe4e87662a4f2ea6

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aojea, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@aojea
Copy link
Member

aojea commented Oct 12, 2023

@pohly should we announce this to the wide community for awareness?

@k8s-ci-robot
Copy link
Contributor

k8s-ci-robot commented Oct 12, 2023

@pohly: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-conformance-image-test 2f4be302428f61c79778654a666cb282c6de591a link false /test pull-kubernetes-conformance-image-test
pull-kubernetes-node-e2e-crio-dra af0336b3864e73b0ad01ee42125501bac724de3c link false /test pull-kubernetes-node-e2e-crio-dra
pull-kubernetes-node-e2e-containerd-1-7-dra af0336b3864e73b0ad01ee42125501bac724de3c link false /test pull-kubernetes-node-e2e-containerd-1-7-dra
pull-kubernetes-e2e-kind-kms af0336b3864e73b0ad01ee42125501bac724de3c link false /test pull-kubernetes-e2e-kind-kms
pull-kubernetes-kind-dra af0336b3864e73b0ad01ee42125501bac724de3c link false /test pull-kubernetes-kind-dra

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@k8s-ci-robot k8s-ci-robot merged commit 4c8fca2 into kubernetes:master Oct 13, 2023
24 checks passed
SIG Node CI/Test Board automation moved this from PRs Waiting on Author to Done Oct 13, 2023
SIG Node PR Triage automation moved this from Waiting on Author to Done Oct 13, 2023
@k8s-ci-robot k8s-ci-robot added this to the v1.29 milestone Oct 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/apiserver area/cloudprovider area/code-generation area/conformance Issues or PRs related to kubernetes conformance tests area/dependency Issues or PRs related to dependency changes area/e2e-test-framework Issues or PRs related to refactoring the kubernetes e2e test framework area/kube-proxy area/kubeadm area/kubectl area/network-policy Issues or PRs related to Network Policy subproject area/test cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lgtm "Looks good to me", indicates that a PR is ready to be merged. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. release-note-none Denotes a PR that doesn't merit a release note. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/architecture Categorizes an issue or PR as relevant to SIG Architecture. sig/auth Categorizes an issue or PR as relevant to SIG Auth. sig/autoscaling Categorizes an issue or PR as relevant to SIG Autoscaling. sig/cli Categorizes an issue or PR as relevant to SIG CLI. sig/cloud-provider Categorizes an issue or PR as relevant to SIG Cloud Provider. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. sig/network Categorizes an issue or PR as relevant to SIG Network. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. sig/storage Categorizes an issue or PR as relevant to SIG Storage. sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Archived in project
Archived in project
Archived in project
Archived in project
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet