Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backport of FLAKEY_TEST: Add retry to outbound request for ProxyLifecycleShutdown… into release/1.4.x #4007

Conversation

hc-github-team-consul-core
Copy link
Collaborator

Backport

This PR is auto-generated from #3999 to be assessed for backporting due to the inclusion of the label backport/1.4.x.

The below text is copied from the body of the original PR.


Changes proposed in this PR

  • Test fails intermittently in CI (for some reason it only fails in cloud environment not kind clusters) see example here
  • Add retry to outbound API calls to improve chance of success.
    • if error occures, log the error and mark the current attempt as failed, trigger a retry.
    • If error does not exist, check if the output contains the specific error message. If the condition fails, log the error message and trigger a retry.

How I've tested this PR

CI should pass

How I expect reviewers to test this PR

CI should pass

Checklist


Overview of commits

jm96441n and others added 30 commits February 1, 2024 12:44
* Fix meshgw tests

* change protocol on mesh gw tests to tcp from mesh
* stub api-gateway-controller

* Add setup to v2 controller
* updated script to point at RC version correctly
* bump versions to next version

* updated script to handle new Consul-k8s images
Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>
* add status structs

* update status

* fixtures for v2

* checkpoint

* add hook to only run test when flag is enabled

* clean up reversions, delte extra files

* remove http listeners

* delete extra file

* revert accidental IDE changes

* clean up lint issues
…3537)

* Add GatewayClass[Config] watches for MeshGateway controller

* Update merge logic for deployment + service

* Add test coverage for MergeDeployment

* Add test coverage for MergeService

* Copy over owner references to new Service + Deployment
* Ensure signals are passed to commands

Change `/bin/sh -ec "<command>"` to
`/bin/sh -ec "exec <command>"`. Adding `exec` ensures that `<command>`
is not executed as a child process but replaces the `/bin/sh` process.
This ensure that `<command>` receives any signals.

Specifically this is an issue when attempting to trap SIGTERMs as part
of graceful pod shutdown. Without this change, we weren't receiving any
signals because they aren't passed down by `/bin/sh -c`.

* Fix broken bats tests and add changelog

Signed-off-by: Ashwin Venkatesh <ashwin.what@gmail.com>

---------

Signed-off-by: Ashwin Venkatesh <ashwin.what@gmail.com>
Co-authored-by: Ashwin Venkatesh <ashwin.what@gmail.com>
* Add hooks for CRUD side effects for apigateway controller

* Added tests for controller
…ateways (#3531)

* Respect connectInject.initContainer.resources for v1 API gateways

* Add changelog entry

* Add test coverage for init container resources on API gateway Pods
…h Gateway (NET-6463) (#3549)

* Add NET_BIND_SERVICE to the security context in the deployment of Mesh Gateway
…ly (#3559)

* Define GatewayClass's spec model locally instead of consuming proto from Consul

* Update gateway resources job to use new types, constants

* Make description optional, regenerate CRD definitions

* Remove GatewayClass columns related to syncing into Consul
* make controller setup for gateway controllers generic and reusable, add
indices onto gateway resources in k8s for more efficient lookups

* cleanup from PR review

* Update control-plane/controllers/resources/gateway_controller_setup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/controllers/resources/gateway_indices.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/controllers/resources/gateway_controller_setup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/controllers/resources/gateway_controller_setup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* clean up from PR review

---------

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
…ting + desired deployments are equal (#3575)

* Consider init container resources when determining if existing + desired deployments are equal

* Add test coverage for compareDeployments

* Update control-plane/api-gateway/gatekeeper/deployment_test.go
…removed (#3581)

[NET-7657] Consume version of proto-public with GatewayClass + GatewayClassConfig removed

Co-authored-by: skpratt <sarah.pratt@hashicorp.com>
* update gateway builder to be generic

* Add api gateway to gateway builder

* Updated service test for gateway listeners/ports

* update test names

* update listener functions

* remove check for listener name

* fix tests
add new make target for go mod tidy check
added linting back
Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>
* datadog-integration: updated consul-server agent telemetry-config.json with dd specific items as well as additional missing VM based options, unit tests, dd unix socket integration, dd agent acl token generation, deployment override failsafes

* datadog-integration: updated consul-server agent telemetry-config.json with dd specific items as well as additional missing VM based options, unit tests, dd unix socket integration, dd agent acl token generation | final initial-push

* changelog entry update

* datadog-integration: updated consul-server agent server.config (enable_debug) and telemetry.config update | enable_debug to server.config

* curt pr review changes (minus extraConfig templating verification changes)

* global.metrics.AgentMetrics -> global.metrics.enableAgentMetrics

* dogstatsd and otlp mutually exclusive verification checks

* breaking changes now incorporated into consul.validateExtraConfig helper template function as precheck

* extraConfig hash updates post merge conflict update

* fix helpers.tpl consul.extraConfig from merge --> /consul/tmp/extra-config/extra-from-values.json | add labels to rolebinding for datadog secrets

* update changelog .txt to match new PR number

* updated server-statefulset.yaml to correct ad.datadoghq.com/consul.logs annotation to valid single quote string

* fix helpers.tpl consul.extraConfig from merge --> /consul/tmp/extra-config/extra-from-values.json | add labels to rolebinding for datadog secrets

* fix helpers.tpl consul.extraConfig from merge --> /consul/tmp/extra-config/extra-from-values.json | add labels to rolebinding for datadog secrets

* update UDP dogstatsdPort behavior to exclude including a port value if using a kube service address (as determined by user overrides)

* update _helpers.tpl consul.ValidateDatadogConfiguration func to account for using 'https' as protocol => should fail

* update server-statefulset.yaml to exclude prometheus.io annotations if enabling datadog openmetrics method for consul server metrics scrape. conflict present with http vs https that breaks openemtrics scrape on consul

* update server-statefulset.yaml to exclude prometheus.io annotations if enabling datadog openmetrics method for consul server metrics scrape. conflict present with http vs https that breaks openemtrics scrape on consul

* correct otlp protocol helpers.tpl check to lower-case the protocol to match the open-telemetry-deployment.yaml behavior

* fix server-acl-init command_test.go for datadog token policy - datacenter should have been dc1

* add in server-statefulset bats test for extraConfig validation testing
…tewayclass and gatewayclassconfig (#3564)

* configmap update

* udpate chart to respect api-gateway-config

* fix typo

* added unit tests, added some stuff missed in initial pass

* added thorough unit tests for gateway-resources-configmap.yaml

* remove unneeded extra line

* additional debugging

* test

* test

* remove extra escapes

* final test

* test again

* one more test

* this should work

* fix spacing issue
DanStough and others added 21 commits May 2, 2024 15:35
* Consume controller-runtime v0.16.3

This is the version required by gateway-api v1.0.0, which will be consumed in a future PR

* Reconcile breaking changes in controller-runtime

* Fix linter errors

* gofmt

* Update controller tests to handle new fake client requirements

* Update test assertion to handle changes in controller-runtime

* Restore incorrectly-removed flags

* Use a proper delete on the fake client since DeletionTimestamp is immutable

* Update enterprise tests to specify status subresources

* Update controller-runtime dependency for acceptance tests

* Explicitly inject decoder into webhooks

* Appease the linter

* Use SetupWithManager pattern from controllers for webhook setup

* Consume consistent version of k8s.io/client-go everywhere

* Upgrade related dependencies for CLI, including helm/v3

* Consume latest release of helm/v3

* changelog

* Inline function calls for testing

* Consume controller-runtime v0.16.5

---------

Co-authored-by: Ronald Ekambi <ronekambi@gmail.com>
…formed (#3956)

* Check if an upstream is malformed, if so ignore it.

* support multiple upstreams separator (<space>, <comma>) add tests

* add /n as a separator

* add changelog

* added log when upstream is skipped
* service is registering

* add all the fields

* health checks working

* handle finalizers to clean up

* Add status to registration CRD

* Added initial unit test for reconcile

* success paths for registration and deregistration

* added failure tests, moved finalizer removal logic so it occurs after
service is successfully deregistered

* first test for to catalog registration type

* maximal registration to catalog test

* test all the things

* deregistration tests

* update some comments and fields, re-run generators

* Added changelog

* linting all the things

* fixing test setup for new controller runtime

* Handle errors for parsing duration
* Add readOnlyRootFilesystem to security context (#2771)

* readOnlyRootFilesystem

* Add mount for /tmp

* Add /tmp mountpoint

* Update ingress-gateways-deployment.yaml

* Update terminating-gateways-deployment.yaml

* Update helm unit tests

* Create 2781.txt

* rename changelog file

* rename changelog file

* Mount /tmp to volume for snapshots

* rename changelog

* changelog

---------

Co-authored-by: mr-miles <miles.waller@gmail.com>
Co-authored-by: Paul Glass <pglass@hashicorp.com>
Co-authored-by: Sarah Alsmiller <sarah.alsmiller@hashicorp.com>
…3974)

* activate tproxy mode even when a cluster IP is not assigned to pod.

* add changelog

* fix failing tests
* don't error if role already exists on restart

* changelog

* lint
* first pass at creating write policy for service and updating term gw acl
role

* handle deregistering, update tests for registering with acls

* existing deregister tests passing

* failures with term gw role not existing

* clean up

* reorg code

* Move to own package

* watch for terminating gateways

* move files back, handle multiple terminating gateways

* handle errors and ensure finalizer is set

* Add tests for finalizers

* remove unused file

* fix import naming

* linting

* fix comment, extract constant
* Add validating webhook for registrations

* cleaned up registration webhook setup

* fix setup for webhook, updated docs

* fix typo, remove debugging log, rename variables for readability
#3991)

* Adds ability to set the imagePullPolicy for all Consul images (consul, consul-dataplane, consul-k8s, consul-telemetry-collector)
* Add set for adding and removing services

* remove service add

* first pass at populating cache

* cache is working, need to fix how statuses are handled

* move to new directory, fix up the status conditions (still todos on this), handle results

* updated tests

* unexport methods that don't need to be exported

* handle consul deregistrations

* clean up before code review

* show ACLUpdate as false if consul deregistered service

* fix issue with updating acl status on consul deregistration

* fix linting errors
@hc-github-team-consul-core hc-github-team-consul-core force-pushed the backport/fix_flakey_ProxyLifecycleShutdownTest/mildly-blessed-starling branch from 256fe12 to 7bf6ff6 Compare May 17, 2024 17:19
@hc-github-team-consul-core hc-github-team-consul-core merged commit 705f459 into release/1.4.x May 21, 2024
26 of 50 checks passed
@hc-github-team-consul-core hc-github-team-consul-core deleted the backport/fix_flakey_ProxyLifecycleShutdownTest/mildly-blessed-starling branch May 21, 2024 19:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet