Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - fix flaky test: TestSpacemeshApp_NodeService #4728

Closed
wants to merge 9 commits into from

Conversation

fasmat
Copy link
Member

@fasmat fasmat commented Jul 19, 2023

Motivation

Closes: #4729

Changes

  • The app during initialization for set the logging level to zapcore.InfoLevel which overwrites what ever logging configuration is passed in via WithLog. This was fixed, the app still starts up with zapcore.InfoLevel by default.
  • There has been an issue with logging in tests because the logging middleware for GRPC wasn't unset when the test ended which could result in components logging after the test ended and thereby failing a test. This is now done in app.stopServices (since it is set up in app.startServices)
  • After making logs visible the issue was identified: the node is shut down while it verifies its own post proof. This only happens on the macos-latest runner because other runners might not have completed their initial post proof yet or are already done verifying it by that time.
    • Changed verifying to only emit and log errors when an error other than context.Cancelled is returned.
  • These changes allowed me to run the test 100 times in a row without it failing a single time:
go test -v -timeout 150s -run ^TestSpacemeshApp_NodeService$ github.com/spacemeshos/go-spacemesh/node -count 100

passes without a single iteration of the test failing.

Test Plan

n/a

TODO

  • Explain motivation or link existing issue(s)
  • Test changes and document test plan
  • Update documentation as needed

DevOps Notes

  • This PR does not require configuration changes (e.g., environment variables, GitHub secrets, VM resources)
  • This PR does not affect public APIs
  • This PR does not rely on a new version of external services (PoET, elasticsearch, etc.)
  • This PR does not make changes to log messages (which monitoring infrastructure may rely on)

@fasmat fasmat self-assigned this Jul 19, 2023
@codecov
Copy link

codecov bot commented Jul 19, 2023

Codecov Report

Merging #4728 (3cefb86) into develop (68d196b) will increase coverage by 0.2%.
The diff coverage is 58.8%.

@@            Coverage Diff            @@
##           develop   #4728     +/-   ##
=========================================
+ Coverage     77.0%   77.2%   +0.2%     
=========================================
  Files          255     255             
  Lines        28846   28855      +9     
=========================================
+ Hits         22213   22286     +73     
+ Misses        5250    5184     -66     
- Partials      1383    1385      +2     
Impacted Files Coverage Δ
activation/activation.go 77.8% <55.5%> (-0.2%) ⬇️
node/node.go 65.5% <62.5%> (+0.2%) ⬆️

... and 6 files with indirect coverage changes

@fasmat fasmat marked this pull request as ready for review July 19, 2023 14:17
@fasmat
Copy link
Member Author

fasmat commented Jul 19, 2023

bors try

bors bot added a commit that referenced this pull request Jul 19, 2023
node/bad_peer_test.go Show resolved Hide resolved
@bors
Copy link

bors bot commented Jul 19, 2023

try

Build failed:

@fasmat fasmat changed the title flaky test: ongoing investigation, do not merge fix flaky test: TestSpacemeshApp_NodeService Jul 19, 2023
@fasmat
Copy link
Member Author

fasmat commented Jul 19, 2023

bors merge

bors bot pushed a commit that referenced this pull request Jul 19, 2023
## Motivation
Closes: #4729

## Changes
- The app during initialization for set the logging level to `zapcore.InfoLevel` which overwrites what ever logging configuration is passed in via `WithLog`. This was fixed, the app still starts up with `zapcore.InfoLevel` by default.
- There has been an issue with logging in tests because the logging middleware for GRPC wasn't unset when the test ended which could result in components logging after the test ended and thereby failing a test. This is now done in `app.stopServices` (since it is set up in `app.startServices`)
- After making logs visible the issue was identified: the node is shut down while it verifies its own post proof. This only happens on the `macos-latest` runner because other runners might not have completed their initial post proof yet or are already done verifying it by that time.
   - Changed verifying to only emit and log errors when an error other than `context.Cancelled` is returned.
- These changes allowed me to run the test 100 times in a row without it failing a single time:

```
go test -v -timeout 150s -run ^TestSpacemeshApp_NodeService$ github.com/spacemeshos/go-spacemesh/node -count 100
```

passes without a single iteration of the test failing.

## Test Plan
n/a

## TODO
<!-- This section should be removed when all items are complete -->
- [x] Explain motivation or link existing issue(s)
- [x] Test changes and document test plan
- [x] Update documentation as needed

## DevOps Notes
<!-- Please uncheck these items as applicable to make DevOps aware of changes that may affect releases -->
- [x] This PR does not require configuration changes (e.g., environment variables, GitHub secrets, VM resources)
- [x] This PR does not affect public APIs
- [x] This PR does not rely on a new version of external services (PoET, elasticsearch, etc.)
- [x] This PR does not make changes to log messages (which monitoring infrastructure may rely on)
@bors
Copy link

bors bot commented Jul 19, 2023

Build failed:

@fasmat
Copy link
Member Author

fasmat commented Jul 19, 2023

bors merge

bors bot pushed a commit that referenced this pull request Jul 19, 2023
## Motivation
Closes: #4729

## Changes
- The app during initialization for set the logging level to `zapcore.InfoLevel` which overwrites what ever logging configuration is passed in via `WithLog`. This was fixed, the app still starts up with `zapcore.InfoLevel` by default.
- There has been an issue with logging in tests because the logging middleware for GRPC wasn't unset when the test ended which could result in components logging after the test ended and thereby failing a test. This is now done in `app.stopServices` (since it is set up in `app.startServices`)
- After making logs visible the issue was identified: the node is shut down while it verifies its own post proof. This only happens on the `macos-latest` runner because other runners might not have completed their initial post proof yet or are already done verifying it by that time.
   - Changed verifying to only emit and log errors when an error other than `context.Cancelled` is returned.
- These changes allowed me to run the test 100 times in a row without it failing a single time:

```
go test -v -timeout 150s -run ^TestSpacemeshApp_NodeService$ github.com/spacemeshos/go-spacemesh/node -count 100
```

passes without a single iteration of the test failing.

## Test Plan
n/a

## TODO
<!-- This section should be removed when all items are complete -->
- [x] Explain motivation or link existing issue(s)
- [x] Test changes and document test plan
- [x] Update documentation as needed

## DevOps Notes
<!-- Please uncheck these items as applicable to make DevOps aware of changes that may affect releases -->
- [x] This PR does not require configuration changes (e.g., environment variables, GitHub secrets, VM resources)
- [x] This PR does not affect public APIs
- [x] This PR does not rely on a new version of external services (PoET, elasticsearch, etc.)
- [x] This PR does not make changes to log messages (which monitoring infrastructure may rely on)
@bors
Copy link

bors bot commented Jul 19, 2023

Build failed:

  • ci-status

node/node_test.go Outdated Show resolved Hide resolved
@fasmat
Copy link
Member Author

fasmat commented Jul 20, 2023

bors merge

bors bot pushed a commit that referenced this pull request Jul 20, 2023
## Motivation
Closes: #4729

## Changes
- The app during initialization for set the logging level to `zapcore.InfoLevel` which overwrites what ever logging configuration is passed in via `WithLog`. This was fixed, the app still starts up with `zapcore.InfoLevel` by default.
- There has been an issue with logging in tests because the logging middleware for GRPC wasn't unset when the test ended which could result in components logging after the test ended and thereby failing a test. This is now done in `app.stopServices` (since it is set up in `app.startServices`)
- After making logs visible the issue was identified: the node is shut down while it verifies its own post proof. This only happens on the `macos-latest` runner because other runners might not have completed their initial post proof yet or are already done verifying it by that time.
   - Changed verifying to only emit and log errors when an error other than `context.Cancelled` is returned.
- These changes allowed me to run the test 100 times in a row without it failing a single time:

```
go test -v -timeout 150s -run ^TestSpacemeshApp_NodeService$ github.com/spacemeshos/go-spacemesh/node -count 100
```

passes without a single iteration of the test failing.

## Test Plan
n/a

## TODO
<!-- This section should be removed when all items are complete -->
- [x] Explain motivation or link existing issue(s)
- [x] Test changes and document test plan
- [x] Update documentation as needed

## DevOps Notes
<!-- Please uncheck these items as applicable to make DevOps aware of changes that may affect releases -->
- [x] This PR does not require configuration changes (e.g., environment variables, GitHub secrets, VM resources)
- [x] This PR does not affect public APIs
- [x] This PR does not rely on a new version of external services (PoET, elasticsearch, etc.)
- [x] This PR does not make changes to log messages (which monitoring infrastructure may rely on)
@bors
Copy link

bors bot commented Jul 20, 2023

Pull request successfully merged into develop.

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot changed the title fix flaky test: TestSpacemeshApp_NodeService [Merged by Bors] - fix flaky test: TestSpacemeshApp_NodeService Jul 20, 2023
@bors bors bot closed this Jul 20, 2023
@bors bors bot deleted the investigate-failing-test branch July 20, 2023 08:47
dshulyak added a commit to dshulyak/go-spacemesh that referenced this pull request Jul 21, 2023
@fasmat fasmat mentioned this pull request Jul 21, 2023
7 tasks
bors bot pushed a commit that referenced this pull request Jul 21, 2023
## Motivation
#4728 broke debug logging, this reverts part of the PR to make it possible to log debug messages again when enabled via config.

## Changes
Revert some changes from #4728 

## Test Plan
n/a

## TODO
<!-- This section should be removed when all items are complete -->
- [x] Explain motivation or link existing issue(s)
- [x] Test changes and document test plan
- [x] Update documentation as needed

## DevOps Notes
<!-- Please uncheck these items as applicable to make DevOps aware of changes that may affect releases -->
- [x] This PR does not require configuration changes (e.g., environment variables, GitHub secrets, VM resources)
- [x] This PR does not affect public APIs
- [x] This PR does not rely on a new version of external services (PoET, elasticsearch, etc.)
- [x] This PR does not make changes to log messages (which monitoring infrastructure may rely on)
bors bot pushed a commit that referenced this pull request Jul 21, 2023
## Motivation
#4728 broke debug logging, this reverts part of the PR to make it possible to log debug messages again when enabled via config.

## Changes
Revert some changes from #4728 

## Test Plan
n/a

## TODO
<!-- This section should be removed when all items are complete -->
- [x] Explain motivation or link existing issue(s)
- [x] Test changes and document test plan
- [x] Update documentation as needed

## DevOps Notes
<!-- Please uncheck these items as applicable to make DevOps aware of changes that may affect releases -->
- [x] This PR does not require configuration changes (e.g., environment variables, GitHub secrets, VM resources)
- [x] This PR does not affect public APIs
- [x] This PR does not rely on a new version of external services (PoET, elasticsearch, etc.)
- [x] This PR does not make changes to log messages (which monitoring infrastructure may rely on)
dshulyak added a commit to dshulyak/go-spacemesh that referenced this pull request Jul 21, 2023
dshulyak added a commit to dshulyak/go-spacemesh that referenced this pull request Jul 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky TestSpacemeshApp_NodeService on macos-latest runner in CI
3 participants