Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Merged by Bors] - serve malfeasance proof over grpc #4851

Closed

Conversation

countvonzero
Copy link
Contributor

Motivation

as title
depends on spacemeshos/api#259

Changes

  • add MeshService.MalfeasanceStream and MeshService:MalfeasanceQuery
  • add malfeasance proof to ActivationService.Get
  • add own malfeasance proof to node event update

output

{
  "proof": {
    "smesherId": {
      "id": "8JhffQsAWuCBQ8YRWrLpOj8tesNx/SKV611TDbPnXCA="
    },
    "layer": {
      "number": 8112
    },
    "kind": "MALFEASANCE_HARE",
    "debugInfo": "generate layer: 8112\nsmesher id: f0985f7d0b005ae08143c6115ab2e93a3f2d7ac371fd2295eb5d530db3e75c20\ncause: smesher published multiple hare messages in layer 8112 round 3\n1st message hash: 9c40b8789bad681d25081f85af7f9a4af87c5d70ac0637934b4e6fc935b9566c\n1st message signature: efdb895917b2303442a3b0c8f455c5cea79f8a2969af2c7f5b3bbbbfa09a77441e3d886e049f39dbf4dc0d49e78537fe3035de2d25d9d295f02c4928677ea107\n2nd message hash: d005c8ff6ece20cb87acc7ea97d5473559a3e197f9a66621505383132515a2ae\n2nd message signature: f3c014fd0f1353896222f22ee71b7fb2a10623394d4905b8ea447ac2e45350e487de04df71448cb19012b897728bb8580499b3e16a5edb546333b432a927d301\n"
  }
}

@codecov
Copy link

codecov bot commented Aug 16, 2023

Codecov Report

Merging #4851 (0c23579) into develop (e0575cb) will decrease coverage by 0.2%.
Report is 7 commits behind head on develop.
The diff coverage is 63.7%.

@@            Coverage Diff            @@
##           develop   #4851     +/-   ##
=========================================
- Coverage     76.9%   76.7%   -0.2%     
=========================================
  Files          261     262      +1     
  Lines        29689   29905    +216     
=========================================
+ Hits         22831   22965    +134     
- Misses        5406    5473     +67     
- Partials      1452    1467     +15     
Files Changed Coverage Δ
events/reporter.go 63.6% <25.0%> (-0.9%) ⬇️
datastore/store.go 67.8% <35.7%> (-2.2%) ⬇️
api/grpcserver/mesh_service.go 76.9% <56.0%> (-2.3%) ⬇️
api/grpcserver/activation_service.go 79.6% <57.8%> (-12.5%) ⬇️
events/events.go 88.0% <58.8%> (-6.0%) ⬇️
events/malfeasance.go 58.8% <58.8%> (ø)
malfeasance/handler.go 69.6% <73.6%> (-0.4%) ⬇️
common/types/malfeasance.go 49.6% <100.0%> (+13.0%) ⬆️
node/node.go 65.5% <100.0%> (+<0.1%) ⬆️

... and 7 files with indirect coverage changes

events/events.go Outdated
return result
}

func MalfeasanceInfo(smesher types.NodeID, mp *types.MalfeasanceProof) string {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@noamnelke this is the debug info provided so far. anything i should add?

api/grpcserver/activation_service.go Outdated Show resolved Hide resolved
malfeasance/handler_test.go Outdated Show resolved Hide resolved
api/grpcserver/activation_service.go Show resolved Hide resolved
api/grpcserver/mesh_service.go Outdated Show resolved Hide resolved
if sub == nil {
return status.Errorf(codes.FailedPrecondition, "event reporting is not enabled")
}
eventch, fullch := consumeEvents[events.EventMalfeasance](stream.Context(), sub)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT (and unrelated to this PR):

I don't think it makes sense to have eventch and fullch as return value in consumeEvents if the buffer in eventch runs full, the listener doesn't read from it. Why should the listener read from fullch?

The buffer is also huge -> 32768 events. So if fullch is closed the listener hasn't read events for quite some time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i see this more as a safeguard to stopgap a programming error from OOM e.g. doing something blocking when processing an event. not sure about the history here.

Copy link
Member

@fasmat fasmat Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... I am not sure if we are doing it correctly. I see 2 possible approaches:

  • Events can be dropped / there is no guarantee that every event will be sent to every consumer. This is the "safe" approach since busy consumers won't cause the producer to hang because they are not listening
  • Events cannot be dropped / every event is guaranteed to be sent to every consumer. This way a malfunctioning consumer might cause the producer to get stuck because it cannot send more events.

Both approaches can give some leeway to busy consumers by using buffered channels. Here it looks like the first approach was taken (if the consumer isn't listening eventually the producer will stop sending). Just that the consumer will also not handle the events any more that were buffered for it when it becomes ready again. Additionally his happens not immediately: the channels are read in the same select statement, so some events might still be processed before the consumer gets the message that no events will be coming any more and stops reading from eventch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other stream APIs are implemented this way as well. if you feel strongly about changing the behavior, maybe file an issue for what you want to fix?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the idea is that stream subscriber should never be able to block consensus code and we shouldn't silently drop events either. so once we see that subscriber is not reading fast enough error is returned from api and channel is unregistered

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand. I think there is an issue on the consumer side with this approach where overflows of the eventch are not handled deterministically:

Both eventch and fullch are always read from in the same select statement (not just here). If fullch is closed the cosumer might read anything between 0 to 10 more events (more than 10 become extremely unlikely) from eventch before they actually stop processing events.

api/grpcserver/mesh_service.go Show resolved Hide resolved
api/grpcserver/mesh_service_test.go Show resolved Hide resolved
api/grpcserver/mesh_service.go Show resolved Hide resolved
if sub == nil {
return status.Errorf(codes.FailedPrecondition, "event reporting is not enabled")
}
eventch, fullch := consumeEvents[events.EventMalfeasance](stream.Context(), sub)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand. I think there is an issue on the consumer side with this approach where overflows of the eventch are not handled deterministically:

Both eventch and fullch are always read from in the same select statement (not just here). If fullch is closed the cosumer might read anything between 0 to 10 more events (more than 10 become extremely unlikely) from eventch before they actually stop processing events.

@countvonzero
Copy link
Contributor Author

bors merge

bors bot pushed a commit that referenced this pull request Aug 17, 2023
## Motivation
as title
depends on spacemeshos/api#259

## Changes
- add MeshService.MalfeasanceStream and MeshService:MalfeasanceQuery
- add malfeasance proof to ActivationService.Get
- add own malfeasance proof to node event update

## output
```
{
  "proof": {
    "smesherId": {
      "id": "8JhffQsAWuCBQ8YRWrLpOj8tesNx/SKV611TDbPnXCA="
    },
    "layer": {
      "number": 8112
    },
    "kind": "MALFEASANCE_HARE",
    "debugInfo": "generate layer: 8112\nsmesher id: f0985f7d0b005ae08143c6115ab2e93a3f2d7ac371fd2295eb5d530db3e75c20\ncause: smesher published multiple hare messages in layer 8112 round 3\n1st message hash: 9c40b8789bad681d25081f85af7f9a4af87c5d70ac0637934b4e6fc935b9566c\n1st message signature: efdb895917b2303442a3b0c8f455c5cea79f8a2969af2c7f5b3bbbbfa09a77441e3d886e049f39dbf4dc0d49e78537fe3035de2d25d9d295f02c4928677ea107\n2nd message hash: d005c8ff6ece20cb87acc7ea97d5473559a3e197f9a66621505383132515a2ae\n2nd message signature: f3c014fd0f1353896222f22ee71b7fb2a10623394d4905b8ea447ac2e45350e487de04df71448cb19012b897728bb8580499b3e16a5edb546333b432a927d301\n"
  }
}
```
@bors
Copy link

bors bot commented Aug 17, 2023

This PR was included in a batch that was canceled, it will be automatically retried

bors bot pushed a commit that referenced this pull request Aug 17, 2023
## Motivation
as title
depends on spacemeshos/api#259

## Changes
- add MeshService.MalfeasanceStream and MeshService:MalfeasanceQuery
- add malfeasance proof to ActivationService.Get
- add own malfeasance proof to node event update

## output
```
{
  "proof": {
    "smesherId": {
      "id": "8JhffQsAWuCBQ8YRWrLpOj8tesNx/SKV611TDbPnXCA="
    },
    "layer": {
      "number": 8112
    },
    "kind": "MALFEASANCE_HARE",
    "debugInfo": "generate layer: 8112\nsmesher id: f0985f7d0b005ae08143c6115ab2e93a3f2d7ac371fd2295eb5d530db3e75c20\ncause: smesher published multiple hare messages in layer 8112 round 3\n1st message hash: 9c40b8789bad681d25081f85af7f9a4af87c5d70ac0637934b4e6fc935b9566c\n1st message signature: efdb895917b2303442a3b0c8f455c5cea79f8a2969af2c7f5b3bbbbfa09a77441e3d886e049f39dbf4dc0d49e78537fe3035de2d25d9d295f02c4928677ea107\n2nd message hash: d005c8ff6ece20cb87acc7ea97d5473559a3e197f9a66621505383132515a2ae\n2nd message signature: f3c014fd0f1353896222f22ee71b7fb2a10623394d4905b8ea447ac2e45350e487de04df71448cb19012b897728bb8580499b3e16a5edb546333b432a927d301\n"
  }
}
```
@bors
Copy link

bors bot commented Aug 17, 2023

Build failed (retrying...):

bors bot pushed a commit that referenced this pull request Aug 17, 2023
## Motivation
as title
depends on spacemeshos/api#259

## Changes
- add MeshService.MalfeasanceStream and MeshService:MalfeasanceQuery
- add malfeasance proof to ActivationService.Get
- add own malfeasance proof to node event update

## output
```
{
  "proof": {
    "smesherId": {
      "id": "8JhffQsAWuCBQ8YRWrLpOj8tesNx/SKV611TDbPnXCA="
    },
    "layer": {
      "number": 8112
    },
    "kind": "MALFEASANCE_HARE",
    "debugInfo": "generate layer: 8112\nsmesher id: f0985f7d0b005ae08143c6115ab2e93a3f2d7ac371fd2295eb5d530db3e75c20\ncause: smesher published multiple hare messages in layer 8112 round 3\n1st message hash: 9c40b8789bad681d25081f85af7f9a4af87c5d70ac0637934b4e6fc935b9566c\n1st message signature: efdb895917b2303442a3b0c8f455c5cea79f8a2969af2c7f5b3bbbbfa09a77441e3d886e049f39dbf4dc0d49e78537fe3035de2d25d9d295f02c4928677ea107\n2nd message hash: d005c8ff6ece20cb87acc7ea97d5473559a3e197f9a66621505383132515a2ae\n2nd message signature: f3c014fd0f1353896222f22ee71b7fb2a10623394d4905b8ea447ac2e45350e487de04df71448cb19012b897728bb8580499b3e16a5edb546333b432a927d301\n"
  }
}
```
@bors
Copy link

bors bot commented Aug 17, 2023

Build failed:

@countvonzero
Copy link
Contributor Author

bors merge

bors bot pushed a commit that referenced this pull request Aug 17, 2023
## Motivation
as title
depends on spacemeshos/api#259

## Changes
- add MeshService.MalfeasanceStream and MeshService:MalfeasanceQuery
- add malfeasance proof to ActivationService.Get
- add own malfeasance proof to node event update

## output
```
{
  "proof": {
    "smesherId": {
      "id": "8JhffQsAWuCBQ8YRWrLpOj8tesNx/SKV611TDbPnXCA="
    },
    "layer": {
      "number": 8112
    },
    "kind": "MALFEASANCE_HARE",
    "debugInfo": "generate layer: 8112\nsmesher id: f0985f7d0b005ae08143c6115ab2e93a3f2d7ac371fd2295eb5d530db3e75c20\ncause: smesher published multiple hare messages in layer 8112 round 3\n1st message hash: 9c40b8789bad681d25081f85af7f9a4af87c5d70ac0637934b4e6fc935b9566c\n1st message signature: efdb895917b2303442a3b0c8f455c5cea79f8a2969af2c7f5b3bbbbfa09a77441e3d886e049f39dbf4dc0d49e78537fe3035de2d25d9d295f02c4928677ea107\n2nd message hash: d005c8ff6ece20cb87acc7ea97d5473559a3e197f9a66621505383132515a2ae\n2nd message signature: f3c014fd0f1353896222f22ee71b7fb2a10623394d4905b8ea447ac2e45350e487de04df71448cb19012b897728bb8580499b3e16a5edb546333b432a927d301\n"
  }
}
```
@bors
Copy link

bors bot commented Aug 17, 2023

Build failed:

@countvonzero
Copy link
Contributor Author

bors merge

bors bot pushed a commit that referenced this pull request Aug 17, 2023
## Motivation
as title
depends on spacemeshos/api#259

## Changes
- add MeshService.MalfeasanceStream and MeshService:MalfeasanceQuery
- add malfeasance proof to ActivationService.Get
- add own malfeasance proof to node event update

## output
```
{
  "proof": {
    "smesherId": {
      "id": "8JhffQsAWuCBQ8YRWrLpOj8tesNx/SKV611TDbPnXCA="
    },
    "layer": {
      "number": 8112
    },
    "kind": "MALFEASANCE_HARE",
    "debugInfo": "generate layer: 8112\nsmesher id: f0985f7d0b005ae08143c6115ab2e93a3f2d7ac371fd2295eb5d530db3e75c20\ncause: smesher published multiple hare messages in layer 8112 round 3\n1st message hash: 9c40b8789bad681d25081f85af7f9a4af87c5d70ac0637934b4e6fc935b9566c\n1st message signature: efdb895917b2303442a3b0c8f455c5cea79f8a2969af2c7f5b3bbbbfa09a77441e3d886e049f39dbf4dc0d49e78537fe3035de2d25d9d295f02c4928677ea107\n2nd message hash: d005c8ff6ece20cb87acc7ea97d5473559a3e197f9a66621505383132515a2ae\n2nd message signature: f3c014fd0f1353896222f22ee71b7fb2a10623394d4905b8ea447ac2e45350e487de04df71448cb19012b897728bb8580499b3e16a5edb546333b432a927d301\n"
  }
}
```
@bors
Copy link

bors bot commented Aug 17, 2023

Build failed:

@countvonzero
Copy link
Contributor Author

bors merge

bors bot pushed a commit that referenced this pull request Aug 17, 2023
## Motivation
as title
depends on spacemeshos/api#259

## Changes
- add MeshService.MalfeasanceStream and MeshService:MalfeasanceQuery
- add malfeasance proof to ActivationService.Get
- add own malfeasance proof to node event update

## output
```
{
  "proof": {
    "smesherId": {
      "id": "8JhffQsAWuCBQ8YRWrLpOj8tesNx/SKV611TDbPnXCA="
    },
    "layer": {
      "number": 8112
    },
    "kind": "MALFEASANCE_HARE",
    "debugInfo": "generate layer: 8112\nsmesher id: f0985f7d0b005ae08143c6115ab2e93a3f2d7ac371fd2295eb5d530db3e75c20\ncause: smesher published multiple hare messages in layer 8112 round 3\n1st message hash: 9c40b8789bad681d25081f85af7f9a4af87c5d70ac0637934b4e6fc935b9566c\n1st message signature: efdb895917b2303442a3b0c8f455c5cea79f8a2969af2c7f5b3bbbbfa09a77441e3d886e049f39dbf4dc0d49e78537fe3035de2d25d9d295f02c4928677ea107\n2nd message hash: d005c8ff6ece20cb87acc7ea97d5473559a3e197f9a66621505383132515a2ae\n2nd message signature: f3c014fd0f1353896222f22ee71b7fb2a10623394d4905b8ea447ac2e45350e487de04df71448cb19012b897728bb8580499b3e16a5edb546333b432a927d301\n"
  }
}
```
@bors
Copy link

bors bot commented Aug 17, 2023

Pull request successfully merged into develop.

Build succeeded!

The publicly hosted instance of bors-ng is deprecated and will go away soon.

If you want to self-host your own instance, instructions are here.
For more help, visit the forum.

If you want to switch to GitHub's built-in merge queue, visit their help page.

@bors bors bot changed the title serve malfeasance proof over grpc [Merged by Bors] - serve malfeasance proof over grpc Aug 17, 2023
@bors bors bot closed this Aug 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants