New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - serve malfeasance proof over grpc #4851
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #4851 +/- ##
=========================================
- Coverage 76.9% 76.7% -0.2%
=========================================
Files 261 262 +1
Lines 29689 29905 +216
=========================================
+ Hits 22831 22965 +134
- Misses 5406 5473 +67
- Partials 1452 1467 +15
|
events/events.go
Outdated
return result | ||
} | ||
|
||
func MalfeasanceInfo(smesher types.NodeID, mp *types.MalfeasanceProof) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@noamnelke this is the debug info provided so far. anything i should add?
if sub == nil { | ||
return status.Errorf(codes.FailedPrecondition, "event reporting is not enabled") | ||
} | ||
eventch, fullch := consumeEvents[events.EventMalfeasance](stream.Context(), sub) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT (and unrelated to this PR):
I don't think it makes sense to have eventch
and fullch
as return value in consumeEvents
if the buffer in eventch
runs full, the listener doesn't read from it. Why should the listener read from fullch
?
The buffer is also huge -> 32768 events. So if fullch
is closed the listener hasn't read events for quite some time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i see this more as a safeguard to stopgap a programming error from OOM e.g. doing something blocking when processing an event. not sure about the history here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm... I am not sure if we are doing it correctly. I see 2 possible approaches:
- Events can be dropped / there is no guarantee that every event will be sent to every consumer. This is the "safe" approach since busy consumers won't cause the producer to hang because they are not listening
- Events cannot be dropped / every event is guaranteed to be sent to every consumer. This way a malfunctioning consumer might cause the producer to get stuck because it cannot send more events.
Both approaches can give some leeway to busy consumers by using buffered channels. Here it looks like the first approach was taken (if the consumer isn't listening eventually the producer will stop sending). Just that the consumer will also not handle the events any more that were buffered for it when it becomes ready again. Additionally his happens not immediately: the channels are read in the same select statement, so some events might still be processed before the consumer gets the message that no events will be coming any more and stops reading from eventch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other stream APIs are implemented this way as well. if you feel strongly about changing the behavior, maybe file an issue for what you want to fix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the idea is that stream subscriber should never be able to block consensus code and we shouldn't silently drop events either. so once we see that subscriber is not reading fast enough error is returned from api and channel is unregistered
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand. I think there is an issue on the consumer side with this approach where overflows of the eventch
are not handled deterministically:
Both eventch
and fullch
are always read from in the same select
statement (not just here). If fullch
is closed the cosumer might read anything between 0 to 10 more events (more than 10 become extremely unlikely) from eventch
before they actually stop processing events.
if sub == nil { | ||
return status.Errorf(codes.FailedPrecondition, "event reporting is not enabled") | ||
} | ||
eventch, fullch := consumeEvents[events.EventMalfeasance](stream.Context(), sub) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand. I think there is an issue on the consumer side with this approach where overflows of the eventch
are not handled deterministically:
Both eventch
and fullch
are always read from in the same select
statement (not just here). If fullch
is closed the cosumer might read anything between 0 to 10 more events (more than 10 become extremely unlikely) from eventch
before they actually stop processing events.
bors merge |
## Motivation as title depends on spacemeshos/api#259 ## Changes - add MeshService.MalfeasanceStream and MeshService:MalfeasanceQuery - add malfeasance proof to ActivationService.Get - add own malfeasance proof to node event update ## output ``` { "proof": { "smesherId": { "id": "8JhffQsAWuCBQ8YRWrLpOj8tesNx/SKV611TDbPnXCA=" }, "layer": { "number": 8112 }, "kind": "MALFEASANCE_HARE", "debugInfo": "generate layer: 8112\nsmesher id: f0985f7d0b005ae08143c6115ab2e93a3f2d7ac371fd2295eb5d530db3e75c20\ncause: smesher published multiple hare messages in layer 8112 round 3\n1st message hash: 9c40b8789bad681d25081f85af7f9a4af87c5d70ac0637934b4e6fc935b9566c\n1st message signature: efdb895917b2303442a3b0c8f455c5cea79f8a2969af2c7f5b3bbbbfa09a77441e3d886e049f39dbf4dc0d49e78537fe3035de2d25d9d295f02c4928677ea107\n2nd message hash: d005c8ff6ece20cb87acc7ea97d5473559a3e197f9a66621505383132515a2ae\n2nd message signature: f3c014fd0f1353896222f22ee71b7fb2a10623394d4905b8ea447ac2e45350e487de04df71448cb19012b897728bb8580499b3e16a5edb546333b432a927d301\n" } } ```
This PR was included in a batch that was canceled, it will be automatically retried |
## Motivation as title depends on spacemeshos/api#259 ## Changes - add MeshService.MalfeasanceStream and MeshService:MalfeasanceQuery - add malfeasance proof to ActivationService.Get - add own malfeasance proof to node event update ## output ``` { "proof": { "smesherId": { "id": "8JhffQsAWuCBQ8YRWrLpOj8tesNx/SKV611TDbPnXCA=" }, "layer": { "number": 8112 }, "kind": "MALFEASANCE_HARE", "debugInfo": "generate layer: 8112\nsmesher id: f0985f7d0b005ae08143c6115ab2e93a3f2d7ac371fd2295eb5d530db3e75c20\ncause: smesher published multiple hare messages in layer 8112 round 3\n1st message hash: 9c40b8789bad681d25081f85af7f9a4af87c5d70ac0637934b4e6fc935b9566c\n1st message signature: efdb895917b2303442a3b0c8f455c5cea79f8a2969af2c7f5b3bbbbfa09a77441e3d886e049f39dbf4dc0d49e78537fe3035de2d25d9d295f02c4928677ea107\n2nd message hash: d005c8ff6ece20cb87acc7ea97d5473559a3e197f9a66621505383132515a2ae\n2nd message signature: f3c014fd0f1353896222f22ee71b7fb2a10623394d4905b8ea447ac2e45350e487de04df71448cb19012b897728bb8580499b3e16a5edb546333b432a927d301\n" } } ```
Build failed (retrying...): |
## Motivation as title depends on spacemeshos/api#259 ## Changes - add MeshService.MalfeasanceStream and MeshService:MalfeasanceQuery - add malfeasance proof to ActivationService.Get - add own malfeasance proof to node event update ## output ``` { "proof": { "smesherId": { "id": "8JhffQsAWuCBQ8YRWrLpOj8tesNx/SKV611TDbPnXCA=" }, "layer": { "number": 8112 }, "kind": "MALFEASANCE_HARE", "debugInfo": "generate layer: 8112\nsmesher id: f0985f7d0b005ae08143c6115ab2e93a3f2d7ac371fd2295eb5d530db3e75c20\ncause: smesher published multiple hare messages in layer 8112 round 3\n1st message hash: 9c40b8789bad681d25081f85af7f9a4af87c5d70ac0637934b4e6fc935b9566c\n1st message signature: efdb895917b2303442a3b0c8f455c5cea79f8a2969af2c7f5b3bbbbfa09a77441e3d886e049f39dbf4dc0d49e78537fe3035de2d25d9d295f02c4928677ea107\n2nd message hash: d005c8ff6ece20cb87acc7ea97d5473559a3e197f9a66621505383132515a2ae\n2nd message signature: f3c014fd0f1353896222f22ee71b7fb2a10623394d4905b8ea447ac2e45350e487de04df71448cb19012b897728bb8580499b3e16a5edb546333b432a927d301\n" } } ```
Build failed: |
bors merge |
## Motivation as title depends on spacemeshos/api#259 ## Changes - add MeshService.MalfeasanceStream and MeshService:MalfeasanceQuery - add malfeasance proof to ActivationService.Get - add own malfeasance proof to node event update ## output ``` { "proof": { "smesherId": { "id": "8JhffQsAWuCBQ8YRWrLpOj8tesNx/SKV611TDbPnXCA=" }, "layer": { "number": 8112 }, "kind": "MALFEASANCE_HARE", "debugInfo": "generate layer: 8112\nsmesher id: f0985f7d0b005ae08143c6115ab2e93a3f2d7ac371fd2295eb5d530db3e75c20\ncause: smesher published multiple hare messages in layer 8112 round 3\n1st message hash: 9c40b8789bad681d25081f85af7f9a4af87c5d70ac0637934b4e6fc935b9566c\n1st message signature: efdb895917b2303442a3b0c8f455c5cea79f8a2969af2c7f5b3bbbbfa09a77441e3d886e049f39dbf4dc0d49e78537fe3035de2d25d9d295f02c4928677ea107\n2nd message hash: d005c8ff6ece20cb87acc7ea97d5473559a3e197f9a66621505383132515a2ae\n2nd message signature: f3c014fd0f1353896222f22ee71b7fb2a10623394d4905b8ea447ac2e45350e487de04df71448cb19012b897728bb8580499b3e16a5edb546333b432a927d301\n" } } ```
Build failed: |
bors merge |
## Motivation as title depends on spacemeshos/api#259 ## Changes - add MeshService.MalfeasanceStream and MeshService:MalfeasanceQuery - add malfeasance proof to ActivationService.Get - add own malfeasance proof to node event update ## output ``` { "proof": { "smesherId": { "id": "8JhffQsAWuCBQ8YRWrLpOj8tesNx/SKV611TDbPnXCA=" }, "layer": { "number": 8112 }, "kind": "MALFEASANCE_HARE", "debugInfo": "generate layer: 8112\nsmesher id: f0985f7d0b005ae08143c6115ab2e93a3f2d7ac371fd2295eb5d530db3e75c20\ncause: smesher published multiple hare messages in layer 8112 round 3\n1st message hash: 9c40b8789bad681d25081f85af7f9a4af87c5d70ac0637934b4e6fc935b9566c\n1st message signature: efdb895917b2303442a3b0c8f455c5cea79f8a2969af2c7f5b3bbbbfa09a77441e3d886e049f39dbf4dc0d49e78537fe3035de2d25d9d295f02c4928677ea107\n2nd message hash: d005c8ff6ece20cb87acc7ea97d5473559a3e197f9a66621505383132515a2ae\n2nd message signature: f3c014fd0f1353896222f22ee71b7fb2a10623394d4905b8ea447ac2e45350e487de04df71448cb19012b897728bb8580499b3e16a5edb546333b432a927d301\n" } } ```
Build failed: |
bors merge |
## Motivation as title depends on spacemeshos/api#259 ## Changes - add MeshService.MalfeasanceStream and MeshService:MalfeasanceQuery - add malfeasance proof to ActivationService.Get - add own malfeasance proof to node event update ## output ``` { "proof": { "smesherId": { "id": "8JhffQsAWuCBQ8YRWrLpOj8tesNx/SKV611TDbPnXCA=" }, "layer": { "number": 8112 }, "kind": "MALFEASANCE_HARE", "debugInfo": "generate layer: 8112\nsmesher id: f0985f7d0b005ae08143c6115ab2e93a3f2d7ac371fd2295eb5d530db3e75c20\ncause: smesher published multiple hare messages in layer 8112 round 3\n1st message hash: 9c40b8789bad681d25081f85af7f9a4af87c5d70ac0637934b4e6fc935b9566c\n1st message signature: efdb895917b2303442a3b0c8f455c5cea79f8a2969af2c7f5b3bbbbfa09a77441e3d886e049f39dbf4dc0d49e78537fe3035de2d25d9d295f02c4928677ea107\n2nd message hash: d005c8ff6ece20cb87acc7ea97d5473559a3e197f9a66621505383132515a2ae\n2nd message signature: f3c014fd0f1353896222f22ee71b7fb2a10623394d4905b8ea447ac2e45350e487de04df71448cb19012b897728bb8580499b3e16a5edb546333b432a927d301\n" } } ```
Pull request successfully merged into develop. Build succeeded! The publicly hosted instance of bors-ng is deprecated and will go away soon. If you want to self-host your own instance, instructions are here. If you want to switch to GitHub's built-in merge queue, visit their help page. |
Motivation
as title
depends on spacemeshos/api#259
Changes
output