-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add node health status into config #4758
Conversation
@@ -94,6 +94,12 @@ struct ClusterNodeInfo { | |||
} | |||
}; | |||
|
|||
enum class NodeHealth : std::uint8_t { NONE, FAIL, LOADING, ONLINE }; | |||
|
|||
struct ClusterExtendedNodeInfo : ClusterNodeInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you need this and not just add the new member to ClusterNodeInfo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because ClusterNodeInfo is used for migration
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Ok
@@ -149,21 +149,24 @@ ClusterShardInfo ClusterFamily::GetEmulatedShardInfo(ConnectionContext* cntx) co | |||
? static_cast<uint16_t>(absl::GetFlag(FLAGS_port)) | |||
: cluster_announce_port; | |||
|
|||
info.master = {.id = id_, .ip = preferred_endpoint, .port = preferred_port}; | |||
info.master = {{.id = id_, .ip = preferred_endpoint, .port = preferred_port}, NodeHealth::NONE}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not online here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure regarding online for emulated cluster so I decided to make NONE, when I will do the next step I will have better understanding what we need and it will be easy to find this place and refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
today we alway reply with online for all nodes in emulated mode, so unless we want to do any change there you should use online here
info.replicas.push_back({{.id = replica.id, | ||
.ip = replica.address, | ||
.port = static_cast<uint16_t>(replica.listening_port)}, | ||
NodeHealth::NONE}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to support this for cluster emulated mode we need here to check what is the status of the replica. i.e if its in loading state.
@romange do we want do this for emulated mode as well?
@BorysTheDev LGTM. |
auto health = json.at_or_null("health"); | ||
if (!health.is_null()) { | ||
if (!health.is_string()) { | ||
LOG(WARNING) << kInvalidConfigPrefix << "invalid health status for node " << json; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BorysTheDev Why does this only warn, but not return an error? (same below)
From the control plane, we'd rather have an error response to the cluster config command rather than silently fail (with a warning we don't monitor)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We removed the most part of the errors according to your request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this code doesn't affect the stable work of Dragonfly and isn't critical
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We removed the most part of the errors according to your request
What request?
If the control plane ever sends invalid configuration I think it's better to just fail the request, then we'll be notified and can investigate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
#4741