Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EventEngine] PosixEventEngine DNS Resolver #32701

Merged
merged 190 commits into from
Jul 21, 2023
Merged

[EventEngine] PosixEventEngine DNS Resolver #32701

merged 190 commits into from
Jul 21, 2023

Conversation

yijiem
Copy link
Member

@yijiem yijiem commented Mar 24, 2023

This PR implements a c-ares based DNS resolver for EventEngine with the reference from the original grpc_ares_wrapper.h. The PosixEventEngine DNSResolver is implemented on top of that. Tests which use the client channel resolver API (resolver.h) are ported, namely the resolver_component_test.cc and the cancel_ares_query_test.cc. The WindowsEventEngine DNSResolver will use the same EventEngine's grpc_ares_wrapper and will be worked on next.

The resolve_address_test.cc which uses the iomgr DNSResolver API has been ported to EventEngine's dns_test.cc. That leaves only 2 tests which use iomgr's API, notably the dns_resolver_cooldown_test.cc and the goaway_server_test.cc which probably need to be restructured to use EventEngine DNSResolver (for one thing they override the original grpc_ares_wrapper's free functions). I will try to tackle these in the next step.

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really good! Remaining comments are mostly minor.

Please let me know if you have any questions. Thanks!

src/core/lib/event_engine/posix_engine/posix_engine.cc Outdated Show resolved Hide resolved
src/core/lib/event_engine/ares_resolver.h Outdated Show resolved Hide resolved
src/core/lib/event_engine/ares_resolver.cc Show resolved Hide resolved
src/core/lib/event_engine/ares_resolver.cc Outdated Show resolved Hide resolved
src/core/lib/event_engine/ares_resolver.cc Show resolved Hide resolved
test/core/event_engine/test_suite/tests/dns_test.cc Outdated Show resolved Hide resolved
test/core/event_engine/test_suite/tests/dns_test.cc Outdated Show resolved Hide resolved
kHealthCheckRecordName},
" ");
// TODO(yijiem): build a portable solution for Windows
FILE* f = popen(command.c_str(), "r");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we might still be duplicating some stuff unnecessarily here, but I might be wrong, and I don't have time to really dig in to understand the details. I'm going to defer to @apolcyn for review of the tests here, with particular focus on making sure that we are not needlessly reinventing any wheels.

Copy link
Member

@drfloob drfloob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I found just a few small things to address.

#include "src/core/lib/iomgr/sockaddr.h"
#ifdef GRPC_POSIX_SOCKET_ARES_EV_DRIVER
#include "src/core/lib/event_engine/posix_engine/tcp_socket_utils.h"
#elif defined(GRPC_WINDOWS_SOCKET_ARES_EV_DRIVER)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe add a // TODO(yijiem) here for the next step, so the empty block doesn't look like a mistake.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the #elif here doesn't really make sense. We would include something here for Windows when porting the Windows' version of IsIpv6LoopbackAvailable() from iomgr to EE. But there is no need to reserve it here now.

namespace grpc_event_engine {
namespace experimental {

grpc_core::TraceFlag grpc_trace_ares_resolver(false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: WDYT about just reusing cares_resolver for the trace flag now that we're landing it in production? Since one implementation will be replaced by the other, and no process will be running both ares versions concurrently, it might make sense to not introduce a separate trace flag.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Done.

// Don't query for TXT records if the target is "localhost"
if (absl::EqualsIgnoreCase(host, "localhost")) {
event_engine_->Run([callback = std::move(callback)]() mutable {
callback(std::vector<std::string>());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this just be callback({});?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fd_node_list_.begin(), fd_node_list_.end(),
[sock = socks[i]](const auto& node) { return node->as == sock; });
if (iter == fd_node_list_.end()) {
new_list.emplace_back(std::make_unique<FdNode>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since this is an std::list<std::unique_ptr<FdNode>>, you don't need emplace here, push_bach is preferred. I think we have linting for that internally, so you may get that same recommendation on import.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

BTW love the push_bach since JSBach is my favorite musician :).

@@ -487,9 +490,62 @@ EventEngine::TaskHandle PosixEventEngine::RunAfterInternal(
return handle;
}

#if GRPC_ARES == 1 && defined(GRPC_POSIX_SOCKET_TCP)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for the future - something you already know, but worth commenting - that this engine is expected to run on platforms that don't have Ares enabled (Android is the only supported platform I'm aware of anymore if iOS works with Ares, but possibly other platforms will use it well). Before the EE dns functionality can be enabled generally, a basic A/AAAA-only native resolver needs to be implemented as well.

It's probably worth having a TODO for that here or below next to the Crash("unimplemented") bit. If I missed a TODO for this somewhere already, apologies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Added a TODO below next to the Crash("unimplemented").

@@ -138,14 +140,23 @@ class PosixEventEngine final : public PosixEventEngineWithFdSupport,
public:
class PosixDNSResolver : public EventEngine::DNSResolver {
public:
~PosixDNSResolver() override;
#if GRPC_ARES == 1 && defined(GRPC_POSIX_SOCKET_TCP)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On platforms that don't support ares, this class will have an auto-generated public default constructor. Making a private, no-op default constructor, either in non-ares builds or just always having it, would prevent this from being instantiated on platforms that don't support ares. It's a bit of defensive programming, but I think it conveys the intention a bit better at compile time.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I explicitly deleted the default constructor which should convey the same meaning here.

@HannahShiSFB
Copy link
Collaborator

HannahShiSFB commented Jul 3, 2023

@drfloob @apolcyn @markdroth, continue the discussion on the concerns in using c-ares on iOS:

I'm drafting an implementation with DNSService API (#33233).

As I mentioned earlier, this API doesn't support specifying custom DNS servers but can only use which ever servers configured in the system. I took a look at the iomgr implementation, looks like name server is not actually used.

So I wonder what's the use case of specifying a different dns server in event engine? is it OK for iOS I always use the system DNS?

The other thing that's not supported is using a custom timeout, given the new DNS API removed the deadline parameter, I assume it's OK to rely on system default timeout (5s in my test)?

CC: @sampajano

@drfloob
Copy link
Member

drfloob commented Jul 10, 2023

Sorry for the delay, I just returned from vacation.

So I wonder what's the use case of specifying a different dns server in event engine? is it OK for iOS I always use the system DNS?

It is OK for gRPC on iOS to always use the built-in DNS system. Customizing the DNS authority is an advanced feature which has never been supported on iOS.

The other thing that's not supported is using a custom timeout, given the new DNS API removed the deadline parameter, I assume it's OK to rely on system default timeout (5s in my test)?

I think this is OK too. I'm not aware of any requests for advanced DNS resolution capabilities on iOS, so the default timeout should also be fine. The same is true for Android, to the best of my knowledge.

@markdroth
Copy link
Member

@drfloob @HannahShiSFB

So I wonder what's the use case of specifying a different dns server in event engine? is it OK for iOS I always use the system DNS?

It is OK for gRPC on iOS to always use the built-in DNS system. Customizing the DNS authority is an advanced feature which has never been supported on iOS.

I'm fine with iOS not supporting custom DNS servers. But we need to make sure that if a user provides a URI with a custom DNS server, we do not just silently ignore the custom DNS server; we need to reject such a URI if we cannot honor it.

I think this means that the EE impl should return a failure when creating the DNSResolver object if the caller passes in a custom DNS server.

@drfloob
Copy link
Member

drfloob commented Jul 18, 2023

@markdroth @HannahShiSFB

I think this means that the EE impl should return a failure when creating the DNSResolver object if the caller passes in a custom DNS server.

I agree. @yijiem can you add a bit to the documentation in your API update here: #33744. Something like "If the caller requests a custom DNS resolver, and the EventEngine implementation does not support it, this must return an error."

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really great! The only blocking issue is merging the changes from #33744 into this PR.

src/core/lib/event_engine/ares_resolver.cc Outdated Show resolved Hide resolved
src/core/lib/event_engine/posix_engine/posix_engine.cc Outdated Show resolved Hide resolved
Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

src/core/lib/event_engine/ares_resolver.cc Outdated Show resolved Hide resolved
Copy link
Member

@drfloob drfloob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work, Yijie!

@yijiem yijiem merged commit a7bf07e into grpc:master Jul 21, 2023
64 of 65 checks passed
@copybara-service copybara-service bot added the imported Specifies if the PR has been imported to the internal repository label Jul 21, 2023
mario-vimal pushed a commit to mario-vimal/grpc that referenced this pull request Jul 27, 2023
This PR implements a c-ares based DNS resolver for EventEngine with the
reference from the original
[grpc_ares_wrapper.h](../blob/master/src/core/ext/filters/client_channel/resolver/dns/c_ares/grpc_ares_wrapper.h).
The PosixEventEngine DNSResolver is implemented on top of that. Tests
which use the client channel resolver API
([resolver.h](../blob/master/src/core/lib/resolver/resolver.h#L54)) are
ported, namely the
[resolver_component_test.cc](../blob/master/test/cpp/naming/resolver_component_test.cc)
and the
[cancel_ares_query_test.cc](../blob/master/test/cpp/naming/cancel_ares_query_test.cc).
The WindowsEventEngine DNSResolver will use the same EventEngine's
grpc_ares_wrapper and will be worked on next.

The
[resolve_address_test.cc](https://github.com/grpc/grpc/blob/master/test/core/iomgr/resolve_address_test.cc)
which uses the iomgr
[DNSResolver](../blob/master/src/core/lib/iomgr/resolve_address.h#L44)
API has been ported to EventEngine's dns_test.cc. That leaves only 2
tests which use iomgr's API, notably the
[dns_resolver_cooldown_test.cc](../blob/master/test/core/client_channel/resolvers/dns_resolver_cooldown_test.cc)
and the
[goaway_server_test.cc](../blob/master/test/core/end2end/goaway_server_test.cc)
which probably need to be restructured to use EventEngine DNSResolver
(for one thing they override the original grpc_ares_wrapper's free
functions). I will try to tackle these in the next step.
<!--

If you know who should review your pull request, please assign it to
that
person, otherwise the pull request would get assigned randomly.

If your pull request is for a specific language, please add the
appropriate
lang label.

-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants