Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: NO_PROXY should support IPv4, IPv6 and localhost #2659

Merged
merged 4 commits into from Apr 19, 2023

Conversation

HearyShen
Copy link
Contributor

@HearyShen HearyShen commented Apr 14, 2023

Closes #2660

The get_environment_proxies function in _utils.py is responsible for parsing and mounting proxy info from system environment.

httpx/httpx/_utils.py

Lines 229 to 264 in 4b5a92e

def get_environment_proxies() -> typing.Dict[str, typing.Optional[str]]:
"""Gets proxy information from the environment"""
# urllib.request.getproxies() falls back on System
# Registry and Config for proxies on Windows and macOS.
# We don't want to propagate non-HTTP proxies into
# our configuration such as 'TRAVIS_APT_PROXY'.
proxy_info = getproxies()
mounts: typing.Dict[str, typing.Optional[str]] = {}
for scheme in ("http", "https", "all"):
if proxy_info.get(scheme):
hostname = proxy_info[scheme]
mounts[f"{scheme}://"] = (
hostname if "://" in hostname else f"http://{hostname}"
)
no_proxy_hosts = [host.strip() for host in proxy_info.get("no", "").split(",")]
for hostname in no_proxy_hosts:
# See https://curl.haxx.se/libcurl/c/CURLOPT_NOPROXY.html for details
# on how names in `NO_PROXY` are handled.
if hostname == "*":
# If NO_PROXY=* is used or if "*" occurs as any one of the comma
# separated hostnames, then we should just bypass any information
# from HTTP_PROXY, HTTPS_PROXY, ALL_PROXY, and always ignore
# proxies.
return {}
elif hostname:
# NO_PROXY=.google.com is marked as "all://*.google.com,
# which disables "www.google.com" but not "google.com"
# NO_PROXY=google.com is marked as "all://*google.com,
# which disables "www.google.com" and "google.com".
# (But not "wwwgoogle.com")
mounts[f"all://*{hostname}"] = None
return mounts

For env no_proxy, according to CURLOPT_NOPROXY explained, it should support domains, IPv4, IPv6 and the localhost. However, current get_environment_proxies function implementation only supports domains correctly as it always adds wildcard * in front of the hostname.

I encountered error when my environment no_proxy includes IPv6 address like ::1. It is wrongly transformed into all://*::1 and causes urlparse error since the _urlparse.py parses the :1 as port.

image

To fix this issue, I looked into this repo and suggest handling the no_proxy hostnames as domains, IPv4, IPv6 and the localhost seperately. I have updated the get_environment_proxies function in _utils.py and tested it.

Replies and discussions are welcomed!

@tomchristie tomchristie added the bug Something isn't working label Apr 19, 2023
@tomchristie
Copy link
Member

Nicely done, yup.

@tomchristie tomchristie merged commit 15d09a3 into encode:master Apr 19, 2023
5 checks passed
@epenet epenet mentioned this pull request May 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

The get_environment_proxies function in _utils.py does not support IPv4, IPv6 correctly
2 participants