Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow urllib3 v2.0 #105

Closed
wants to merge 1 commit into from
Closed

Allow urllib3 v2.0 #105

wants to merge 1 commit into from

Conversation

sethmlarson
Copy link
Contributor

Closes #102

@pquentin
Copy link
Member

pquentin commented Jun 8, 2023

While most failures are unrelated and due to the test suite reaching out to httpbin.org, I would like to understand what this new warning is:

FAILED tests/node/test_urllib3_chain_certs.py::test_assert_fingerprint_in_cert_chain[8ECDE6884F3D87B1125BA31AC3FCB13D7016DE7F57CC904FE1CB97C6AE98196E-RequestsHttpNode] - assert [<warnings.WarningMessage object at 0x7f5499ec4bd0>] == []
  Left contains one more item: <warnings.WarningMessage object at 0x7f5499ec4bd0>
  Full diff:
  - []
  + [<warnings.WarningMessage object at 0x7f5499ec4bd0>]

@JoshMock
Copy link
Member

While most failures are unrelated and due to the test suite reaching out to httpbin.org, I would like to understand what this new warning is

I'm looking at that right now, too. The full warning message is:

InsecureRequestWarning: Unverified HTTPS request is being made to host 'httpbin.org'. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#tls-warnings

When I check the fingerprint it doesn't match the fingerprint on the test:

openssl s_client -connect httpbin.org:443 -showcerts </dev/null | openssl x509 -noout -sha256 -fingerprint
# FF:77:C6:EA:49:81:71:84:63:B3:45:12:2F:9B:A0:00:34:B4:78:00:12:E8:91:47:FB:EB:56:1F:9A:63:BB:CC

I changed the tests to use that fingerprint and ran them locally, but the same warnings persisted.

I don't know this library well enough at all (yet) to understand the issue here. Since the test is specifically for testing SSL I'm not comfortable squashing an InsecureRequestWarning.

@sethmlarson is, coincidentally, the maintainer of urllib3 as well as the author of this PR, so perhaps he can provide some feedback here. 😆

@aqeelat
Copy link

aqeelat commented Jun 26, 2023

I was able to resolve the warnings for Urllib3HttpNode by changing the value to "CERT_REQUIRED" in

I'm not a daily user of this package, and so I won't know the different use cases for it, but I'm not sure if we should enforce CERT_REQUIRED.

@pquentin
Copy link
Member

pquentin commented Jul 5, 2023

As it turns out I'm also an urllib3 maintainer, but this took quite some digging.

In urllib3 1.26.x, we considered a connection verified if assert_fingerprint was not None, but in urllib3 2.0 we changed it to check if it was truthy using bool(). The thing is that elastic-transport-python relies on the 1.26.x behavior:

if assert_fingerprint:
# Falsey but not None. This is a hack to skip fingerprinting by urllib3
# but still set 'is_verified=True' within HTTPSConnectionPool._validate_conn()
kwargs["assert_fingerprint"] = ""

elastic-transport-python hacks into the internals of urllib3 which is unsupported. I can see three options here:

  • We revert to using is not None in urllib3
  • We manually set conn.is_verified to True in elastic-transport-python, which will work with urllib3 1.26.x and 2.0.x
  • We try to understand why elastic-transport-python has its own fingerprint logic and see if we can offer an API to do the same thing in urllib3

@sethmlarson @JoshMock What do you think?

@ezimuel
Copy link

ezimuel commented Jul 6, 2023

@JoshMock I think we should investigate in the third option suggested by @pquentin:

We try to understand why elastic-transport-python has its own fingerprint logic and see if we can offer an API to do the same thing in urllib3

I'm also looking into this but definitely something to fix for security reason. We should not bypass this check.

@aqeelat
Copy link

aqeelat commented Jul 6, 2023

@pquentin I'm very confused about this block and the nested comment. Does this call happen before the validation from urllib2 or after it?

@pquentin
Copy link
Member

pquentin commented Jul 6, 2023

@aqeelat It happens before. If you look at more context:

class HTTPSConnectionPool(urllib3.HTTPSConnectionPool):
"""HTTPSConnectionPool implementation which supports ``assert_fingerprint``
on certificates within the chain instead of only the leaf cert using private
APIs in CPython 3.10+
"""
def __init__(
self, *args: Any, assert_fingerprint: Optional[str] = None, **kwargs: Any
) -> None:
self._elastic_assert_fingerprint = (
assert_fingerprint.replace(":", "").lower() if assert_fingerprint else None
)
# Complain about fingerprint length earlier than urllib3 does.
if (
self._elastic_assert_fingerprint
and len(self._elastic_assert_fingerprint) not in _HASHES_BY_LENGTH
):
valid_lengths = "', '".join(map(str, sorted(_HASHES_BY_LENGTH.keys())))
raise ValueError(
f"Fingerprint of invalid length '{len(self._elastic_assert_fingerprint)}'"
f", should be one of '{valid_lengths}'"
)
if assert_fingerprint:
# Falsey but not None. This is a hack to skip fingerprinting by urllib3
# but still set 'is_verified=True' within HTTPSConnectionPool._validate_conn()
kwargs["assert_fingerprint"] = ""
super().__init__(*args, **kwargs)

You can see that elastic-transport-python is inheriting from urllib3.HTTPSConnectionPool and does kwargs["assert_fingerprint"] = "" before calling urllib3.HTTPSConnectionPool.__init__ using super(). urllib3's validation happens when a request is made and there's an active connection to validate.

The reason urllib3's validation is disabled is that elastic-transport-python does its own validation:

# Give users all the fingerprints we checked against in
# order of peer -> root CA.
if not success:
raise urllib3.exceptions.SSLError(
'Fingerprints did not match. Expected "{0}", got "{1}".'.format(
self._elastic_assert_fingerprint,
'", "'.join([x.decode() for x in map(hexlify, fingerprints)]),
)
)
conn.is_verified = success

The equivalent in urllib3 is assert_fingerprint: https://github.com/urllib3/urllib3/blob/2ac40569acb464074bdc3f308124d781d6aa0860/src/urllib3/util/ssl_.py#L146-L173

@ezimuel @JoshMock The additions in elastic-transport-python allow pinning any certificate in the chain, not just the leaf certificate as in urllib3. Seth has blogged about this extensively at https://sethmlarson.dev/experimental-python-3.10-apis-and-trust-stores. Even if we move that code to urllib3, it will only be for 2.0.x, not 1.26.x which is still widely used, so I would actually suggest option 2 here: setting conn.is_verified to True somehow in elastic-transport-python.

@JoshMock
Copy link
Member

JoshMock commented Jul 6, 2023

Thanks for all the context @pquentin. 🙏

@JoshMock I think we should investigate in the third option suggested by @pquentin:

@ezimuel Whatever option means we depend on validated, well-tested upstream security practices (in urllib3 in this case) rather than writing and maintaining our own custom security validation schemes would be my preference. Unless there's some Elasticsearch-specific reasoning for it, and I really hope there isn't. 😄

@pquentin
Copy link
Member

pquentin commented Jul 6, 2023

Unless there's some Elasticsearch-specific reasoning for it, and I really hope there isn't. 😄

I also hoped that root CA pinning support was simply to add system certificate stores support! This is quite common in enterprise companies: to avoid paying for hundreds or thousands of public certificates, they have their own root certificates installed on every computer, and most software can use it transparently. But not Python as it uses OpenSSL which does not support this. If this was the reason for _urllib3_chain_certs.py then I would have recommended https://github.com/sethmlarson/truststore which uses the same hacks but nicely packaged to look at system certificate stores on Linux, macOS and Windows.

However, I discussed quickly with Seth today and _urllib3_chain_certs is not about system certificate stores but to help with the security by default introduced in Elasticsearch 8. While Elasticsearch can use any root CA, by default it generates its own root CA with each node having a different leaf server certificate. If you have Python 3.10+, it's much simpler to pin that root CA, because it will work with all nodes, not just a single node. This is documented here: https://www.elastic.co/guide/en/elasticsearch/client/python-api/current/config.html#_verifying_server_certificates.

Sorry to disappoint!

@pquentin
Copy link
Member

pquentin commented Jul 7, 2023

So httpbin.org keeps timing out for me which makes testing harder, but this diff is compatible with urllib3 1.26 and 2.0. It fixes the warning by using the public assert_fingerprint API correctly while monkeypatching a new urllib3 private function. Not sure you'll like it :)

diff --git a/elastic_transport/_node/_urllib3_chain_certs.py b/elastic_transport/_node/_urllib3_chain_certs.py
index 0d99896..e01f6a9 100644
--- a/elastic_transport/_node/_urllib3_chain_certs.py
+++ b/elastic_transport/_node/_urllib3_chain_certs.py
@@ -36,7 +36,17 @@ _HASHES_BY_LENGTH = {32: hashlib.md5, 40: hashlib.sha1, 64: hashlib.sha256}
 __all__ = ["HTTPSConnectionPool"]
 
 
+class HTTPSConnection(urllib3.connection.HTTPSConnection):
+    def connect(self) -> None:
+        super().connect()
+        # Hack to prevent a warning within HTTPSConnectionPool._validate_conn()
+        if self._elastic_assert_fingerprint:
+            self.is_verified = True
+
+
 class HTTPSConnectionPool(urllib3.HTTPSConnectionPool):
+    ConnectionCls = HTTPSConnection
+
     """HTTPSConnectionPool implementation which supports ``assert_fingerprint``
     on certificates within the chain instead of only the leaf cert using private
     APIs in CPython 3.10+
@@ -60,13 +70,21 @@ class HTTPSConnectionPool(urllib3.HTTPSConnectionPool):
                 f", should be one of '{valid_lengths}'"
             )
 
-        if assert_fingerprint:
-            # Falsey but not None. This is a hack to skip fingerprinting by urllib3
-            # but still set 'is_verified=True' within HTTPSConnectionPool._validate_conn()
-            kwargs["assert_fingerprint"] = ""
+        if self._elastic_assert_fingerprint:
+            # Skip fingerprinting by urllib3 as we'll do it ourselves
+            kwargs["assert_fingerprint"] = None
 
         super().__init__(*args, **kwargs)
 
+    def _new_conn(self) -> HTTPSConnection:
+        """
+        Return a fresh :class:`urllib3.connection.HTTPSConnection`.
+        """
+        conn = super()._new_conn()
+        # Tell our custom connection if we'll assert fingerprint ourselves
+        conn._elastic_assert_fingerprint = self._elastic_assert_fingerprint
+        return conn
+
     def _validate_conn(self, conn: urllib3.connection.HTTPSConnection) -> None:
         """
         Called right before a request is made, after the socket is created.

@sethmlarson
Copy link
Contributor Author

Going to close this PR since I won't be able to get it across the finish line. Good luck @pquentin and @JoshMock!

@ezimuel
Copy link

ezimuel commented Oct 18, 2023

This should be fixed in #121 , just merged.

@pquentin pquentin deleted the urllib3-2.0 branch December 13, 2023 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update the dependency on urllib3 to >2.0?
5 participants