[[PATCH] botocore: Fix rejecting URLs with unsafe characters in is_valid_endpoint_url()


Wentao Zhang
 

The function is_valid_endpoint_url() in botocore is designed to validate
endpoint URLs, but it fails to detect unsafe characters with Python 3.9.5+
and other versions carrying bpo-43882 fix. The issue is caused by urlsplit()
silently stripping LF, CR, and HT characters while splitting the URL,
which disarms the validator in botocore.

This patch detects unsafe characters in is_valid_endpoint_url() and
is_valid_ipv6_endpoint_url() early, in order to fix rejecting invalid URLs
with unsafe characters.

Signed-off-by: Wentao Zhang <wentao.zhang@...>
---
...Ls-with-unsafe-characters-in-is_vali.patch | 58 +++++++++++++++++++
.../python/python3-botocore_1.20.51.bb | 2 +
2 files changed, 60 insertions(+)
create mode 100644 recipes-devtools/python/python3-botocore/0001-Fix-rejecting-URLs-with-unsafe-characters-in-is_vali.patch

diff --git a/recipes-devtools/python/python3-botocore/0001-Fix-rejecting-URLs-with-unsafe-characters-in-is_vali.patch b/recipes-devtools/python/python3-botocore/0001-Fix-rejecting-URLs-with-unsafe-characters-in-is_vali.patch
new file mode 100644
index 0000000..6a43608
--- /dev/null
+++ b/recipes-devtools/python/python3-botocore/0001-Fix-rejecting-URLs-with-unsafe-characters-in-is_vali.patch
@@ -0,0 +1,58 @@
+From 370cdf7d708c92bf21a42f15392f7be330cf8f80 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= <mgorny@...>
+Date: Fri, 7 May 2021 19:54:16 +0200
+Subject: [PATCH] Fix rejecting URLs with unsafe characters in
+ is_valid_endpoint_url() (#2381)
+
+Detect unsafe characters in is_valid_endpoint_url()
+and is_valid_ipv6_endpoint_url() early, in order to fix rejecting
+invalid URLs with Python 3.9.5+ and other versions carrying bpo-43882
+fix. In these versions, urlsplit() silently strips LF, CR and HT
+characters while splitting the URL, effectively disarming the validator
+in botocore.
+
+The solution is based on a similar fix in Django.
+
+Fixes #2377
+---
+ botocore/utils.py | 10 ++++++++++
+ 1 file changed, 10 insertions(+)
+
+diff --git a/botocore/utils.py b/botocore/utils.py
+index 378972248..d35dd64bb 100644
+--- a/botocore/utils.py
++++ b/botocore/utils.py
+@@ -173,6 +173,10 @@ ZONE_ID_PAT = "(?:%25|%)(?:[" + UNRESERVED_PAT + "]|%[a-fA-F0-9]{2})+"
+ IPV6_ADDRZ_PAT = r"\[" + IPV6_PAT + r"(?:" + ZONE_ID_PAT + r")?\]"
+ IPV6_ADDRZ_RE = re.compile("^" + IPV6_ADDRZ_PAT + "$")
+
++# These are the characters that are stripped by post-bpo-43882 urlparse().
++UNSAFE_URL_CHARS = frozenset('\t\r\n')
++
++
+ def ensure_boolean(val):
+ """Ensures a boolean value if a string or boolean is provided
+
+@@ -977,6 +981,8 @@ class ArgumentGenerator(object):
+
+
+ def is_valid_ipv6_endpoint_url(endpoint_url):
++ if UNSAFE_URL_CHARS.intersection(endpoint_url):
++ return False
+ netloc = urlparse(endpoint_url).netloc
+ return IPV6_ADDRZ_RE.match(netloc) is not None
+
+@@ -990,6 +996,10 @@ def is_valid_endpoint_url(endpoint_url):
+ :return: True if the endpoint url is valid. False otherwise.
+
+ """
++ # post-bpo-43882 urlsplit() strips unsafe characters from URL, causing
++ # it to pass hostname validation below. Detect them early to fix that.
++ if UNSAFE_URL_CHARS.intersection(endpoint_url):
++ return False
+ parts = urlsplit(endpoint_url)
+ hostname = parts.hostname
+ if hostname is None:
+--
+2.25.1
+
diff --git a/recipes-devtools/python/python3-botocore_1.20.51.bb b/recipes-devtools/python/python3-botocore_1.20.51.bb
index ca506f6..f71db1f 100644
--- a/recipes-devtools/python/python3-botocore_1.20.51.bb
+++ b/recipes-devtools/python/python3-botocore_1.20.51.bb
@@ -8,3 +8,5 @@ SRC_URI[sha256sum] = "c853d6c2321e2f2328282c7d49d7b1a06201826ba0e7049c6975ab5f22
inherit pypi setuptools3

RDEPENDS:${PN} += "python3-jmespath python3-dateutil python3-logging"
+
+SRC_URI += "file://0001-Fix-rejecting-URLs-with-unsafe-characters-in-is_vali.patch"
--
2.25.1


Bruce Ashfield
 

merged.

Bruce

In message: [meta-virtualization] [[PATCH] botocore: Fix rejecting URLs with unsafe characters in is_valid_endpoint_url()
on 21/03/2023 Wentao Zhang wrote:

The function is_valid_endpoint_url() in botocore is designed to validate
endpoint URLs, but it fails to detect unsafe characters with Python 3.9.5+
and other versions carrying bpo-43882 fix. The issue is caused by urlsplit()
silently stripping LF, CR, and HT characters while splitting the URL,
which disarms the validator in botocore.

This patch detects unsafe characters in is_valid_endpoint_url() and
is_valid_ipv6_endpoint_url() early, in order to fix rejecting invalid URLs
with unsafe characters.

Signed-off-by: Wentao Zhang <wentao.zhang@...>
---
...Ls-with-unsafe-characters-in-is_vali.patch | 58 +++++++++++++++++++
.../python/python3-botocore_1.20.51.bb | 2 +
2 files changed, 60 insertions(+)
create mode 100644 recipes-devtools/python/python3-botocore/0001-Fix-rejecting-URLs-with-unsafe-characters-in-is_vali.patch

diff --git a/recipes-devtools/python/python3-botocore/0001-Fix-rejecting-URLs-with-unsafe-characters-in-is_vali.patch b/recipes-devtools/python/python3-botocore/0001-Fix-rejecting-URLs-with-unsafe-characters-in-is_vali.patch
new file mode 100644
index 0000000..6a43608
--- /dev/null
+++ b/recipes-devtools/python/python3-botocore/0001-Fix-rejecting-URLs-with-unsafe-characters-in-is_vali.patch
@@ -0,0 +1,58 @@
+From 370cdf7d708c92bf21a42f15392f7be330cf8f80 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= <mgorny@...>
+Date: Fri, 7 May 2021 19:54:16 +0200
+Subject: [PATCH] Fix rejecting URLs with unsafe characters in
+ is_valid_endpoint_url() (#2381)
+
+Detect unsafe characters in is_valid_endpoint_url()
+and is_valid_ipv6_endpoint_url() early, in order to fix rejecting
+invalid URLs with Python 3.9.5+ and other versions carrying bpo-43882
+fix. In these versions, urlsplit() silently strips LF, CR and HT
+characters while splitting the URL, effectively disarming the validator
+in botocore.
+
+The solution is based on a similar fix in Django.
+
+Fixes #2377
+---
+ botocore/utils.py | 10 ++++++++++
+ 1 file changed, 10 insertions(+)
+
+diff --git a/botocore/utils.py b/botocore/utils.py
+index 378972248..d35dd64bb 100644
+--- a/botocore/utils.py
++++ b/botocore/utils.py
+@@ -173,6 +173,10 @@ ZONE_ID_PAT = "(?:%25|%)(?:[" + UNRESERVED_PAT + "]|%[a-fA-F0-9]{2})+"
+ IPV6_ADDRZ_PAT = r"\[" + IPV6_PAT + r"(?:" + ZONE_ID_PAT + r")?\]"
+ IPV6_ADDRZ_RE = re.compile("^" + IPV6_ADDRZ_PAT + "$")
+
++# These are the characters that are stripped by post-bpo-43882 urlparse().
++UNSAFE_URL_CHARS = frozenset('\t\r\n')
++
++
+ def ensure_boolean(val):
+ """Ensures a boolean value if a string or boolean is provided
+
+@@ -977,6 +981,8 @@ class ArgumentGenerator(object):
+
+
+ def is_valid_ipv6_endpoint_url(endpoint_url):
++ if UNSAFE_URL_CHARS.intersection(endpoint_url):
++ return False
+ netloc = urlparse(endpoint_url).netloc
+ return IPV6_ADDRZ_RE.match(netloc) is not None
+
+@@ -990,6 +996,10 @@ def is_valid_endpoint_url(endpoint_url):
+ :return: True if the endpoint url is valid. False otherwise.
+
+ """
++ # post-bpo-43882 urlsplit() strips unsafe characters from URL, causing
++ # it to pass hostname validation below. Detect them early to fix that.
++ if UNSAFE_URL_CHARS.intersection(endpoint_url):
++ return False
+ parts = urlsplit(endpoint_url)
+ hostname = parts.hostname
+ if hostname is None:
+--
+2.25.1
+
diff --git a/recipes-devtools/python/python3-botocore_1.20.51.bb b/recipes-devtools/python/python3-botocore_1.20.51.bb
index ca506f6..f71db1f 100644
--- a/recipes-devtools/python/python3-botocore_1.20.51.bb
+++ b/recipes-devtools/python/python3-botocore_1.20.51.bb
@@ -8,3 +8,5 @@ SRC_URI[sha256sum] = "c853d6c2321e2f2328282c7d49d7b1a06201826ba0e7049c6975ab5f22
inherit pypi setuptools3

RDEPENDS:${PN} += "python3-jmespath python3-dateutil python3-logging"
+
+SRC_URI += "file://0001-Fix-rejecting-URLs-with-unsafe-characters-in-is_vali.patch"
--
2.25.1



Peter Kjellerstedt
 

-----Original Message-----
From: meta-virtualization@... <meta-virtualization@...> On Behalf Of Bruce Ashfield
Sent: den 24 mars 2023 00:09
To: Wentao Zhang <wentao.zhang@...>
Cc: meta-virtualization@...
Subject: Re: [meta-virtualization] [[PATCH] botocore: Fix rejecting URLs with unsafe characters in is_valid_endpoint_url()

merged.

Bruce
Any reason not to update the version instead? The current version (1.20.51)
was released over two years ago. The latest release is 1.29.113.

//Peter


In message: [meta-virtualization] [[PATCH] botocore: Fix rejecting URLs with unsafe characters in is_valid_endpoint_url()
on 21/03/2023 Wentao Zhang wrote:

The function is_valid_endpoint_url() in botocore is designed to validate
endpoint URLs, but it fails to detect unsafe characters with Python 3.9.5+
and other versions carrying bpo-43882 fix. The issue is caused by urlsplit()
silently stripping LF, CR, and HT characters while splitting the URL,
which disarms the validator in botocore.

This patch detects unsafe characters in is_valid_endpoint_url() and
is_valid_ipv6_endpoint_url() early, in order to fix rejecting invalid URLs
with unsafe characters.

Signed-off-by: Wentao Zhang <wentao.zhang@...>
---
...Ls-with-unsafe-characters-in-is_vali.patch | 58 +++++++++++++++++++
.../python/python3-botocore_1.20.51.bb | 2 +
2 files changed, 60 insertions(+)
create mode 100644 recipes-devtools/python/python3-botocore/0001-Fix-rejecting-URLs-with-unsafe-characters-in-is_vali.patch

diff --git a/recipes-devtools/python/python3-botocore/0001-Fix-rejecting-URLs-with-unsafe-characters-in-is_vali.patch b/recipes-devtools/python/python3-botocore/0001-Fix-rejecting-URLs-with-unsafe-characters-in-is_vali.patch
new file mode 100644
index 0000000..6a43608
--- /dev/null
+++ b/recipes-devtools/python/python3-botocore/0001-Fix-rejecting-URLs-with-unsafe-characters-in-is_vali.patch
@@ -0,0 +1,58 @@
+From 370cdf7d708c92bf21a42f15392f7be330cf8f80 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= <mgorny@...>
+Date: Fri, 7 May 2021 19:54:16 +0200
+Subject: [PATCH] Fix rejecting URLs with unsafe characters in
+ is_valid_endpoint_url() (#2381)
+
+Detect unsafe characters in is_valid_endpoint_url()
+and is_valid_ipv6_endpoint_url() early, in order to fix rejecting
+invalid URLs with Python 3.9.5+ and other versions carrying bpo-43882
+fix. In these versions, urlsplit() silently strips LF, CR and HT
+characters while splitting the URL, effectively disarming the validator
+in botocore.
+
+The solution is based on a similar fix in Django.
+
+Fixes #2377
+---
+ botocore/utils.py | 10 ++++++++++
+ 1 file changed, 10 insertions(+)
+
+diff --git a/botocore/utils.py b/botocore/utils.py
+index 378972248..d35dd64bb 100644
+--- a/botocore/utils.py
++++ b/botocore/utils.py
+@@ -173,6 +173,10 @@ ZONE_ID_PAT = "(?:%25|%)(?:[" + UNRESERVED_PAT + "]|%[a-fA-F0-9]{2})+"
+ IPV6_ADDRZ_PAT = r"\[" + IPV6_PAT + r"(?:" + ZONE_ID_PAT + r")?\]"
+ IPV6_ADDRZ_RE = re.compile("^" + IPV6_ADDRZ_PAT + "$")
+
++# These are the characters that are stripped by post-bpo-43882 urlparse().
++UNSAFE_URL_CHARS = frozenset('\t\r\n')
++
++
+ def ensure_boolean(val):
+ """Ensures a boolean value if a string or boolean is provided
+
+@@ -977,6 +981,8 @@ class ArgumentGenerator(object):
+
+
+ def is_valid_ipv6_endpoint_url(endpoint_url):
++ if UNSAFE_URL_CHARS.intersection(endpoint_url):
++ return False
+ netloc = urlparse(endpoint_url).netloc
+ return IPV6_ADDRZ_RE.match(netloc) is not None
+
+@@ -990,6 +996,10 @@ def is_valid_endpoint_url(endpoint_url):
+ :return: True if the endpoint url is valid. False otherwise.
+
+ """
++ # post-bpo-43882 urlsplit() strips unsafe characters from URL, causing
++ # it to pass hostname validation below. Detect them early to fix that.
++ if UNSAFE_URL_CHARS.intersection(endpoint_url):
++ return False
+ parts = urlsplit(endpoint_url)
+ hostname = parts.hostname
+ if hostname is None:
+--
+2.25.1
+
diff --git a/recipes-devtools/python/python3-botocore_1.20.51.bb b/recipes-devtools/python/python3-botocore_1.20.51.bb
index ca506f6..f71db1f 100644
--- a/recipes-devtools/python/python3-botocore_1.20.51.bb
+++ b/recipes-devtools/python/python3-botocore_1.20.51.bb
@@ -8,3 +8,5 @@ SRC_URI[sha256sum] = "c853d6c2321e2f2328282c7d49d7b1a06201826ba0e7049c6975ab5f22
inherit pypi setuptools3

RDEPENDS:${PN} += "python3-jmespath python3-dateutil python3-logging"
+
+SRC_URI += "file://0001-Fix-rejecting-URLs-with-unsafe-characters-in-is_vali.patch"
--
2.25.1


Bruce Ashfield
 

On Fri, Apr 14, 2023 at 12:21 AM Peter Kjellerstedt
<peter.kjellerstedt@...> wrote:

-----Original Message-----
From: meta-virtualization@... <meta-virtualization@...> On Behalf Of Bruce Ashfield
Sent: den 24 mars 2023 00:09
To: Wentao Zhang <wentao.zhang@...>
Cc: meta-virtualization@...
Subject: Re: [meta-virtualization] [[PATCH] botocore: Fix rejecting URLs with unsafe characters in is_valid_endpoint_url()

merged.

Bruce
Any reason not to update the version instead? The current version (1.20.51)
was released over two years ago. The latest release is 1.29.113.
That was my plan post release.

You'll almost always see me asking for uprev's versus patches, but I
don't have my own system level tests for botocore .. so I was playing
this one a bit more safely.

Bruce


//Peter


In message: [meta-virtualization] [[PATCH] botocore: Fix rejecting URLs with unsafe characters in is_valid_endpoint_url()
on 21/03/2023 Wentao Zhang wrote:

The function is_valid_endpoint_url() in botocore is designed to validate
endpoint URLs, but it fails to detect unsafe characters with Python 3.9.5+
and other versions carrying bpo-43882 fix. The issue is caused by urlsplit()
silently stripping LF, CR, and HT characters while splitting the URL,
which disarms the validator in botocore.

This patch detects unsafe characters in is_valid_endpoint_url() and
is_valid_ipv6_endpoint_url() early, in order to fix rejecting invalid URLs
with unsafe characters.

Signed-off-by: Wentao Zhang <wentao.zhang@...>
---
...Ls-with-unsafe-characters-in-is_vali.patch | 58 +++++++++++++++++++
.../python/python3-botocore_1.20.51.bb | 2 +
2 files changed, 60 insertions(+)
create mode 100644 recipes-devtools/python/python3-botocore/0001-Fix-rejecting-URLs-with-unsafe-characters-in-is_vali.patch

diff --git a/recipes-devtools/python/python3-botocore/0001-Fix-rejecting-URLs-with-unsafe-characters-in-is_vali.patch b/recipes-devtools/python/python3-botocore/0001-Fix-rejecting-URLs-with-unsafe-characters-in-is_vali.patch
new file mode 100644
index 0000000..6a43608
--- /dev/null
+++ b/recipes-devtools/python/python3-botocore/0001-Fix-rejecting-URLs-with-unsafe-characters-in-is_vali.patch
@@ -0,0 +1,58 @@
+From 370cdf7d708c92bf21a42f15392f7be330cf8f80 Mon Sep 17 00:00:00 2001
+From: =?UTF-8?q?Micha=C5=82=20G=C3=B3rny?= <mgorny@...>
+Date: Fri, 7 May 2021 19:54:16 +0200
+Subject: [PATCH] Fix rejecting URLs with unsafe characters in
+ is_valid_endpoint_url() (#2381)
+
+Detect unsafe characters in is_valid_endpoint_url()
+and is_valid_ipv6_endpoint_url() early, in order to fix rejecting
+invalid URLs with Python 3.9.5+ and other versions carrying bpo-43882
+fix. In these versions, urlsplit() silently strips LF, CR and HT
+characters while splitting the URL, effectively disarming the validator
+in botocore.
+
+The solution is based on a similar fix in Django.
+
+Fixes #2377
+---
+ botocore/utils.py | 10 ++++++++++
+ 1 file changed, 10 insertions(+)
+
+diff --git a/botocore/utils.py b/botocore/utils.py
+index 378972248..d35dd64bb 100644
+--- a/botocore/utils.py
++++ b/botocore/utils.py
+@@ -173,6 +173,10 @@ ZONE_ID_PAT = "(?:%25|%)(?:[" + UNRESERVED_PAT + "]|%[a-fA-F0-9]{2})+"
+ IPV6_ADDRZ_PAT = r"\[" + IPV6_PAT + r"(?:" + ZONE_ID_PAT + r")?\]"
+ IPV6_ADDRZ_RE = re.compile("^" + IPV6_ADDRZ_PAT + "$")
+
++# These are the characters that are stripped by post-bpo-43882 urlparse().
++UNSAFE_URL_CHARS = frozenset('\t\r\n')
++
++
+ def ensure_boolean(val):
+ """Ensures a boolean value if a string or boolean is provided
+
+@@ -977,6 +981,8 @@ class ArgumentGenerator(object):
+
+
+ def is_valid_ipv6_endpoint_url(endpoint_url):
++ if UNSAFE_URL_CHARS.intersection(endpoint_url):
++ return False
+ netloc = urlparse(endpoint_url).netloc
+ return IPV6_ADDRZ_RE.match(netloc) is not None
+
+@@ -990,6 +996,10 @@ def is_valid_endpoint_url(endpoint_url):
+ :return: True if the endpoint url is valid. False otherwise.
+
+ """
++ # post-bpo-43882 urlsplit() strips unsafe characters from URL, causing
++ # it to pass hostname validation below. Detect them early to fix that.
++ if UNSAFE_URL_CHARS.intersection(endpoint_url):
++ return False
+ parts = urlsplit(endpoint_url)
+ hostname = parts.hostname
+ if hostname is None:
+--
+2.25.1
+
diff --git a/recipes-devtools/python/python3-botocore_1.20.51.bb b/recipes-devtools/python/python3-botocore_1.20.51.bb
index ca506f6..f71db1f 100644
--- a/recipes-devtools/python/python3-botocore_1.20.51.bb
+++ b/recipes-devtools/python/python3-botocore_1.20.51.bb
@@ -8,3 +8,5 @@ SRC_URI[sha256sum] = "c853d6c2321e2f2328282c7d49d7b1a06201826ba0e7049c6975ab5f22
inherit pypi setuptools3

RDEPENDS:${PN} += "python3-jmespath python3-dateutil python3-logging"
+
+SRC_URI += "file://0001-Fix-rejecting-URLs-with-unsafe-characters-in-is_vali.patch"
--
2.25.1

--
- Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end
- "Use the force Harry" - Gandalf, Star Trek II