Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CODEC-314: Fix possible IndexOutOfBoundsException thrown by PercentCodec.insertAlwaysEncodeChars() method #222

Closed

Conversation

arthurscchan
Copy link
Contributor

@arthurscchan arthurscchan commented Nov 22, 2023

This fixes a possible IndexOutOfBoundsException in src/main/java/org/apache/commons/codec/net/PercentCodec.java thrown by PercentCodec.insertAlwaysEncodeChars() method if the initial byte array contains negative bytes which are used as index for the BitSet.

This PR adds a conditional check to ensure only valid bytes (positive or zero) are processed.

We found this bug using fuzzing by way of OSS-Fuzz. It is reported at https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=64362.

@garydgregory
Copy link
Member

Let's just have ONE PR instead of one-liner PRs. Especially since these all claim to fix the exact same kind of issue. Also, you MUST provide a matching test.

@arthurscchan
Copy link
Contributor Author

Hi, I have added unit test. I am not sure about combining the PRs since they are opened as 4 different issues in the JIRA.

Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @arthurscchan
Please see my comment.

@garydgregory
Copy link
Member

Hi, I have added unit test. I am not sure about combining the PRs since they are opened as 4 different issues in the JIRA.

In this case, I don't care about mapping PRs to JIRA 1-1. YMMV ;-) I care about making my life easier ;-)

@garydgregory
Copy link
Member

Hi @arthurscchan,

Please help me understand why we need this PR because at first glance this PR could be making the use case worse IMO:

  • Before PR: Grabage in -> exception, albeit not IllegalArgumentException.
  • After PR: Garbage in -> garbage out

Or am I missing something?
TY!

@arthurscchan
Copy link
Contributor Author

arthurscchan commented Nov 23, 2023

Hi @garydgregory,

In my understanding, the constructor of PecentCodec calls the insertAlwaysEncodeChars(byte[]) method in Line 83. The use of the insertAlwaysEncodeChars(byte[]) method is to record all bytes in the provided byte array to be some character that requires encoding. Thus if garbage bytes do exist in the byte array, it should be skipped and all other "valid" bytes should still be considered. For example, if someone thinks that the letter A should always be encoded, and accidentally provides byte[] encode = {(byte)'A', (byte) -1};, the logic could still work, skipping the -1 bytes and change the 'A' byte in the BitSet to true. Also, the byte array itself is never stored in the object. Thus the PR could make the running smoother without crashing even if some invalid byte does exist. The new logic simply skips them and continues. The most important part is that "valid" bytes are still processed and interpreted.
This is my understanding, am I missing something here?? Do let me know if I interpret the functionality wrongly. Thanks.

@arthurscchan
Copy link
Contributor Author

arthurscchan commented Nov 23, 2023

Hi, I have added unit test. I am not sure about combining the PRs since they are opened as 4 different issues in the JIRA.

In this case, I don't care about mapping PRs to JIRA 1-1. YMMV ;-) I care about making my life easier ;-)

If you want, I don't mind combining all the PRs together, not sure if it is more messy to handle. Although they are throwing similar exceptions, I would say the reason is not too similar IMO.

@garydgregory
Copy link
Member

@arthurscchan
Please use a better description in PRs and JIRA: Specify the class and method where the exception occurs.

@garydgregory
Copy link
Member

@arthurscchan Please use a better description in PRs and JIRA: Specify the class and method where the exception occurs.

Hi, I have added unit test. I am not sure about combining the PRs since they are opened as 4 different issues in the JIRA.

In this case, I don't care about mapping PRs to JIRA 1-1. YMMV ;-) I care about making my life easier ;-)

If you want, I don't mind combining all the PRs together, not sure if it is more messy to handle. Although they are throwing similar exceptions, I would say the reason is not too similar IMO.

Well, at this point, I commented on all PRs, so we might as well leave it as is.
You'll have to make sure to review all comments from all PRs as they apply to all PRs but I did not want to repeat myself all over the place ;-)

@arthurscchan arthurscchan changed the title CODEC-314: Fix possible IndexOutOfBoundsException CODEC-314: Fix possible IndexOutOfBoundsException thrown by PercentCodec.insertAlwaysEncodeChars() method Nov 24, 2023
@codecov-commenter
Copy link

codecov-commenter commented Nov 25, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (44e4c4d) 92.27% compared to head (e193cd6) 92.23%.
Report is 21 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master     #222      +/-   ##
============================================
- Coverage     92.27%   92.23%   -0.05%     
- Complexity     1742     1747       +5     
============================================
  Files            67       67              
  Lines          4584     4595      +11     
  Branches        709      712       +3     
============================================
+ Hits           4230     4238       +8     
- Misses          242      243       +1     
- Partials        112      114       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@garydgregory
Copy link
Member

garydgregory commented Nov 25, 2023

Hi @garydgregory,

In my understanding, the constructor of PecentCodec calls the insertAlwaysEncodeChars(byte[]) method in Line 83. The use of the insertAlwaysEncodeChars(byte[]) method is to record all bytes in the provided byte array to be some character that requires encoding. Thus if garbage bytes do exist in the byte array, it should be skipped and all other "valid" bytes should still be considered. For example, if someone thinks that the letter A should always be encoded, and accidentally provides byte[] encode = {(byte)'A', (byte) -1};, the logic could still work, skipping the -1 bytes and change the 'A' byte in the BitSet to true. Also, the byte array itself is never stored in the object. Thus the PR could make the running smoother without crashing even if some invalid byte does exist. The new logic simply skips them and continues. The most important part is that "valid" bytes are still processed and interpreted. This is my understanding, am I missing something here?? Do let me know if I interpret the functionality wrongly. Thanks.

Hi @arthurscchan
Hmm, so that means that one can roundtrip encode/decode, but when you start with bad data, you end up with nothing (if all the input is bad) or good encoded data. But when you decode that encoded data, you don't end up with what you started. If we really want all of that, I think we should update the Javadoc and describe that encoding also performs filtering and can never throw an exception for illegal input. WDYT?
Does that mean encode never throws EncoderException?

@arthurscchan
Copy link
Contributor Author

Hi @garydgregory,

I think it is a different case. The invalid bytes are provided through constructor, it is only used for the initial setting of the PercentCodec object to set which bytes are to be encoded during the encode or decode method. Thus if we initialise the PercentCodec with a byte array contains invalid bytes, it just mean that the encode method will not encode that byte. It does not affect the encode and decode process at all. Thus the encode / decode method round trip won't be affected and the result is the same as the original input. Maybe I could add a round trip call and ensure the result are the same in the test case.
Alternatively, instead of checking the length and skip the invalid byte during constructor phase, I could simply add a try catch block in the code, when any IndexOutOfBound (or more general, and kind of RuntimeException) is thrown, wrap it with EncoderException. This could at least make the method not throwing "unexpected" RuntimeException. It maybe a better approach. WDYT?

@garydgregory
Copy link
Member

Hi @arthurscchan
I'm not a fan of making the ctor overly clever in a way that was never documented and is not expected or obvious (IMO). This violates the principle of least surprise (IMO). We then risk some feature-creep, and setting expectations that other constructors should somehow sanitize their inputs. That is not the best approach for a lower-level library like Commons Codec I would offer. I'd rather the ctor throws an exception on garbage input.

@arthurscchan
Copy link
Contributor Author

arthurscchan commented Nov 27, 2023

Hi @garydgregory, OK. Sorry for not considering in that direction. I will change the code in this PR to wrap around the possible IndexOutOfBoundException in an EncoderException, to at least decrease the change of "unpexected" input thrown directly from the constructor. I will also change a bit on the javadoc comments, mentioning that EncoderException will be thrown by the ctor if invalid bytes exist in the provided byte array. That maybe a better approach. Thanks for your comments.

@garydgregory
Copy link
Member

TY @arthurscchan
Let's keep in mind that this is a library that needs to make sense (or try to) for users. As opposed to just addressing fuzzing issues in the most expedient manner ;-) TY again for your work :-)

Signed-off-by: Arthur Chan <arthur.chan@adalogics.com>
Signed-off-by: Arthur Chan <arthur.chan@adalogics.com>
@arthurscchan
Copy link
Contributor Author

Hi @garydgregory, thanks for your comments and suggestions. I will surely keep that in mind.

Copy link
Member

@garydgregory garydgregory left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @arthurscchan
Rethought this a bit, please see my comment.

*/
public PercentCodec(final byte[] alwaysEncodeChars, final boolean plusForSpace) {
public PercentCodec(final byte[] alwaysEncodeChars, final boolean plusForSpace) throws EncoderException {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @arthurscchan
Now that I look at this again, it doesn't make sense to me, this seems like the exact use case for an IllegalArgumentException. In addition, the constructor does not encode anything, so an EncoderException really does not make sense IMO.

I propose:

diff --git a/src/main/java/org/apache/commons/codec/net/PercentCodec.java b/src/main/java/org/apache/commons/codec/net/PercentCodec.java
index 8e51538..df623df 100644
--- a/src/main/java/org/apache/commons/codec/net/PercentCodec.java
+++ b/src/main/java/org/apache/commons/codec/net/PercentCodec.java
@@ -231,6 +231,9 @@
      * @param b the byte that is candidate for min and max limit
      */
     private void insertAlwaysEncodeChar(final byte b) {
+        if (b < 0) {
+            throw new IllegalArgumentException("byte must be >= 0");
+        }
         this.alwaysEncodeChars.set(b);
         if (b < alwaysEncodeCharsMin) {
             alwaysEncodeCharsMin = b;
diff --git a/src/test/java/org/apache/commons/codec/net/PercentCodecTest.java b/src/test/java/org/apache/commons/codec/net/PercentCodecTest.java
index 34f8ecb..a537efb 100644
--- a/src/test/java/org/apache/commons/codec/net/PercentCodecTest.java
+++ b/src/test/java/org/apache/commons/codec/net/PercentCodecTest.java
@@ -106,6 +106,12 @@
     }
 
     @Test
+    public void testInvalidByte() throws Exception {
+        final byte[] invalid = { (byte) -1, (byte) 'A' };
+        assertThrows(IllegalArgumentException.class, () -> new PercentCodec(invalid, true));
+    }
+
+    @Test
     public void testPercentEncoderDecoderWithNullOrEmptyInput() throws Exception {
         final PercentCodec percentCodec = new PercentCodec(null, true);
         assertNull(percentCodec.encode(null), "Null input value encoding test");

WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think it is a better changes and interpretation of the exceptions. I will change it accordingly.
@garydgregory

@garydgregory
Copy link
Member

Closing: Done.

asfgit pushed a commit that referenced this pull request Dec 3, 2023
PercentCodec.insertAlwaysEncodeChars() method #222
@arthurscchan
Copy link
Contributor Author

@garydgregory Sorry I am bit busy and did not handle that last week. Thanks for fixing that for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants