Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Logic to exclude object keys that are themselves objects is imperfect #771

Closed
eamonnmcmanus opened this issue Sep 20, 2023 · 11 comments · Fixed by #772
Closed

Logic to exclude object keys that are themselves objects is imperfect #771

eamonnmcmanus opened this issue Sep 20, 2023 · 11 comments · Fixed by #772

Comments

@eamonnmcmanus
Copy link
Contributor

JSONObject contains this logic to prevent a key in a JSON object from being another JSON object or an array:

            case '{':
            case '[':
                if(prev=='{') {
                    throw x.syntaxError("A JSON Object can not directly nest another JSON Object or JSON Array.");
                }

However this doesn't cover all cases. For example, JSONObject currently accepts this string:

{"a":1, [{"b": 2}]: 3}

which it interprets as an object with two keys, one that is the string a and one that is the string [{"b":2}].

Strict JSON, of course, only accepts string literals as keys in objects. JSON-java is more liberal, but it does balk at nested objects since the fix for #654. The example above suggests that that fix was incomplete. I think it would make sense to disallow nested objects or arrays as keys always. They're not valid JSON and it's unlikely that anyone is relying on them for legitimate purposes. Meanwhile they can straightforwardly be used for DoS attacks, similar to #654 and #758.

I have a proposed fix which I will send shortly.

@stleary
Copy link
Owner

stleary commented Sep 21, 2023

@eamonnmcmanus Thanks for bringing this up, however, I am not convinced yet this is a bug.
Using the above example results in a valid JSON doc:

{
  "a": 1,
  "[{\"b\":2}]": 3
}

You mentioned the current code allows potential DOS attacks. Can you explain how that would work?

@eamonnmcmanus
Copy link
Contributor Author

I'm reluctant to share precise exploit details here, but I have sent them privately.

@eamonnmcmanus
Copy link
Contributor Author

OK, after discussion with @stleary I'm sharing the details here.

The bug that allows an array-of-object as a key can be exploited recursively:

jshell> new JSONObject("{a:1,[{a:1,[{a:1,[{a:1}]:2}]:2}]:2}")
$5 ==> {"a":1,"[{\"a\":1,\"[{\\\"a\\\":1,\\\"[{\\\\\\\"a\\\\\\\":1}]\\\":2}]\":2}]":2}

Each level of recursion doubles the number of backslashes, so with only a couple of hundred characters an attacker can provoke an OutOfMemoryError. This OutOfMemoryError occurs during the parsing, because at each level the parser will call toString() on the array-of-object, which in turn will end up calling quote.

I dare say an attacker can also provoke a StackOverflowError similar to #654, but an OutOfMemoryError is more severe because it can affect everything running in the JVM, not just the thread that is parsing the input. Servers can't safely recover from OutOfMemoryError in general.

#660 already addressed problems like this to some extent. It adds both a catch for StackOverflowError and some logic to prevent keys that are themselves objects. Here I'm saying that that logic can be bypassed, and in #772 I'm proposing alternative logic that I think can't be.

@stleary
Copy link
Owner

stleary commented Sep 24, 2023

Now convinced there is a bug in how missing quotes are handled.
Updating the string for clarity.

How should this text be parsed?
"{a:1,[{b:2,[{c:3,[{d:4}]:5}]:6}]:7}"

20230618 release:
{"a":1,"[{"b":2,"[{\"c\":3,\"[{\\\"d\\\":4}]\":5}]":6}]":7}

#772 fix:
org.json.JSONException: Nested array not expected here. at 6 [character 7 line 1]
(probably should be referencing character 6)

I think the parser should terminate implicit strings by a colon for JSONObject,
and by a comma or closing bracket for JSONArray):

    {
        "a":1,
        "[{b":2,
        "[{c",3,
        "[{d", 4,
        "}]",5,
        "}]",6,     # error for duplicate key
        "}]",7      # error for duplicate key
     }

Post here if anyone has a different take on how the parser should work.

@johnjaylward
Copy link
Contributor

I think the parser should terminate implicit strings by a colon for JSONObject, and by a comma or closing bracket for JSONArray):

    {
        "a":1,
        "[{b":2,
        "[{c",3,
        "[{d", 4,
        "}]",5,
        "}]",6,     # error for duplicate key
        "}]",7      # error for duplicate key
     }

This makes the most sense to me

@eamonnmcmanus
Copy link
Contributor Author

It's certainly possible to introduce a clearer rule for unquoted keys. Those aren't standard JSON, but we can imagine that if people use them then they are expecting something like JavaScript semantics. Scanning to the first : is probably a fair approximation of that. For the example string {a:1,[{b:2,[{c:3,[{d:4}]:5}]:6}]:7}, I would expect that to be a syntax error. The first parsed keys would indeed be a, [{b, [{c, and [{d, with corresponding values 1, 2, 3, 4, but then the } after 4 would be interpreted as the end of the object. As far as I can see, the current code doesn't actually throw an exception if there is junk after the closing }, so in fact the result would be

{
    "a":1,
    "[{b":2,
    "[{c":3,
    "[{d":4
}

A similar rule for unquoted values in arrays (scanning to the first , or ]) would I think be straightforward to do at the same time.

In both cases this can change the parse of some existing strings. For example {a:1,[{b:2,[{c:3,[{d:4}]:5}]:6}]:7} is currently parsed as an object with two keys: a and [{"b":2,"[{\"c\":3,\"[{\\\"d\\\":4}]\":5}]":6}]. For JSONArray, I think it means that some strings that are not currently accepted would be. [a}] is currently rejected but with the change it would be a single-element array containing the string a}.

To be honest, I'm not sure that this would be solving a problem that anyone has. I don't think anyone has JSON strings like this that they expect to be able to parse with the results above, or to be able to parse at all. But the new rules are at least simpler to explain.

I will look into what's involved in making another PR implementing these rules.

@eamonnmcmanus
Copy link
Contributor Author

I've prototyped some of the above, but the more I think about it the less I like it. Some JSON implementations do indeed go beyond the spec to allow object keys that are not string literals. But I think that is to be more like JavaScript (which of course is the JS of JSON). JavaScript does allow "object literals" like {a:23} and {17:23}, which are equivalent to {"a":23}and {"17":23} respectively. But it doesn't allow nested arrays or objects there, and it particularly doesn't allow random characters so as to treat {[{b:2} as {"[{b":2}. So I really do think that the approach in #772 is best. Especially given that the code already does forbid nested arrays and objects in keys most of the time.

@stleary
Copy link
Owner

stleary commented Sep 26, 2023

Agreed that if the user wants a key, value, or list element to use chars that might be interpreted as JSON control chars like braces, brackets, commas, etc., then they need to use double quotes. So for example {[b:2} should be disallowed, but {b:2} or {"[b":"]2"} would be OK.
I think it is better in this case to modify the existing behavior so that the parser is more intuitive and predictable.

@eamonnmcmanus
Copy link
Contributor Author

I think it is better in this case to modify the existing behavior so that the parser is more intuitive and predictable.

Did you have something in mind? The main alternative I see to just bailing on [ and { would be to use the existing set of special characters and say that a key is any sequence of non-blank characters that doesn't contain one of those. Since [ and { are both in the set, that means you'll still get an exception for the examples we've been talking about, but it might be more like "expecting :" with a line and column number pointing to the [ or {, rather than something about nesting.

@stleary
Copy link
Owner

stleary commented Sep 27, 2023

It seems to me that you, @johnjaylward, and I have about the same idea, maybe differing in details. Your approach makes sense, please proceed, and let's see how it turns out.

@eamonnmcmanus
Copy link
Contributor Author

OK, well I did what I said in my last comment and updated #772 so it just skips past the part of nextValue() that would allow an object key to be another object or an array. That means that something like {a:1,[{b:2}]:3} will get an exception at the [ since that's not an allowed character in unquoted text. So you get the same "Missing value" exception as you would for something like {#:1}.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants