Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: querylog feature can create invalid json #13417

Closed
timvaillancourt opened this issue Jun 29, 2023 · 3 comments
Closed

Bug Report: querylog feature can create invalid json #13417

timvaillancourt opened this issue Jun 29, 2023 · 3 comments
Labels
Needs Triage This issue needs to be correctly labelled and triaged Type: Bug

Comments

@timvaillancourt
Copy link
Contributor

timvaillancourt commented Jun 29, 2023

Overview of the Issue

When using -querylog-format json and querylog enabled, it is possible for invalid json (from the perspective of Golang's encoding/json library) to be written to the querylog when certain characters the json spec doesn't support exist in the bindvars of a query

The bindvar "value" of the query (a []byte) are logged verbatim to the querylog without encoding, which is the main cause of this problem. Golang's json.Marshal(...) avoids these risks by automatically encoding []byte as a base64-encoded string, however the querylog feature is using fmt.Fprintf(...) to marshal json, which of course bypasses this feature

Here are some ideas I can see to fix the problem. All are variants of the same approach really:

  1. Use json.Marshal(...) by default to marshal the json log entry
    • This will cause the value []byte from bindvars to be base64-encoded
    • Drawback: Go's encoding/json will handle the base64 transparently, other languages may not 👎
  2. Add a command-line flag to use json.Marshal(...) instead (backwards compatible)
  3. Add a new querylog format (json2 or json-golang for example) to toggle json.Marshal(...) (backwards compatible)

Although it's not backwards-compatible, I'm in favour of option 1 because continuing to log bindvar values without encoding opens users up to bugs

Reproduction Steps

Here is a Go Playground to reproduce the issue

Full repro steps:

  1. Enable querylog to disk with:
    • -querylog-format json
    • -querylog-filter-tag 'VTQUERYLOG=1'
    • -log_queries_to_file <file>
  2. Run this simple query: select * from something where id = '\x'
    • Include the VTQUERYLOG=1 querylog filter in query
  3. Read this log entry from the querylog file using json.Unmarshal(...) (Go's encoding/json)
  4. Notice the hard unmarshal failure: invalid character 'x' in string escape code
    • This is due to \x being considered a unicode character (or something like that) by json.Unmarshal

Binary Version

Vitess v12+

Operating System and Environment details

Linux

Log Fragments

No response

@timvaillancourt timvaillancourt added Needs Triage This issue needs to be correctly labelled and triaged Type: Bug labels Jun 29, 2023
@timvaillancourt
Copy link
Contributor Author

timvaillancourt commented Jun 29, 2023

A related gotcha: the type field of the bind vars is written as a string but it is an int32 in the protobuf. Example:

{"vtg1": {"type": "INT64", "value": 1162978204130}, "vtg2": {"type": "INT64", "value": 0}}

This means you can't unmarshal bindvars back into map[string]*querypb.BindVariable because you get this error:

 json: cannot unmarshal string into Go struct field BindVariable.BindVars.type of type query.Type

Instead you're stuck unmarshaling to map[string]interface{} and manually parsing the interface{} -> *querypb.BindVariable. This is another place where using json.Marshal(...) for marshalling would help, although this changes type to a less-human-friendly int32 in logs. A custom (un)marshaler could handle the string<=>int32 (*querypb.Type) conversion, however

@DeathBorn
Copy link
Contributor

same reported #12872

@timvaillancourt
Copy link
Contributor Author

@DeathBorn thanks! Closing as a dupe of #12872

I'll likely provide a fix PR for this soon! 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Triage This issue needs to be correctly labelled and triaged Type: Bug
Projects
None yet
Development

No branches or pull requests

2 participants