Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Buffer underflow while deserializing with Collection type field removed (with reproducible test case) #834

Closed
BrotherJing opened this issue Jun 12, 2021 · 7 comments · Fixed by #850

Comments

@BrotherJing
Copy link

Describe the bug
Got KryoException: Buffer underflow when I try to deserialize data after remove a field with Collection type from the class.

To Reproduce

First, serialize the original object.

public static class A {
    public List<String> aaa = new ArrayList<>();
    public String bbb = "bbb";
    public String ddd = bbb;
}

@Test
public void test_kryo5() {
    A a = new A();
    a.aaa.add("a");
    byte[] bytes = new KryoSerializeUtil().serialize(a);
    String s = Base64.getEncoder().encodeToString(bytes);
    System.out.println(s);
}

Then, comment field aaa, and run

@Test
public void test_kryo5Deserialize() {
    // the serialized data from above
    String s = "AQBjb20ucGRkLmJpZ2RhdGEucmlzay5wdWJsaXNoLmNvbnRyYWN0LnV0aWxzLkNoZWNrc3VtVXRpbFRlc3QkwQEDYWHhYmLiZGTkGgEBamF2YS51dGlsLkFycmF5TGlz9AECAYJhAAUDAWJi4gACAwUA";
    A a = deserialize(Base64.getDecoder().decode(s));
    System.out.println(a.ddd);
}

Got

00:00 DEBUG: [kryo] Unable to read unknown data, type: java.util.ArrayList (com.pdd.bigdata.risk.publish.contract.utils.ChecksumUtilTest$A#null)
com.esotericsoftware.kryo.kryo5.KryoException: Buffer underflow.
	at com.esotericsoftware.kryo.kryo5.io.Input.require(Input.java:219)
	at com.esotericsoftware.kryo.kryo5.io.Input.readVarIntFlag(Input.java:480)
	at com.esotericsoftware.kryo.kryo5.io.Input.readString(Input.java:775)

FYI, The kryo I use:

private static Kryo kryo = new Kryo();
static {
    kryo.setReferences(true);
    kryo.setRegistrationRequired(false);
    CompatibleFieldSerializer.CompatibleFieldSerializerConfig config =
            new CompatibleFieldSerializer.CompatibleFieldSerializerConfig();
    config.setChunkedEncoding(true);
    config.setReadUnknownFieldData(true);
    kryo.setDefaultSerializer(new SerializerFactory.CompatibleFieldSerializerFactory(config));
    kryo.setInstantiatorStrategy(new StdInstantiatorStrategy());
    Log.set(LEVEL_DEBUG);
}

private <T> T deserialize(byte[] bytes) {
    ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(bytes);
    Input input = new Input(byteArrayInputStream);
    input.close();
    return (T) kryo.readClassAndObject(input);
}

private byte[] serialize(Object obj) {
    ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
    Output output = new Output(byteArrayOutputStream);
    kryo.writeClassAndObject(output, obj);
    output.close();
    return byteArrayOutputStream.toByteArray();
}

Environment:

  • OS: Windows 10
  • JDK Version: 8
  • Kryo Version: 5.1.0

Additional context
Add any other context about the problem here.

@theigl
Copy link
Collaborator

theigl commented Jun 13, 2021

@BrotherJing: Thank you for the detailed bug report!

I'm currently on vacation and won't be able to look into this for a couple of weeks. If you want, you can submit a PR with a fix and I'll merge it.

@theigl
Copy link
Collaborator

theigl commented Jul 5, 2021

@BrotherJing: I was able to reproduce your problem and have debugged it for some time but was unable to come up with a solution. The problem seems to be related to references. If references are disabled, the data can be read without problems.

I currently don't have the capacity to spend a lot of time on this issue and would greatly appreciate it if someone could look into this and help with fixing the issue.

@krishna81m
Copy link

krishna81m commented Aug 3, 2021

@theigl, we came across a similar issue, is there more documentation and examples on CompatibleFieldSerializer as it is supposed to be the simplest and most compatible.

https://github.com/EsotericSoftware/kryo#compatiblefieldserializer wiki also has to be updated to readUnknownFieldData default: true. From what I gathered, if we set readUnknownFieldData:true it handle unknown fields and if reading fails due to class removed? chunkedEncoding:true will gracefully skip.

Assuming, most people will use the simplest configuration for CompatibleFieldSerializer to support graceful deserialization with backward/forward compatibility, any idea why this is not a major bug?

CompatibleFieldSerializer.CompatibleFieldSerializerConfig config =
            new CompatibleFieldSerializer.CompatibleFieldSerializerConfig();
    config.setChunkedEncoding(true);
    config.setReadUnknownFieldData(true);

@theigl
Copy link
Collaborator

theigl commented Aug 4, 2021

@theigl, we came across a similar issue, is there more documentation and examples on CompatibleFieldSerializer as it is supposed to be the simplest and most compatible.

Apart from the Readme, there is only the JavaDoc for the serializers.

https://github.com/EsotericSoftware/kryo#compatiblefieldserializer wiki also has to be updated to readUnknownFieldData default: true. From what I gathered, if we set readUnknownFieldData:true it handle unknown fields and if reading fails due to class removed? chunkedEncoding:true will gracefully skip.

I added the missing default values to the documentation. Your understanding is correct.

Assuming, most people will use the simplest configuration for CompatibleFieldSerializer to support graceful deserialization with backward/forward compatibility, any idea why this is not a major bug?

What do you mean by "this"? This ticket or that chunkedEncoding isn't enabled by default?

I'm not the original author of the serializers, but I am pretty sure that chunked encoding is disabled by default, because it has a significant performance overhead and users should opt-in if they really need it.

@theigl
Copy link
Collaborator

theigl commented Aug 4, 2021

@BrotherJing: After some more hours of debugging, I managed to get to the bottom of this. Chunked input failed to correctly skip a chunk after a buffer underflow. The underflow is caused by another issue that I cannot fix at this point, but my proposed changes will solve your problem.

@theigl
Copy link
Collaborator

theigl commented Aug 4, 2021

@BrotherJing: Please test your issue against the latest Kryo Snapshots.

@krishna81m
Copy link

krishna81m commented Sep 8, 2021

@BrotherJing , any luck with the latest snapshot?

@theigl, as a follow up from previous discussion, why would this config throw exception when deserializing bytes with a new registered class added? Shouldn't CompatibleFieldSerializer, chunkedEncoding and warnUnregisteredClasses(true) work gracefully?

                Kryo kryo = new Kryo();
                kryo.setReferences(false);

                CompatibleFieldSerializer.CompatibleFieldSerializerConfig config =
                        new CompatibleFieldSerializer.CompatibleFieldSerializerConfig();
                config.setReadUnknownFieldData(true);
                config.setChunkedEncoding(true);
                kryo.setDefaultSerializer(new SerializerFactory.CompatibleFieldSerializerFactory(config));

                kryo.setRegistrationRequired(false);    // each class has its own ID
                // registered classes must have exact same IDs during deserialization or use "warn" for backward compatibility?
                kryo.setWarnUnregisteredClasses(true);

                kryo.register(Foo.class, 11);
                kryo.register(Bar.class, 12);
                

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

3 participants