Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RedisHashCommands aloing with its reactive/async/cluster counterparts should allow different types for keys and hash fields #2801

Open
rgohol opened this issue Mar 26, 2024 · 10 comments

Comments

@rgohol
Copy link

rgohol commented Mar 26, 2024

Feature Request

It would be great to allow different types for keys and fields for hashes.
For example, RedisHashCommands could look like:

public interface RedisHashCommands<K, F, V> {
    Long hdel(K key, F... fields);
    ...
    V hget(K key, F field);
    ...
}

Naturally, the keys and the fields do not have to have the same type.

Is your feature request related to a problem? Please describe.

We use plain strings for keys and serialized binary data for fields in some of our hash entities.
With current API we have to stick to a work-around: declare both keys and fields as strings and then additionally serialize/deserialize field types in each call methods of RedisHashCommands, which is not really very efficient.

Describe the solution you'd like

  • Add one more generic parameter to RedisHashCommands and counterparts for fields.
  • Add an ability to add custom encoders/decoders for fields independently of keys.

Describe alternatives you've considered

The work-around from the above only.

Teachability, Documentation, Adoption, Migration Strategy

Actually, there could be a separate set of interfaces, like:

public interface RedisHashCommandsBase<K, F, V> { ... }
public interface RedisHashCommands<K, V> extends RedisHashCommandsBase<K, K, V> { ... }

That could help with migrations of existing clients.


See also an unanswered discussion: #2094
@mp911de
Copy link
Collaborator

mp911de commented Apr 3, 2024

That isn't easily achivable as the entire architecture is based on key and value types. If you're looking for different data types, then either using byte[] and a specific serialization mechanism on top of that is a viable approach. Alternatively, you can dispatch custom commands that use a different RedisCodec.

@rgohol
Copy link
Author

rgohol commented Apr 7, 2024

@mp911de

If you're looking for different data types, then either using byte[] and a specific serialization mechanism on top of that is a viable approach.

Yes, this is a work-around we ended up with eventually. However, it looks like a quite awkward solution – building an encoder/decoder layer on top of another layer that has got its encoder and decoder already.

Alternatively, you can dispatch custom commands that use a different RedisCodec.

Could you elaborate a bit more on this approach please? I'm not sure I grasped what you mean. Thank you!

@mp911de
Copy link
Collaborator

mp911de commented Apr 8, 2024

However, it looks like a quite awkward solution – building an encoder/decoder layer on top of another layer that has got its encoder and decoder already.

Actually, this is a good approach, also performance-wise. RedisCodec is invoked on the event-loop. Moving complex serialization onto the eventloop keeps a single thread busy (assuming single-connection Redis) while other threads of the application wait until serialization and command processing finishes.

Having serialization as part of application threads distributes the load and the eventloop is less busy resulting in a much better performance profile.

Alternatively, you can dispatch custom commands that use a different RedisCodec.
Could you elaborate a bit more on this approach please? I'm not sure I grasped what you mean. Thank you!

RedisConnection has a dispatch method that can be used to dispatch commands. See the following example:

AsyncCommand<String, String, String> myCommand = new AsyncCommand<>(new Command<>(CommandType.SET, 
                new StatusOutput<>(StringCodec.ASCII), 
                new CommandArgs<>(StringCodec.ASCII).addKey("foo").addValue("bar")));

connection.dispatch((RedisCommand) myCommand);
myCommand.join();

@rgohol
Copy link
Author

rgohol commented Apr 14, 2024

Moving complex serialization onto the eventloop keeps a single thread busy (assuming single-connection Redis) while other threads of the application wait until serialization and command processing finishes.

Hmm.. As far as I can understand the API, a new connection has to be created for every new RedisCodec anyway:

public <K, V> StatefulRedisConnection<K, V> connect(RedisCodec<K, V> codec) {
checkForRedisURI();
return getConnection(connectStandaloneAsync(codec, this.redisURI, getDefaultTimeout()));
}

So multiple RedisCodec assume that as many connections to be created, at least one connection per each codec, so it shouldn't entail a bottleneck in command processing – is it correct?

@mp911de
Copy link
Collaborator

mp911de commented Apr 15, 2024

Codecs are used internally to decode push and pub-sub messages, therefore there is an internal association between codec and connection in addition to sending commands.

If you just want to send a command, then using dispatch while disregarding generics is fine.

@tishun
Copy link
Collaborator

tishun commented Apr 21, 2024

@rgohol could we take a step back?

What is the direct problem we are trying to address - is it API usability of the driver, is it performance or is it something else?

As far as I understand you already have a working solution by using your own decoder; and as far as I understand you have no problems with the performance of this solution.

So is it only that you believe it should not be a part of your application, but rather part of the driver?
What concern do you want to address specifically?

I think answering this question would help us all understand the situation better.

@rgohol
Copy link
Author

rgohol commented Apr 25, 2024

@tishun , thank you for looking into it. Our concern is the following:

We have a working system that makes use of Redis quite a bit. Initially there were mostly regular "string" type data in use (and some others but hashes), and we also implemented a custom serialization schema for the keys. The schema is based on a domain-specific type that allows us to simplify managing Redis keys in the domain-layer code.

Now, as the system keeps growing, we realized at some point that it could benefit from employing the "hash" type significantly. In order to make it really performant, we would like to get hash keys (aka fields) and values serialized between source byte arrays and domain data models directly, back and forth (i.e. avoiding any intermediate transformations). The domain data models can be different for different hash fields and values. Moreover, they are always different from the domain type representations of Redis base keys.

There's no problem to achieve that for hash values, apparently. But suddenly it does not work out for the base keys and the hash fields married together. Instead, we have to downgrade our key types to byte buffers, then make the hash fields of the byte buffer type as well, and then implement yet another layer of serialization on top of the Lettuce commands (which pretends to have one already). Therefore, the built-in serialization for keys and fields in Lettuce turns out useless for us.

The major problem here is that we have to modify our existing code base that has been working for years just in order to adopt one more Redis type. And the other problem - yet another layer to appear just because of the design-enforced restriction in Lettuce.

I still believe - it is quite weird that Lettuce artificially restricts base keys and hash fields to the same type. There are no constraints on that in the in the Redis docs per sei, are there? So basically, a bottom line is - yes, we can handle it, but should we?

Sorry for the long-read, I appreciate your patience.

@mp911de
Copy link
Collaborator

mp911de commented Apr 25, 2024

For additional context, there's Spring Data Redis that builds on top of Lettuce and makes a distinction not only between keys and values, but also hash fields and hash values.

Effectively, it is a facade that accepts your domain types, applies (de)-serialization and sends the command to the driver.

Using different serialization formats isn't uncommon, however the concrete implementation across applications is different. If a driver or something on top serves your purpose, then great.

In many cases however, requirements are tied too much to a particular application so it doesn't make sense to extend a library with your specific requirements, instead hosting that kind of custom code in your code base would be a good way.

So it always depends.

@tishun
Copy link
Collaborator

tishun commented Apr 26, 2024

Hey @rgohol thanks for explaining this in detail. We actually need that to figure out best if we can help in any way. Don't hesitate to add more information, if we are missing it. And thanks @mp911de for the discussion on the topic and provided ideas.

Assuming I understand correct and the major problem is

  • keys were historically String-based and changing that would require rewriting a lot of existing code
  • hash fields are best represented as byte arrays (same as values, but we do not have a problem with them)
  • avoiding double serialization is important from performance (and probably maintenance) reasons
  • as long as keys and fields are of the same type (as per the API) we have a problem because of the previous steps

I can see where the predicament stands. Now from the standpoint of the Lettuce driver it would be close to impossible to support an infinite number of generics for each command (there are commands with even more parameters and one could always argue they need to be able to have different types). Such a change would also be a nightmare from a compatibility perspective for all the existing users of the driver. So the proposed solution is - unfortunately - going against the core design decisions and - as such - would very unlikely be implemented.

That does not mean that the use case is an invalid one, but we will have to spend more time thinking how we can solve it in a good way. One thing that @mp911de proposed during our discussion is to consider using dynamic Redis command interfaces if you haven't already. Have you thought about that?

As usual - PRs on this topic are welcome, as long as they do not break existing use cases.

@tishun tishun added the status: waiting-for-feedback We need additional information before we can continue label Apr 26, 2024
@rgohol
Copy link
Author

rgohol commented Apr 26, 2024

One thing that @mp911de proposed during our discussion is to consider using dynamic Redis command interfaces if you haven't already. Have you thought about that?

Yes, we have already. Which, by the way, reminded me an old-good aphorism:

"All problems in computer science can be solved by another level of indirection... except for the problem of too many levels of indirection." ©

However, regardless of all the details of our particular systems and the ways we can work the issue around, I would like to emphasize the primary point of the issue:

The Redis API itself does not imply any intersections nor other correlations between base key and hash field namespaces. It leads to a rather simple conclusion: if there's a general-purpose system (like Lettuce) that generalizes Redis keys and values as abstract types (e.g. K and V), then hash field types could be represented as a different type that would be separated and independent from K (or V, apparently). Therefore, enforcing such a restriction on the driver level looks very artificial and stems only from some shortcomings in the design of the driver itself, but not Redis.

I mean, it is a very common-sense assumption, not even specific to Redis: if there are sets A and B that never intersect, then they can be encoded as completely different and unrelated types.

That said, I totally get the point, that the current design of the driver does not allow to embrace more than two types that easily.

PRs on this topic are welcome, as long as they do not break existing use cases.

Yeah, I would probably give it a shot. Not sure though if I can put aside enough time to start working on that right away.

Thank you for your time!

@tishun tishun removed the status: waiting-for-feedback We need additional information before we can continue label Apr 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants