server: rdb loader big string loading in chunks #4623

adiholden · 2025-02-19T07:29:08Z

In this PR we introduce rdb loading of string alliteratively in chunks to prevent memory spikes on big string loading.
In this PR

Add 2 apis to compact object to reserve memory and append to string
Reading of string data from rdb is done in chunks in case of big string
The rdb save compression logic is disabled in case of big buffer to prevent additional spike in snapshot while saving due to compression buffer and in the loader to decompress the buffer.

Signed-off-by: adi_holden <adi@dragonflydb.io>

kostasrim · 2025-02-19T07:32:31Z

@adiholden ready for review ?

adiholden · 2025-02-19T07:40:16Z

@adiholden ready for review ?

Yes in general but I want to see green CI run

kostasrim · 2025-02-19T07:47:07Z

@adiholden ready for review ?

Yes in general but I want to see green CI run

alright will take a look when it becomes green :)

Signed-off-by: adi_holden <adi@dragonflydb.io>

romange · 2025-02-19T11:33:13Z

src/core/compact_object.cc

+  type_ = OBJ_STRING;
+  encoding_ = OBJ_ENCODING_RAW;
+  MakeInnerRoom(0, size, mr);
+  sz_ = 0;


why sz_ = 0; ?
MakeInnerRoom - copies the string if it existed before.

now I see the usecase. Maybe just check that inner_obj() == nullptr at line 560?

src/server/rdb_load.cc

kostasrim · 2025-02-19T13:25:04Z

src/server/rdb_save.cc

@@ -230,7 +230,7 @@ RdbSerializer::~RdbSerializer() {
  VLOG(2) << "compression mode: " << uint32_t(compression_mode_);
  if (compression_stats_) {
    VLOG(2) << "compression not effective: " << compression_stats_->compression_no_effective;
-    VLOG(2) << "small string none compression applied: " << compression_stats_->small_str_count;
+    VLOG(2) << "string none compression applied: " << compression_stats_->size_skip_count;


Suggested change

VLOG(2) << "string none compression applied: " << compression_stats_->size_skip_count;

VLOG(2) << "string compression skipped " << compression_stats_->size_skip_count;

kostasrim · 2025-02-19T13:30:29Z

src/server/rdb_load.cc

@@ -1331,15 +1331,15 @@ error_code RdbLoaderBase::ReadObj(int rdbtype, OpaqueObj* dest) {
      iores = ReadStreams(rdbtype);
      break;
    case RDB_TYPE_JSON:
-      iores = ReadJson();
+      iores = ReadGeneric(rdbtype);


Why don't we move this on line 1313 and let it fallthrough just like RDB_TYPE_STRING (Which you did the same 😄 )

kostasrim · 2025-02-19T13:39:28Z

src/server/rdb_load.cc

  if (ec)
    return make_unexpected(ec);

-  if (StrLen(str_obj) == 0) {
+  if (!is_string_type && StrLen(str_obj) == 0) {


if is_string_type is true, and pending_read_.remaining is false we won't return Unexpected and we will miss that the rdb file is corrupted

not sure I understood your comment correctly.
This logic is to enable empty string -> is_string_type=true && pending_read_.remaining is false and there for we do not return rdb file corrupted

Yeah sorry, let me explain. If we call ReadStringObj above a string can:

len < kMaxStringSize upon which we will read the whole string in one iteration -- usual path

len > kMaxStringSize and therefore pending_read_.remaining > 0 so we load the string in chunks/multiple iterations

For case (1), if StrLen(str_obj) == 0 the rdb file should be corrupted because the string is an empty string. Isn't that right ? This is similar to what we had before. We tried to load a string in one go, and we received an empty string -- isn't that a corrupted rdb file ?

Maybe what I am missing here is the reason we want to support reading an empty string because to my mind:

We read the whole string in one go -> if it's empty then rdb file corrupted

We read the whole string in multiple chunks -> Why would a chunk be empty ?

empty string is valid . you can add empty string to dragonfly.
set x ""

oh that's true, Then if pending_read_.remaining > 0 and StrLen(str_obj) == 0) then we have a corrupted rdb file since we know that the string is not an empty string (because we got remaining) but we just read 0!

if remaining > 0 we enter ReadRemainingString and it will read from sink. if the length that I am trying to read is not available that it will return error

gotcha cheers

src/core/compact_object.cc

kostasrim · 2025-02-19T13:52:13Z

tests/dragonfly/replication_test.py

@@ -2695,6 +2695,48 @@ async def test_replication_timeout_on_full_sync_heartbeat_expiry(
    await assert_replica_reconnections(replica, 0)


+@pytest.mark.exclude_epoll
+@dfly_args({"proactor_threads": 1})
+async def test_big_string(df_factory):


We could just DEBUG POPULATE big string, save to rdb, shutdown df and reload. Why bother with a replication test here ? It's very similar to test_big_value_serialization_memory_limit

static seeder used in this test is debug populate , but I can change this to use debug populate directly
I don't think that using replication is a bother, we want to make sure that we don't have a spike in memory when loading the data. From my perspective using replica or if we do save and than load is simillar.

Signed-off-by: adi_holden <adi@dragonflydb.io>

romange · 2025-02-19T20:18:01Z

src/server/rdb_load.cc

@@ -1393,10 +1392,24 @@ error_code RdbLoaderBase::ReadStringObj(RdbVariant* dest) {
    }
  }

+  if (big_string_split && len > kMaxStringSize) {


do we split a compressed string as well?

we do not compress string on save, only big blobs f.e f.e entries together

hmm we actually do if we use CompressionMode::SINGLE_ENTRY baaaaa

so incase the string is compressed the opcode is RDB_ENC_LZF and we dont split.
we dont realy use CompressionMode::SINGLE_ENTRY so I dont think we need to optimise for this

server: rdb loader big string loading in chunks

Loading
Loading status checks…

c485b53

Signed-off-by: adi_holden <adi@dragonflydb.io>

fix bug

Loading
Loading status checks…

2dcad0d

Signed-off-by: adi_holden <adi@dragonflydb.io>

adiholden requested review from romange and kostasrim February 19, 2025 11:24

romange reviewed Feb 19, 2025

View reviewed changes

src/server/rdb_load.cc Show resolved Hide resolved

romange reviewed Feb 19, 2025

View reviewed changes

src/server/rdb_load.cc Show resolved Hide resolved

kostasrim reviewed Feb 19, 2025

View reviewed changes

adiholden added 2 commits February 19, 2025 21:23

CR fix

Loading
Loading status checks…

f193803

Signed-off-by: adi_holden <adi@dragonflydb.io>

CR fix

Loading
Loading status checks…

88e4ea8

Signed-off-by: adi_holden <adi@dragonflydb.io>

adiholden requested review from romange and kostasrim February 19, 2025 19:27

romange reviewed Feb 19, 2025

View reviewed changes

romange approved these changes Feb 20, 2025

View reviewed changes

adiholden merged commit 6016ae1 into main Feb 20, 2025
10 checks passed

adiholden deleted the rdb_loader_split_reading_string branch February 20, 2025 14:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: rdb loader big string loading in chunks #4623

server: rdb loader big string loading in chunks #4623

adiholden commented Feb 19, 2025

kostasrim commented Feb 19, 2025

adiholden commented Feb 19, 2025

kostasrim commented Feb 19, 2025

romange Feb 19, 2025

romange Feb 19, 2025

kostasrim Feb 19, 2025

kostasrim Feb 19, 2025

kostasrim Feb 19, 2025

adiholden Feb 19, 2025

kostasrim Feb 20, 2025

adiholden Feb 20, 2025

kostasrim Feb 20, 2025

adiholden Feb 20, 2025

kostasrim Feb 20, 2025

kostasrim Feb 19, 2025

adiholden Feb 19, 2025

romange Feb 19, 2025

adiholden Feb 20, 2025

adiholden Feb 20, 2025

adiholden Feb 20, 2025

	VLOG(2) << "string none compression applied: " << compression_stats_->size_skip_count;
	VLOG(2) << "string compression skipped " << compression_stats_->size_skip_count;

server: rdb loader big string loading in chunks #4623

server: rdb loader big string loading in chunks #4623

Conversation

adiholden commented Feb 19, 2025

kostasrim commented Feb 19, 2025

adiholden commented Feb 19, 2025

kostasrim commented Feb 19, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment