Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EOFError: end of buffer reached #474

Closed
kuahyeow opened this issue Jan 31, 2024 · 9 comments · Fixed by #476
Closed

EOFError: end of buffer reached #474

kuahyeow opened this issue Jan 31, 2024 · 9 comments · Fixed by #476

Comments

@kuahyeow
Copy link

kuahyeow commented Jan 31, 2024

I made sure the issue is in bootsnap

After disabling bootsnap, the issue now longer appears

Steps to reproduce

Unfortunately, this only happens reliably with gdk update (GitLab tooling). I am working to find a simpler reproduction in https://gitlab.com/gitlab-org/gitlab/-/merge_requests/143230#note_1750589800

Expected behavior

Actual behavior

The previous version of bootsnap we used as 1.17.0. After the upgrade to 1.18.1, bundle exec rails db:migrate started failing with:

rake aborted!
EOFError: end of buffer reached
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database/dictionary.rb:62:in `initialize'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database/dictionary.rb:32:in `new'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database/dictionary.rb:32:in `block in entries'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database/dictionary.rb:31:in `map'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database/dictionary.rb:31:in `entries'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database/gitlab_schema_info.rb:123:in `all_table_allows'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database/gitlab_schema_info.rb:99:in `add_table_specific_allows'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database/gitlab_schema_info.rb:18:in `initialize'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database/gitlab_schema_info.rb:28:in `new'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database/gitlab_schema_info.rb:28:in `load_file'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database.rb:81:in `block in all_gitlab_schemas'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database.rb:81:in `map'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database.rb:81:in `all_gitlab_schemas'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database.rb:122:in `schemas_to_base_models'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database.rb:262:in `gitlab_schemas_for_connection'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database/shared_model.rb:25:in `using_connection'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database/each_database.rb:74:in `with_shared_connection'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database/each_database.rb:68:in `with_model_connection'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database/each_database.rb:31:in `block in each_model_connection'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database/each_database.rb:25:in `each_model_connection'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database/partitioning.rb:36:in `sync_partitions'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database/partitioning.rb:24:in `sync_partitions_ignore_db_error'
/Users/tkuah/code/gdk-ee/gitlab/config/initializers/postgres_partitioning.rb:58:in `<main>'
/Users/tkuah/code/gdk-ee/gitlab/config/environment.rb:7:in `<main>'
<internal:/Users/tkuah/.rbenv/versions/3.2.3/lib/ruby/site_ruby/3.2.0/rubygems/core_ext/kernel_require.rb>:37:in `require'
<internal:/Users/tkuah/.rbenv/versions/3.2.3/lib/ruby/site_ruby/3.2.0/rubygems/core_ext/kernel_require.rb>:37:in `require'
/Users/tkuah/.rbenv/versions/3.2.3/bin/bundle:25:in `load'
/Users/tkuah/.rbenv/versions/3.2.3/bin/bundle:25:in `<main>'
Tasks: TOP => db:migrate => db:load_config => environment
(See full trace by running task with --trace)
ERROR: 'bundle exec rake db:migrate db:test:prepare' failed.

If I run it with --backtrace, I notice this lines at the top:

/Users/tkuah/.rbenv/versions/3.2.3/lib/ruby/gems/3.2.0/gems/bootsnap-1.18.1/lib/bootsnap/compile_cache/yaml.rb:210:in `read'
/Users/tkuah/.rbenv/versions/3.2.3/lib/ruby/gems/3.2.0/gems/bootsnap-1.18.1/lib/bootsnap/compile_cache/yaml.rb:210:in `storage_to_output'
/Users/tkuah/.rbenv/versions/3.2.3/lib/ruby/gems/3.2.0/gems/bootsnap-1.18.1/lib/bootsnap/compile_cache/yaml.rb:232:in `fetch'
/Users/tkuah/.rbenv/versions/3.2.3/lib/ruby/gems/3.2.0/gems/bootsnap-1.18.1/lib/bootsnap/compile_cache/yaml.rb:232:in `load_file'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database/dictionary.rb:62:in `initialize'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database/dictionary.rb:32:in `new'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database/dictionary.rb:32:in `block in entries'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database/dictionary.rb:31:in `map'
/Users/tkuah/code/gdk-ee/gitlab/lib/gitlab/database/dictionary.rb:31:in `entries'

I suspect something in 1.18.1 has changed relating to YAML.load_file. Looking at how Bootsnap monkey patches YAML, it's possible that changes to the bs_fetch function in 1.18.0 might be relate to this bug. df6267f

System configuration

Bootsnap version: 1.18.1

Ruby version: ruby 3.2.3 (2024-01-18 revision 52bb2ac0a6) [arm64-darwin22]

Rails version: 7.0.8

@kuahyeow
Copy link
Author

I managed to grab a copy of a bad tmp/cache/bootsnap, and also a good copy. It's too large, so I am uploading a YAML cache :

  • cache from the good bootsnap cache
  • cache from the bad bootsnap cache
  • original YAML file

compile-cache-yaml.zip

Hope this helps

@stanhu
Copy link
Contributor

stanhu commented Jan 31, 2024

I work with @kuahyeow and reproduced the same issue on my end.

In our environment something appears to fail while loading /Users/stanhu/gdk-ee/gitlab/db/docs/ci_builds.yml:

---
table_name: ci_builds
classes:
- Ci::Bridge
- Ci::Build
- Ci::Processable
- CommitStatus
- GenericCommitStatus
feature_categories:
- continuous_integration
description: TODO
introduced_by_url: https://gitlab.com/gitlab-org/gitlab/-/commit/046b28312704f3131e72dcd2dbdacc5264d4aa62
milestone: '8.0'
gitlab_schema: gitlab_ci

The compiled YAML file looks right UPDATE: The compiled YAML has the wrong length of 1:

% hexdump -C tmp/cache/bootsnap/compile-cache-yaml/04/2405f43322f6be
00000000  05 00 00 00 0d ed 79 30  a7 f6 84 50 ea 73 d9 4a  |......y0...P.s.J|
00000010  46 01 00 00 00 00 00 00  75 dc b9 65 00 00 00 00  |F.......u..e....|
00000020  01 00 00 00 00 00 00 00  b5 54 d0 d0 e8 b9 55 5e  |.........T....U^|
00000030  01 b6 a4 6d 01 00 00 00  fc 42 06 03 01 00 00 00  |...m.....B......|
00000040  c3 87 aa 74 61 62 6c 65  5f 6e 61 6d 65 a9 63 69  |...table_name.ci|
00000050  5f 62 75 69 6c 64 73 a7  63 6c 61 73 73 65 73 95  |_builds.classes.|
00000060  aa 43 69 3a 3a 42 72 69  64 67 65 a9 43 69 3a 3a  |.Ci::Bridge.Ci::|
00000070  42 75 69 6c 64 af 43 69  3a 3a 50 72 6f 63 65 73  |Build.Ci::Proces|
00000080  73 61 62 6c 65 ac 43 6f  6d 6d 69 74 53 74 61 74  |sable.CommitStat|
00000090  75 73 b3 47 65 6e 65 72  69 63 43 6f 6d 6d 69 74  |us.GenericCommit|
000000a0  53 74 61 74 75 73 b2 66  65 61 74 75 72 65 5f 63  |Status.feature_c|
000000b0  61 74 65 67 6f 72 69 65  73 91 b6 63 6f 6e 74 69  |ategories..conti|
000000c0  6e 75 6f 75 73 5f 69 6e  74 65 67 72 61 74 69 6f  |nuous_integratio|
000000d0  6e ab 64 65 73 63 72 69  70 74 69 6f 6e a4 54 4f  |n.description.TO|
000000e0  44 4f b1 69 6e 74 72 6f  64 75 63 65 64 5f 62 79  |DO.introduced_by|
000000f0  5f 75 72 6c d9 56 68 74  74 70 73 3a 2f 2f 67 69  |_url.Vhttps://gi/|
00000100  74 6c 61 62 2e 63 6f 6d  2f 67 69 74 6c 61 62 2d  |tlab.com/gitlab-|
00000110  6f 72 67 2f 67 69 74 6c  61 62 2f 2d 2f 63 6f 6d  |org/gitlab/-/com|
00000120  6d 69 74 2f 30 34 36 62  32 38 33 31 32 37 30 34  |mit/046b28312704|
00000130  66 33 31 33 31 65 37 32  64 63 64 32 64 62 64 61  |f3131e72dcd2dbda|
00000140  63 63 35 32 36 34 64 34  61 61 36 32 a9 6d 69 6c  |cc5264d4aa62.mil|
00000150  65 73 74 6f 6e 65 a3 38  2e 30 ad 67 69 74 6c 61  |estone.8.0.gitla|
00000160  62 5f 73 63 68 65 6d 61  a9 67 69 74 6c 61 62 5f  |b_schema.gitlab_|
00000170  63 69                                             |ci|
00000172

For some reason Bootsnap is just trying to load a single byte: \xC3:

[201, 210] in /Users/stanhu/.asdf/installs/ruby/3.2.3/lib/ruby/gems/3.2.0/gems/bootsnap-1.18.1/lib/bootsnap/compile_cache/yaml.rb
   201:           def storage_to_output(data, kwargs)
   202:             if kwargs&.key?(:symbolize_names)
   203:               kwargs[:symbolize_keys] = kwargs.delete(:symbolize_names)
   204:             end
   205:
=> 206:             unpacker = CompileCache::YAML.msgpack_factory.unpacker(kwargs)
   207:             unpacker.feed(data)
   208:             safe_loaded = unpacker.unpack
   209:             if safe_loaded
   210:               unpacker.unpack
(byebug) data
"\xC3"

I added some debug log messages, and I see that the right file is being loaded. However, cached_key.data_size is 1:

valid cache, data size is 269!
opening cache file: /Users/stanhu/gdk-ee/gitlab/tmp/cache/bootsnap/compile-cache-yaml/07/e5864971fdf455
open cache_fd is 30
valid cache, data size is 247!
opening cache file: /Users/stanhu/gdk-ee/gitlab/tmp/cache/bootsnap/compile-cache-yaml/04/2405f43322f6be
open cache_fd is 30
valid cache, data size is 1!
rake aborted!
EOFError: end of buffer reached
/Users/stanhu/gdk-ee/gitlab/lib/gitlab/database/dictionary.rb:62:in `initialize'
/Users/stanhu/gdk-ee/gitlab/lib/gitlab/database/dictionary.rb:32:in `new'
/Users/stanhu/gdk-ee/gitlab/lib/gitlab/database/dictionary.rb:32:in `block in entries'
/Users/stanhu/gdk-ee/gitlab/lib/gitlab/database/dictionary.rb:31:in `map'
/Users/stanhu/gdk-ee/gitlab/lib/gitlab/database/dictionary.rb:31:in `entries'
/Users/stanhu/gdk-ee/gitlab/lib/gitlab/database/gitlab_schema_info.rb:123:in `all_table_allows'
/Users/stanhu/gdk-ee/gitlab/lib/gitlab/database/gitlab_schema_info.rb:99:in `add_table_specific_allows'
/Users/stanhu/gdk-ee/gitlab/lib/gitlab/database/gitlab_schema_info.rb:18:in `initialize'
/Users/stanhu/gdk-ee/gitlab/lib/gitlab/database/gitlab_schema_info.rb:28:in `new'
/Users/stanhu/gdk-ee/gitlab/lib/gitlab/database/gitlab_schema_info.rb:28:in `load_file'

@stanhu
Copy link
Contributor

stanhu commented Jan 31, 2024

I'm also running into #471. It does seem like the header is getting corrupted somehow because you can see in the hexdump the length is indeed set to 1:

00000020  01 00 00 00 00 00 00 00

It appears the first time the YAML cache is compiled it is correct:

% hexdump -C tmp/cache/bootsnap/compile-cache-yaml/04/2405f43322f6be
00000000  05 00 00 00 0d ed 79 30  a7 f6 84 50 ea 73 d9 4a  |......y0...P.s.J|
00000010  46 01 00 00 00 00 00 00  bd f3 b9 65 00 00 00 00  |F..........e....|
00000020  32 01 00 00 00 00 00 00  b5 54 d0 d0 e8 b9 55 5e  |2........T....U^|
00000030  01 66 6f 6b 01 00 00 00  fc 82 3b 05 01 00 00 00  |.fok......;.....|
00000040  c3 87 aa 74 61 62 6c 65  5f 6e 61 6d 65 a9 63 69  |...table_name.ci|
00000050  5f 62 75 69 6c 64 73 a7  63 6c 61 73 73 65 73 95  |_builds.classes.|
00000060  aa 43 69 3a 3a 42 72 69  64 67 65 a9 43 69 3a 3a  |.Ci::Bridge.Ci::|
00000070  42 75 69 6c 64 af 43 69  3a 3a 50 72 6f 63 65 73  |Build.Ci::Proces|
00000080  73 61 62 6c 65 ac 43 6f  6d 6d 69 74 53 74 61 74  |sable.CommitStat|
00000090  75 73 b3 47 65 6e 65 72  69 63 43 6f 6d 6d 69 74  |us.GenericCommit|
000000a0  53 74 61 74 75 73 b2 66  65 61 74 75 72 65 5f 63  |Status.feature_c|
000000b0  61 74 65 67 6f 72 69 65  73 91 b6 63 6f 6e 74 69  |ategories..conti|
000000c0  6e 75 6f 75 73 5f 69 6e  74 65 67 72 61 74 69 6f  |nuous_integratio|
000000d0  6e ab 64 65 73 63 72 69  70 74 69 6f 6e a4 54 4f  |n.description.TO|
000000e0  44 4f b1 69 6e 74 72 6f  64 75 63 65 64 5f 62 79  |DO.introduced_by|
000000f0  5f 75 72 6c d9 56 68 74  74 70 73 3a 2f 2f 67 69  |_url.Vhttps://gi|
00000100  74 6c 61 62 2e 63 6f 6d  2f 67 69 74 6c 61 62 2d  |tlab.com/gitlab-|
00000110  6f 72 67 2f 67 69 74 6c  61 62 2f 2d 2f 63 6f 6d  |org/gitlab/-/com|
00000120  6d 69 74 2f 30 34 36 62  32 38 33 31 32 37 30 34  |mit/046b28312704|
00000130  66 33 31 33 31 65 37 32  64 63 64 32 64 62 64 61  |f3131e72dcd2dbda|
00000140  63 63 35 32 36 34 64 34  61 61 36 32 a9 6d 69 6c  |cc5264d4aa62.mil|
00000150  65 73 74 6f 6e 65 a3 38  2e 30 ad 67 69 74 6c 61  |estone.8.0.gitla|
00000160  62 5f 73 63 68 65 6d 61  a9 67 69 74 6c 61 62 5f  |b_schema.gitlab_|
00000170  63 69                                             |ci|
00000172

However, for some reason the next time I ran rails db:migrate, this file gets overwritten with the wrong length.

@casperisfine
Copy link
Contributor

Thanks. Yes very likely the same root cause than #471. The revalidation must corrupt the cache entry somehow.

If more people are running into it, I think I'll cut another patch release to disable revalidation (or put it behind a flag).

@stanhu
Copy link
Contributor

stanhu commented Jan 31, 2024

I see that atomic_write_cache_file always updates data_size:

key->data_size = RSTRING_LEN(data);

However, when update_cache_key runs, it doesn't do that, so data_size gets set to whatever it happens to be.

casperisfine pushed a commit that referenced this issue Jan 31, 2024
Ref: #471
Ref: #474

Since it is still unclear what causes the corruption, and more
people seem to be running into it, I'd rather make it an opt-in
feature until I get to the bottom of this.
@casperisfine
Copy link
Contributor

Interesting catch. If that's indeed it, it shouldn't be too hard to produce a test case. 👀

BTW: I just pushed a 1.18.2 release that make the feature opt-in, and should be released in a few minutes.

@stanhu
Copy link
Contributor

stanhu commented Jan 31, 2024

Thanks. This diff fixes the issue for me:

diff --git a/ext/bootsnap/bootsnap.c b/ext/bootsnap/bootsnap.c
index 326aade..5a7ddb0 100644
--- a/ext/bootsnap/bootsnap.c
+++ b/ext/bootsnap/bootsnap.c
@@ -115,7 +115,7 @@ static void bs_cache_path(const char * cachedir, const VALUE path, char (* cache
 static int bs_read_key(int fd, struct bs_cache_key * key);
 static enum cache_status cache_key_equal_fast_path(struct bs_cache_key * k1, struct bs_cache_key * k2);
 static int cache_key_equal_slow_path(struct bs_cache_key * current_key, struct bs_cache_key * cached_key, const VALUE input_data);
-static int update_cache_key(struct bs_cache_key *current_key, int cache_fd, const char ** errno_provenance);
+static int update_cache_key(struct bs_cache_key *current_key, uint64_t data_size, int cache_fd, const char ** errno_provenance);
 
 static void bs_cache_key_digest(struct bs_cache_key * key, const VALUE input_data);
 static VALUE bs_fetch(char * path, VALUE path_v, char * cache_path, VALUE handler, VALUE args);
@@ -331,9 +331,10 @@ static int cache_key_equal_slow_path(struct bs_cache_key *current_key,
   return current_key->digest == cached_key->digest;
 }
 
-static int update_cache_key(struct bs_cache_key *current_key, int cache_fd, const char ** errno_provenance)
+static int update_cache_key(struct bs_cache_key *current_key, uint64_t data_size, int cache_fd, const char ** errno_provenance)
 {
   lseek(cache_fd, 0, SEEK_SET);
+  current_key->data_size = data_size;
   ssize_t nwrite = write(cache_fd, current_key, KEY_SIZE);
   if (nwrite < 0) {
       *errno_provenance = "update_cache_key:write";
@@ -803,7 +804,7 @@ bs_fetch(char * path, VALUE path_v, char * cache_path, VALUE handler, VALUE args
       valid_cache = cache_key_equal_slow_path(&current_key, &cached_key, input_data);
       if (valid_cache) {
         if (!readonly) {
-          if (update_cache_key(&current_key, cache_fd, &errno_provenance)) {
+          if (update_cache_key(&current_key, RSTRING_LEN(input_data), cache_fd, &errno_provenance)) {
               exception_message = path_v;
               goto fail_errno;
           }
@@ -965,7 +966,7 @@ bs_precompile(char * path, VALUE path_v, char * cache_path, VALUE handler)
       }
       valid_cache = cache_key_equal_slow_path(&current_key, &cached_key, input_data);
        if (valid_cache) {
-         if (update_cache_key(&current_key, cache_fd, &errno_provenance)) {
+         if (update_cache_key(&current_key, RSTRING_LEN(input_data), cache_fd, &errno_provenance)) {
              goto fail;
          }
       }

I wonder if it's simpler to do:

current_key->data_size = current_key->size;

UPDATE: Probably not, since RSTRING_LEN may be different.

@stanhu
Copy link
Contributor

stanhu commented Jan 31, 2024

Hmm, maybe this is not exactly right. The data_size is actually the compiled size, not the raw string length. But at least the revalidation doesn't hit an EOF error now.

@casperisfine
Copy link
Contributor

Yup, I have what I think is a cleaner patch, but you were right on the money, I'm just writing a proper test case and will open a PR in a few minutes.

Thanks a lot, that was a great catch.

casperisfine pushed a commit that referenced this issue Jan 31, 2024
Fix: #474
Fix: #471

As noticed by Stan Hu, upon revalidation, we'd corrupt the
`data_size` field, as it wasn't initialized at all on the
cache key we computed from the source file.

The only thing we actually want to update is the `mtime`.

Co-Authored-By: Stan Hu <stanhu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants