Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce excessive CPU usage when serializing breadcrumbs to disk #4181

Merged
merged 20 commits into from
Mar 14, 2025

Conversation

romtsn
Copy link
Member

@romtsn romtsn commented Feb 19, 2025

📜 Description

  • Introduce QueueFile (vendored from https://github.com/square/tape) which allows us to write breadcrumbs atomically one-by-one using RandomAccessFile under the hood
    • I also modified it so it supports maxElements and works as a ring buffer based on the number of elements in the queue
    • It already works as a ring buffer but based on the file size. It automatically expands the file size until some certain limit; after that limit is reached it will wrap around and start writing to the beginning of the file
  • Add a new resetCache method that cleans up the disk cache on every init (but after the ANR processing is done) to ensure clean state and that we don't enrich with outdated scope values
  • Do not check for file existence before deleting/writing to files because it was redundant and unnecessary I/O

Before

Screenshot 2025-02-17 at 16 51 15

After

Screenshot 2025-02-17 at 16 03 39

Also posting a perfetto trace query where we have ~50 addBreadcrumb calls to confirm that time spent in addBreadcrumb does not increase with the number of breadcrumbs added:

Slack 2025-03-14 11 25 58

💡 Motivation and Context

Closes #3168

💚 How did you test it?

manually + automated

📝 Checklist

  • I added tests to verify the changes.
  • No new PII added or SDK only sends newly added PII if sendDefaultPII is enabled.
  • I updated the docs if needed.
  • I updated the wizard if needed.
  • Review from the native team if needed.
  • No breaking change or entry added to the changelog.
  • No breaking change for hybrid SDKs or communicated to hybrid SDKs.

🔮 Next steps

Another potential improvement could be to batch breadcrumbs in-memory first, before writing to disk, and then flush every 1-2 seconds as batches to reduce I/O even further. The trade-off would be that we can lose some important breadcrumbs if an error occurs right within that timeframe. For now I would still like to keep it as-is, but if we receive further reports about excessive I/O we can implement that improvement too. I'm leaving the code snippet here so we can come back to it later and just copy-paste it:

Code snippet for batched writes
  public PersistingScopeObserver(final @NotNull SentryOptions options) {
    this.options = options;
    this.inMemoryCrumbs = createBreadcrumbsList(options.getMaxBreadcrumbs());
  }

  private @Nullable Future<?> breadcrumbsSerializerTask;
  private volatile @NotNull Queue<Breadcrumb> inMemoryCrumbs;

  private static @NotNull Queue<Breadcrumb> createBreadcrumbsList(final int maxBreadcrumb) {
    return maxBreadcrumb > 0
      ? SynchronizedQueue.synchronizedQueue(new CircularFifoQueue<>(maxBreadcrumb))
      : SynchronizedQueue.synchronizedQueue(new DisabledQueue<>());
  }

  @Override
  public void addBreadcrumb(@NotNull Breadcrumb crumb) {
    inMemoryCrumbs.offer(crumb);
    if (breadcrumbsSerializerTask == null || breadcrumbsSerializerTask.isCancelled() || breadcrumbsSerializerTask.isDone()) {
      breadcrumbsSerializerTask = serializeToDiskScheduled(() -> {
        try {
          Breadcrumb breadcrumb = inMemoryCrumbs.poll();
          while (breadcrumb != null) {
            diskCrumbs.getValue().add(breadcrumb);
            breadcrumb = inMemoryCrumbs.poll();
          }
        } catch (IOException e) {
          options.getLogger().log(ERROR, "Failed to add breadcrumb to file queue", e);
        }
      });
    }
  }

  @Override
  public void setBreadcrumbs(@NotNull Collection<Breadcrumb> breadcrumbs) {
    if (breadcrumbs.isEmpty()) {
      Date now = DateUtils.getCurrentDateTime();
      serializeToDisk(() -> {
        try {
          Iterator<Breadcrumb> iterator = inMemoryCrumbs.iterator();
          while (iterator.hasNext()) {
            if (iterator.next().getTimestamp().before(now)) {
              iterator.remove();
            }
          }
          diskCrumbs.getValue().clear();
        } catch (IOException e) {
          options.getLogger().log(ERROR, "Failed to clear breadcrumbs from file queue", e);
        }
      });
    }
  }

  @Nullable
  private Future<?> serializeToDiskScheduled(final @NotNull Runnable task) {
    if (!options.isEnableScopePersistence()) {
      return null;
    }
    if (Thread.currentThread().getName().contains("SentryExecutor")) {
      // we're already on the sentry executor thread, so we can just execute it directly
      try {
        task.run();
      } catch (Throwable e) {
        options.getLogger().log(ERROR, "Serialization task failed", e);
      }
      return null;
    }

    try {
      return options
          .getExecutorService()
          .schedule(
              () -> {
                try {
                  task.run();
                } catch (Throwable e) {
                  options.getLogger().log(ERROR, "Serialization task failed", e);
                }
              }, 1000);
    } catch (Throwable e) {
      options.getLogger().log(ERROR, "Serialization task could not be scheduled", e);
    }
    return null;
  }

as a result we just had 2 I/O writes:
Screenshot 2025-02-17 at 17 44 37

Sorry, something went wrong.

@romtsn romtsn changed the title Rz/fix/persisting scope observer Reduce excessive CPU usage when serializing breadcrumbs to disk Feb 19, 2025
Copy link
Contributor

github-actions bot commented Feb 19, 2025

Performance metrics 🚀

  Plain With Sentry Diff
Startup time 399.11 ms 422.08 ms 22.98 ms
Size 1.58 MiB 2.21 MiB 646.66 KiB

Previous results on branch: rz/fix/persisting-scope-observer

Startup times

Revision Plain With Sentry Diff
473dcc6 406.02 ms 455.16 ms 49.14 ms
d877760 672.94 ms 788.46 ms 115.52 ms
ca480b4 389.82 ms 456.52 ms 66.70 ms
247dd70 396.72 ms 419.14 ms 22.42 ms

App size

Revision Plain With Sentry Diff
473dcc6 1.58 MiB 2.21 MiB 645.19 KiB
d877760 1.58 MiB 2.21 MiB 646.64 KiB
ca480b4 1.58 MiB 2.21 MiB 645.20 KiB
247dd70 1.58 MiB 2.21 MiB 646.61 KiB

Copy link
Member

@stefanosiano stefanosiano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question regarding setBreadcrumbs(), but it's good

Copy link
Member

@markushi markushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Do I assume right that if someone upgrades the SDK, and the devices has pending ANRs, they won't be enriched, as the scope data is only available in the old format?

if (breadcrumbs == null) {
return;
}
if (event.getBreadcrumbs() == null) {
event.setBreadcrumbs(new ArrayList<>(breadcrumbs));
event.setBreadcrumbs(breadcrumbs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well spotted, nice!

@romtsn
Copy link
Member Author

romtsn commented Mar 14, 2025

Looks good to me! Do I assume right that if someone upgrades the SDK, and the devices has pending ANRs, they won't be enriched, as the scope data is only available in the old format?

Yes that's right (only breadcrumbs would be missing though)! But I guess it's fine to accept that trade-off (and it's probably not going to be a very common scenario to have an ANR in-between app updates). Left a comment also:

              try {
                queueFile = new QueueFile.Builder(file).size(options.getMaxBreadcrumbs()).build();
              } catch (IOException e) {
                // if file is corrupted we simply delete it and try to create it again. We accept
                // the trade
                // off of losing breadcrumbs for ANRs that happened right before the app has
                // received an
                // update where the new format was introduced
                file.delete();

                queueFile = new QueueFile.Builder(file).size(options.getMaxBreadcrumbs()).build();
              }

@romtsn romtsn enabled auto-merge (squash) March 14, 2025 11:16
@romtsn romtsn merged commit b61429a into main Mar 14, 2025
34 checks passed
@romtsn romtsn deleted the rz/fix/persisting-scope-observer branch March 14, 2025 11:31
romtsn added a commit that referenced this pull request Mar 14, 2025
* WIP

* WIP

* Remove redundant line

* Add Tests

* api dump

* Formatting

* REset scope cache on new init

* Clean up

* Comment

* Changelog

* Workaround square/tape#173

* Add a comment to setBreadcrumbs

* Address PR review

* Update CHANGELOG.md
romtsn added a commit that referenced this pull request Mar 17, 2025
… (#4260)

* WIP

* WIP

* Remove redundant line

* Add Tests

* api dump

* Formatting

* REset scope cache on new init

* Clean up

* Comment

* Changelog

* Workaround square/tape#173

* Add a comment to setBreadcrumbs

* Address PR review

* Update CHANGELOG.md
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reducing writes performed by the PersistingScopeObserver
3 participants