Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance drop using cloneDeep of lodash #471

Closed
1 task done
karol-bisztyga opened this issue Mar 17, 2021 · 13 comments
Closed
1 task done

Performance drop using cloneDeep of lodash #471

karol-bisztyga opened this issue Mar 17, 2021 · 13 comments
Labels
enhancement New feature or request performance

Comments

@karol-bisztyga
Copy link

karol-bisztyga commented Mar 17, 2021

Bug Description

There is a significant performance drop using cloneDeep of lodash, it takes approximately 25-45 times longer to execute when running with Hermes on android.

  • I have run gradle clean and confirmed this bug does not occur with JSC

Hermes version: hermes-engine: 0.5.1
React Native version (if any): 0.63.4
OS version (if any): Android 10 x86 Pixel 3 simulator
Platform (most likely one of arm64-v8a, armeabi-v7a, x86, x86_64): x86_64

Minimal reproducing repository

https://github.com/karol-bisztyga/hermes-repro
There are 2 commits:

  • init - use it to test without Hermes
  • with hermes - use it to test with Hermes

For reproducing this I used a randomly generated JSON file, which is here, it is a 1MB file(just FYI, you don't have to worry about this, it's hardcoded in the repro already).

Tests results

Running sample code from the repro provided I got following results(that stand for times of execution of doing JSON.parse, clondeDeep and both combined):

- no hermes:
{ parse: [ 32, 11, 16, 14, 17, 15, 14, 17, 14, 12 ],
      clone: [ 218, 209, 114, 122, 117, 114, 117, 123, 111, 118 ],
      parseClone: [ 235, 137, 141, 132, 125, 142, 131, 131, 141, 136 ] }
- with hermes:
{ parse: [ 24, 18, 22, 17, 17, 13, 18, 13, 18, 12 ],
      clone: [ 5667, 5318, 5500, 5395, 5576, 5760, 5553, 5349, 5669, 5455 ],
      parseClone: [ 5405, 5388, 5787, 5452, 5426, 5360, 5381, 5364, 5455, 5390 ] }

The Expected Behavior

Similar time of execution of clondeDeep with and without Hermes

Note

I didn't know how to categorize it so I went for the Bug, sorry if that's not the best way, I couldn't find a category for the performance drop.

@karol-bisztyga karol-bisztyga added the bug Something isn't working label Mar 17, 2021
@dulinriley dulinriley added enhancement New feature or request performance and removed bug Something isn't working labels Mar 18, 2021
@dulinriley
Copy link
Contributor

I think "enhancement" combined with "performance" is a better set of tags.
"bug" is reserved for Hermes producing incorrect output or crashing, which doesn't seem to be the case here (assuming the values you list are times, not the output).

@karol-bisztyga
Copy link
Author

Those are times of execution, I'll update the description as it might've been misleading, thanks for clarifying.

karol-bisztyga added a commit to CommE2E/comm that referenced this issue Mar 30, 2021
Summary: Assuming that the `cloneDeep` function has been introduced in order to avoid memory leaks/inefficient memory usage, an investigation checking whether it still occurs without cloning has been performed. Apparently this is not an issue anymore. The main reason for removing this function is our will to switch to Hermes. Using `cloneDeep` with this engine, a significant performance drop can be observed, the problem is described [here](facebook/hermes#471).

Test Plan: Test results are presented in [this](https://phabricator.ashoat.com/D884#21096) and two following comments. We also ran a test logging in to the comm app with an account that has a lot of data to be fetched and parsed(and, up to this point, clonned).

Reviewers: palys-swm, ashoat

Reviewed By: palys-swm, ashoat

Subscribers: ashoat, KatPo, zrebcu411, Adrian, atul, subnub

Differential Revision: https://phabricator.ashoat.com/D916
@andreialecu andreialecu mentioned this issue Apr 8, 2021
@andreialecu
Copy link

Sorry to hijack this thread, but I believe this could be related.

It appears that polyfilling Intl is extremely slow as well using Hermes. I forked @karol-bisztyga's repro to show an issue with formatting dates.

Hermes is 3 times slower than JSC, and it makes working with i18n/timezoned dates almost impossible. It takes around 5ms to format one single date(!).

Copying my comment from: #23 (comment)

Sample benchmark using luxon and @formatjs on Android via Hermes:

  for (let y = 0; y < 5; y++) {
    const start = Date.now();
    for (let i = 0; i < 100; i++) {
      DateTime.local().toLocaleString();
    }
    console.log(Date.now() - start);
  }

Output:

LOG  423
LOG  450
LOG  575
LOG  450
LOG  421

I ran the same benchmark using JSC instead of Hermes on Android and the times are 150-170ms per 100 runs.

So it seems to be a Hermes performance issue as well, probably same root cause as #471

Here's a repro on both engines:
https://github.com/andreialecu/hermes-repro/tree/jsc-intl
https://github.com/andreialecu/hermes-repro/tree/hermes-intl

The same benchmark on an iOS emulator (iOS has native Intl so it does not need a polyfill) using JavaScriptCore logs:

 LOG  8
 LOG  0
 LOG  1
 LOG  1
 LOG  0

@dulinriley
Copy link
Contributor

@andreialecu thanks for letting us know, but we don't know for certain yet what the root cause of the performance issue is with the original example from the post. So it's hard to say if they're related or not yet.

We're currently trying to isolate the particular lines of code that take the most time. If you'd like to know this for your own app, you can use https://reactnative.dev/docs/next/profile-hermes to get a profile of the JS that is run. That might help us figure out the root cause of slow performance

@andreialecu
Copy link

Hey @dulinriley, thanks for the reply.

I actually used the hermes profiler to track down the issue to formatjs's polyfills.

A profile session on https://github.com/andreialecu/hermes-repro/tree/hermes-intl would likely help troubleshoot this on your end.

@dulinriley
Copy link
Contributor

Ok so we spent some time profiling this. There are two main functions with lodash taking most of the time (> 70%):

  • assocIndexOf: this looks through a stack to find a key match. It's sort of like a hand-rolled Map object with a specific ordering
  • eq: This is literally just x === y || (x !== x && y !== y)

Given the latter is fast everywhere, we find that it's actually a function calling issue. Inlining the latter gives about a 40% speedup.

The former is used by a stack of circular references in cloneDeep. Basically it's about 4500 deep by the end of the clone, and it never has a circular reference so there's no match. It would be much faster to use a Map or Set for this, not sure why lodash doesn't use one.

We're not sure yet how to speed up searching through the medium size (4500) array of key/value pairs, nor are we sure why v8 is faster than us (in jitless mode).

What we can do is work on our inliner, since the eq function should be trivially inlinable even statically: it can't throw, has no strictness changes, no arguments usage, and no captured variables. We don't do it today because we only do this for functions called exactly once.

We still don't have any fixes ready to land yet, but in the meantime if this is blocking you, you can try copying the cloneDeep implementation and remove the stack checks. That brings the runtime on my machine from 500 ms down to 50 ms. It works fine for non-circular objects like a JSON object.

@avp
Copy link
Contributor

avp commented Apr 10, 2021

I spent some more time and I think I have a more detailed diagnosis.

lodash has a failover mechanism which allows it to turn ListCache into a Map specifically to avoid the bad case of iterating over all the entries in the Array. It triggers within the stackSet call.

So, what prevents that mechanism from triggering?

  function stackSet(key, value) {
    var data = this.__data__;
    if (data instanceof ListCache) {
      var pairs = data.__data__;
      if (!Map || (pairs.length < LARGE_ARRAY_SIZE - 1)) {
        pairs.push([key, value]);
        this.size = ++data.size;
        return this;
      }
      data = this.__data__ = new MapCache(pairs);
    }
    data.set(key, value);
    this.size = data.size;
    return this;
  }

Notice that we have a check for !Map in addition to the size check on LARGE_ARRAY_SIZE. Other engines are populating Map, while Hermes is not.

Here's how Map is populated:

      Map = getNative(root, 'Map'),

getNative eventually calls into baseIsNative:

  function baseIsNative(value) {
    if (!isObject(value) || isMasked(value)) {
      return false;
    }
    var pattern = isFunction(value) ? reIsNative : reIsHostCtor;
    return pattern.test(toSource(value));
  }

Here it effectively calls toString on value (which here is the Hermes-implemented native Map), and runs a pattern match on it. The pattern is going to be reIsNative (because Map is indeed a function).

Thus we come to reIsNative:

  /** Used to detect if a method is native. */
  var reIsNative = RegExp('^' +
    funcToString.call(hasOwnProperty).replace(reRegExpChar, '\\$&')
    .replace(/hasOwnProperty|(function).*?(?=\\\()| for .+?(?=\\\])/g, '$1.*?') + '$'
  );

The regexp which is being matched against is calling toString on Object.hasOwnProperty in particular, and replacing the word hasOwnProperty with some wildcards.

In other engines it ends up looking like this:

/^function.*?\(\) \{\n    \[native code\]\n\}$/

In Hermes, however, we show how many arguments the function's .length property expects:

/^function.*?\(a0\) \{ \[native code\] \}$/

This means that we fail the check for Map (which expects no arguments, and therefore does not match the regexp) and lodash never gets to use the Map implementation which Hermes provides.

I changed the line that sets up Map to simply set it unconditionally and the regression was mostly mitigated.

We'll look into what's necessary to fix this.

@tmikov
Copy link
Contributor

tmikov commented Apr 10, 2021

So, ideally this should be fixed with a patch to Lodash?

@andreialecu
Copy link

It seems Hermes could still be up to 3x slower than jsc in some tasks. The jsc vs Hermes benchmark I linked above shows a big performance gap between the two.

It could be because of inlining, so I'm hopeful things can improve.

@tmikov
Copy link
Contributor

tmikov commented Apr 12, 2021

@andreialecu can you please file a separate issue? The performance of Intl is most likely not related to this.

@andreialecu
Copy link

Well, it's not exactly about the performance of Intl, because that doesn't exist in Hermes. It's about the performance of some polyfills, so it's a JS based implementation that is slow.

I have opened #495 however. The findings in this thread related to function inlining are probably relevant.

@Huxpro
Copy link
Contributor

Huxpro commented Apr 15, 2021

Hey folks, we've changed Hermes to print any NativeFunction without formal parameters to fix this. You should get a perf boost on RN 0.65 with next Hermes release.

Ideally, this should be a fix on Lodash. However, the reality seems to suggest that this practice might be too common that it's not worthy for us to diverge with all other engines for the sake of practicality.

We extracted the benchmark from @karol-bisztyga's original one to make it easier to run on engines directly, and I was able to confirm that this change (582a529) improved Hermes performance on the benchmark from 787 to 43, which is in the same ballpark with other engines when interpreting (numbers are the median of 3 runs on my mac pro):

  • v8 --jitless: 30
  • jsc --useJIT=false: 44
  • qjs: 60

@karol-bisztyga
Copy link
Author

Thanks everyone for the attention, research, and solution 🙌 This is amazing! 🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request performance
Projects
None yet
Development

No branches or pull requests

6 participants