Improve results on the rustc-perf benchmark suite #878

bjorn3 · 2020-01-26T16:59:04Z

I only ran the debug benchmarks, as check should be identical and release will definitively be faster because of much less optimizations by cg_clif.

Except for some stress-tests the clean and baseline incremental results are quite positive (~10-60% improvement, often ~40%) For clean incremental the results are much worse (easily ~200%), as compiled object files are not stored in the incremental cache (#760) For patched incremental the results are very mixed. Sometimes the difference is just a little bit less than clean incremental, while in other cases it is up to ~70% faster than cg_llvm.

packed-simd failed due to a verifier error. Edit(2020-03-11): Opened #919. ~~hyper-2 failed due to unsized locals not being implemented (used for impl FnOnce for Box<FnOnce>).~~ Edit(2020-03-11): Fixed in #916. style-servo failed due to running out of disk space.

Patch for rustc-perf

diff --git a/collector/src/bin/rustc-perf-collector/execute.rs b/collector/src/bin/rustc-perf-collector/execute.rs
index 9aa2cc48..4f577183 100644
--- a/collector/src/bin/rustc-perf-collector/execute.rs
+++ b/collector/src/bin/rustc-perf-collector/execute.rs
@@ -203,13 +203,19 @@ impl<'a> CargoProcess<'a> {
     fn run_rustc(&mut self) -> anyhow::Result<()> {
         loop {
             let mut cmd = self.base_command(self.cwd, "rustc");
+            cmd.env("RUSTFLAGS", "-Cpanic=abort \
+            -Zcodegen-backend=~/Documents/cg_clif/target/release/librustc_codegen_cranelift.so \
+            --sysroot ~/Documents/cg_clif/build_sysroot/sysroot");
+            cmd.arg("--target").arg("x86_64-unknown-linux-gnu");
             cmd.arg("-p").arg(self.get_pkgid(self.cwd));
             match self.build_kind {
                 BuildKind::Check => {
+                    return Ok(());
                     cmd.arg("--profile").arg("check");
                 }
                 BuildKind::Debug => {}
                 BuildKind::Opt => {
+                    return Ok(());
                     cmd.arg("--release");
                 }
             }

Results

The text was updated successfully, but these errors were encountered:

bjorn3 · 2020-03-11T20:49:38Z

Results after #918:

There are still regressions compared to cg_llvm, but most of the incremental compilation times have improved compared to cg_llvm.

Results

bjorn3 · 2020-03-12T22:01:15Z

A lot of the reds are caused by the linker taking much more time. (Up to 90%!)

This reduces runtime of ConstantCx::finalize for the coercions rustc bench by ~65% cc #878

bjorn3 · 2020-03-14T16:41:34Z

5d516f9 is a 20%-50% improvement on the coercions-debug benchmark. Overall it is a ~2% improvement.

Reduces the time spent during the copy from ~9% to ~1% for helloworld cc #878

bjorn3 · 2020-03-14T19:35:35Z

Current results with lld:

Results

Patch for rustc-perf

diff --git a/collector/src/bin/rustc-perf-collector/execute.rs b/collector/src/bin/rustc-perf-collector/execute.rs
index 9aa2cc48..9787da13 100644
--- a/collector/src/bin/rustc-perf-collector/execute.rs
+++ b/collector/src/bin/rustc-perf-collector/execute.rs
@@ -203,13 +203,21 @@ impl<'a> CargoProcess<'a> {
     fn run_rustc(&mut self) -> anyhow::Result<()> {
         loop {
             let mut cmd = self.base_command(self.cwd, "rustc");
+            cmd.env("RUSTFLAGS", "-Cpanic=abort \
+            -Clink-args=-fuse-ld=lld -Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so \
+            --sysroot /home/bjorn/Documenten/cg_clif/build_sysroot/sysroot");
+            //cmd.env("RUSTFLAGS", "-Cpanic=abort -Clink-args=-fuse-ld=lld");
+            cmd.arg("--target").arg("x86_64-unknown-linux-gnu");
             cmd.arg("-p").arg(self.get_pkgid(self.cwd));
+            cmd.env("CG_CLIF_INCR_CACHE", "1");
             match self.build_kind {
                 BuildKind::Check => {
+                    return Ok(());
                     cmd.arg("--profile").arg("check");
                 }
                 BuildKind::Debug => {}
                 BuildKind::Opt => {
+                    return Ok(());
                     cmd.arg("--release");
                 }
             }

vultix · 2020-03-14T20:04:25Z

Although there are still regressions, they are almost entirely found in the tiny stress-test benchmarks. Most real-world benchmarks are seeing fantastic improvements!

Wonderful work, @bjorn3!

bjorn3 · 2020-03-14T20:20:54Z

There are a few places where a non stress-test benchmark regresses a few percent in one of the incremental benchmarks. Other than that many stress-test benchmarks regress because of slower linking. Improving this will benefit all other executable benchmarks too. For example the helloworld-debug regression can be completely explained by longer linking times. In fact the codegen part is faster for cg_clif.

bjorn3 · 2020-03-15T22:07:56Z

Reran the benchmarks with firefox and vscode closed. Now only regression-31157-debug patched incremental is a significant regression:

vultix · 2020-03-15T22:11:44Z

With such huge improvements, how much work would you say is left for MVP?

bjorn3 · 2020-03-15T22:36:38Z

There are still missing features as mentioned in https://hackmd.io/@bjorn3/HJL5ryFS8. I don't know how long it will take to implement most of them. Some are hard, while others are less hard.

NotAFile · 2021-10-11T23:58:33Z

Are there any recent rustc-perf runs? I'm especially curious about the JIT mode.

bjorn3 · 2021-10-12T05:48:08Z

Not recently. Don't expect the JIT mode to be faster than AOT compilation. The JIT mode currently doesn't support incremental compilation, which makes it slower.

jasonwilliams · 2021-12-08T21:32:11Z

Here is the latest.. Using commit df7f020

CG_CLIF

diff --git a/collector/src/execute.rs b/collector/src/execute.rs
index d816eaaf..ec71984f 100644
--- a/collector/src/execute.rs
+++ b/collector/src/execute.rs
@@ -399,14 +399,21 @@ impl<'a> CargoProcess<'a> {
                 };
 
             let mut cmd = self.base_command(self.cwd, subcommand);
+            cmd.env(
+                "RUSTFLAGS",
+                "-Zcodegen-backend=/home/jasew/workspace/rustc_codegen_cranelift/build/lib/librustc_codegen_cranelift.so",
+            );
+            cmd.arg("--target").arg("x86_64-unknown-linux-gnu");
             cmd.arg("-p").arg(self.get_pkgid(self.cwd)?);
             match self.profile_kind {
                 ProfileKind::Check => {
+                    return Ok(());
                     cmd.arg("--profile").arg("check");
                 }
                 ProfileKind::Debug => {}
                 ProfileKind::Doc => {}
                 ProfileKind::Opt => {
+                    return Ok(());
                     cmd.arg("--release");
                 }
             }

LLVM

diff --git a/collector/src/execute.rs b/collector/src/execute.rs
index d816eaaf..ca34d0a3 100644
--- a/collector/src/execute.rs
+++ b/collector/src/execute.rs
@@ -399,14 +399,17 @@ impl<'a> CargoProcess<'a> {
                 };
 
             let mut cmd = self.base_command(self.cwd, subcommand);
+            cmd.arg("-j1");
             cmd.arg("-p").arg(self.get_pkgid(self.cwd)?);
             match self.profile_kind {
                 ProfileKind::Check => {
+                    return Ok(());
                     cmd.arg("--profile").arg("check");
                 }
                 ProfileKind::Debug => {}
                 ProfileKind::Doc => {}
                 ProfileKind::Opt => {
+                    return Ok(());
                     cmd.arg("--release");
                 }
             }

Notes:

I needed perf, https://gist.github.com/abel0b/b1881e41b9e1c4b16d84e5e083c38a13 worked fine
rust-perf https://github.com/rust-lang/rustc-perf

Processor AMD Ryzen 9 5950X 16-Core Processor 3.40 GHz
Installed RAM 32.0 GB

bjorn3 · 2022-08-25T17:56:29Z

cc #1271

bjorn3 added the compile-time How fast is the code compiled label Jan 26, 2020

bjorn3 pinned this issue Feb 1, 2020

This comment has been minimized.

Sign in to view

This was referenced Mar 11, 2020

Implement incremental caching of object files #918

Merged

Cranelift backend for rustc rust-lang/compiler-team#257

Closed

bjorn3 added a commit that referenced this issue Mar 14, 2020

Use Vec instead of HashSet for ccx.todo

5d516f9

This reduces runtime of ConstantCx::finalize for the coercions rustc bench by ~65% cc #878

bjorn3 added a commit that referenced this issue Mar 14, 2020

Pre-allocate vec for rlib metadata reading

0c1dcb0

Reduces the time spent during the copy from ~9% to ~1% for helloworld cc #878

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve results on the rustc-perf benchmark suite #878

Improve results on the rustc-perf benchmark suite #878

bjorn3 commented Jan 26, 2020 •

edited

bjorn3 commented Mar 11, 2020 •

edited

This comment has been minimized.

bjorn3 commented Mar 12, 2020

bjorn3 commented Mar 14, 2020

bjorn3 commented Mar 14, 2020 •

edited

vultix commented Mar 14, 2020 •

edited

bjorn3 commented Mar 14, 2020

bjorn3 commented Mar 15, 2020

vultix commented Mar 15, 2020

bjorn3 commented Mar 15, 2020

NotAFile commented Oct 11, 2021

bjorn3 commented Oct 12, 2021

jasonwilliams commented Dec 8, 2021

bjorn3 commented Aug 25, 2022

Improve results on the rustc-perf benchmark suite #878

Improve results on the rustc-perf benchmark suite #878

Comments

bjorn3 commented Jan 26, 2020 • edited

bjorn3 commented Mar 11, 2020 • edited

This comment has been minimized.

bjorn3 commented Mar 12, 2020

bjorn3 commented Mar 14, 2020

bjorn3 commented Mar 14, 2020 • edited

vultix commented Mar 14, 2020 • edited

bjorn3 commented Mar 14, 2020

bjorn3 commented Mar 15, 2020

vultix commented Mar 15, 2020

bjorn3 commented Mar 15, 2020

NotAFile commented Oct 11, 2021

bjorn3 commented Oct 12, 2021

jasonwilliams commented Dec 8, 2021

bjorn3 commented Aug 25, 2022

bjorn3 commented Jan 26, 2020 •

edited

bjorn3 commented Mar 11, 2020 •

edited

bjorn3 commented Mar 14, 2020 •

edited

vultix commented Mar 14, 2020 •

edited