Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve results on the rustc-perf benchmark suite #878

Open
bjorn3 opened this issue Jan 26, 2020 · 14 comments
Open

Improve results on the rustc-perf benchmark suite #878

bjorn3 opened this issue Jan 26, 2020 · 14 comments
Labels
compile-time How fast is the code compiled

Comments

@bjorn3
Copy link
Member

bjorn3 commented Jan 26, 2020

I only ran the debug benchmarks, as check should be identical and release will definitively be faster because of much less optimizations by cg_clif.

Except for some stress-tests the clean and baseline incremental results are quite positive (~10-60% improvement, often ~40%) For clean incremental the results are much worse (easily ~200%), as compiled object files are not stored in the incremental cache (#760) For patched incremental the results are very mixed. Sometimes the difference is just a little bit less than clean incremental, while in other cases it is up to ~70% faster than cg_llvm.

packed-simd failed due to a verifier error. Edit(2020-03-11): Opened #919. hyper-2 failed due to unsized locals not being implemented (used for impl FnOnce for Box<FnOnce>). Edit(2020-03-11): Fixed in #916. style-servo failed due to running out of disk space.

Patch for rustc-perf
diff --git a/collector/src/bin/rustc-perf-collector/execute.rs b/collector/src/bin/rustc-perf-collector/execute.rs
index 9aa2cc48..4f577183 100644
--- a/collector/src/bin/rustc-perf-collector/execute.rs
+++ b/collector/src/bin/rustc-perf-collector/execute.rs
@@ -203,13 +203,19 @@ impl<'a> CargoProcess<'a> {
     fn run_rustc(&mut self) -> anyhow::Result<()> {
         loop {
             let mut cmd = self.base_command(self.cwd, "rustc");
+            cmd.env("RUSTFLAGS", "-Cpanic=abort \
+            -Zcodegen-backend=~/Documents/cg_clif/target/release/librustc_codegen_cranelift.so \
+            --sysroot ~/Documents/cg_clif/build_sysroot/sysroot");
+            cmd.arg("--target").arg("x86_64-unknown-linux-gnu");
             cmd.arg("-p").arg(self.get_pkgid(self.cwd));
             match self.build_kind {
                 BuildKind::Check => {
+                    return Ok(());
                     cmd.arg("--profile").arg("check");
                 }
                 BuildKind::Debug => {}
                 BuildKind::Opt => {
+                    return Ok(());
                     cmd.arg("--release");
                 }
             }
Results

image

@bjorn3 bjorn3 added the compile-time How fast is the code compiled label Jan 26, 2020
@bjorn3 bjorn3 pinned this issue Feb 1, 2020
@bjorn3
Copy link
Member Author

bjorn3 commented Mar 11, 2020

Results after #918:

There are still regressions compared to cg_llvm, but most of the incremental compilation times have improved compared to cg_llvm.

Results

image

@bjorn3

This comment has been minimized.

@bjorn3
Copy link
Member Author

bjorn3 commented Mar 12, 2020

A lot of the reds are caused by the linker taking much more time. (Up to 90%!)

bjorn3 added a commit that referenced this issue Mar 14, 2020
This reduces runtime of ConstantCx::finalize for the coercions rustc
bench by ~65%

cc #878
@bjorn3
Copy link
Member Author

bjorn3 commented Mar 14, 2020

5d516f9 is a 20%-50% improvement on the coercions-debug benchmark. Overall it is a ~2% improvement.

bjorn3 added a commit that referenced this issue Mar 14, 2020
Reduces the time spent during the copy from ~9% to ~1% for helloworld

cc #878
@bjorn3
Copy link
Member Author

bjorn3 commented Mar 14, 2020

Current results with lld:

Results

image

Patch for rustc-perf
diff --git a/collector/src/bin/rustc-perf-collector/execute.rs b/collector/src/bin/rustc-perf-collector/execute.rs
index 9aa2cc48..9787da13 100644
--- a/collector/src/bin/rustc-perf-collector/execute.rs
+++ b/collector/src/bin/rustc-perf-collector/execute.rs
@@ -203,13 +203,21 @@ impl<'a> CargoProcess<'a> {
     fn run_rustc(&mut self) -> anyhow::Result<()> {
         loop {
             let mut cmd = self.base_command(self.cwd, "rustc");
+            cmd.env("RUSTFLAGS", "-Cpanic=abort \
+            -Clink-args=-fuse-ld=lld -Zcodegen-backend=/home/bjorn/Documenten/cg_clif/target/release/librustc_codegen_cranelift.so \
+            --sysroot /home/bjorn/Documenten/cg_clif/build_sysroot/sysroot");
+            //cmd.env("RUSTFLAGS", "-Cpanic=abort -Clink-args=-fuse-ld=lld");
+            cmd.arg("--target").arg("x86_64-unknown-linux-gnu");
             cmd.arg("-p").arg(self.get_pkgid(self.cwd));
+            cmd.env("CG_CLIF_INCR_CACHE", "1");
             match self.build_kind {
                 BuildKind::Check => {
+                    return Ok(());
                     cmd.arg("--profile").arg("check");
                 }
                 BuildKind::Debug => {}
                 BuildKind::Opt => {
+                    return Ok(());
                     cmd.arg("--release");
                 }
             }

@vultix
Copy link

vultix commented Mar 14, 2020

Although there are still regressions, they are almost entirely found in the tiny stress-test benchmarks. Most real-world benchmarks are seeing fantastic improvements!

Wonderful work, @bjorn3!

@bjorn3
Copy link
Member Author

bjorn3 commented Mar 14, 2020

There are a few places where a non stress-test benchmark regresses a few percent in one of the incremental benchmarks. Other than that many stress-test benchmarks regress because of slower linking. Improving this will benefit all other executable benchmarks too. For example the helloworld-debug regression can be completely explained by longer linking times. In fact the codegen part is faster for cg_clif.

@bjorn3
Copy link
Member Author

bjorn3 commented Mar 15, 2020

Reran the benchmarks with firefox and vscode closed. Now only regression-31157-debug patched incremental is a significant regression:

image

@vultix
Copy link

vultix commented Mar 15, 2020

With such huge improvements, how much work would you say is left for MVP?

@bjorn3
Copy link
Member Author

bjorn3 commented Mar 15, 2020

There are still missing features as mentioned in https://hackmd.io/@bjorn3/HJL5ryFS8. I don't know how long it will take to implement most of them. Some are hard, while others are less hard.

@NotAFile
Copy link

Are there any recent rustc-perf runs? I'm especially curious about the JIT mode.

@bjorn3
Copy link
Member Author

bjorn3 commented Oct 12, 2021

Not recently. Don't expect the JIT mode to be faster than AOT compilation. The JIT mode currently doesn't support incremental compilation, which makes it slower.

@jasonwilliams
Copy link
Member

Here is the latest.. Using commit df7f020

localhost_2346_compare html_start=LLVM end=CG_CLIF stat=wall-time

CG_CLIF

diff --git a/collector/src/execute.rs b/collector/src/execute.rs
index d816eaaf..ec71984f 100644
--- a/collector/src/execute.rs
+++ b/collector/src/execute.rs
@@ -399,14 +399,21 @@ impl<'a> CargoProcess<'a> {
                 };
 
             let mut cmd = self.base_command(self.cwd, subcommand);
+            cmd.env(
+                "RUSTFLAGS",
+                "-Zcodegen-backend=/home/jasew/workspace/rustc_codegen_cranelift/build/lib/librustc_codegen_cranelift.so",
+            );
+            cmd.arg("--target").arg("x86_64-unknown-linux-gnu");
             cmd.arg("-p").arg(self.get_pkgid(self.cwd)?);
             match self.profile_kind {
                 ProfileKind::Check => {
+                    return Ok(());
                     cmd.arg("--profile").arg("check");
                 }
                 ProfileKind::Debug => {}
                 ProfileKind::Doc => {}
                 ProfileKind::Opt => {
+                    return Ok(());
                     cmd.arg("--release");
                 }
             }

LLVM

diff --git a/collector/src/execute.rs b/collector/src/execute.rs
index d816eaaf..ca34d0a3 100644
--- a/collector/src/execute.rs
+++ b/collector/src/execute.rs
@@ -399,14 +399,17 @@ impl<'a> CargoProcess<'a> {
                 };
 
             let mut cmd = self.base_command(self.cwd, subcommand);
+            cmd.arg("-j1");
             cmd.arg("-p").arg(self.get_pkgid(self.cwd)?);
             match self.profile_kind {
                 ProfileKind::Check => {
+                    return Ok(());
                     cmd.arg("--profile").arg("check");
                 }
                 ProfileKind::Debug => {}
                 ProfileKind::Doc => {}
                 ProfileKind::Opt => {
+                    return Ok(());
                     cmd.arg("--release");
                 }
             }

Notes:

Processor AMD Ryzen 9 5950X 16-Core Processor 3.40 GHz
Installed RAM 32.0 GB

@bjorn3
Copy link
Member Author

bjorn3 commented Aug 25, 2022

cc #1271

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compile-time How fast is the code compiled
Projects
None yet
Development

No branches or pull requests

4 participants