Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tab.find_elements() performance is 150x worse than JS #460

Open
bricef opened this issue Feb 13, 2024 · 4 comments
Open

tab.find_elements() performance is 150x worse than JS #460

bricef opened this issue Feb 13, 2024 · 4 comments

Comments

@bricef
Copy link

bricef commented Feb 13, 2024

Issue

Running tab.find_elements() takes >150 times as long as running document.querySelectorAll() in Javascript.

I'm not sure if I'm doing something wrong or if this the expected performance, but this performance is prohibitive for my use case. Are there performance tricks I don't know about? The code seems to delegate to the browser's document.querySelectorAll() so I'm not sure where the time is being spent.

On the same url:

Rust

let start = Instant::now();
let es = tab.find_elements("[href]")?;
let duration = start.elapsed();
println!("Time elapsed in find_elements() is: {:?}", duration);

Output

Time elapsed in find_elements() is: 46.614745743s

JS

console.time("test"); 
document.querySelectorAll("[href]");
console.timeEnd("test")

Output

test: 0.260009765625 ms

Investigations

Looking at the headless_chrome source leads me to browser::transport::Transport.call_method(), but it's not clear where the time is being spent. I'm still starting on my rust journey, so I'm not familiar with the profiler tooling at this time to understand the details of this performance issue.

@bricef
Copy link
Author

bricef commented Feb 13, 2024

Update

I created a minimum viable test case:

use std::time::Instant;
use headless_chrome::Browser;

#[tokio::main]
async fn main() -> Result<(), anyhow::Error>{
    let browser = Browser::default()?;
    let tab = browser.new_tab()?;
    tab.navigate_to("https://en.wikipedia.org")?;
    tab.wait_until_navigated()?;

    let start = Instant::now();
    let _es = tab.find_elements("[href]")?;
    let duration = start.elapsed();

    println!("Time elapsed in find_elements() is: {:?}", duration);
    Ok(())
}

Compiling it with debug symbols and running perf/flamegraph I get the following:

flamegraph

Which suggests that the majority of the time is spent in chrome. I'm not sure why chrome would take so long, and I'm running against a production build of chromium (so no debug symbols...). Will continue to investigate.

@bricef
Copy link
Author

bricef commented Feb 13, 2024

On the headless_chrome crate front, the de/serialization of events from the websocket seems to be the hot zone, which is not unexpected.

Screenshot_2024-02-13_12-28-40

@chenshuiluke
Copy link

Did you ever end up getting further in your investigation?

@juddbaguio
Copy link

Any improvements on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants