Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use this crates correctly in tokio runtime #477

Open
kolapapa opened this issue Apr 10, 2024 · 1 comment
Open

How to use this crates correctly in tokio runtime #477

kolapapa opened this issue Apr 10, 2024 · 1 comment

Comments

@kolapapa
Copy link

I have a need to detect the load time of web pages, but I encountered some problems when processing a large number of pages in parallel. My entire program is performed in tokio runtime, I hope to run it in the form of task::spawn_blocking.

Problem OS Info: aarch64 debian

Below is a demo example of my code

// headless_chrome = "1.0.9"
// anyhow = "1.0.81"
// tokio = { version = "1.37.0", features = ["full"] }
// futures = "0.3.30"

use std::sync::Arc;
use std::time::{Duration, Instant};
use headless_chrome::Browser;

use tokio::task;
use futures::future::join_all;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let browser = Browser::default()?;
    let arc_browser = Arc::new(browser);

    let targets = vec![
        "https://www.rust-lang.org/".to_string(),
        "https://www.cloudflare.com/".to_string(),
        "https://aws.amazon.com/".to_string(),
    ];

    let mut tasks = vec![];
    let start = Instant::now();
    for target in targets {
        let browser = arc_browser.clone();
        let target = target.clone();
        let task_name = target.clone();
        let task = task::spawn(detect_target_5times(browser, target));
        tasks.push(async move { (task_name, task) });
    }

    let results = join_all(tasks).await;

    for (name, result) in results {
        match result.await {
            Ok(Ok(_)) => println!("{} done.", name),
            Ok(Err(e)) => println!("failed: {:?}", e),
            Err(e) => println!("join failed: {:?}", e),
        }
    }
    println!("Total time: {:?}", start.elapsed());

    Ok(())
}

async fn detect_target_5times(browser: Arc<Browser>, target: String) -> anyhow::Result<()> {
    println!("start browser: {}", target);
    let mut tasks = vec![];
    for _ in 0..5 {
        let arc_browser = browser.clone();
        let target = target.clone();
        let task = task::spawn_blocking(move || detect_once(arc_browser, target));
        tasks.push(task);
    }

    let results = join_all(tasks).await;
    let mut durations = vec![];
    for result in results {
        match result {
            Ok(Ok(d)) => durations.push(d),
            Ok(Err(e)) => println!("failed: {}", e),
            Err(e) => println!("join failed: {}", e),
        }
    }
    println!("target: {} total duration: {:?}", target, durations.iter().sum::<Duration>());

    Ok(())
}

fn detect_once(browser: Arc<Browser>, target: String) -> anyhow::Result<Duration> {
    let tab = browser.new_context()?.new_tab()?;
    let start = Instant::now();
    tab.navigate_to(&target)?;
    tab.wait_until_navigated()?;
    let duration = start.elapsed();
    println!("{}: {:?}", target, duration);
    tab.close(true)?;
    Ok(duration)
}

This is the result log of a run:

start browser: https://www.rust-lang.org/
start browser: https://aws.amazon.com/
start browser: https://www.cloudflare.com/
https://www.rust-lang.org/: 3.854869329s
https://www.rust-lang.org/: 4.119861796s
https://www.rust-lang.org/: 5.435828585s
https://www.rust-lang.org/: 8.424771699s
https://www.rust-lang.org/: 9.711593305s
target: https://www.rust-lang.org/ total duration: 31.546924714s
https://www.rust-lang.org/ done.
https://aws.amazon.com/: 18.602800925s
https://aws.amazon.com/: 19.842338577s
failed: The event waited for never came
failed: The event waited for never came
failed: The event waited for never came
target: https://aws.amazon.com/ total duration: 38.445139502s
failed: The event waited for never came
failed: The event waited for never came
failed: The event waited for never came
failed: The event waited for never came
failed: The event waited for never came
target: https://www.cloudflare.com/ total duration: 0ns
https://www.cloudflare.com/ done.
https://aws.amazon.com/ done.
Total time: 40.207492793s

This code runs without problems on my M2 macbook, but a similar problem occurs when I put it on an aarch64 box. Is there something wrong with what I am using? Can anyone help me?

@kingsleyh
Copy link

I had a problem when spawning multiple browsers in parallel - don't know if it was my hardware or some other problems. I've also tried the other project chromiumoxide but I find it extremely unreliable compared to this project. In the end I've gone more sequential. I imagine that if it works fine on M2 macbook then it's some kind of hardware related issue - but I have no clue. For me it runs great in docker on a linux host - but just gets stuck or hangs in a docker on my mac host.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants