New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hijack navigation, return custom HTML and setup the page interactions (on_clicks, on_submits...) #841
Comments
Please fix the golang code in your markdown: @@ golang markdown block 1 @@
2:1: expected declaration, found 'go'
4:24: expected 'IDENT', found ')'
4:26: expected type, found '{'
5:34: expected ';', found ':'
6:5: expected declaration, found 'go'
43:3: expected declaration, found 'if' generated by check-issue |
You can do the same thing with rod: proto.FetchEnable{
Patterns: []*proto.FetchRequestPattern{
{URLPattern: "*"},
},
}.Call(page)
go page.EachEvent(func(e *proto.FetchRequestPaused) {
fmt.Println("request", e.Request.URL)
})()
proto.FetchFulfillRequest{
ResponseHeaders: []*proto.FetchHeaderEntry{},
Body: []byte("Hello World!"),
}.Call(page) |
That's beautiful, makes it so much easier. Thank you. |
@ysmood Sorry to bother again, I think I am misunderstanding how func main() {
cfg := getConfig()
browser := rod.New().MustConnect()
defer browser.MustClose()
page := browser.MustPage("")
defer page.MustClose()
Setup(page, &cfg)
done, err := StartCrawler(context.Background(), page)
if err != nil {
log.Fatalf("failed to run crawler: %v", err)
}
<-done
fmt.Println("success!")
}
func Setup(page *rod.Page, config *Config) {
fetchEnable := proto.FetchEnable{
Patterns: []*proto.FetchRequestPattern{
{URLPattern: "*"},
},
}
if err := fetchEnable.Call(page); err != nil {
log.Fatalf("failed to enable fetch: %v", err)
}
listenEvents(page, config)
}
func listenEvents(page *rod.Page, config *Config) {
go page.EachEvent(func(e *proto.FetchRequestPaused) {
stubbedPageInfo := stubPage(page, e, config)
switch {
case stubbedPageInfo != nil:
// Panics here
html, err := page.HTML()
if err != nil {
log.Fatalf("failed to get html: %v", err)
}
fmt.Println(html)
....
func stubPage(page *rod.Page, ev *proto.FetchRequestPaused, config *Config) *Page {
for _, pageConf := range config.Pages {
if pageConf.URL == ev.Request.URL && ev.Request.Method == "GET" {
fmt.Println("stubbing navigation: " + ev.Request.URL)
headers := []*proto.FetchHeaderEntry{{
Name: "Content-Type", Value: "text/html",
}}
file, err := ioutil.ReadFile(pageConf.File)
if err != nil {
log.Fatalf("failed to load file %v", err)
}
err = proto.FetchFulfillRequest{
RequestID: ev.RequestID,
ResponseHeaders: headers,
Body: file,
ResponsePhrase: "OK",
ResponseCode: 200,
}.Call(page)
if err != nil {
log.Fatalf("failed to stub navigation request %v", err)
}
return &pageConf
}
}
return nil
}
Line 65 is
I'd expect it to complete the request and the page should load the my stubbed HTML. Maybe some race condition is occurring
func StartCrawler(ctx context.Context, page *rod.Page) (chan bool, error) {
done := make(chan bool)
go func() {
fmt.Println("navigating")
page.MustNavigate(baseUrl + "/login").WaitLoad()
done <- true
}()
return done, nil
} |
It's a limitation of cdp, Also to run js, you have to wait the navigation to complete, code like before works fine to me: func main() {
browser := rod.New().MustConnect()
page := browser.MustPage("")
utils.E(proto.FetchEnable{
Patterns: []*proto.FetchRequestPattern{
{URLPattern: "*"},
},
}.Call(page))
go page.EachEvent(func(e *proto.FetchRequestPaused) {
utils.E(proto.FetchFulfillRequest{
RequestID: e.RequestID,
ResponseCode: 200,
Body: []byte("<html>test</html>"),
}.Call(page))
})()
page.MustNavigate("http://example.com")
fmt.Println(page.MustHTML())
} |
Rod Version: v0.112.6
Description
I am building a testing library for my web crawler using
go-rod
, and I want to stub the real pages with custom HTML that I control. My goal is to hook into a page, listen for navigation events, and then substitute the real page with my own copy of the page. After substituting the page, I want to call thesetupPageInteractions
function that hooks into the elements on the page and sets uponClick
events,onSubmit
and other interactions.Problem
I am facing an issue while trying to achieve this. When I stub the page using
HijackRequests
, I cannot block the request, and I have to let it finish. However, this means that I am not able to inject my custom code and call thesetupPageInteractions
function at the right time. If I call thesetupPageInteractions
function directly after stubbing the page, the request won't complete, and the execution is blocked indefinitely. I need to setup the page and the interactions before theWaitLoad
triggers on the crawler side.crawler.Navigate("desired_page") -> crawler.WaitLoad() -> stop the request -> inject custom HTML -> setup page interactions -> crawler.WaitLoad() and the crawler.Page is now our custom page
Example Code
Here is the code snippet showcasing my issue:
Expected Behavior
I expect to be able to inject my custom HTML and set up interactions after hijacking the request without blocking the request or the execution of the code.
Actual Behavior
I am not able to inject my custom code and call the setupPageInteractions function without blocking the request or the execution of the code.
Any guidance or suggestions on how to resolve this issue would be greatly appreciated.
TestRod reproduction code
rod_test.go
I achieved this in
chromedp
using thefetch.FullfilRequest
The text was updated successfully, but these errors were encountered: