Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Progress reporting like in clickhouse-cli #1127

Closed
vmihailenco opened this issue Oct 23, 2023 · 11 comments · Fixed by #1242
Closed

Progress reporting like in clickhouse-cli #1127

vmihailenco opened this issue Oct 23, 2023 · 11 comments · Fixed by #1242
Assignees
Labels

Comments

@vmihailenco
Copy link

I'd like to have progress reporting like in clickhouse-cli, but the following Go program only prints the error without any Progress or Profile events:

package main

import (
	"context"
	"fmt"

	"github.com/ClickHouse/clickhouse-go/v2"
)

func main() {
	if err := ProgressProfileLogs(); err != nil {
		panic(err)
	}
}

func ProgressProfileLogs() error {
	conn, err := clickhouse.Open(&clickhouse.Options{
		Addr: []string{"127.0.0.1:9000"},
		Settings: clickhouse.Settings{
			"send_logs_level": "trace",
		},
	})
	if err != nil {
		return err
	}
	totalRows := uint64(0)
	// use context to pass a call back for progress and profile info
	ctx := clickhouse.Context(context.Background(), clickhouse.WithProgress(func(p *clickhouse.Progress) {
		fmt.Println("progress: ", p)
		totalRows += p.Rows
	}), clickhouse.WithProfileInfo(func(p *clickhouse.ProfileInfo) {
		fmt.Println("profile info: ", p)
	}), clickhouse.WithLogs(func(log *clickhouse.Log) {
		fmt.Println("log info: ", log)
	}))

	rows, err := conn.Query(ctx, "SELECT number from numbers(10000000) LIMIT 10000000 settings max_rows_to_read = 100000, max_block_size = 100")
	if err != nil {
		return err
	}
	for rows.Next() {
	}

	fmt.Printf("Total Rows: %d\n", totalRows)
	rows.Close()
	return rows.Err()
}

The desired behavior is to have number of processed rows before the ClickHouse returned the error. Is that possible?

@jkaflik
Copy link
Contributor

jkaflik commented Oct 23, 2023

@vmihailenco I guess the error you refer to is:

Code: 158. DB::Exception: Limit for rows or bytes to read exceeded, max rows: 100.00 thousand, current rows: 10.00 million: While executing Numbers. (TOO_MANY_ROWS)

This is due to the max_rows_to_read setting, which - I haven't confirmed - makes the query fail immediately as the number of rows is LIMIT 10000000.

If you remove this setting, the query is executed, and the server should report back progress.

Important: progress is guaranteed to be called for a block of processed data. max_block_size in your case helps you control this.

@vmihailenco
Copy link
Author

Yes, it works for successful queries, but I'd like to show the progress for unsuccessful queries as well, for example, slow queries that take more than 30 seconds. In such cases the progress is not reported at all.

So I've added max_rows_to_read to make the query fail.

Important: progress is guaranteed to be called for a block of processed data. max_block_size in your case helps you control this.

That is how it works now, but clickhouse-cli is able to show the number of processed rows/bytes without any blocks. I wonder how that is achieved...

@jkaflik
Copy link
Contributor

jkaflik commented Oct 23, 2023

@vmihailenco Ok, I see. Reproduced it:

package issues

import (
	"context"
	"fmt"
	"testing"

	"github.com/ClickHouse/clickhouse-go/v2"
	clickhouse_tests "github.com/ClickHouse/clickhouse-go/v2/tests"
	"github.com/stretchr/testify/require"
)

func Test1127(t *testing.T) {
	var (
		conn, err = clickhouse_tests.GetConnection("issues", clickhouse.Settings{
			"max_execution_time":             60,
			"allow_experimental_object_type": true,
		}, nil, &clickhouse.Compression{
			Method: clickhouse.CompressionLZ4,
		})
	)
	require.NoError(t, err)

	totalRows := uint64(0)
	ctx := clickhouse.Context(context.Background(), clickhouse.WithProgress(func(p *clickhouse.Progress) {
		fmt.Println("progress: ", p)
		totalRows += p.Rows
	}), clickhouse.WithProfileInfo(func(p *clickhouse.ProfileInfo) {
		fmt.Println("profile info: ", p)
	}), clickhouse.WithLogs(func(log *clickhouse.Log) {
		fmt.Println("log info: ", log)
	}))

	rows, err := conn.Query(ctx, "SELECT number from numbers(10000000) LIMIT 10000000 settings max_rows_to_read = 100000, max_block_size = 100")
	require.NoError(t, err)

	defer rows.Close()
	for rows.Next() {
	}

	require.Error(t, rows.Err())
}

Which is the same code you posted. Driver receives progress packet, however it's not properly handled due to lost reference to progress callback function. I will investigate it.

@jkaflik jkaflik added the stale requires a follow-up label Nov 15, 2023
@jkaflik jkaflik self-assigned this Nov 15, 2023
@jkaflik jkaflik added the bug label Nov 15, 2023
@jkaflik jkaflik removed the stale requires a follow-up label Nov 28, 2023
jkaflik added a commit that referenced this issue Mar 19, 2024
jkaflik added a commit that referenced this issue Mar 19, 2024
@vmihailenco
Copy link
Author

@jkaflik it still does not work for timed out queries. In #1242 you removed max_rows_to_read = 100000, max_block_size = 100 so the query succeeds and the events are reported.

@jkaflik
Copy link
Contributor

jkaflik commented Mar 19, 2024

You are right @vmihailenco. I just confused things as it's already open for a while. Let me reopen this.

@jkaflik jkaflik reopened this Mar 19, 2024
@jkaflik
Copy link
Contributor

jkaflik commented Mar 19, 2024

@vmihailenco OK - it was a little bit of confusion. I was wrong with:

Which is the same code you posted. Driver receives progress packet, however it's not properly handled due to lost reference to progress callback function. I will investigate it.

The debugger stopped on a progress packet but for a helper query call. Not from a query:

SELECT number from numbers(10000000) LIMIT 10000000 settings max_rows_to_read = 100000, max_block_size = 100

Apologies for confusion. I should spent more time on it back then.

The thing is, ClickHouse server returns an error in the very first packet. The query is interrupted, thus no progress data is received. It's also reproducible in CLI client.

I don't have any query that can reproduce your use case. Do you have any?

@vmihailenco
Copy link
Author

I don't have any query that can reproduce your use case. Do you have any?

The first message says that ClickHouse does not report the number of processed rows if the query is interrupted, but clickhouse-cli somehow still has acccess to that information. How is that possible?

I've tried to ask this question in the ClickHouse repo, but was redirected here.

The thing is, ClickHouse server returns an error in the very first packet. The query is interrupted, thus no progress data is received.

That's what I observe as well.

It's also reproducible in CLI client.

No, clickhouse-cli somehow is able to show the progress. You can try any query with the too short timeout so the query fails. clickhouse-cli will show you a progress bar and the number of processed rows+bytes.

With the same query, clickhouse-go does not receive any profile/progress events. Just an error.

@jkaflik
Copy link
Contributor

jkaflik commented Mar 19, 2024

@vmihailenco

No, clickhouse-cli somehow is able to show the progress. You can try any query with the too short timeout so the query fails. clickhouse-cli will show you a progress bar and the number of processed rows+bytes.

What is your ClickHouse version?

@vmihailenco
Copy link
Author

What is your ClickHouse version?

I am currently using 23.11 , but AFAIK clickhouse-cli was always able to show the progress.

@jkaflik
Copy link
Contributor

jkaflik commented Mar 27, 2024

@vmihailenco what is your patch version?

$ docker run --rm -it clickhouse/clickhouse-server:23.11 clickhouse local           

ClickHouse local version 23.11.4.24 (official build).

dcf64eeac489 :) SELECT number from numbers(10000000) LIMIT 10000000 settings max_rows_to_read = 100000, max_block_size = 100


SELECT number
FROM numbers(10000000)
LIMIT 10000000
SETTINGS max_rows_to_read = 100000, max_block_size = 100

Query id: 1c1cbb8d-9c2e-4b2c-b658-bcd6313eff48


Elapsed: 0.018 sec.

Received exception:
Code: 158. DB::Exception: Limit for rows (controlled by 'max_rows_to_read' setting) exceeded, max rows: 100.00 thousand, current rows: 10.00 million. (TOO_MANY_ROWS)

I don't think we can reproduce your case with this query. I will check internally if we can have another query to repro.

@jkaflik
Copy link
Contributor

jkaflik commented Mar 27, 2024

@vmihailenco I managed to reproduce this behaviour with the query:

select throwIf(number = 1e6) from system.numbers format Null settings max_block_size = 100

Here is a PR that confirms progress is tracked with an exception being thrown: #1259

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants