Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Temporal Aggregating Index #15553

Open
Tracked by #126
Max-Meldrum opened this issue May 16, 2024 · 2 comments
Open
Tracked by #126

Feature: Temporal Aggregating Index #15553

Max-Meldrum opened this issue May 16, 2024 · 2 comments
Labels
C-feature Category: feature

Comments

@Max-Meldrum
Copy link

I recently wrote a blog post about speeding up temporal aggregation queries significantly in DataFusion by using µWheel.

µWheel could potentially be used by Databend also to implement a Temporal version of Aggregating Index that pre-materializes aggregates across time.

I'd be happy to help if there is interest.

@Max-Meldrum Max-Meldrum added the C-feature Category: feature label May 16, 2024
@sundy-li
Copy link
Member

I wondered if it works with the distributed warehouse?

@Max-Meldrum
Copy link
Author

I wondered if it works with the distributed warehouse?

I would say that µWheel can be used in two different modes:

Stream Mode:

This mode assumes that the wheel will be incrementally be updated by a streaming system.
µWheel is designed around low watermarking, meaning it is up to the user/system to advance the internal time
to cause aggregates to roll up over time.

A low watermark w indicates that all records with timestamps t where t <= w have been ingested. This means
a wheel will start rejecting data with timestamps below the watermark. This assumption may not be fully compatible
with non-streaming systems.

Index Mode:

However, if you are working with static read-only datasets that are time partitioned, then µWheel is ideal as an
index on top of this data.

So, to answer the question. If the distributed warehouse does not adopt low watermarking, it is still possible to use µWheel
in Index mode. The result from different µWheel instances can be merged together if the data is sharded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature Category: feature
Projects
None yet
Development

No branches or pull requests

2 participants