You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I wondered if it works with the distributed warehouse?
I would say that µWheel can be used in two different modes:
Stream Mode:
This mode assumes that the wheel will be incrementally be updated by a streaming system.
µWheel is designed around low watermarking, meaning it is up to the user/system to advance the internal time
to cause aggregates to roll up over time.
A low watermark w indicates that all records with timestamps t where t <= w have been ingested. This means
a wheel will start rejecting data with timestamps below the watermark. This assumption may not be fully compatible
with non-streaming systems.
Index Mode:
However, if you are working with static read-only datasets that are time partitioned, then µWheel is ideal as an
index on top of this data.
So, to answer the question. If the distributed warehouse does not adopt low watermarking, it is still possible to use µWheel
in Index mode. The result from different µWheel instances can be merged together if the data is sharded.
I recently wrote a blog post about speeding up temporal aggregation queries significantly in DataFusion by using µWheel.
µWheel could potentially be used by Databend also to implement a Temporal version of Aggregating Index that pre-materializes aggregates across time.
I'd be happy to help if there is interest.
The text was updated successfully, but these errors were encountered: