You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm currently using polars to process real-time stock market data. As new market data is made available, I currently have to recompute the entire historical dataset. With millions of rows of historical data, this is hugely inefficient.
This issue proposes a mechanism to append new data into an existing streaming lazyframe, only computing the new values.
Example use case
# Start with simple pricing dataprice_data=pl.DataFrame({"price": [1.0, 2.0, 3.0, 4.0]})
# Add some computation that can be done in a streaming, append-only mannerlazy_frame=price_data.lazy().with_columns(pl.col('price').pct_change().alias('pct_change'))
# Process the "historical" pricing datacurrent_data=lazy_frame.collect(streaming=True)
# shape: (4, 2)# ┌───────┬────────────┐# │ price ┆ pct_change │# │ --- ┆ --- │# │ f64 ┆ f64 │# ╞═══════╪════════════╡# │ 1.0 ┆ null │# │ 2.0 ┆ 1.0 │# │ 3.0 ┆ 0.5 │# │ 4.0 ┆ 0.333333 │# └───────┴────────────┘# Later, new pricing information is made availablenew_price_data=pl.DataFrame({"price": [5.0]})
# Calculate the output dataframe for the new price, no need to recompute all old data lazy_frame.stream_append(new_price_data)
# shape: (1, 2)# ┌───────┬────────────┐# │ price ┆ pct_change │# │ --- ┆ --- │# │ f64 ┆ f64 │# ╞═══════╪════════════╡# │ 5.0 ┆ 0.25 │# └───────┴────────────┘
Proposed new functionality
Streaming LazyFrame Wrapper: Introduce a wrapper that maintains the state of the streaming engine within a LazyFrame.
Incremental Processing: Allow collection of the next n rows, processing/consuming only the data required to output those rows.
Append New Data: Enable appending new rows/dataframes directly to the streaming engine
The text was updated successfully, but these errors were encountered:
vultix
changed the title
Implement Append Functionality for Streaming Data in LazyFrames
Add Append Functionality for Streaming Data in LazyFrames
May 13, 2024
Description
I'm currently using polars to process real-time stock market data. As new market data is made available, I currently have to recompute the entire historical dataset. With millions of rows of historical data, this is hugely inefficient.
This issue proposes a mechanism to append new data into an existing streaming lazyframe, only computing the new values.
Example use case
Proposed new functionality
n
rows, processing/consuming only the data required to output those rows.The text was updated successfully, but these errors were encountered: