Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

semi-endless loop around add_candle call #447

Open
movy opened this issue May 19, 2024 · 17 comments
Open

semi-endless loop around add_candle call #447

movy opened this issue May 19, 2024 · 17 comments
Labels
bug Something isn't working

Comments

@movy
Copy link
Contributor

movy commented May 19, 2024

Describe the bug
Until this week I'd been running 0.44 w/o any problems, but 0.48 added some welcome changes, so I switched my backtesting to 0.48 and encountered a very long loop around https://github.com/jesse-ai/jesse/blob/master/jesse/store/state_candles.py#L102 call, which sometimes causes a backtest to hang for 10-15 minutes. This happens sporadically, maybe one out of 100-200 backtests run, but it effectively it blocks the whole pipeline until problematic test is finished with this loop.

This happens on all symbols and tfs.

When I say 'around' it means impossible to track down exact line that hangs, but what I could gather via py-spy is:

  %Own   %Total  OwnTime  TotalTime  Function (filename)
 48.00% 100.00%   627.62s    652.15s   add_candle (jesse/store/state_candles.py)
 52.00%  52.00%   624.50s    624.50s   __getitem__ (jesse/libs/dynamic_numpy_array/__init__.py)
  0.00%   0.00%   0.310s    0.320s   _var (numpy/core/_methods.py)
  0.00%   0.00%   0.070s    0.180s   mean (numpy/core/fromnumeric.py)
  0.00%   0.00%   0.070s    0.110s   _mean (numpy/core/_methods.py)
  0.00%   0.00%   0.070s    0.390s   _std (numpy/core/_methods.py)
  0.00%   0.00%   0.060s    0.060s   timeframe_to_one_minutes (jesse/helpers.py)
  0.00%   0.00%   0.050s    0.050s   _count_reduce_items (numpy/core/_methods.py)
  0.00% 100.00%   0.040s    53.00s   _step_simulator (jesse/modes/backtest_mode.py)
  0.00%   0.00%   0.030s    0.630s   peaks (jesse-bot/custom_indicators/peaks.py)
  0.00%  52.00%   0.020s    25.74s   _simulate_price_change_effect (jesse/modes/backtest_mode.py)
  0.00%   0.00%   0.020s    0.020s   stochrsi (tulipy/__init__.py)
  0.00%   0.00%   0.020s    0.020s   is_collecting_data (jesse/helpers.py)
  0.00%   0.00%   0.020s    0.020s   wrapper (talib/__init__.py)
  0.00%   0.00%   0.010s    0.190s   mean (<__array_function__ internals>)
  0.00%   0.00%   0.010s    0.010s   _sum (numpy/core/_methods.py)
  0.00%   0.00%   0.010s    0.010s   is_live (jesse/helpers.py)
  0.00%   0.00%   0.010s    0.400s   std (numpy/core/fromnumeric.py)
  0.00%   0.00%   0.010s    0.010s   get_candles (jesse/store/state_candles.py)
  0.00%   0.00%   0.010s    0.010s   get_candle_source (jesse/helpers.py)
  0.00%   0.00%   0.010s    0.410s   std (<__array_function__ internals>)
  0.00%   0.00%   0.010s    0.010s   is_debuggable (jesse/helpers.py)
  0.00%   0.00%   0.010s    0.010s   _get_fixed_jumped_candle (jesse/modes/backtest_mode.py)
  0.00%   0.00%   0.010s    0.700s   _check (jesse/strategies/Strategy.py)
  0.00% 100.00%   0.000s    53.00s   backtest_rungs (jesse-bot/backtest_new.py)
  0.00% 100.00%   0.000s    53.00s   _resume_span (ray/util/tracing/tracing_helper.py)
  0.00%   0.00%   0.000s    0.690s   decorated (jesse/services/cache.py)

I've added some logging:

    def get_storage(self, exchange: str, symbol: str, timeframe: str) -> DynamicNumpyArray:
        key = jh.key(exchange, symbol, timeframe)
        print(key) <----
[...]


    def add_candle(
        [...]
        print('getting arr') <---
        arr: DynamicNumpyArray = self.get_storage(exchange, symbol, timeframe)
        print('got arr', len(arr)) <---

And from these prints we can see this method is called millions of times by a single backtest process:
image

Eventually number of calls falls:
image

Which makes me think there's loop inside loop somewhere.

I could not instantly pinpoint recent changes responsible for this. Will roll back to 0.44 for now and slowly work my way to 0.48, maybe I can find where regression first appeared.

Any help from those familiar with recent changes is appreciated @yakir4123

@movy movy added the bug Something isn't working label May 19, 2024
@yakir4123
Copy link
Contributor

yakir4123 commented May 19, 2024

@movy
yesterday i pushed a bug fix, it may be the problem but i cabt be sure.
#446

please pull the new master version and try it out.
let me know if it fixes it. if not we need to investigate it 👍
ty for the issue

can you also tell me. what tf your example was? the number of routes and the simulation start / end time?

@movy
Copy link
Contributor Author

movy commented May 19, 2024

Thanks for prompt response,
I use one route + one extra_route, and not using new fast_mode for now, as it fails for different reasons with my strategies (will investigate this later, as it's just barely faster for strategies with complex computation-heavy indicators).

I import candles for ~6 months, then split them into 40 days periods and get candles for each period. Basic code:

for index, date_range in enumerate(backtest_dates, start=0):
    print(f"Importing candles for {date_range[0]} - {date_range[1]}...")
    try:
        candles_warmup, candles_period = research.get_candles(exchange, symbol, tf, int((datetime.strptime(date_range[0], '%Y-%m-%d')).timestamp() * 1000), int((datetime.strptime(date_range[1], '%Y-%m-%d')).timestamp() * 1000), warmup_candles_num=config["warm_up_candles"], caching=True, is_for_jesse=True)
        period_candles = {
            jh.key(exchange, symbol): {
                'exchange': exchange,
                'symbol': symbol,
                # backtest candles tf is always 1m
                'candles': candles_period
            }
        }
        warmup_candles = {
            jh.key(exchange, symbol): {
                'exchange': exchange,
                'symbol': symbol,
                'candles': candles_warmup
            }
        }
        candles_refs.append(ray.put(period_candles))
        # print(candles_refs)
        warmup_candles_refs.append(ray.put(warmup_candles))
[...]

        for candles_ref in candles_refs:
            result = backtest(config=ray.get(config_ref),  
                            routes=ray.get(routes_refs[params["tf"]]),  
                            extra_routes=ray.get(extra_routes_refs[anchor_timeframe(params["tf"])]), 
                            candles=ray.get(candles_ref),  # candles within backtest date range
                            warmup_candles=ray.get(warmup_candles_refs[candles_refs.index(candles_ref)]), 
                            generate_csv=False,
                            generate_json=True,
                            hyperparameters=params,
                            fast_mode=False)

Will try master branch now.

@yakir4123
Copy link
Contributor

the higher the timeframe you use the better theperformance of fast_mose.
If you use 5m or 15m the changes will be noticable (as long as the indicators calculation is not big enough) but not as 1h or 4h

@saleh-mir
Copy link
Member

Hey Movy,

I pushed the patch that Yakir is talking about. It is version 0.48.1. Please give it a try and see if it fixes your issue.

@movy
Copy link
Contributor Author

movy commented May 19, 2024

So, good news and bad news: this regression was not caused by 0.48 (i.e. 0.48.1. did not fix it). The last version that does not hang is 0.45, so I'm sticking to it for now.

I guess some change related to warmup_candles passed as a parameter to backtest() call that was introduced in 0.46 causes this behaviour. It is also worth looking into why add_candle() is called millions of times with ever increasing arr length: even if this is intended behaviour, this is highly unoptimal (especially in python) in my view and should be replaced with a single call.

p.s. inconsistent params names 'warm_up_candles' vs 'warmup_candles' gave me some avoidable pain while I was refactoring backtest() calls. Can be improved as well, I think warm_up_* was used from the beginning, maybe we can stick with it?

@yakir4123
Copy link
Contributor

@movy
do you maybe cancel a lot of orders in your strategy?

@movy
Copy link
Contributor Author

movy commented May 20, 2024

Yes, I cancel on every candle if order was not executed. This never was a problem prior to 0.46.

@yakir4123
Copy link
Contributor

@movy
Cool I didnt know that v0.46 didnt has this problem

please try this PR
I did some fixes on these cases exactly!

#450

@movy
Copy link
Contributor Author

movy commented May 22, 2024

Thank you, @yakir4123 , I tried your branch, unfort same result. I will try to create a small script that consistently reproduces problematic behaviour, will post it here later.

@yakir4123
Copy link
Contributor

@movy , Yeah im facing the same problemim trying to solve it. I know the root of the problem. I guess i need to go back to v0.46 to see what was there

@movy
Copy link
Contributor Author

movy commented May 22, 2024

0.45 is the final version working w/o hanging, 0.46 introduced this bug

@yakir4123
Copy link
Contributor

yakir4123 commented May 24, 2024

@movy
do you mind try this branch with fixes on these cases that i believe you have too.
#455

https://github.com/yakir4123/jesse/tree/yakir/feat/delete-orders

@movy
Copy link
Contributor Author

movy commented May 25, 2024

@yakir4123 , still hangs unfort.
But I think you're moving in the right direction with identifying orders that are updated on each candle. My strategy logic is such that I update take_profit on each def update_position(self) and cancel non-executed orders on each candle via def should_cancel_entry(self):. I really will try to find time next week and post a simple repro.

@yakir4123
Copy link
Contributor

@movy
I updated the branch 1 hour ago, if you fetch the last update before that please give another try.
I do something very similar to you and now it workings for me really fast

@movy
Copy link
Contributor Author

movy commented May 25, 2024

Last commit I tested was 5ed48de
·
2 hours ago

@yakir4123
Copy link
Contributor

:/ I wonder what is it, please give me a simple strategy that repreduce that ill probably find the problem

@movy
Copy link
Contributor Author

movy commented May 26, 2024

DUAL_THRUST.py.zip

Here's one of Jesse's default strategies. I added trailing take-profit and offset-entries (i.e. entry not at current price, but at price - offset % not to enter at the candle start). These changes do not necessarily make this particular strategy more profitable, but they offer an example that hangs backtests after 70-80 iterations with Jesse after v0.45, including @yakir4123 fixes.

Tested on multiple coins and 4h candles for speed, same result. I run tests using Ray.tune (to enable more complex hyperparams perturbations and schedulers), but this particular code should run with jesse out of the box.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants