-
-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Freqai's Reinforcement Learning: Incorporating Non-Feature Columns in Reward Calculation #9859
Comments
Here is another question: Would the current state of the model - such as the current position type, trading time, floating profit and loss, etc. - be added as features and used for training? When we evaluate the model's behavior using the reward function, does the model have the ability to understand its own state? Are aspects like the current position and floating profit and loss included as part of the features observable by the model? Clarifying the above questions could be helpful for creating a good model. If real-time state information is part of the features, it seems that the model can adjust its next actions based on the current investment situation. For example, when adjusting positions is not allowed, the model may be more inclined to not issue opening signals when there is an existing position. If other types of investments and floating profit and loss for different coins can also be included as features, then the model may have the potential to learn to control the investment ratio in different directions. |
Hello, For your first question regarding carrying non features into the environment, this is certainly possible, but it requires development. We are open to accepting a PR on this if you feel it is important to you. Please go ahead and submit the PR for review. Regarding your second question, yes you can include state info as features in your model during dry/live runs. Please have a look at the documentation where we outline all the functionality of Reinforcement Learning and associated parameters: https://www.freqtrade.io/en/stable/freqai-parameter-table/#reinforcement-learning-parameters cheers, rob |
The developer suggests adding functionality that would allow using values from the DataFrame in the calculate_reward function without including them as features. This would enhance model training since the reward evaluation would be more accurate, utilizing additional information about local price extremes without the bias of peeking into the future.
Hey there! Could you possibly include a code snippet about the issue you're facing? It would help make the problem clearer. |
The environment does append thoose data, to the features -like loss and trade duration- it is already there in the code, you just need to read it. The idea is good and already possible the self.prices table is the raw prices-close,open, high,low-, you are free to use them and calculate the indicators what you need even if they are not there in the features table(I've done it many times). This is also easy to do by reading the code. :) From RL perspective, might not be a good idea to reward the agent based on future stuff as you introduce more 'partial observation' stuff to your own agent. |
Describe your environment
python -V
)pip freeze | grep ccxt
)freqtrade -V
ordocker compose run --rm freqtrade -V
for Freqtrade running in docker)Describe the enhancement
I'm exploring the reinforcement learning aspect of Freqai and have a suggestion regarding the
calculate_reward
function. I believe that utilizing columns in the DataFrame which not designated as features might offer some benefits for training the model.I'm currently experimenting with creating a new column in the
feature_engineering_standard
method to mark the positions of local highest and lowest prices without including them as features. My intention is to leverage this information in thecalculate_reward
function.Clearly, directly incorporating this result as a feature would introduce lookahead bias. Determining the highest and lowest prices requires data from several candles into the future, which is unacceptable for features.
Therefore, I propose to use this information in the reward calculation without including it as a feature. During the initial training process, where the model solely relies on historical data, these local optima values could potentially enhance training efficiency by reinforcing the correlation between correct actions and current features. This could accelerate early-stage training to some extent.
Although this mechanism would be ineffective in subsequent real-time runs due to the lack of future price data, it could still serve as a valuable aid during model training.
Unfortunately, in the
MyRLEnv
class, the DataFrame I encountered only contains columns designated as features, preventing me from practically implementing this idea.Is it feasible in the current version of Freqai to utilize values from the DataFrame in the
calculate_reward
function without including them as features? I believe this would greatly benefit model training, as the features would still be derived solely from past data, while the reward function provides a more accurate evaluation.The text was updated successfully, but these errors were encountered: