Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recording of robotics nodes #517

Merged
merged 28 commits into from
Oct 11, 2023
Merged

Recording of robotics nodes #517

merged 28 commits into from
Oct 11, 2023

Conversation

h3ndrk
Copy link
Member

@h3ndrk h3ndrk commented Jul 16, 2023

Introduced Changes

  • PersistentState is renamed to CyclerState because it stores state in a cycler
  • Node states, setup main outputs, and cross-inputs (from other cyclers) are recorded by serializing to Bincode and writing it to disk
  • hardware.json is split into framework.json and hardware.json
  • Recording can be enabled by changing the framework.json

ToDo / Known Issues

  • Naming of the structs?
  • Fix Pepsi parameter setting

Ideas for Next Iterations (Not This PR)

  • Delete localization recording
  • Change pixel storage
  • Reuce frequency
  • Same seconds in launchHULK log files and recording file names
  • Reduce recording interleavement
  • Add channel directly from robotics domain to "runtime" framework

How to Test

Deploy to a NAO, check that the cyclers are enabled for recording. When the NAO is in playing primary state, the NAO records to logs directory in Bincode format.

crates/context_attribute/src/lib.rs Outdated Show resolved Hide resolved
@knoellle knoellle self-assigned this Jul 19, 2023
@h3ndrk h3ndrk assigned h3ndrk and unassigned knoellle Jul 25, 2023
@h3ndrk h3ndrk disabled auto-merge August 13, 2023 09:21
@h3ndrk h3ndrk force-pushed the states branch 3 times, most recently from 7864390 to e9f160f Compare August 13, 2023 18:44
Copy link
Contributor

@knoellle knoellle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good so far. I have left some comments.
From what I can see, the main things left to implement are:

  • serializing node state
  • writing frames to a file

crates/code_generation/src/cyclers.rs Outdated Show resolved Hide resolved
crates/code_generation/src/cyclers.rs Outdated Show resolved Hide resolved
crates/code_generation/src/cyclers.rs Show resolved Hide resolved
tools/behavior_simulator/src/interfake.rs Outdated Show resolved Hide resolved
@schmidma
Copy link
Member

Did I miss something? I can't find the NodeState anywhere?

@knoellle
Copy link
Contributor

There is no NodeState identifier.
All member field of the node structs are part of the node state.
However, fields that are skipped during (de)serialization must be considered ephemeral and should only be used for caching.

@schmidma
Copy link
Member

As you are implementing Serialize and Deserialize for nearly all nodes, is it possible to expose these to communication now, such that they can be subscribed via twix?

@knoellle
Copy link
Contributor

Not without further changes, no.
I played around with it for a bit and I don't see a way to construct an AdditionalOutput with a type path that is valid both in the module itself as well as in the generated structs module for the database types.
The only way I see for this to work is to move the state to a separate crate (as we do with types), but that's no different from the state on the main branch.

@schmidma
Copy link
Member

Sorry, my previous statement was phrased in a confusing manner. I was expecting further necessary changes. The question was meant in general. We introduce a new (very strong) requirement to nodes: implementing Serialize and Deserialize. We previously decided against such requirements. But introducing these implementations, a node can directly serialized and thus also exposed via communication.

To realize such a thing, I imagine the generated framework to expose not only databases containing the main and additional outputs, but also the list of nodes to communication. I think they do not necessarily need to exist in a separate crate for that, do they?

Maybe we should discuss further details after the upcoming dev meeting or at another occasion.

@knoellle
Copy link
Contributor

Yes, introducing this trait bound certainly opens up the possibility of exposing all nodes state via communication.
Although we also have to consider SerializeHierarchy.

@h3ndrk
Copy link
Member Author

h3ndrk commented Aug 14, 2023

Did I miss something? I can't find the NodeState anywhere?

Sorry, please ignore outdated the description... 😅

@h3ndrk
Copy link
Member Author

h3ndrk commented Aug 14, 2023

As you are implementing Serialize and Deserialize for nearly all nodes, is it possible to expose these to communication now, such that they can be subscribed via twix?

In theory yes but out-of-scope for this PR.

@h3ndrk
Copy link
Member Author

h3ndrk commented Aug 14, 2023

Not without further changes, no.

I played around with it for a bit and I don't see a way to construct an AdditionalOutput with a type path that is valid both in the module itself as well as in the generated structs module for the database types.

The only way I see for this to work is to move the state to a separate crate (as we do with types), but that's no different from the state on the main branch.

If I would implement it, I would build a separate Communication infrastructure and don't rely on e.g. additional outputs. But this needs more design work.

@h3ndrk
Copy link
Member Author

h3ndrk commented Aug 14, 2023

Yes, introducing this trait bound certainly opens up the possibility of exposing all nodes state via communication.

Although we also have to consider SerializeHierarchy.

Exactly 100% what Schmiddy wrote above. But I think we don't need a trait, Code generation should be enough. But it needs more thinking.

@h3ndrk
Copy link
Member Author

h3ndrk commented Aug 20, 2023

The current approach has been tested on a real NAO. Here are the findings and conclusions:

Size of recorded data

1 - 3 GiB data is written per minute. 10 minutes of one half time would be 10 - 30 GiB. The NAO has 32 GiB internal memory. USB sticks can be used for external storage, our usual sticks have 32 GiB storage. That means we might run into storage size problems during games.

One possibility is to optimize the amount of data further but it is unlikely that this will reduce data usage by orders of magnitude. Only some cyclers accumulate a lot of data (vision) and the serialization only has 30 % overhead compared to the raw data. Compression is unlikely to help because this is mostly image data which has high entropy and is hard to compress losslessy. Another possibility is to reduce the frequency of recorded frames, maybe even for some cyclers while recording other cyclers at full frequency.

Execution time impact

The biggest impact has vision recording with 20 ms Bincode serialization and writing to BufWriter<File>. This leaves 13 ms headroom for the actual calculation which is nearly the time our current vision pipeline needs for calculation. Control cycler recording takes 0.3 ms and the whole cycle takes 2 ms (10 ms headroom). The other cyclers are even faster. Again, vision is the critical path and needs special care. If we would reduce the frequency, only every other frame would take longer. Maybe we would skip images but we control that via the frequency. One idea is to change the internal buffer format of YCbCr422Image to Vec<u8> to decrease one layer of serialization hopefully resulting in simple generated code. All accesses must be changed to use the different buffer format. Maybe we need to introduce some unsafe stuff to efficiently cast pointers around.

@h3ndrk
Copy link
Member Author

h3ndrk commented Aug 20, 2023

Some ideas to support different frequencies and dynamic recording:

Dynamic recording may be triggered by different parties. For example a trigger might be the current game state (robotics domain) or someone requesting a recording from Twix (debug tooling, framework domain via communication) or it has been requested from the deployed configuration (framework domain via communication).

Communication and the hardware interface will both provide a quadruple state: Force to not record, force to record, intend to record, intend to not record. This is also the order of descending precendence. Communication will provide a state for each cycler enabling fine-grained control of the recording. The hardware interface may just provide a single state for the moment, but this can be extended later to one for each cycler.

If the cycler derives from all states that a recording of the current frame should happen, it records everything and sends the data to its instance of a channel to communication. Communication will receive the data and store it in the file system. We might think about, reducing computation overhead in the cyclers even more by moving the serialization into communication.

Communication will also provide recording frequency information, for example whether every frame should be recorded, none at all or at which frequency. One approach is to have this fixed over the whole process runtime by storing this information in e.g. the hardware.json configuration and loading it once at startup. Another approach is to also provide this information via communication and be able to control it from a connected debug client at runtime. Dynamic frequencies have the benefit of more fine-grained analysis during testing without restarting the HULK process.

Communication will serve an additional API endpoint for recording s.t. clients are able to control the recording state and maybe also the frequency.

@h3ndrk h3ndrk force-pushed the states branch 3 times, most recently from 4f7a719 to 2b3d884 Compare October 1, 2023 13:49
crates/code_generation/src/cyclers.rs Outdated Show resolved Hide resolved
crates/code_generation/src/cyclers.rs Outdated Show resolved Hide resolved
crates/code_generation/src/cyclers.rs Outdated Show resolved Hide resolved
crates/code_generation/src/cyclers.rs Outdated Show resolved Hide resolved
crates/control/src/localization_recorder.rs Show resolved Hide resolved
@h3ndrk h3ndrk enabled auto-merge October 1, 2023 17:37
@schmidma schmidma self-assigned this Oct 2, 2023
Copy link
Member

@schmidma schmidma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only minor remarks. This looks good to me in general. Let's see a test game before merging this?

crates/code_generation/src/cyclers.rs Outdated Show resolved Hide resolved
crates/code_generation/src/cyclers.rs Outdated Show resolved Hide resolved
crates/code_generation/src/cyclers.rs Show resolved Hide resolved
crates/code_generation/src/run.rs Outdated Show resolved Hide resolved
crates/code_generation/src/run.rs Outdated Show resolved Hide resolved
crates/code_generation/src/run.rs Show resolved Hide resolved
crates/audio/src/whistle_detection.rs Show resolved Hide resolved
crates/control/src/localization_recorder.rs Show resolved Hide resolved
crates/hardware/src/lib.rs Outdated Show resolved Hide resolved
crates/code_generation/src/run.rs Outdated Show resolved Hide resolved
@h3ndrk h3ndrk changed the title Rework state handling in our framework Recording of robotics nodes Oct 3, 2023
@h3ndrk h3ndrk added this pull request to the merge queue Oct 11, 2023
Merged via the queue into HULKs:main with commit 0c64498 Oct 11, 2023
16 checks passed
@h3ndrk h3ndrk deleted the states branch October 11, 2023 16:42
This was referenced Mar 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants