Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove use of usize from the VM core struct #1498

Open
tdelabro opened this issue Nov 28, 2023 · 12 comments
Open

Remove use of usize from the VM core struct #1498

tdelabro opened this issue Nov 28, 2023 · 12 comments
Labels
enhancement New feature or request

Comments

@tdelabro
Copy link
Contributor

tdelabro commented Nov 28, 2023

Problem

The VM execution should be deterministic regardless of if it is executed on a 16, 32 or 64 bit processor.
Having usize being used in order to define some of the core VM types (segment offset, ap, fp) is problematic because it could lead to different execution on different machines. Which, in a decentralized blockchain architecture will lead to hard forks.

Recomandation

  1. Get rid of all the usize in the cairo-vm stuct. I think it's still okay to use usize when we call language method like len() but we should convert those as soon as possible to types that have a fixed behaviour regardless of the machine.

  2. Ban the use of as. as silence the value overflows. This is not something we want at all in a blockchain context. We should use the very explicit try_from/into instead and handle errors accordingly.

Impl

I would reccomand we introduce a new wrapper struct: SegmentOffset(u64) that will be used instead of usize in all Relocatable, ap and fp. This will require impl a lot of conversion methods on it, but it will give us the level of security we need for such a core component of our infrastructure.

@tdelabro tdelabro added the enhancement New feature or request label Nov 28, 2023
@Oppen
Copy link
Member

Oppen commented Nov 28, 2023

I would reccomand we introduce a new wrapper struct: SegmentOffset(u64) that will be used instead of usize in all Relocatable, ap and fp. This will require impl a lot of conversion methods on it, but it will give us the level of security we need for such a core component of our infrastructure.

I don't really understand this part. Why not just make the offset u64? If it's just to avoid as, maybe there's a clippy lint for that?

@Oppen
Copy link
Member

Oppen commented Nov 28, 2023

Re: execution on different architectures: AFAICT, in the cases we could observe overflows in the offsets execution would fail due to exceeding the addressable space anyway, so usize should work well for any program that can in fact run correctly in smaller architectures. Not that it wouldn't be more correct to use explicit sizes anyway (I also have a more compact u64 representation for Relocatable in mind for traces).

@tdelabro
Copy link
Contributor Author

tdelabro commented Nov 29, 2023

Why not just make the offset u64?

It would work, but next time you want to change, lets say to u32, it will be hell again. With the correct encapsulation it is easier to change in the future.
I would at least use an alias type, for code readability, rather than u64.
There could also be some operation that make sense if we talk about u64 but don't if we consider memory offset. So, we could want to impl different, more specific, behaviour for the type that represent an segment offset.

execution on different architectures: AFAICT, in the cases we could observe overflows in the offsets execution would fail due to exceeding the addressable space anyway, so usize should work well for any program that can in fact run correctly in smaller architectures.

I think you are correct on this one. But it mean that it work "by luck" and it will be okay until it isn't. So yes I would prefer the more restrictive architecture.

Also some traits like parity-scale-codec::{Encode, Decode} are not implemented on usize. So getting rid of it would be more handy..

@tdelabro
Copy link
Contributor Author

Same goes with isize used as segment_index in Relocatable

@Oppen
Copy link
Member

Oppen commented Nov 29, 2023

Sounds reasonable.

@tdelabro
Copy link
Contributor Author

tdelabro commented Feb 5, 2024

@Oppen @igaray any new on this? Is it planned? It will become a requirement for Madara soon as it can break consensus super easily

@tdelabro
Copy link
Contributor Author

tdelabro commented Mar 5, 2024

@Oppen @igaray 👋
Could be nice to fix that before the 1.0.0 release, as it is a big breaking change

@fmoletta
Copy link
Member

Hello!
I understand the need to have deterministic execution which doesn't change between different architectures but I don't think changing the type of the address' offsets to u64 will fully solve this.
The memory structure of the VM is made up of vectors that each represent a memory segment, the maximum capacity of a memory segment is given by usize::MAX, therefore depending on the architecture on which the VM is run. Even if addresses are changed to represent offsets up to u64::MAX, the memory they address is still constrained to usize::MAX. A program which fails to run on a 32bit architecture due to exceeding the addressable memory will still fail after this change and it will still run correctly on a 64bit architecture

@tdelabro
Copy link
Contributor Author

the maximum capacity of a memory segment is given by usize::MAX

Yeah that sound like a bad design. It should be a u64 or a u128 or whatever. But it should be defined by the starknet spec, not by the machine the program runs on. Otherwise the virtual machine is not that virtual, it end up being tied up to the actual hardware.

When are you planning to solve all of this @fmoletta? As I say, it could be great to have it before 1.0.0 release

@fmoletta
Copy link
Member

the maximum capacity of a memory segment is given by usize::MAX

Yeah that sound like a bad design. It should be a u64 or a u128 or whatever. But it should be defined by the starknet spec, not by the machine the program runs on. Otherwise the virtual machine is not that virtual, it end up being tied up to the actual hardware.

When are you planning to solve all of this @fmoletta? As I say, it could be great to have it before 1.0.0 release

What I meant by that line is that the maximum capacity of a vector in rust is usize::MAX, therefore the maximum capacity for a memory segment is usize::MAX. It's not something we can change without changing the whole structure of the memory

@tdelabro
Copy link
Contributor Author

tdelabro commented Mar 12, 2024

Well, that is a bad design. The max size of a segment should be decided by the VM specs. Not by some implementation details which makes it depend on the hardware it is running on.

Once you decide that the max size of a segment is x (maybe 2^251-1, or something else), you implement it in a way that makes it respect the specs, regardless of the hardware. Not the other way around.

@tdelabro
Copy link
Contributor Author

tdelabro commented Mar 12, 2024

Regardless, this is a very specific edge case that most likely will never happen in an actual run.
Removing usize from everywhere else is still something relevant

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Backlog
Development

No branches or pull requests

3 participants