Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ctor/dtor to be made always unsafe in 1.0 #159

Open
mmastrac opened this issue Sep 1, 2021 · 9 comments
Open

ctor/dtor to be made always unsafe in 1.0 #159

mmastrac opened this issue Sep 1, 2021 · 9 comments
Labels

Comments

@mmastrac
Copy link
Owner

mmastrac commented Sep 1, 2021

This library requires you to know what you're doing, and making ctor/dtor unsafe is the right way to go. Most users are probably already using unsafe anyways, as this is often used to interface with C code.

@mmastrac mmastrac changed the title ctor/dtor to be made always unsafe in a 1.0 ctor/dtor to be made always unsafe in 1.0 Sep 1, 2021
@mmastrac mmastrac mentioned this issue Sep 1, 2021
3 tasks
@mmastrac mmastrac added the 1.0 label Sep 1, 2021
@asomers
Copy link
Contributor

asomers commented Dec 29, 2022

I disagree. unsafe doesn't mean "this could have bugs". It only means a few specific things, like "this could access invalid memory" or "this is prone to data races". Unless running before main inherently creates a risk of accessing invalid global variables or something like that, it shouldn't require unsafe.

@notgull
Copy link

notgull commented Mar 18, 2023

Unless running before main inherently creates a risk of accessing invalid global variables or something like that, it shouldn't require unsafe.

Can't it cause this, in some environments? My knowledge of libstd is rusty at best, but I think that there are at least some global variables that can't be accessed early like this. In addition, I don't think libstd should accommodate the use case where stuff happens before main().

@asomers
Copy link
Contributor

asomers commented Mar 20, 2023

Can't it cause this, in some environments? My knowledge of libstd is rusty at best, but I think that there are at least some global variables that can't be accessed early like this.

Maybe? If so that would be a good argument for being unsafe. But I don't think you should assume such a thing without finding any specific examples.

@mmastrac
Copy link
Owner Author

The big issue is that something as simple and fundamental as println can cause UB, as there is no guarantee that Rust has correctly initialized any part of std by the time we're up and running.

I've been pondering whether it is possible to allow a reduced subset of code that can run without unsafe, but most uses of ctor are just calling extern "C" functions.

@asomers
Copy link
Contributor

asomers commented Mar 26, 2023

So what would be the safety advice to the user? "Don't use anything from the standard library?" I notice that some of the examples in the README do access the standard library.

@oscartbeaumont
Copy link

It's also worth noting if you stack overflow in ctor you will get a segmentation fault. I feel like being able to cause a segmentation fault in purely safe Rust is expressly against what I understand Rust's unsafety rules to say.

#[ctor::ctor]
fn foo() {
    demo();
}

fn demo() {
    demo();
}
image

@Kixunil
Copy link

Kixunil commented Oct 25, 2023

@oscartbeaumont segmentation fault is not the same thing as UB. Programs do use segfault to avoid UB. So your demo could be the protection working as intended but I have no idea if it actually is.

Anyway, I think assuming unsafe is better since there really aren't any guarantees.

@SteveLauC
Copy link
Contributor

So what would be the safety advice to the user?

Would be really nice if we could have safety advice, I plan to replace Lazy/lazy_static with this crate to avoid runtime check, but I got memory leak (though memory leak does not mean memory-unsafe)

@simonask
Copy link

simonask commented Mar 6, 2024

I second the desire to add the requirement that functions annotated with #[ctor] must be unsafe.

The point is not that the function itself is inherently unsafe, but that a library may want to perform initialization within #[ctor] functions that must run for other safe abstractions inside main() to be sound. But because the order of ctors cannot be globally guaranteed, such abstractions would be unsafe to use in other ctors.

Hence the soundness invariant of any ctor function is at minimum that it doesn't rely on safe abstractions that require another ctor to have run. This is in line with the philosophy that "nothing happens before and after main()", in the sense that it would be nice to be able to say that anything that does happen before main() may be a prerequisite for the soundness of code inside main(). The standard library seems to be making at least somewhat similar assumptions.

This invariant would be very, very useful in conjunction with crates such as linkme.

Use case

My use case is a string interning library, where interned string "literals" are frequently present in the code. I need to guarantee that all identical strings in the program are unified before the user sees them. Without the above invariant, this is not possible to achieve without some runtime check or indirection at the point-of-use.

Ideally, I would like runtime use (i.e. within main()) to be a single load of a particular location in a linkme distributed slice, without any branches at all, or even atomics.

Example to illustrate the general idea, with many details omitted:

#[linkme::distributed_slice]
static LOCATIONS: [UnsafeCell<&'static str>] = [..];

#[ctor]
unsafe fn unify() {
    // MUST RUN BEFORE ANY CALL TO sym!() IS REACHED!
    for location in LOCATIONS {
        // Unify duplicate strings in-place.
    }
}

macro_rules! sym {
    ($string:literal) => {
         #[linkme::distributed_slice(LOCATIONS)]
         static LOCATION: UnsafeCell<&'static str> = UnsafeCell::new($string);
         unsafe {
               // MUST RUN AFTER unify()!
               *LOCATION.get()
         }
    };
}

Currently I'm solving the problem without requiring a #[ctor], and the fastest possible solution requires an indirect function call with an initial trampoline at the point of use. This is more than fast enough, but it isn't the theoretically fastest possible solution, because there is no way to introduce the invariant that calls to sym!() must not occur in other ctors.

I realize that the ctor crate is not able to guarantee that static constructors installed by other means (like linking to a C++ library) uphold the same unsafety requirement, but I would think the above argument applies to any solution that adds static constructors to Rust. They become much more useful if we're allowed to rely on them for soundness in main(), at the cost of not having that soundness in other ctors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants