Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestion: CStr literals #75401

Closed
LunarLambda opened this issue Aug 11, 2020 · 18 comments
Closed

Suggestion: CStr literals #75401

LunarLambda opened this issue Aug 11, 2020 · 18 comments
Labels
A-ffi Area: Foreign Function Interface (FFI) C-feature-request Category: A feature request, i.e: not implemented / a PR. T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@LunarLambda
Copy link

LunarLambda commented Aug 11, 2020

Why?

Currently, creating a CStr, even from a bytestring literal, is quite noisy.

// NOTE: Don't forget to add \0 at the end or this is unsound!
let cstr = unsafe { CStr::from_bytes_with_nul_unchecked(b"Hello, World\0") };

Furthermore, there's no way to ensure the well formed-ness of that literal at compile time¹, despite hardcoded C-strings being fairly common when creating bindings for C libraries.

How?

To address this, I would like to propose a new string literal type:

let cstr = c"Hello, World!";

It would function nearly identical to byte-string literals, with the following key differences:

  1. Its type is &CStr, not &[u8]
  2. It may not contain any nul bytes (\0, \x00 as well as a 'physical' nul byte are forbidden)
  3. The compiler automatically adds a nul byte at the end of the string.

See also "Alternatives?" below.

Pros?

  • Fills a niche in the language people are forced to build around²
  • Allows creation of const CStr items (currently not possible on stable)

Cons?

  • Have to commit to adding a new literal (sub)type to the language, which may require small tweaks to the parser

Alternatives?

  1. Make associated functions on CStr const. This would still leave the burden of checking on the user.
  2. Add a cstr! macro taking a byte-string literal and applying the needed checks and transformations.

¹ It is possible using proc-macros, though this poses different issues regarding ergonomics and stability.
² GitHub code search for CStr::from_bytes_with_nul_unchecked

@tesuji
Copy link
Contributor

tesuji commented Aug 11, 2020

Alternative: Make CStr associated methods const. That is a lib changes only rather than a language change.

@jonas-schievink
Copy link
Contributor

Another alternative: Provide a cstr! macro that turns a regular string literal into a &'static CStr (and checks that it doesn't contain null bytes)

@jonas-schievink jonas-schievink added A-ffi Area: Foreign Function Interface (FFI) C-feature-request Category: A feature request, i.e: not implemented / a PR. T-lang Relevant to the language team, which will review and decide on the PR/issue. labels Aug 11, 2020
@LunarLambda
Copy link
Author

Alternative: Make CStr associated methods const. That is a lib changes only rather than a language change.

This does not resolve the ergonomics / potential unsoundedness issue

Another alternative: Provide a cstr! macro that turns a regular string literal into a &'static CStr (and checks that it doesn't contain null bytes)

A macro would work (and is in fact probably better as CStr is specific to std and not a built-in type), but would have to take a byte string, as C-strings need not conform to any encoding, including str's UTF-8.

@jonas-schievink
Copy link
Contributor

It could take either, it just has to check that there's no 0 bytes contained within

@Lokathor
Copy link
Contributor

Yeah this can just be a proc macro in a crate that takes a string literal or byte string literal and then produces a &[u8].

@LunarLambda
Copy link
Author

LunarLambda commented Aug 11, 2020 via email

@Lokathor
Copy link
Contributor

Probably something similar to this https://github.com/Lokathor/utf16_lit/blob/main/src/lib.rs#L62, but instead of recoding the bytes from utf8 into utf16, you'd just spit them out directly as u8 values.

@Nemo157
Copy link
Member

Nemo157 commented Aug 14, 2020

Combining the suggestions, cstr! could soon just expand to:

const { CStr::from_bytes_with_nul($input).unwrap() }

no proc-macros necessary.

@tesuji
Copy link
Contributor

tesuji commented Aug 14, 2020

Is inline const implemented?

@Nemo157
Copy link
Member

Nemo157 commented Aug 14, 2020

No, but it is RFC-accepted and doesn't seem terribly hard to implement (he says as a non compiler developer).

@Lokathor
Copy link
Contributor

IIRC, concat! cannot currently work with bytestring values, so that is one limit to the macro_rules version.

Still, regardless of the macro details, I think we can all agree that this ability, while nice, doesn't need to exist directly within the compiler or standard library. It can be done as a standard user crate.

So people should make the helper macro they want as a crates.io crate, and then worry about it maybe being moved into the standard library at a later date.

@Nemo157
Copy link
Member

Nemo157 commented Aug 14, 2020

The const features required to make that work are not near stabilization, so only std could feasibly implement it that way in the near future (it would be possible to do without inline const, but even making from_bytes_with_nul a stable const fn seems quite far away given it needing reference transmutes, and I'm unsure of the const-panic status).

@Lokathor
Copy link
Contributor

Well, I should clarify: you can get a const &[u8] using various macro setups. Which is basically good enough.

I sure don't think you'd get a new language construct into Stable faster than a proc-macro that spits out &[u8] values could go up on crates.io

@LunarLambda
Copy link
Author

Haha, perhaps I should've looked on crates.io before opening this :D

I think I will close this, since there doesn't seem to be any particular interest in introducing something like this to core or std.

@thomcc
Copy link
Member

thomcc commented Aug 16, 2020

Note that you can't directly pass those to C, as &CStr is a fat pointer. I sometimes do

macro_rules! lit_cstr {
    ($s:literal) {
        (concat!($s, "\0").as_bytes().as_ptr() as *const ::libc::c_char)
    };
}

which produces a *const c_char that I can use directly in calls to C functions like

let errc = libc::sysctlbyname(
    lit_cstr!("hw.cpufrequency"),
    &mut out as *mut _
    &mut msize,
    null_mut(),
    0,
);

and the like. Of course, this doesn't defend against an interior NUL, but it also takes a literal (reducing the likelihood), and is still safe if it happens since it would just see the end of the str sooner.

IMO CStr's primary value is when dealing with:

  1. non-literal rust strings, that need either checking / are non-'static and thus need to carry the lifetime whatever they're derived from / etc
  2. converting from *const c_char to Rust strs using std::ffi::CStr::from_ptr and such.

But for literals I've been using lit_cstr-alike macros for a bit now and havent hit a downside. (Okay, sometimes I end up adding a bit more type safety to the pointers that come out of the macro, but nothing crazy).

@MoSal
Copy link

MoSal commented Apr 22, 2022

Is there a parallel issue for OsStr?

@Nugine
Copy link
Contributor

Nugine commented Apr 28, 2022

It is possible to convert a const rust string to a const c string at compile time now. (&str -> *const c_char).
https://github.com/Nugine/const-str/blob/master/const-str/src/__ctfe/cstr.rs
https://docs.rs/const-str/0.3.1/const_str/macro.raw_cstr.html

@Nilstrieb
Copy link
Member

An RFC has been accepted. Tracking issue: #105723

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ffi Area: Foreign Function Interface (FFI) C-feature-request Category: A feature request, i.e: not implemented / a PR. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

9 participants