Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking Issue for c"…" string literals #105723

Closed
7 of 12 tasks
tmandry opened this issue Dec 14, 2022 · 21 comments
Closed
7 of 12 tasks

Tracking Issue for c"…" string literals #105723

tmandry opened this issue Dec 14, 2022 · 21 comments
Labels
B-RFC-approved Feature: Approved by a merged RFC but not yet implemented. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. F-c_str_literals `#![feature(c_str_literals)]` T-lang Relevant to the language team, which will review and decide on the PR/issue.

Comments

@tmandry
Copy link
Member

tmandry commented Dec 14, 2022

This is a tracking issue for the RFC "c"…" string literals" (rust-lang/rfcs#3348).
The feature gate for the issue is #![feature(c_str_literals)].

Steps / History

Unresolved Questions

  • Also add c'…' C character literals? (u8, i8, c_char, or something more flexible?)
  • Should we make &CStr a thin pointer before stabilizing this? (If so, how?)
  • Should the (unstable) concat_bytes macro accept C string literals? (If so, should it evaluate to a C string or byte string?)
  • Should there be a valid UTF-8 C string type that c"..." string literals support? (comment 1, comment 2)
@tmandry tmandry added B-RFC-approved Feature: Approved by a merged RFC but not yet implemented. T-lang Relevant to the language team, which will review and decide on the PR/issue. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. F-c_str_literals `#![feature(c_str_literals)]` labels Dec 14, 2022
@CAD97
Copy link
Contributor

CAD97 commented Dec 15, 2022

w.r.t.

c'…' character literals

IIRC, C23 accepted N2653, which defines char8_t as a typedef for unsigned char. (C++20's char8_t is a distinct type.)

Assuming I'm correct that N2653 was accepted, it probably makes sense to define Rust's c"…" as being compatible with C's u"…" (char8_t[]) and Rust's c'…' as being compatible with C's u8'…' (char8_t).

To be explicit, this is not intended to argue that c"…"Rust should be usable as *const c_uchar, just that c"…".as_ptr()Rust should have the same memory content as u8"…"C. It may argue for c"…"Rust requiring UTF-8 well formedness; I do not know whether C's u8"…" allows constructing invalid UTF-8 via escapes.

Note that &CStr's pointer representation uses c_charRust, not c_ucharRust (which would match char8_tC). This serves as a counterargument that c strings in Rust should match bare string/character literals in C, which use unqualified charC, rather than the C17 (u8'…') and C231 (u8"…") prefixed literals.

The presence of UTF-8 literals being unsigned char in C may additionally add weak motivation for a UTF-8 CStr type for Rust separate from CStr, which corresponds to the unqualified c_char type.

Footnotes

  1. To be fully accurate to my understanding, C17 added u8"…" literals of type array of char, and C23 changed their type to array of char8_t. Similarly, C17 added u8'…' literals of type unsigned char, and C23 changed their type to char8_t (for no semantic change, as char8_t is a typedef of unsigned char).

@ojeda
Copy link
Contributor

ojeda commented Dec 15, 2022

Assuming I'm correct that N2653 was accepted

Yeah, char8_t was voted into C23.

It may argue for c"…"Rust requiring UTF-8 well formedness; I do not know whether C's u8"…" allows constructing invalid UTF-8 via escapes.

I think it was possible before, and will still be in C23:

"Any hexadecimal escape sequence or octal escape sequence specified in a u8, u, or U string specifies a single char8_t, char16_t, or char32_t value and may result in the full character sequence not being valid UTF-8, UTF-16, or UTF-32."

@dead-claudia
Copy link

I do want to note a couple things in the alternatives:

  1. While I like the idea of a FromStringLiteral, error messages would need at least a character offset to be actually useful beyond a mere compile-time panic!. Also, &CStrs are really more like byte string literals, and that offset would have to be a byte offset, not a char offset, too.

  2. If some platforms use a u8 for c_char while others use a i8, you need the c'...' literal to be assignable to c_char regardless of which it is in order to be portable. Allowing it to coerce to either i8 or u8 is probably better than having it exactly what c_char is, though. (Ideally, c_char should've been #[repr(transparent)] struct c_char(pub i8/u8); to force people to differentiate the two, but the time to change to that is long past.)

@tmandry
Copy link
Member Author

tmandry commented Jan 5, 2023

@CAD97 @dead-claudia Would you be willing to open issues for these points and add the F-c_str_literal label (or tag me and I can add it)? Otherwise they are likely to get lost in the discussion. Tracking issues are meant to serve as a hub to point to other threads, not to be a place for technical discussion themselves.

@dead-claudia
Copy link

Created #106479 for my first point. The second point's not likely to be relevant before a PR, and there's really not much to discuss there.

@tgross35
Copy link
Contributor

Should we make &CStr a thin pointer before stabilizing this? (If so, how?)

I don't think there would be anything about that change that should block this, it would just have to be updated whenever CStr becomes thin. The change itself is blocked, I think #81513 would solve it, but it is still quite young. Seems like there hasn't been much discussion on the topic recently anyway - #59905 is the only issue I could find, but it's pretty unofficial and stale

@fee1-dead fee1-dead self-assigned this Mar 5, 2023
Manishearth added a commit to Manishearth/rust that referenced this issue May 4, 2023
…r-errors

Implement RFC 3348, `c"foo"` literals

RFC: rust-lang/rfcs#3348
Tracking issue: rust-lang#105723
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue May 4, 2023
…r-errors

Implement RFC 3348, `c"foo"` literals

RFC: rust-lang/rfcs#3348
Tracking issue: rust-lang#105723
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue May 5, 2023
…r-errors

Implement RFC 3348, `c"foo"` literals

RFC: rust-lang/rfcs#3348
Tracking issue: rust-lang#105723
@fee1-dead fee1-dead removed their assignment May 18, 2023
flip1995 pushed a commit to flip1995/rust that referenced this issue May 20, 2023
…r-errors

Implement RFC 3348, `c"foo"` literals

RFC: rust-lang/rfcs#3348
Tracking issue: rust-lang#105723
@dtolnay
Copy link
Member

dtolnay commented Jun 20, 2023

c"…" literals don't work right in proc macros — #112820.

@dtolnay
Copy link
Member

dtolnay commented Jul 7, 2023

I updated the summary comment with recent history.

@jmillikin
Copy link
Contributor

jmillikin commented Oct 31, 2023

I'll take a crack at offering non-official but opinion-ful answers for the unanswered questions.

Also add c'…' C character literals? (u8, i8, c_char, or something more flexible?)

Answer: no

Byte b'.' literals make sense because a b"..." and &[u8; N] are isomorphic -- there's no requirements about byte content, so b"abc" and &[b'a', b'b', b'c'] are clearly equivalent for all purposes.

C string literals c"..." don't have this property, and it doesn't make sense to try to write something like &[c'a', c'b', c'c'] because that's not a valid C string (no terminal NUL). The only thing c'.' literals could offer over today's b'.' is implicit coercion to c_char on platforms where that type is signed -- which has some value, but it's not really related to C string literals.

Presumably the parser is flexible enough (given b"..." and b'.' co-existing today) that c'.' literals could be added in the future should the need prove sufficient.

Should we make &CStr a thin pointer before stabilizing this? (If so, how?)

Answer: no

It's already possible to create a &CStr in a const context (as of Rust v1.59), so this feature isn't providing capabilities or enabling functionality that would need to be walked back if the internal representation of CStr changes. The following block of code does the same thing as a c"..." literal and it's valid stable Rust:

const fn c(bytes: &[u8]) -> &core::ffi::CStr {
    unsafe { core::ffi::CStr::from_bytes_with_nul_unchecked(bytes) }
}

const HELLO: &core::ffi::CStr = c(b"hello, world!\x00");

Should the (unstable) concat_bytes macro accept C string literals? (If so, should it evaluate to a C string or byte string?)

Answer: no

C strings have two equally valid &[u8] representations (with or without terminal NUL), so allowing them to be used in concat_bytes! would be ambiguous.

If people really want to concatenate c"..." literals for some reason, a concat_cstrs! macro would avoid the ambiguity because there's only one possible answer for "NUL or no?" to each &CStr being concatenated.

Should there be a valid UTF-8 C string type that c"..." string literals support?

Answer: sounds useful but unrelated to this feature

Since the &CStr type represents C strings in an undefined encoding the c"..." literal syntax should do the same. The ability to have values that assert dual properties of (1) valid C string and (2) valid UTF-8 seems useful, but that would need to be its own type (core::ffi::Utf8CStr ?), and a separate literal syntax (cu"..." ?) to avoid type ambiguity.

After all, every &Utf8CStr would also be a valid &CStr, but the opposite would not be true, so if c"..." could be either one then the following code would be a type error:

impl AsRef<CStr> for Utf8CStr;

fn print_c_str(s: &impl AsRef<CStr>) { ... }

print_c_str(c"\xc3\xb5"); // ambiguous AsRef

Alternatively, if the compiler prioritized &CStr when inferring the type of c"..." literals, you'd end up with confusing and spooky behavior:

print_c_str(c"\xc3\xb5"); // does this print "õ" or "õ"? can't tell from call site

@tgross35
Copy link
Contributor

tgross35 commented Oct 31, 2023

I'll take a crack at offering non-official but opinion-ful answers for the unanswered questions.

Also add c'…' C character literals? (u8, i8, c_char, or something more flexible?)

Answer: no

I actually wouldn't mind having this since I occasionally come across FFI places where I need to compare a single c_char to a literal, and I can't just use 'y' or b'y' because c_char is signed on most platforms. If c_char: PartialEq<u8> worked than that would be nice, but type aliases....

Anyway, I think this is something that doesn't need a decision at this time.

Other answers seem reasonable to me, at least I don't think there are any blockers.


One minor concern I have is that the reimplementation of #113476 has only been on stable less than a month and #116124 is not yet in stable. But I think that is fine assuming the stabilization of this feature won't land until 1.76 at the earliest.

I do also kind of like the RFC's proposed idea that let x: &Cstr = "foo"; work instead of adding the c"..." syntax, especially so macros can accept a CStr literal where needed and macro users can write an unprefixed string literal (common case for plugin-type things). But that can likely be a separate discussion.

dtolnay/syn#1502 is a big ecosystem thing but it shouldn't block anything

I think that it would be reasonable to open a stabilization PR if you are up for it, should be pretty minimal and the lang team can discuss / FCP there.

@jmillikin
Copy link
Contributor

Concerns about soak time make sense. I'll hope for the best in regard to timing, but if it has to wait an extra release or two to verify ecosystem compatibility then that's just life.

I think that it would be reasonable to open a stabilization PR if you are up for it, should be pretty minimal and the lang team can discuss / FCP there.

OK, PR #117472 created -- this is my first time trying to stabilize a rustc feature, so hopefully I didn't mangle it too badly.

@madsmtm
Copy link
Contributor

madsmtm commented Nov 15, 2023

Just noting that this interacts with the recently accepted RFC 3349 (tracking issue), which modifies the allowed escape codes in strings.

I checked the table in that RFC, and I think the rows for c"..."/cr"..." would be exactly the same as for b"..."/br"...".

bors added a commit to rust-lang-ci/rust that referenced this issue Dec 1, 2023
…ilstrieb

Stabilize C string literals

RFC: https://rust-lang.github.io/rfcs/3348-c-str-literal.html

Tracking issue: rust-lang#105723

Documentation PR (reference manual): rust-lang/reference#1423

# Stabilization report

Stabilizes C string and raw C string literals (`c"..."` and `cr#"..."#`), which are expressions of type [`&CStr`](https://doc.rust-lang.org/stable/core/ffi/struct.CStr.html). Both new literals require Rust edition 2021 or later.

```rust
const HELLO: &core::ffi::CStr = c"Hello, world!";
```

C strings may contain any byte other than `NUL` (`b'\x00'`), and their in-memory representation is guaranteed to end with `NUL`.

## Implementation

Originally implemented by PR rust-lang#108801, which was reverted due to unintentional changes to lexer behavior in Rust editions < 2021.

The current implementation landed in PR rust-lang#113476, which restricts C string literals to Rust edition >= 2021.

## Resolutions to open questions from the RFC

* Adding C character literals (`c'.'`) of type `c_char` is not part of this feature.
  * Support for `c"..."` literals does not prevent `c'.'` literals from being added in the future.
* C string literals should not be blocked on making `&CStr` a thin pointer.
  * It's possible to declare constant expressions of type `&'static CStr` in stable Rust (as of v1.59), so C string literals are not adding additional coupling on the internal representation of `CStr`.
* The unstable `concat_bytes!` macro should not accept `c"..."` literals.
  * C strings have two equally valid `&[u8]` representations (with or without terminal `NUL`), so allowing them to be used in `concat_bytes!` would be ambiguous.
* Adding a type to represent C strings containing valid UTF-8 is not part of this feature.
  * Support for a hypothetical `&Utf8CStr` may be explored in the future, should such a type be added to Rust.
bors added a commit to rust-lang/miri that referenced this issue Dec 2, 2023
Stabilize C string literals

RFC: https://rust-lang.github.io/rfcs/3348-c-str-literal.html

Tracking issue: rust-lang/rust#105723

Documentation PR (reference manual): rust-lang/reference#1423

# Stabilization report

Stabilizes C string and raw C string literals (`c"..."` and `cr#"..."#`), which are expressions of type [`&CStr`](https://doc.rust-lang.org/stable/core/ffi/struct.CStr.html). Both new literals require Rust edition 2021 or later.

```rust
const HELLO: &core::ffi::CStr = c"Hello, world!";
```

C strings may contain any byte other than `NUL` (`b'\x00'`), and their in-memory representation is guaranteed to end with `NUL`.

## Implementation

Originally implemented by PR rust-lang/rust#108801, which was reverted due to unintentional changes to lexer behavior in Rust editions < 2021.

The current implementation landed in PR rust-lang/rust#113476, which restricts C string literals to Rust edition >= 2021.

## Resolutions to open questions from the RFC

* Adding C character literals (`c'.'`) of type `c_char` is not part of this feature.
  * Support for `c"..."` literals does not prevent `c'.'` literals from being added in the future.
* C string literals should not be blocked on making `&CStr` a thin pointer.
  * It's possible to declare constant expressions of type `&'static CStr` in stable Rust (as of v1.59), so C string literals are not adding additional coupling on the internal representation of `CStr`.
* The unstable `concat_bytes!` macro should not accept `c"..."` literals.
  * C strings have two equally valid `&[u8]` representations (with or without terminal `NUL`), so allowing them to be used in `concat_bytes!` would be ambiguous.
* Adding a type to represent C strings containing valid UTF-8 is not part of this feature.
  * Support for a hypothetical `&Utf8CStr` may be explored in the future, should such a type be added to Rust.
@WaffleLapkin
Copy link
Member

Closing this as per the merged stabilization PR: #117472

flip1995 pushed a commit to flip1995/rust-clippy that referenced this issue Dec 5, 2023
Stabilize C string literals

RFC: https://rust-lang.github.io/rfcs/3348-c-str-literal.html

Tracking issue: rust-lang/rust#105723

Documentation PR (reference manual): rust-lang/reference#1423

# Stabilization report

Stabilizes C string and raw C string literals (`c"..."` and `cr#"..."#`), which are expressions of type [`&CStr`](https://doc.rust-lang.org/stable/core/ffi/struct.CStr.html). Both new literals require Rust edition 2021 or later.

```rust
const HELLO: &core::ffi::CStr = c"Hello, world!";
```

C strings may contain any byte other than `NUL` (`b'\x00'`), and their in-memory representation is guaranteed to end with `NUL`.

## Implementation

Originally implemented by PR rust-lang/rust#108801, which was reverted due to unintentional changes to lexer behavior in Rust editions < 2021.

The current implementation landed in PR rust-lang/rust#113476, which restricts C string literals to Rust edition >= 2021.

## Resolutions to open questions from the RFC

* Adding C character literals (`c'.'`) of type `c_char` is not part of this feature.
  * Support for `c"..."` literals does not prevent `c'.'` literals from being added in the future.
* C string literals should not be blocked on making `&CStr` a thin pointer.
  * It's possible to declare constant expressions of type `&'static CStr` in stable Rust (as of v1.59), so C string literals are not adding additional coupling on the internal representation of `CStr`.
* The unstable `concat_bytes!` macro should not accept `c"..."` literals.
  * C strings have two equally valid `&[u8]` representations (with or without terminal `NUL`), so allowing them to be used in `concat_bytes!` would be ambiguous.
* Adding a type to represent C strings containing valid UTF-8 is not part of this feature.
  * Support for a hypothetical `&Utf8CStr` may be explored in the future, should such a type be added to Rust.
@GoldsteinE
Copy link
Contributor

GoldsteinE commented Jan 8, 2024

Stabilization is reverted in #119528, so probably this should be open again?

@traviscross
Copy link
Contributor

@rustbot labels +I-lang-nominated

Nominating to discuss and ensure we're OK with the restabilization that will occur automatically if we take no action.

@rustbot rustbot added the I-lang-nominated The issue / PR has been nominated for discussion during a lang team meeting. label Jan 29, 2024
@traviscross
Copy link
Contributor

@rustbot labels -I-lang-nominated

We discussed this in the triage call today and agreed, given that #119172 has landed, that we were happy to see this restabilize in Rust 1.77.

@rustbot rustbot removed the I-lang-nominated The issue / PR has been nominated for discussion during a lang team meeting. label Jan 31, 2024
@stefson
Copy link

stefson commented Feb 6, 2024

Was this fully stabilized for rust-1.77.0_beta? I am failing to bootstrap the beta from the beta tarball with: error[E0658]: c".." literals are experimental

@ehuss
Copy link
Contributor

ehuss commented Feb 6, 2024

@stefson Yes, it was fully stabilized in 1.77. If you are unable to determine your issue, I would suggest opening a new issue with the exact reproduction steps and commands to run.

I'm going to close this issue as the revert only happened on the old beta branch (1.76), and is on track as stabilized in the new beta branch (1.77).

@ehuss ehuss closed this as completed Feb 6, 2024
freebsd-git pushed a commit to freebsd/freebsd-ports that referenced this issue Apr 4, 2024
error[E0658]: `c".."` literals are experimental
  --> src/bar.rs:61:13
   |
61 |             c"i3bar-river".into(),
   |             ^^^^^^^^^^^^^^
   |
   = note: see issue #105723 <rust-lang/rust#105723> for more information

error[E0658]: `c".."` literals are experimental
   --> src/wm_info_provider/river.rs:118:33
    |
118 |             PointerBtn::Left => c"set-focused-tags",
    |                                 ^^^^^^^^^^^^^^^^^^^
    |
    = note: see issue #105723 <rust-lang/rust#105723> for more information

error[E0658]: `c".."` literals are experimental
   --> src/wm_info_provider/river.rs:119:34
    |
119 |             PointerBtn::Right => c"toggle-focused-tags",
    |                                  ^^^^^^^^^^^^^^^^^^^^^^
    |
    = note: see issue #105723 <rust-lang/rust#105723> for more information

Reported by:	pkg-fallout
(direct commit to 2024Q1 as 73941e6 is missing on the branch)
lnicola pushed a commit to lnicola/rust-analyzer that referenced this issue Apr 7, 2024
Stabilize C string literals

RFC: https://rust-lang.github.io/rfcs/3348-c-str-literal.html

Tracking issue: rust-lang/rust#105723

Documentation PR (reference manual): rust-lang/reference#1423

# Stabilization report

Stabilizes C string and raw C string literals (`c"..."` and `cr#"..."#`), which are expressions of type [`&CStr`](https://doc.rust-lang.org/stable/core/ffi/struct.CStr.html). Both new literals require Rust edition 2021 or later.

```rust
const HELLO: &core::ffi::CStr = c"Hello, world!";
```

C strings may contain any byte other than `NUL` (`b'\x00'`), and their in-memory representation is guaranteed to end with `NUL`.

## Implementation

Originally implemented by PR rust-lang/rust#108801, which was reverted due to unintentional changes to lexer behavior in Rust editions < 2021.

The current implementation landed in PR rust-lang/rust#113476, which restricts C string literals to Rust edition >= 2021.

## Resolutions to open questions from the RFC

* Adding C character literals (`c'.'`) of type `c_char` is not part of this feature.
  * Support for `c"..."` literals does not prevent `c'.'` literals from being added in the future.
* C string literals should not be blocked on making `&CStr` a thin pointer.
  * It's possible to declare constant expressions of type `&'static CStr` in stable Rust (as of v1.59), so C string literals are not adding additional coupling on the internal representation of `CStr`.
* The unstable `concat_bytes!` macro should not accept `c"..."` literals.
  * C strings have two equally valid `&[u8]` representations (with or without terminal `NUL`), so allowing them to be used in `concat_bytes!` would be ambiguous.
* Adding a type to represent C strings containing valid UTF-8 is not part of this feature.
  * Support for a hypothetical `&Utf8CStr` may be explored in the future, should such a type be added to Rust.
RalfJung pushed a commit to RalfJung/rust-analyzer that referenced this issue Apr 27, 2024
Stabilize C string literals

RFC: https://rust-lang.github.io/rfcs/3348-c-str-literal.html

Tracking issue: rust-lang/rust#105723

Documentation PR (reference manual): rust-lang/reference#1423

# Stabilization report

Stabilizes C string and raw C string literals (`c"..."` and `cr#"..."#`), which are expressions of type [`&CStr`](https://doc.rust-lang.org/stable/core/ffi/struct.CStr.html). Both new literals require Rust edition 2021 or later.

```rust
const HELLO: &core::ffi::CStr = c"Hello, world!";
```

C strings may contain any byte other than `NUL` (`b'\x00'`), and their in-memory representation is guaranteed to end with `NUL`.

## Implementation

Originally implemented by PR rust-lang/rust#108801, which was reverted due to unintentional changes to lexer behavior in Rust editions < 2021.

The current implementation landed in PR rust-lang/rust#113476, which restricts C string literals to Rust edition >= 2021.

## Resolutions to open questions from the RFC

* Adding C character literals (`c'.'`) of type `c_char` is not part of this feature.
  * Support for `c"..."` literals does not prevent `c'.'` literals from being added in the future.
* C string literals should not be blocked on making `&CStr` a thin pointer.
  * It's possible to declare constant expressions of type `&'static CStr` in stable Rust (as of v1.59), so C string literals are not adding additional coupling on the internal representation of `CStr`.
* The unstable `concat_bytes!` macro should not accept `c"..."` literals.
  * C strings have two equally valid `&[u8]` representations (with or without terminal `NUL`), so allowing them to be used in `concat_bytes!` would be ambiguous.
* Adding a type to represent C strings containing valid UTF-8 is not part of this feature.
  * Support for a hypothetical `&Utf8CStr` may be explored in the future, should such a type be added to Rust.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
B-RFC-approved Feature: Approved by a merged RFC but not yet implemented. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. F-c_str_literals `#![feature(c_str_literals)]` T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests