Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse '-' in numeric values used in test case #19

Open
frondeus opened this issue Oct 1, 2019 · 12 comments
Open

Parse '-' in numeric values used in test case #19

frondeus opened this issue Oct 1, 2019 · 12 comments
Labels
enhancement New feature or request spike Need to research topic more to come up with a solution
Projects

Comments

@frondeus
Copy link
Owner

frondeus commented Oct 1, 2019

For:

#[test_case(15, 15)]
#[test_case(-15, 15)]

fn test_abs(value: f64, expected: f64) {
   assert_eq!(value.abs(), expected)
}

We get error:

error[E0428]: the name _15_15 is defined multiple times
--> src/test.rs
|
| #[test_case(15, 15)]
| ^^^^^^^^^^^^^^^^^^^^
| |
| previous definition of the value _15_15 here
| _15_15 redefined here
|
= note: _15_15 must be defined only once in the value namespace of this mod

@frondeus
Copy link
Owner Author

frondeus commented Oct 1, 2019

@luke-biel luke-biel added enhancement New feature or request spike Need to research topic more to come up with a solution labels Jan 28, 2022
@luke-biel luke-biel added this to To do in 2.1.0 Apr 22, 2022
@luke-biel luke-biel moved this from To do to Backlog in 2.1.0 May 12, 2022
@AugustoFKL
Copy link
Contributor

@luke-biel @frondeus I'm working on this issue, FYI.

@AugustoFKL
Copy link
Contributor

Sorry the delay

@frondeus, @luke-biel, after some research, I figured that the main problem is how test names are handled by the escape_test_name method:

pub fn escape_test_name(input: impl AsRef<str>) -> Ident {
    if input.as_ref().is_empty() {
        return Ident::new("_empty", Span::call_site());
    }

    let mut last_under = false;
    let mut ident: String = input
        .as_ref()
        .to_ascii_lowercase()
        .chars()
        .filter_map(|c| match c {
            c if c.is_alphanumeric() => {
                last_under = false;
                Some(c.to_ascii_lowercase())
            }
            _ if !last_under => {
                last_under = true;
                Some('_')
            }
            _ => None,
        })
        .collect();

    if !ident.starts_with(|c: char| c == '_' || c.is_ascii_alphabetic()) {
        ident = format!("_{ident}");
    }

    Ident::new(&ident, Span::call_site())
}

There are two things that can cause problems, IMHO: The "None" of the match inside the filter_map and the if at before returning.

I imagine that the _ is added before the test name to avoid test names starting with invalid characters for test names, is that correct?

To be honest I only found two possible solutions to the general problem of conflicting names:

  1. The library start using additional context information to generate test names.
  2. We add an prefix/postfix to all tests to guarantee the uniqueness of names.

My proposal is to start adding an UUID to the end of all test names. This would be the less disruptive change when it comes to the current codebase. Adding contextual information would take a lot more effort and time, so the UUID would work as a bandaid to the problem in the meantime.

@luke-biel
Copy link
Collaborator

luke-biel commented Jan 25, 2024

Hmm, I'm not sold on idea of adding an UUID, mainly because it's not stable, it'd change for every compilation and tools would not understand it (cargo would report to rerun a test with --test test_name_my_old_uuid, but next run would require "new_uuid"). We need something stable. May I suggest span or argument hashing (eg.: SHA1)?

@AugustoFKL
Copy link
Contributor

@luke-biel I agree.

Actually, instead of hashing, I thought about using enconding:

fn encode_test_name(name: &str) -> String {
    // Encode the normalized string to match the allowed characters in the test name
    let mut encoded = BASE64_URL_SAFE_NO_PAD.encode(name.as_bytes());

    // Replace not acceptable characters with an identifiable string
    encoded = encoded.replace('-', "__dash__");

    // Add an underscore to the beginning of the string to make it a valid Rust
    // identifier when starting with a digit.
    encoded.insert(0, '_');

    encoded
}

fn decode_test_name(name: &str) -> Result<String> {
    // Remove the underscore from the beginning of the string
    let name = name.trim_start_matches('_');

    // Reverse the replacement of not acceptable characters
    let name = name.replace("__dash__", "-");

    // Decode the test name
    let decoded = BASE64_URL_SAFE_NO_PAD.decode(name.as_bytes())?;

    // Convert the decoded bytes to a string
    let decoded_str = String::from_utf8(decoded)?;

    Ok(decoded_str)
}

This code generates names based on the BASE64 encoding. I was able to verify and it supports the whole unicode specification (from 0x0000 to 0xFFFF), besides surrogates. Additionally, the encoded values are guaranteed to be unique, which means that will be no more problems with 'A' or 'a' or anything like that.

My only concern is that the names being encoded makes the tracing from the test results to actual tests pretty difficult. Therefore, I am looking into having a way to inform the 'name' (the one given by the user, if any, or the arguments that were used).

Does that sound reasonable?

Does that sound acceptable?

@luke-biel
Copy link
Collaborator

I'm a bit worried with ergonomics of this. How will this look like in the test output? If we were dealing with custom test harness we could write there anything, but in our scenario we have to rely on rust's mechanisms - and these display module path. Appending something at the end (or beginning) of the name would be a bit less confusing, even if it was that encoded string (though probably a bit long), test_case line number or hashed "something". Basically, out of all these options we have to choose something which will give most information to the user while avoiding name conflicts.

@olson-sean-k
Copy link

Just wanted to mention that I've also encountered this issue, though with &str arguments:

#[test_case("a*")]
#[test_case("*a")]
fn test(expression: &str) { ... }

It's very easy for this to occur with text, since it isn't too unlikely that it won't constitute a valid Rust identifier. A minimal tag derived from parameters sounds like a great idea, especially if it's stable. Maybe a truncated hash could provide enough disambiguation. From the example above, I think names like this could work well:

tests::test::_f8c61a7__a_expects
tests::test::_0b548fa__a_expects

That isn't too noisy in my opinion and the original format remains intact. Of course, it may be difficult to differentiate tests like this at a glance, but I think it's reasonable to lean on meaningful output when such tests fail to make the distinction sufficiently clear.

Thanks for looking into this!

@IohannRabeson
Copy link

Why not just adding an arbitrary index after the name?

tests::test::__a_0
tests::test::__a_1
tests::test::__b_2
tests::test::__b_3

It seems enough to make a distinction. I don't think there is a need for a hash.

@luke-biel
Copy link
Collaborator

luke-biel commented Feb 21, 2024

Hmm, thinking more about it, encoding as per your suggestion @AugustoFKL would work, one caveat as @olson-sean-k mentioned, we'd have to cover a lot more than just dash. Most likely all non-ascii characters. Maybe use urlencode then? But we'd need to do something with % characters then.
Number appending works fine too, we can just add index in test_case vector to the function name.

To be honest, I'm fine with any solution as long as it's consistent between runs. Be it hashing, truncated hashes, indexes...

@frondeus
Copy link
Owner Author

Silly idea - what about line numbers :)?

@IohannRabeson
Copy link

Silly idea - what about line numbers :)?

Line numbers are not very stable, any change in the same file above the test can change them.

@olson-sean-k
Copy link

Why not just adding an arbitrary index after the name?

Assuming the cases can be stably enumerated in the procedural macro implementation, I think a case index is the best suggestion. By default, a very simple but reliable scheme is probably best. Anything else should probably be opt-in.

Sometimes the identifiers constructed from case parameters work pretty well... but often they aren't very legible and sometimes they break! It doesn't generalize and so doesn't make for a good default, I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request spike Need to research topic more to come up with a solution
Projects
No open projects
2.1.0
Backlog
Development

No branches or pull requests

5 participants