Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will fail on large files #6

Open
Emerentius opened this issue Mar 28, 2017 · 1 comment
Open

Will fail on large files #6

Emerentius opened this issue Mar 28, 2017 · 1 comment
Labels
enhancement Improve the expected

Comments

@Emerentius
Copy link

The code below loads both files completely into memory before comparing. For files that differ early that can be quite wasteful and it can also result in out of memory errors.

// file a is guaranteed to exist...
let a_text = read_to_vec(a)?;

// but file b is not. If we have any kind of error when loading
// it up, that's a positive result, not an actual error.
let b_text = match read_to_vec(b) {
    Ok(contents) => contents,
    Err(_) => return Ok(true),
};

if a_text != b_text {
    return Ok(true);
}

I'd propose something like

let a_reader = BufReader::new( File::open(a)? );
let b_reader = match File::open(a) {
    Ok(f) => BufReader::new(f),
    Err(_) => return Ok(true),
}

if a_reader.bytes().ne( b_reader.bytes() ) {
    return Ok(true)
}

for similar functionality. The above won't compile when I'm reading the docs right, because the Bytes iterator returns Result<u8, io::Error> and io::Error doesn't implement PartialEq but one should be able to manually iterate in lockstep like so

let mut a_bytes = a_reader.bytes();
let mut b_bytes = b_reader.bytes();
loop {
    match (a_bytes().next(), b_bytes().next()) {
        (Some(Ok(a)), Some(Ok(b))) => if a != b { return Ok(true) },
        (None, None) => break,
        _ => return Ok(true),
    }
}

Now, you're also treating any kind of error as if that would indicate a difference in the folders but there's so much that can go wrong in the world that you can't really make that decision. It's certainly true for FileNotFound but what about some network storage that loses connection for example?
I don't know how much of that ugly mess you're willing to deal with of course.

Note: All code untested. I'm not at a full fledged PC at the moment

@steveklabnik steveklabnik added the enhancement Improve the expected label Mar 28, 2017
@steveklabnik
Copy link
Collaborator

Yup, this is a great enhancement. I'm mostly using this on small files, so I'm not sure if and when I'll get to it, but this would be great to have 👍

Now, you're also treating any kind of error as if that would indicate a difference in the folders but there's so much that can go wrong in the world that you can't really make that decision. It's certainly true for FileNotFound but what about some network storage that loses connection for example?

Yeah, so, at first I went "let me do it only for FileNotFound" and then I looked at it and was like "well....... many of these apply and I'm mostly trying to ship this so let's just throw them all."

I think being more strict about this would be good as well 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improve the expected
Projects
None yet
Development

No branches or pull requests

2 participants