-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
End of buffer indistinguishable from EOF #2
Comments
Note this test case breaks the build. I'm not actually sure how to handle this case, because with, e.g. The only solution I can see is an arbitrary cutoff to the delimiter size. Note the implementation found in JDK 1.7: |
As of the above commit, I can match any delimiter totally contained within a single buffer. Currently, when we fail to find a prefixed delimiter in a buffer, we assume there is no prefixed delimiter, but note the test case below. /// This test will fail if we cannot detect partial matches of the delimiter
/// when skipping over prefixed delimiters. Because the buffer size is 4, it
/// will read "aaaa", which is not in the language of /a+b/, however the
/// automaton is not in a dead state either: reading a "b" would put us in
/// an accepting state, thus we must read more input to know if the regex is
/// satisfied. Reading an additional character will result in "aaaab", which
/// is a valid delimiter in this language and should therefore be skipped.
#[test]
fn buffer_ends_within_start_delim() {
let string: &[u8] = b"aaaabfoo";
let mut br = BufReader::with_capacity(4, string);
let mut test = Scanner::new(&mut br);
test.set_delim(Regex::new(r"a+b").unwrap());
assert_eq!(test.next(), Some(String::from("foo")));
} Further, we currently assume that if no delimiters are found in a given buffer, it is safe to /// This test will fail if we do not solve the above problem in a way that
/// preserves the tail of the original buffer, because in this test case the
/// terminating delimiter begins within the first buffer size and
/// ends within the second.
#[test]
fn buffer_ends_within_end_delim() {
let string: &[u8] = b"foo bar";
let mut br = BufReader::with_capacity(4, string);
let mut test = Scanner::new(&mut br);
test.set_delim_str(" ");
assert_eq!(test.next(), Some(String::from("foo")));
// currently this produces "foo bar" because
// we fail to detect the delimiter in two separate buffers.
} |
When we search for the second occurrence of the delimiter, we only check until the end of the buffer. If the buffer does not contain the entire file, it is possible that there is a match that we simply cannot discern.
In fact, it is possible that a match begins inside our buffer but continues past it, for delimiters that match more than one character. e.g.,:
I think I can fix this, but I am not sure how to write a test case. Ideally I could coerce an arbitrary, small buffer size, but frankly I don't know how to do that in a self-contained test.
The text was updated successfully, but these errors were encountered: