New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
blob: update New[Range]Reader to take ReaderOptions #765
Comments
IMHO the first part of this is pretty clearly better (one function instead of 2, and an |
I moved the part about chunking to #778. This issue is now just about converting to a single |
I'm on board with the exception of the zero |
@zombiezen So are you suggesting that IMHO the zero/default value should provide reasonable default functionality; for reading, reading the whole blob is the most reasonable default behavior. Is there a better name than |
That is my suggestion, yes. I understand why that's not ideal for the Options struct, but that's precisely why the GCP libraries picked the approach of having the two methods (and why we emulated it). |
Is there a use case for a zero-length read? I'm not sure what it would be. If the answer is "no", I guess I don't see the problem with interpreting 0 as "read all" instead of -1. I don't think updating the proposed change to make the default
True; with this change it would be "any positive value..." instead. |
I see allowing zero-length reads as a way of avoiding special cases. As a contrived example, let's say you had some naive code that did something like this: // limitedRead reads all the bytes from n to a predefined, arbitrary limit in the blob.
func limitedRead(ctx context.Context, bucket *blob.Bucket, n int) ([]byte, error) {
const limit = 1024
if n > limit {
return nil, errors.New("too far")
}
r, err := bucket.NewRangeReader(ctx, "foo", n, limit - n)
if err != nil {
return nil, err
}
data, err := io.ReadAll(r)
closeErr := r.Close()
if err != nil {
return data, err
}
if closeErr != nil {
return data, closeErr
}
return data, nil
} Not allowing zero length reads would require another branch on the caller to maintain correctness. |
Wouldn't that code just change to I guess we could add a |
The zero case above is not an error. The code change would be: func limitedRead(ctx context.Context, bucket *blob.Bucket, n int) ([]byte, error) {
const limit = 1024
if n > limit {
return nil, errors.New("too far")
}
if n == limit {
return nil, nil
}
// ...
} The point that I'm trying to illustrate is that while zero is a valid length for a read, a negative number is not. I think it would be a shame to treat zero differently instead of treating an invalid length as the sentinel. The solution you state would work, but I wonder if there's a better approach. The primary constraint here is that we want the zero value of type ReaderOptions struct {
Offset int64
}
// NewReader opens a new reader that reads at most size
// bytes from the blob with the given key. If size is negative,
// then the entire blob is read. [...]
func (b *Bucket) NewReader(ctx context.Context, key string, size int64, opts *ReaderOptions) (*Reader, error) |
So now the normal case (read everything) looks like
which seems a bit much. |
To summarize, the options we have are:
I think we've rejected #3 and #4. I think I would vote for #5. Thoughts? |
I agree with the summary. I vote for the first option, since I feel the usages are different. |
I think the question now becomes that whether 0 should be a valid length. GCS client library says it is, and treat 0 as reading the metadata only. S3 only has the auto-generated HTTP client, which means the range header can only take >0. There is a separate HEAD request for 0-length "read". Since we already have a separate method "Attributes", I think we could make 0-length read invalid. With that said, comparing 1) and 5) I still think 1) is a better API overall. But if we can agree on "0 is an invalid length", then I would vote for 3). |
Yea, if we could declare 0 length read invalid I agree that 3) would be best. That's what I have in the associated PR. I'm not sure there's a valid use case for 0-length read (especially given that there is a separate I'll update the PR to go with 1). 5) would work but seems clunky; I haven't seen that pattern much before, and I agree with @jba that 2) is a bit much. This whole thing is an unfortunate side effect of zero values which don't correspond to the common/default usage; it comes up a lot with proto3 inside Google as well. |
I'm okay with 1). |
Currently
blob
exposes:-- NewRangeReader: takes offset, len and calls
driver.NewRangeReader
.-- NewReader: calls
NewRangeReader
with offset=0, len=-1 (all).The implementations of
NewRangeReader
generally pass through their arguments to the providers, as expected.Imagine a user trying to read a huge blob and stream it somewhere. They would have to implement their own chunking of the blob: determine the size, read first chunk via
NewRangeReader
, etc. If they just useNewReader
, we'll try to download the whole blob.I propose the following signature as a replacement for both functions:
The text was updated successfully, but these errors were encountered: