blob: update New[Range]Reader to take ReaderOptions #765

vangent · 2018-11-19T22:39:33Z

Currently blob exposes:

-- NewRangeReader: takes offset, len and calls driver.NewRangeReader.
-- NewReader: calls NewRangeReader with offset=0, len=-1 (all).

The implementations of NewRangeReader generally pass through their arguments to the providers, as expected.

Imagine a user trying to read a huge blob and stream it somewhere. They would have to implement their own chunking of the blob: determine the size, read first chunk via NewRangeReader, etc. If they just use NewReader, we'll try to download the whole blob.

I propose the following signature as a replacement for both functions:

type ReaderOptions struct {
  Offset int
  ReadLen int // 0 means whole blob
  ChunkSize int // default = 1GB or some such.
}

func (b *bucket) NewReader(ctx context.Context, key string, opts *ReaderOptions) (*Reader, error)

The text was updated successfully, but these errors were encountered:

vangent · 2018-11-20T02:20:29Z

IMHO the first part of this is pretty clearly better (one function instead of 2, and an Options struct for more flexibility in the future). I will send a PR to make that change, and we can add the new chunking option separately.

vangent · 2018-11-20T19:45:18Z

I moved the part about chunking to #778. This issue is now just about converting to a single NewReader with options.

zombiezen · 2018-11-27T18:32:37Z

I'm on board with the exception of the zero ReadLen being whole blob: it gives an incongruous meaning to that parameter. While zero reads are not generally useful, it makes it such that any non-negative value results in a read of that size.

vangent · 2018-11-27T19:49:06Z

@zombiezen So are you suggesting that 0 (the default value) should in fact be a 0 length read? That doesn't seem ideal to me, as anyone who doesn't provide a ReaderOptions would get a zero-length read, which essentially nobody wants, meaning that everyone would always need to supply a ReaderOptions.

IMHO the zero/default value should provide reasonable default functionality; for reading, reading the whole blob is the most reasonable default behavior.

Is there a better name than ReadLen? In the PR I've called it Length, but maybe MaxLength?

zombiezen · 2018-11-27T21:43:04Z

That is my suggestion, yes. I understand why that's not ideal for the Options struct, but that's precisely why the GCP libraries picked the approach of having the two methods (and why we emulated it).

vangent · 2018-11-28T00:42:22Z

Is there a use case for a zero-length read? I'm not sure what it would be. If the answer is "no", I guess I don't see the problem with interpreting 0 as "read all" instead of -1.

I don't think updating the proposed change to make the default ReaderOptions do something that's never what you want is feasible. I think the alternative proposal would be to add an empty ReaderOptions struct, keep the two separate methods, and add ReaderOptions to both of them. I'm not convinced that 2 methods are better than one here.

it makes it such that any non-negative value results in a read of that size

True; with this change it would be "any positive value..." instead.

zombiezen · 2018-11-28T00:50:53Z

I see allowing zero-length reads as a way of avoiding special cases. As a contrived example, let's say you had some naive code that did something like this:

// limitedRead reads all the bytes from n to a predefined, arbitrary limit in the blob.
func limitedRead(ctx context.Context, bucket *blob.Bucket, n int) ([]byte, error) {
  const limit = 1024
  if n > limit {
    return nil, errors.New("too far")
  }
  r, err := bucket.NewRangeReader(ctx, "foo", n, limit - n)
  if err != nil {
    return nil, err
  }
  data, err := io.ReadAll(r)
  closeErr := r.Close()
  if err != nil {
    return data, err
  }
  if closeErr != nil {
    return data, closeErr
  }
  return data, nil
}

Not allowing zero length reads would require another branch on the caller to maintain correctness.

vangent · 2018-11-28T01:05:39Z

Wouldn't that code just change to if n >= limit to be correct?
I see what you're saying, but isn't there a similar argument for using -1 as a sentinel? A pretty similar math error could pass -1 by accident.

I guess we could add a LimitedLengthRead bool or similar that you'd have to explicitly set; setting Length to a non-zero value without setting LimitedLengthRead to true would be an error, Length=0 with LimitedLengthRead=false would read the full blob, Length=0 with LimitedLengthRead=true would read 0.

zombiezen · 2018-11-28T16:32:20Z

The zero case above is not an error. The code change would be:

func limitedRead(ctx context.Context, bucket *blob.Bucket, n int) ([]byte, error) {
  const limit = 1024
  if n > limit {
    return nil, errors.New("too far")
  }
  if n == limit {
    return nil, nil
  }
  // ...
}

The point that I'm trying to illustrate is that while zero is a valid length for a read, a negative number is not. I think it would be a shame to treat zero differently instead of treating an invalid length as the sentinel.

The solution you state would work, but I wonder if there's a better approach. The primary constraint here is that we want the zero value of ReaderOptions to be useful, so I wonder if we could just pull out the length as an explicit argument:

type ReaderOptions struct {
  Offset int64
}

// NewReader opens a new reader that reads at most size
// bytes from the blob with the given key. If size is negative,
// then the entire blob is read. [...]
func (b *Bucket) NewReader(ctx context.Context, key string, size int64, opts *ReaderOptions) (*Reader, error)

jba · 2018-11-28T18:52:20Z

So now the normal case (read everything) looks like

b.NewReader(ctx, "key", -1, nil)

which seems a bit much.

vangent · 2018-11-28T22:20:11Z

To summarize, the options we have are:

Keep both NewReader and NewRangeReader, just add (empty) ReaderOptions to both.
Merge them, keeping length as a top-level arg and moving offset into ReaderOptions (@zombiezen suggested this above, @jba says it "seems a bit much").
Merge them, putting both length and offset in ReaderOptions; length=0 is interpreted as "read it all" (@zombiezen is objecting to this as 0 is a legal read length).
Merge them, putting both length and offset in ReaderOptions; length=0 is interpreted as "0 length read" (@vangent objects to this).
Merge them, putting both length and offset in ReaderOptions, plus a LimitLength boolean to distinguish whether length=0 actually means "0-length read" or "whole blob" (the default).

I think we've rejected #3 and #4. I think I would vote for #5. Thoughts?

zombiezen · 2018-11-29T00:11:37Z

I agree with the summary. I vote for the first option, since I feel the usages are different.

shantuo · 2018-11-29T01:30:47Z

I think the question now becomes that whether 0 should be a valid length. GCS client library says it is, and treat 0 as reading the metadata only. S3 only has the auto-generated HTTP client, which means the range header can only take >0. There is a separate HEAD request for 0-length "read". Since we already have a separate method "Attributes", I think we could make 0-length read invalid.

With that said, comparing 1) and 5) I still think 1) is a better API overall. But if we can agree on "0 is an invalid length", then I would vote for 3).

vangent · 2018-11-29T03:20:20Z

Yea, if we could declare 0 length read invalid I agree that 3) would be best. That's what I have in the associated PR. I'm not sure there's a valid use case for 0-length read (especially given that there is a separate Attributes call as you point out), but @zombiezen is arguing that 0 length reads should be considered valid, and after some web searching I'm inclined to agree. This leaves us with 1) or 2) or 5).

I'll update the PR to go with 1). 5) would work but seems clunky; I haven't seen that pattern much before, and I agree with @jba that 2) is a bit much.

This whole thing is an unfortunate side effect of zero values which don't correspond to the common/default usage; it comes up a lot with proto3 inside Google as well.

jba · 2018-11-29T12:55:17Z

I'm okay with 1).

vangent self-assigned this Nov 20, 2018

vangent added this to the Sprint 20 milestone Nov 20, 2018

vangent changed the title ~~blob: should NewReader take Options and support chunking?~~ blob: unify New[Range]Reader to take Options and support chunking Nov 20, 2018

vangent changed the title ~~blob: unify New[Range]Reader to take Options and support chunking~~ blob: unify New[Range]Reader to take Options Nov 20, 2018

vangent added the in progress This is being actively worked on label Nov 20, 2018

zombiezen mentioned this issue Nov 27, 2018

blob: replace NewRangedReader with NewReader + ReaderOptions #777

Closed

vangent mentioned this issue Nov 29, 2018

blob/all: add ReaderOptions to NewReader funcs #835

Merged

vangent changed the title ~~blob: unify New[Range]Reader to take Options~~ blob: update New[Range]Reader to take ReaderOptions Nov 29, 2018

vangent closed this as completed in #835 Nov 29, 2018

go-cloud-bot bot removed the in progress This is being actively worked on label Nov 29, 2018

vangent mentioned this issue Dec 13, 2018

blob/all: allow 0 length reads #945

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

blob: update New[Range]Reader to take ReaderOptions #765

blob: update New[Range]Reader to take ReaderOptions #765

vangent commented Nov 19, 2018 •

edited

vangent commented Nov 20, 2018

vangent commented Nov 20, 2018

zombiezen commented Nov 27, 2018 •

edited

vangent commented Nov 27, 2018

zombiezen commented Nov 27, 2018

vangent commented Nov 28, 2018

zombiezen commented Nov 28, 2018

vangent commented Nov 28, 2018

zombiezen commented Nov 28, 2018

jba commented Nov 28, 2018

vangent commented Nov 28, 2018

zombiezen commented Nov 29, 2018

shantuo commented Nov 29, 2018

vangent commented Nov 29, 2018

jba commented Nov 29, 2018

blob: update New[Range]Reader to take ReaderOptions #765

blob: update New[Range]Reader to take ReaderOptions #765

Comments

vangent commented Nov 19, 2018 • edited

vangent commented Nov 20, 2018

vangent commented Nov 20, 2018

zombiezen commented Nov 27, 2018 • edited

vangent commented Nov 27, 2018

zombiezen commented Nov 27, 2018

vangent commented Nov 28, 2018

zombiezen commented Nov 28, 2018

vangent commented Nov 28, 2018

zombiezen commented Nov 28, 2018

jba commented Nov 28, 2018

vangent commented Nov 28, 2018

zombiezen commented Nov 29, 2018

shantuo commented Nov 29, 2018

vangent commented Nov 29, 2018

jba commented Nov 29, 2018

vangent commented Nov 19, 2018 •

edited

zombiezen commented Nov 27, 2018 •

edited