-
Notifications
You must be signed in to change notification settings - Fork 2k
[gpt2pre 4] GPT2Preprocessor Layer #7814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I just have some more tests to add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@Linchenn Please take a look when you get a chance. Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
This PR implements the
GPT2Preprocessor
class with itscall
method.Dependencies: #7791 #7806
[UPDATE 7-11-23]
After experimenting with adding generic input/output types to
Layer
, I noticed the refactoring might be a bit more involved than expected due to a lot of util methods expectingTensor|Tensor[]
orSymbolicTensor|SymbolicTensor[]
as their inputs. I have documented this at go/TFJS-layer-generics and will be using the original design outlines below.[UPDATE 7-10-23]
After discussion with @mattsoulanille today (7-10-23), I will be exploring changes to the base
Layer
class to allow different input and output types to thecall
method. This will allow us to support input strings forTokenizers
and any other special i/o needed by the other NLP layers.[ORIGINAL]
Here is the original plan I had, as implemented in this PR:
I had to deviate from the reference implementation a bit more here because the types can get tricky. Since
GPT2Preprocessor
extends theLayer
class (it really extendsPreprocessor
, which extendsLayer
), its overriddencall()
method must returnTensor|Tensor[]
.However, the Keras implementation allows a bit more flexibility by returning the result of
pack_x_y_sample_weight()
, which returns a packed tuple of x, y, and sample_weight.Designs Considered:
packXYSampleWeight()
simply return aTensor
of x, y, and sample_weight so that it can be called and directly returned fromcall()
like Keras does, but since x is often used as an object of type{tokenIds: Tensor|Tensor[], paddingMask: Tensor[]}
, this cannot be wrapped in a tensor.Tensor[]
is also an option, but since tokenIds will most like be aTensor[]
itself (since it's the result of callingtokenize()
on a tensor of strings), that would meancall
would have to support a return type ofTensor|Tensor[]|(Tensor[]|Tensor)[]
which starts to get confusing. Not to mention that distinguishing which output is at which position of the tensor would not be super clear.callAndReturnPaddingMask
that just returns a tuple of thetokenIds
andpaddingMask
and havecall
return the first element (tokenIds) and ignore trying to return y and sample_weight since they aren't mutated by the class. This option is more feasible, however looking ahead, it seems that theGPT2CasualLMPreprocessor
class does change y and sample_weight eventually, so it's probably a good idea to figure out how to support this now.callAndPackArgs
that returns an array of the arguments packed similarly to the Keras version andcall
will simply return a subset of these outputs to keep the class layer compatible.Since GPT2 layers always pass in x as a type of
{tokenIds: Tensor|Tensor[], paddingMask: Tensor[]}
, I've declared this as aPreprocessorOutput
interface and removed Array checking inpackXYSampleWeight
.