New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify collapse_whitespace
implementation
#8293
Conversation
Signed-off-by: harupy <hkawamura0130@gmail.com>
Documentation preview for 385d7b0 will be available here when this CircleCI job completes successfully. More info
|
@@ -1675,8 +1674,7 @@ def trim_input(data_in, data_out): | |||
# If the user has indicated to remove newlines and extra spaces from the generated | |||
# text, replace them with a single space. | |||
if collapse_whitespace: | |||
for to_replace, replace in replacements.items(): | |||
data_out = re.sub(to_replace, replace, data_out).strip() | |||
data_out = re.sub(r"\s+", " ", data_out).strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@BenWilson2 We collapse whitespace characters if data_out.startswith(data_in + "\n\n")
. Is this intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original intention with nesting that was to only apply the whitespace collapsing to the InstructionTextGenerationPipeline (Dolly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could just replace that function with this:
def trim_input(data_in, data_out):
# NB: the '\n\n' pattern is exclusive to specific InstructionalTextGenerationPipeline
# types that have been loaded as a plain TextGenerator. The structure of these
# pipelines will precisely repeat the input question immediately followed by 2 carriage
# return statements, followed by the start of the response to the prompt. We only
# want to left-trim these types of pipelines output values if the user has indicated
# the removal action of the input prompt in the returned str or List[str] by applying
# the optional inference_config entry of `{"include_prompt": False}`.
# By default, the prompt is included in the response.
# Stripping out additional carriage returns (\n) is another additional optional flag
# that can be set for these generator pipelines. It is off by default (False).
if (
data_out.startswith(data_in + "\n\n")
and flavor_config[_INSTANCE_TYPE_KEY] in self._supported_custom_generator_types
and not include_prompt
):
# If the user has indicated to not preserve the prompt input in the response,
# split the response output and trim the input prompt from the response.
data_out = data_out[len(data_in) :].lstrip()
if data_out.startswith("A:"):
data_out = data_out[2:].lstrip()
# If the user has indicated to remove newlines and extra spaces from the generated
# text, replace them with a single space.
if collapse_whitespace:
data_out = re.sub(r"\s+", " ", data_out).strip()
return data_out
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I'll merge this PR. We can make that change in a separate PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
running full test suite locally to validate this logic change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll file the follow-on change in a few minutes after merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Signed-off-by: harupy <hkawamura0130@gmail.com>
Signed-off-by: harupy <hkawamura0130@gmail.com> Signed-off-by: Larry O’Brien <larry.obrien@databricks.com>
Related Issues/PRs
#xxxWhat changes are proposed in this pull request?
Simplifies
collapse_whitespace
implementation.\s
matches\n
.How is this patch tested?
Does this PR change the documentation?
Release Notes
Is this a user-facing change?
(Details in 1-2 sentences. You can just refer to another PR with a description if this PR is part of a larger change.)
What component(s), interfaces, languages, and integrations does this PR affect?
Components
area/artifacts
: Artifact stores and artifact loggingarea/build
: Build and test infrastructure for MLflowarea/docs
: MLflow documentation pagesarea/examples
: Example codearea/model-registry
: Model Registry service, APIs, and the fluent client calls for Model Registryarea/models
: MLmodel format, model serialization/deserialization, flavorsarea/recipes
: Recipes, Recipe APIs, Recipe configs, Recipe Templatesarea/projects
: MLproject format, project running backendsarea/scoring
: MLflow Model server, model deployment tools, Spark UDFsarea/server-infra
: MLflow Tracking server backendarea/tracking
: Tracking Service, tracking client APIs, autologgingInterface
area/uiux
: Front-end, user experience, plotting, JavaScript, JavaScript dev serverarea/docker
: Docker use across MLflow's components, such as MLflow Projects and MLflow Modelsarea/sqlalchemy
: Use of SQLAlchemy in the Tracking Service or Model Registryarea/windows
: Windows supportLanguage
language/r
: R APIs and clientslanguage/java
: Java APIs and clientslanguage/new
: Proposals for new client languagesIntegrations
integrations/azure
: Azure and Azure ML integrationsintegrations/sagemaker
: SageMaker integrationsintegrations/databricks
: Databricks integrationsHow should the PR be classified in the release notes? Choose one:
rn/breaking-change
- The PR will be mentioned in the "Breaking Changes" sectionrn/none
- No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" sectionrn/feature
- A new user-facing feature worth mentioning in the release notesrn/bug-fix
- A user-facing bug fix worth mentioning in the release notesrn/documentation
- A user-facing documentation change worth mentioning in the release notes