Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Content-Type headers to HTTP artifact upload calls #8048

Merged

Conversation

WillEngler
Copy link
Contributor

Related Issues/PRs

Resolves #8026

What changes are proposed in this pull request?

When the MLFlow client uploads artifacts via the HttpArtifactRepository, it will now send the proper HTTP Content-Type header. It pulls logic from #7827 that maps file extensions to Content-Types into a util module. Then it uses that util inside of HTTPArtifactRepository to add the right Content-Type header to artifact upload POST requests.

How is this patch tested?

  • Existing unit/integration tests
  • New unit/integration tests
  • Manual tests (describe details, including test results, below)

Does this PR change the documentation?

  • No. You can skip the rest of this section.
  • Yes. Make sure the changed pages / sections render correctly in the documentation preview.

Release Notes

Is this a user-facing change?

  • No. You can skip the rest of this section.
  • Yes. Give a description of this change to be included in the release notes for MLflow users.

Adds Content-Type headers for outbound artifact upload calls from the HTTPArtifactRepo.

What component(s), interfaces, languages, and integrations does this PR affect?

Components

  • area/artifacts: Artifact stores and artifact logging
  • area/build: Build and test infrastructure for MLflow
  • area/docs: MLflow documentation pages
  • area/examples: Example code
  • area/model-registry: Model Registry service, APIs, and the fluent client calls for Model Registry
  • area/models: MLmodel format, model serialization/deserialization, flavors
  • area/recipes: Recipes, Recipe APIs, Recipe configs, Recipe Templates
  • area/projects: MLproject format, project running backends
  • area/scoring: MLflow Model server, model deployment tools, Spark UDFs
  • area/server-infra: MLflow Tracking server backend
  • area/tracking: Tracking Service, tracking client APIs, autologging

Interface

  • area/uiux: Front-end, user experience, plotting, JavaScript, JavaScript dev server
  • area/docker: Docker use across MLflow's components, such as MLflow Projects and MLflow Models
  • area/sqlalchemy: Use of SQLAlchemy in the Tracking Service or Model Registry
  • area/windows: Windows support

Language

  • language/r: R APIs and clients
  • language/java: Java APIs and clients
  • language/new: Proposals for new client languages

Integrations

  • integrations/azure: Azure and Azure ML integrations
  • integrations/sagemaker: SageMaker integrations
  • integrations/databricks: Databricks integrations

How should the PR be classified in the release notes? Choose one:

  • rn/breaking-change - The PR will be mentioned in the "Breaking Changes" section
  • rn/none - No description will be included. The PR will be mentioned only by the PR number in the "Small Bugfixes and Documentation Updates" section
  • rn/feature - A new user-facing feature worth mentioning in the release notes
  • rn/bug-fix - A user-facing bug fix worth mentioning in the release notes
  • rn/documentation - A user-facing documentation change worth mentioning in the release notes

from mimetypes import guess_type

# from mlflow.models.model import MLMODEL_FILE_NAME
# from mlflow.projects._project_spec import MLPROJECT_FILE_NAME
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Importing these constants from here creates a circular import. (artifacts -> utils -> models -> artifacts). Is there a good place to pull out these constants so that this util can reference them and it won't cause a circular import?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good place to pull out these constants so that this util can reference

We currently don't have such a module. I think rewriting the constants in this file is ok for now.

Copy link
Member

@harupy harupy Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the following code work?

# TODO: Create a module to define constants to avoid circular imports and move MLMODEL_FILE_NAME and MLPROJECT_FILE_NAME in the module.
def get_mlodel_and_mlproject_filenames():
    from mlflow.models.model import MLMODEL_FILE_NAME
    from mlflow.projects._project_spec import MLPROJECT_FILE_NAME

    return [MLMODEL_FILE_NAME, MLPROJECT_FILE_NAME]

_TEXT_EXTENSIONS = [
    ...,
    *get_mlodel_and_mlproject_filenames()
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That still doesn't quite work because _TEXT_EXTENSIONS gets evaluated at import time and it triggers the circular dependency. But tweaking the idea a little, I can put the whole list in a function and that gets around it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# from mlflow.projects._project_spec import MLPROJECT_FILE_NAME

MLMODEL_FILE_NAME = "MLmodel"
MLPROJECT_FILE_NAME = "mlproject"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just rewriting the constants here is a kludge while I wait for advice on the import issue)

@github-actions
Copy link

@WillEngler Thank you for the contribution! Could you fix the following issue(s)?

⚠ DCO check

The DCO check failed. Please sign off your commit(s) by following the instructions here. See https://github.com/mlflow/mlflow/blob/master/CONTRIBUTING.md#sign-your-work for more details.

@github-actions github-actions bot added area/artifacts Artifact stores and artifact logging rn/feature Mention under Features in Changelogs. labels Mar 16, 2023
@mlflow-automation
Copy link
Collaborator

mlflow-automation commented Mar 16, 2023

Documentation preview for 2616349 will be available here when this CircleCI job completes successfully.

More info

@harupy harupy self-requested a review March 17, 2023 02:58
@WillEngler WillEngler force-pushed the 8026/artifact-upload-content-headers branch 2 times, most recently from f0da35b to 33d4973 Compare March 21, 2023 17:31
@WillEngler
Copy link
Contributor Author

WillEngler commented Mar 21, 2023

Was doing a pre-review and I think I still need to add a unit test to test_rest_utils to verify the extra_headers behavior. Once that's in and everything's passing I'll flip this out of draft state.

Update: this is ready for review.

@WillEngler WillEngler force-pushed the 8026/artifact-upload-content-headers branch from 33d4973 to 1409230 Compare March 21, 2023 18:41
@WillEngler WillEngler marked this pull request as ready for review March 21, 2023 18:42
@WillEngler WillEngler force-pushed the 8026/artifact-upload-content-headers branch from 1409230 to a16beba Compare March 22, 2023 13:56
@WillEngler
Copy link
Contributor Author

Fixed the failing tests. Ready for another CI run.

Copy link
Member

@harupy harupy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@harupy
Copy link
Member

harupy commented Mar 23, 2023

@WillEngler Can you resolve the conflict?

@WillEngler WillEngler force-pushed the 8026/artifact-upload-content-headers branch 2 times, most recently from 75072c4 to 107d9bd Compare March 23, 2023 19:58
…actRepository._log_artifact.

Signed-off-by: Will Engler <Engler.Will@gmail.com>
@WillEngler WillEngler force-pushed the 8026/artifact-upload-content-headers branch from 107d9bd to 2616349 Compare March 23, 2023 20:03
@WillEngler
Copy link
Contributor Author

Thanks for the review @harupy! Resolved the conflicts

@harupy harupy merged commit d55451f into mlflow:master Mar 24, 2023
22 checks passed
@harupy
Copy link
Member

harupy commented Mar 24, 2023

@WillEngler Merged, thanks for the contribution!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/artifacts Artifact stores and artifact logging rn/feature Mention under Features in Changelogs.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FR] Support content-type headers when uploading artifacts via proxied artifact storage
4 participants