Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[OpenVINO]: Updated documentation about weight compression #529

Merged
merged 11 commits into from
Jan 24, 2024

Conversation

AlexKoff88
Copy link
Collaborator

No description provided.

@AlexKoff88 AlexKoff88 requested a review from echarlaix January 22, 2024 11:41
@AlexKoff88
Copy link
Collaborator Author

@ljaljushkin, please take a look as well.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

AlexKoff88 and others added 2 commits January 22, 2024 18:00

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Co-authored-by: Lyalyushkin Nikolay <nikolay.lyalyushkin@intel.com>

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Co-authored-by: Lyalyushkin Nikolay <nikolay.lyalyushkin@intel.com>
AlexKoff88 and others added 6 commits January 22, 2024 19:19

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Co-authored-by: Lyalyushkin Nikolay <nikolay.lyalyushkin@intel.com>

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Co-authored-by: Lyalyushkin Nikolay <nikolay.lyalyushkin@intel.com>

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

> **NOTE:** `load_in_8bit` is enabled by default for models larger than 1 billion parameters.

For the 4-bit weight quantization we recommend using the NNCF API like below:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would much recommend optimum-cli over NNCF API for this. It's such a quick and easy method. And (unless that has been fixed very recently) NNCF fails on SPR/EMR with a BF16 error and it's not easy to know how to work around that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we fixed this issue in the recent version of NNCF. @alexsu52, please confirm.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right. Models with float16 and float32 weigths work on SPR/EMR.

@AlexKoff88
Copy link
Collaborator Author

I think we can merge this. @echarlaix

@echarlaix echarlaix merged commit 5e9c1b7 into main Jan 24, 2024
@echarlaix echarlaix deleted the ak/weight_compression_docs branch January 24, 2024 09:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants