[OpenVINO]: Updated documentation about weight compression #529

AlexKoff88 · 2024-01-22T11:41:05Z

No description provided.

AlexKoff88 · 2024-01-22T11:41:28Z

@ljaljushkin, please take a look as well.

HuggingFaceDocBuilderDev · 2024-01-22T11:46:08Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

docs/source/inference.mdx

docs/source/optimization_ov.mdx

Co-authored-by: Lyalyushkin Nikolay <nikolay.lyalyushkin@intel.com>

docs/source/optimization_ov.mdx

docs/source/inference.mdx

Co-authored-by: Lyalyushkin Nikolay <nikolay.lyalyushkin@intel.com>

Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

helena-intel · 2024-01-22T19:00:32Z

docs/source/optimization_ov.mdx

+
+> **NOTE:** `load_in_8bit` is enabled by default for models larger than 1 billion parameters.
+
+For the 4-bit weight quantization we recommend using the NNCF API like below:


I would much recommend optimum-cli over NNCF API for this. It's such a quick and easy method. And (unless that has been fixed very recently) NNCF fails on SPR/EMR with a BF16 error and it's not easy to know how to work around that.

I guess we fixed this issue in the recent version of NNCF. @alexsu52, please confirm.

Yes, you are right. Models with float16 and float32 weigths work on SPR/EMR.

docs/source/inference.mdx

AlexKoff88 · 2024-01-24T09:26:41Z

I think we can merge this. @echarlaix

Updated documentation about weight compression

4aaf179

AlexKoff88 requested a review from echarlaix January 22, 2024 11:41

Fixed doc style issues

f932e93

ljaljushkin suggested changes Jan 22, 2024

View reviewed changes

docs/source/inference.mdx Outdated Show resolved Hide resolved

docs/source/inference.mdx Outdated Show resolved Hide resolved

docs/source/inference.mdx Outdated Show resolved Hide resolved

docs/source/optimization_ov.mdx Outdated Show resolved Hide resolved

AlexKoff88 and others added 2 commits January 22, 2024 18:00

Update docs/source/inference.mdx

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

035be73

Co-authored-by: Lyalyushkin Nikolay <nikolay.lyalyushkin@intel.com>

Update docs/source/inference.mdx

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

9524e07

Co-authored-by: Lyalyushkin Nikolay <nikolay.lyalyushkin@intel.com>

echarlaix reviewed Jan 22, 2024

View reviewed changes

docs/source/optimization_ov.mdx Show resolved Hide resolved

docs/source/optimization_ov.mdx Outdated Show resolved Hide resolved

docs/source/inference.mdx Outdated Show resolved Hide resolved

docs/source/inference.mdx Outdated Show resolved Hide resolved

AlexKoff88 and others added 6 commits January 22, 2024 19:19

Update docs/source/inference.mdx

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

89b250d

Co-authored-by: Lyalyushkin Nikolay <nikolay.lyalyushkin@intel.com>

Update docs/source/optimization_ov.mdx

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

3600757

Co-authored-by: Lyalyushkin Nikolay <nikolay.lyalyushkin@intel.com>

Update docs/source/optimization_ov.mdx

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

16d6785

Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

Update docs/source/inference.mdx

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

f1f9217

Co-authored-by: Ella Charlaix <80481427+echarlaix@users.noreply.github.com>

Applied comments

720b84d

Applied more comments

f0255be

helena-intel reviewed Jan 22, 2024

View reviewed changes

Applied additional comments

47f102b

echarlaix merged commit 5e9c1b7 into main Jan 24, 2024

echarlaix deleted the ak/weight_compression_docs branch January 24, 2024 09:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[OpenVINO]: Updated documentation about weight compression #529

[OpenVINO]: Updated documentation about weight compression #529

AlexKoff88 commented Jan 22, 2024

AlexKoff88 commented Jan 22, 2024

HuggingFaceDocBuilderDev commented Jan 22, 2024

helena-intel Jan 22, 2024

AlexKoff88 Jan 23, 2024

alexsu52 Jan 24, 2024

AlexKoff88 commented Jan 24, 2024


		> NOTE: `load_in_8bit` is enabled by default for models larger than 1 billion parameters.

		For the 4-bit weight quantization we recommend using the NNCF API like below:

[OpenVINO]: Updated documentation about weight compression #529

[OpenVINO]: Updated documentation about weight compression #529

Conversation

AlexKoff88 commented Jan 22, 2024

AlexKoff88 commented Jan 22, 2024

HuggingFaceDocBuilderDev commented Jan 22, 2024

helena-intel Jan 22, 2024

Choose a reason for hiding this comment

AlexKoff88 Jan 23, 2024

Choose a reason for hiding this comment

alexsu52 Jan 24, 2024

Choose a reason for hiding this comment

AlexKoff88 commented Jan 24, 2024