You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ImportError: Using bitsandbytes 8-bit quantization requires Accelerate: pip install accelerate and the latest version of bitsandbytes: pip install -i https://pypi.org/simple/ bitsandbytes#13569
Open
AnandUgale opened this issue
May 18, 2024
· 3 comments
ImportError: Using bitsandbytes 8-bit quantization requires Accelerate: pip install accelerate and the latest version of bitsandbytes: pip install -i https://pypi.org/simple/ bitsandbytes
llm = HuggingFaceLLM(
model_name="meta-llama/Meta-Llama-3-8B-Instruct",
model_kwargs={
"token": hf_token,
"torch_dtype": torch.bfloat16, # comment this line and uncomment below to use 4bit
# "quantization_config": quantization_config
},
generate_kwargs={
"do_sample": True,
"temperature": 0.6,
"top_p": 0.9,
},
tokenizer_name="meta-llama/Meta-Llama-3-8B-Instruct",
tokenizer_kwargs={"token": hf_token},
stopping_ids=stopping_ids,
)
Relevant Logs/Tracbacks
ImportError: Using `bitsandbytes` 8-bit quantization requires Accelerate: `pip install accelerate` and the latest version of bitsandbytes: `pip install -i https://pypi.org/simple/ bitsandbytes`
The text was updated successfully, but these errors were encountered:
Hey @AnandUgale, great to run into you again on here! 🚀 It looks like you've stumbled upon an intriguing challenge. I'm diving into the details now and will circle back with a more comprehensive response soon. Stay tuned!
To address the ImportError you're encountering with bitsandbytes and 8-bit quantization, here are targeted steps to resolve the issue:
Reinstall bitsandbytes for CUDA 11.8: Ensure bitsandbytes is correctly installed for your CUDA version by running:
pip install bitsandbytes-cuda118
This command installs the version compiled specifically for CUDA 11.8, which might resolve compatibility issues.
Check Environment Variables: Verify that CUDA_HOME and LD_LIBRARY_PATH are correctly set to reflect your CUDA 11.8 installation. Misconfigured environment variables can lead to import errors.
Isolate the ImportError: Determine exactly where the ImportError occurs. Is it during the import of a specific module from bitsandbytes or another package? Pinpointing the location can help identify the cause.
Consult Documentation and Issues: Look into the GitHub repositories and documentation for bitsandbytes, llama_index, and HuggingFace's transformers for similar issues. Sometimes, specific version conflicts or bugs are documented.
Environment Clean-Up: If necessary, create a new virtual environment and reinstall your dependencies. This can help eliminate conflicts or issues from previous installations.
For integrating bitsandbytes with HuggingFace's transformers, ensure your BitsAndBytesConfig and model initialization align with the provided example. This includes setting the quantization_config in model_kwargs when creating the HuggingFaceLLM instance. If these steps don't resolve the issue, please provide more details about the error and the code causing it for further assistance.
About Dosu
This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.
Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.
Bug Description
ImportError: Using
bitsandbytes
8-bit quantization requires Accelerate:pip install accelerate
and the latest version of bitsandbytes:pip install -i https://pypi.org/simple/ bitsandbytes
Environment
Packages installed with CUDA 11.8:
Version
0.10.37
Steps to Reproduce
import torch
from llama_index.llms.huggingface import HuggingFaceLLM
Optional quantization to 4bit
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_quant_type="nf4",
bnb_4bit_use_double_quant=True,
)
llm = HuggingFaceLLM(
model_name="meta-llama/Meta-Llama-3-8B-Instruct",
model_kwargs={
"token": hf_token,
"torch_dtype": torch.bfloat16, # comment this line and uncomment below to use 4bit
# "quantization_config": quantization_config
},
generate_kwargs={
"do_sample": True,
"temperature": 0.6,
"top_p": 0.9,
},
tokenizer_name="meta-llama/Meta-Llama-3-8B-Instruct",
tokenizer_kwargs={"token": hf_token},
stopping_ids=stopping_ids,
)
Relevant Logs/Tracbacks
The text was updated successfully, but these errors were encountered: