You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[xinference] Error: Failed to rerank documents, detail: [address=0.0.0.0:46625, pid=418] CUDA error: CUDA-capable device(s) is/are busy or unavailable CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.
我 bce-reranker-base_v1 模型重新加载了一下,又好了。
这个会是什么原因导致?
具体报错信息:
xinference-1 | 2024-05-16 03:55:49,229 xinference.core.model 418 DEBUG Request rerank, current serve request count: 0, request limit: None for the model bce-reranker-base_v1-1-0
xinference-1 | 2024-05-16 03:55:49,289 xinference.core.model 418 DEBUG After request rerank, current serve request count: 0 for the model bce-reranker-base_v1-1-0
xinference-1 | 2024-05-16 03:55:49,298 xinference.api.restful_api 1 ERROR [address=0.0.0.0:46625, pid=418] CUDA error: CUDA-capable device(s) is/are busy or unavailable
xinference-1 | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
xinference-1 | For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
xinference-1 | Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
xinference-1 | Traceback (most recent call last):
xinference-1 | File "/opt/conda/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1036, in rerank
xinference-1 | scores = await model.rerank(
xinference-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 227, in send
xinference-1 | return self._process_result_message(result)
xinference-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
xinference-1 | raise message.as_instanceof_cause()
xinference-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 659, in send
xinference-1 | result = await self._run_coro(message.message_id, coro)
xinference-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
xinference-1 | return await coro
xinference-1 | File "/opt/conda/lib/python3.10/site-packages/xoscar/api.py", line 384, in __on_receive__
xinference-1 | return await super().__on_receive__(message) # type: ignore
xinference-1 | File "xoscar/core.pyx", line 558, in __on_receive__
xinference-1 | raise ex
xinference-1 | File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.__on_receive__
xinference-1 | async with self._lock:
xinference-1 | File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.__on_receive__
xinference-1 | with debug_async_timeout('actor_lock_timeout',
xinference-1 | File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.__on_receive__
xinference-1 | result = await result
xinference-1 | File "/opt/conda/lib/python3.10/site-packages/xinference/core/utils.py", line 45, in wrapped
xinference-1 | ret = await func(*args, **kwargs)
xinference-1 | File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 80, in wrapped_func
xinference-1 | ret = await fn(self, *args, **kwargs)
xinference-1 | File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 423, in rerank
xinference-1 | return await self._call_wrapper(
xinference-1 | File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 104, in _async_wrapper
xinference-1 | return await fn(*args, **kwargs)
xinference-1 | File "/opt/conda/lib/python3.10/site-packages/xinference/core/model.py", line 333, in _call_wrapper
xinference-1 | ret = await asyncio.to_thread(fn, *args, **kwargs)
xinference-1 | File "/opt/conda/lib/python3.10/asyncio/threads.py", line 25, in to_thread
xinference-1 | return await loop.run_in_executor(None, func_call)
xinference-1 | File "/opt/conda/lib/python3.10/concurrent/futures/thread.py", line 58, in run
xinference-1 | result = self.fn(*self.args, **self.kwargs)
xinference-1 | File "/opt/conda/lib/python3.10/site-packages/xinference/model/rerank/core.py", line 180, in rerank
xinference-1 | similarity_scores = self._model.predict(sentence_combinations)
xinference-1 | File "/opt/conda/lib/python3.10/site-packages/sentence_transformers/cross_encoder/CrossEncoder.py", line 336, in predict
xinference-1 | self.model.to(self._target_device)
xinference-1 | File "/opt/conda/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2692, in to
xinference-1 | return super().to(*args, **kwargs)
xinference-1 | File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1152, in to
xinference-1 | return self._apply(convert)
xinference-1 | File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
xinference-1 | module._apply(fn)
xinference-1 | File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
xinference-1 | module._apply(fn)
xinference-1 | File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 802, in _apply
xinference-1 | module._apply(fn)
xinference-1 | File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 825, in _apply
xinference-1 | param_applied = fn(param)
xinference-1 | File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1150, in convert
xinference-1 | return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
xinference-1 | RuntimeError: [address=0.0.0.0:46625, pid=418] CUDA error: CUDA-capable device(s) is/are busy or unavailable
xinference-1 | CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
xinference-1 | For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
xinference-1 | Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
The text was updated successfully, but these errors were encountered:
服务器4090,2卡,xinference 是昨天启动的,版本:v0.11.0,使用docker 部署,并加载了模型:
今天使用 dify 时, qwen 跟 bce-embedding 调用正常, 但是调用 rerank 的时候报错:
[xinference] Error: Failed to rerank documents, detail: [address=0.0.0.0:46625, pid=418] CUDA error: CUDA-capable device(s) is/are busy or unavailable CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with
TORCH_USE_CUDA_DSA
to enable device-side assertions.我 bce-reranker-base_v1 模型重新加载了一下,又好了。
这个会是什么原因导致?
具体报错信息:
The text was updated successfully, but these errors were encountered: