Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error showing time spent in llama perf context print #1898

Conversation

shakalaca
Copy link
Contributor

This PR addresses the issue reported here: #1830 - After the 0.3.0 update, llama_perf_context_print() failed to correctly display inference time, tokens per second, and other related data.

After some investigation, I found that this change: f8fcb3e, caused this issue due to a commit in the upstream llama.cpp repo: ggml-org/llama.cpp@0abc6a2. When designing the no_perf parameter, although it defaults to false, it is set to true in the llama_context_default_params() function for external program calls, leading to incorrect calculation of performance metrics when calling llama_synchronize(). As a result, llama-cpp-python displays incorrect information when using llama_perf_context_print().

In addition to adding the no_perf field in llama_cpp.py, we should also set no_perf to false in llama.py. Since llama-cpp-python project always calls llama_perf_context_print() during usage, I don't see a reason not to collect this information. Of course, if we want to maintain consistency with llama.cpp's settings, we can add an API to allow users to set the no_perf value, providing a way to toggle performance statistics.

shakalaca and others added 3 commits January 18, 2025 10:37
Add `no_perf` field to `llama_context_params` to optionally disable performance timing measurements.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
…ontext_print
@abetlen abetlen merged commit 4442ff8 into abetlen:main Jan 29, 2025
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants