-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
.Net: Unicode escape sequences consume too many tokens when using WebSearchEngineSkill (BingConnector) in StepwisePlanner #2820
Comments
|
This issue is that I'm concerned about Bing search result strings are passed remain escaped to the chat api, so many tokens are consumed. |
Hi, any update for this issue? I'm also facing this now. Is there any option to render response without unicode escape by some specific languages such as Vietnamese or Finnish or Arabian from Bing Search? The output is only plain text, how can it be HTML or Markdown so I can render it on page. If you have any idea to do, please share with me. Thank you so much! |
@huyqta |
…consistent encoding Fixes microsoft#2820
…e encoding (#4327) Fixes #2820 ### Motivation and Context This PR resolves an issue where search results containing UTF-8 characters were automatically converted to Unicode, adversely affecting readability and resulting in token wastage. <!-- Thank you for your contribution to the semantic-kernel repo! Please help reviewers and future users, providing the following information: 1. Why is this change required? 2. What problem does it solve? 3. What scenario does it contribute to? 4. If it fixes an open issue, please link to the issue here. --> ### Description [ This is the second submission addressing feedback from the initial review.](#4128) Changes have been made to explicitly use `UnsafeRelaxedJsonEscaping` for all serialization cases, as suggested by @stephentoub. Please review the modifications and provide further feedback. Thank you! <!-- Describe your changes, the overall approach, the underlying design. These notes will help understanding how your code works. Thanks! --> ### Contribution Checklist <!-- Before submitting this PR, please make sure: --> - [x] The code builds clean without any errors or warnings - [x] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [x] All unit tests pass, and I have added new tests where possible - [x] I didn't break anyone 😄 --------- Co-authored-by: SergeyMenshykh <68852919+SergeyMenshykh@users.noreply.github.com> Co-authored-by: Roger Barreto <19890735+RogerBarreto@users.noreply.github.com>
…to Unicode encoding (microsoft#4327) Fixes microsoft#2820 ### Motivation and Context This PR resolves an issue where search results containing UTF-8 characters were automatically converted to Unicode, adversely affecting readability and resulting in token wastage. <!-- Thank you for your contribution to the semantic-kernel repo! Please help reviewers and future users, providing the following information: 1. Why is this change required? 2. What problem does it solve? 3. What scenario does it contribute to? 4. If it fixes an open issue, please link to the issue here. --> ### Description [ This is the second submission addressing feedback from the initial review.](microsoft#4128) Changes have been made to explicitly use `UnsafeRelaxedJsonEscaping` for all serialization cases, as suggested by @stephentoub. Please review the modifications and provide further feedback. Thank you! <!-- Describe your changes, the overall approach, the underlying design. These notes will help understanding how your code works. Thanks! --> ### Contribution Checklist <!-- Before submitting this PR, please make sure: --> - [x] The code builds clean without any errors or warnings - [x] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [x] All unit tests pass, and I have added new tests where possible - [x] I didn't break anyone 😄 --------- Co-authored-by: SergeyMenshykh <68852919+SergeyMenshykh@users.noreply.github.com> Co-authored-by: Roger Barreto <19890735+RogerBarreto@users.noreply.github.com>
Describe the bug
I'm using WebSearchEngineSkill (BingConnector) in StepwisePlanner.
It return an unexpected token over error by Unicode escape sequences.
This becomes a serious problem when using multi-byte characters like Japanese.
To Reproduce

The following example does not generate an error, but it does show that more tokens are returned than expected.
This console log is as follows.




The escaped string is a Telugu language and Japanese language.
It's return more unicode escape sequences when I use Japanese prompt as below.


:
The number of tokens will be exceeded in no time.
Expected behavior
To work without converting to unicode escape sequences.
Platform
The text was updated successfully, but these errors were encountered: