Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.Net: Unicode escape sequences consume too many tokens when using WebSearchEngineSkill (BingConnector) in StepwisePlanner #2820

Closed
takeo-iw opened this issue Sep 14, 2023 · 4 comments · Fixed by #4327
Labels
core plugin Anything related to core plugins .NET Issue or Pull requests regarding .NET code
Milestone

Comments

@takeo-iw
Copy link

Describe the bug
I'm using WebSearchEngineSkill (BingConnector) in StepwisePlanner.
It return an unexpected token over error by Unicode escape sequences.
This becomes a serious problem when using multi-byte characters like Japanese.

To Reproduce
The following example does not generate an error, but it does show that more tokens are returned than expected.
image

This console log is as follows.
image
image
image
image

The escaped string is a Telugu language and Japanese language.

It's return more unicode escape sequences when I use Japanese prompt as below.
image
  :
image

The number of tokens will be exceeded in no time.

Expected behavior
To work without converting to unicode escape sequences.

Platform

  • OS: Mac
  • IDE: VS Code
  • Language: C#
  • Source:
image
@shawncal shawncal added .NET Issue or Pull requests regarding .NET code triage labels Sep 14, 2023
@lemillermicrosoft
Copy link
Member

MaxTokens configuration parameter is to configure the max tokens to allow in generated prompts and completions within the planner. It does not restrict the output of the plan execution.

@takeo-iw
Copy link
Author

takeo-iw commented Sep 15, 2023

@lemillermicrosoft

This issue is that I'm concerned about Bing search result strings are passed remain escaped to the chat api, so many tokens are consumed.

@nacharya1 nacharya1 added planner Anything related to planner or plans and removed triage labels Sep 19, 2023
@nacharya1 nacharya1 moved this to Backlog - Planner in Semantic Kernel Sep 19, 2023
@nacharya1 nacharya1 added this to the R3: Cycle 3 milestone Sep 19, 2023
@huyqta
Copy link

huyqta commented Oct 27, 2023

Hi, any update for this issue? I'm also facing this now. Is there any option to render response without unicode escape by some specific languages such as Vietnamese or Finnish or Arabian from Bing Search? The output is only plain text, how can it be HTML or Markdown so I can render it on page. If you have any idea to do, please share with me. Thank you so much!

@matthewbolanos matthewbolanos added core plugin Anything related to core plugins and removed planner Anything related to planner or plans prune labels Nov 27, 2023
@yuezhishun
Copy link
Contributor

@huyqta
I've encountered the same issue.If you trust the search results, you can refrain from encoding the search results.
Check here

yuezhishun added a commit to yuezhishun/semantic-kernel that referenced this issue Dec 16, 2023
github-merge-queue bot pushed a commit that referenced this issue Dec 20, 2023

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
…e encoding (#4327)

Fixes #2820
### Motivation and Context
This PR resolves an issue where search results containing UTF-8
characters were automatically converted to Unicode, adversely affecting
readability and resulting in token wastage.
<!-- Thank you for your contribution to the semantic-kernel repo!
Please help reviewers and future users, providing the following
information:
  1. Why is this change required?
  2. What problem does it solve?
  3. What scenario does it contribute to?
  4. If it fixes an open issue, please link to the issue here.
-->

### Description
[ This is the second submission addressing feedback from the initial
review.](#4128) Changes
have been made to explicitly use `UnsafeRelaxedJsonEscaping` for all
serialization cases, as suggested by @stephentoub. Please review the
modifications and provide further feedback. Thank you!
<!-- Describe your changes, the overall approach, the underlying design.
These notes will help understanding how your code works. Thanks! -->

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [x] I didn't break anyone 😄

---------

Co-authored-by: SergeyMenshykh <68852919+SergeyMenshykh@users.noreply.github.com>
Co-authored-by: Roger Barreto <19890735+RogerBarreto@users.noreply.github.com>
@github-project-automation github-project-automation bot moved this from Backlog to Sprint: Done in Semantic Kernel Dec 20, 2023
Bryan-Roe pushed a commit to Bryan-Roe-ai/semantic-kernel that referenced this issue Oct 6, 2024
…to Unicode encoding (microsoft#4327)

Fixes microsoft#2820
### Motivation and Context
This PR resolves an issue where search results containing UTF-8
characters were automatically converted to Unicode, adversely affecting
readability and resulting in token wastage.
<!-- Thank you for your contribution to the semantic-kernel repo!
Please help reviewers and future users, providing the following
information:
  1. Why is this change required?
  2. What problem does it solve?
  3. What scenario does it contribute to?
  4. If it fixes an open issue, please link to the issue here.
-->

### Description
[ This is the second submission addressing feedback from the initial
review.](microsoft#4128) Changes
have been made to explicitly use `UnsafeRelaxedJsonEscaping` for all
serialization cases, as suggested by @stephentoub. Please review the
modifications and provide further feedback. Thank you!
<!-- Describe your changes, the overall approach, the underlying design.
These notes will help understanding how your code works. Thanks! -->

### Contribution Checklist

<!-- Before submitting this PR, please make sure: -->

- [x] The code builds clean without any errors or warnings
- [x] The PR follows the [SK Contribution
Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md)
and the [pre-submission formatting
script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts)
raises no violations
- [x] All unit tests pass, and I have added new tests where possible
- [x] I didn't break anyone 😄

---------

Co-authored-by: SergeyMenshykh <68852919+SergeyMenshykh@users.noreply.github.com>
Co-authored-by: Roger Barreto <19890735+RogerBarreto@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core plugin Anything related to core plugins .NET Issue or Pull requests regarding .NET code
Projects
Archived in project
8 participants