.Net: Unicode escape sequences consume too many tokens when using WebSearchEngineSkill (BingConnector) in StepwisePlanner #2820

takeo-iw · 2023-09-14T14:59:34Z

Describe the bug
I'm using WebSearchEngineSkill (BingConnector) in StepwisePlanner.
It return an unexpected token over error by Unicode escape sequences.
This becomes a serious problem when using multi-byte characters like Japanese.

To Reproduce
The following example does not generate an error, but it does show that more tokens are returned than expected.

This console log is as follows.

The escaped string is a Telugu language and Japanese language.

It's return more unicode escape sequences when I use Japanese prompt as below.

　　:

The number of tokens will be exceeded in no time.

Expected behavior
To work without converting to unicode escape sequences.

Platform

OS: Mac
IDE: VS Code
Language: C#
Source:

lemillermicrosoft · 2023-09-14T17:47:27Z

MaxTokens configuration parameter is to configure the max tokens to allow in generated prompts and completions within the planner. It does not restrict the output of the plan execution.

takeo-iw · 2023-09-15T00:09:20Z

@lemillermicrosoft

This issue is that I'm concerned about Bing search result strings are passed remain escaped to the chat api, so many tokens are consumed.

huyqta · 2023-10-27T03:05:45Z

Hi, any update for this issue? I'm also facing this now. Is there any option to render response without unicode escape by some specific languages such as Vietnamese or Finnish or Arabian from Bing Search? The output is only plain text, how can it be HTML or Markdown so I can render it on page. If you have any idea to do, please share with me. Thank you so much!

yuezhishun · 2023-12-10T10:40:16Z

@huyqta
I've encountered the same issue.If you trust the search results, you can refrain from encoding the search results.
Check here

…consistent encoding Fixes microsoft#2820

@stephentoub

…e encoding (#4327) Fixes #2820 ### Motivation and Context This PR resolves an issue where search results containing UTF-8 characters were automatically converted to Unicode, adversely affecting readability and resulting in token wastage.  ### Description [ This is the second submission addressing feedback from the initial review.](#4128) Changes have been made to explicitly use `UnsafeRelaxedJsonEscaping` for all serialization cases, as suggested by @stephentoub. Please review the modifications and provide further feedback. Thank you!  ### Contribution Checklist  - [x] The code builds clean without any errors or warnings - [x] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [x] All unit tests pass, and I have added new tests where possible - [x] I didn't break anyone 😄 --------- Co-authored-by: SergeyMenshykh <68852919+SergeyMenshykh@users.noreply.github.com> Co-authored-by: Roger Barreto <19890735+RogerBarreto@users.noreply.github.com>

@stephentoub

…to Unicode encoding (microsoft#4327) Fixes microsoft#2820 ### Motivation and Context This PR resolves an issue where search results containing UTF-8 characters were automatically converted to Unicode, adversely affecting readability and resulting in token wastage.  ### Description [ This is the second submission addressing feedback from the initial review.](microsoft#4128) Changes have been made to explicitly use `UnsafeRelaxedJsonEscaping` for all serialization cases, as suggested by @stephentoub. Please review the modifications and provide further feedback. Thank you!  ### Contribution Checklist  - [x] The code builds clean without any errors or warnings - [x] The PR follows the [SK Contribution Guidelines](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md) and the [pre-submission formatting script](https://github.com/microsoft/semantic-kernel/blob/main/CONTRIBUTING.md#development-scripts) raises no violations - [x] All unit tests pass, and I have added new tests where possible - [x] I didn't break anyone 😄 --------- Co-authored-by: SergeyMenshykh <68852919+SergeyMenshykh@users.noreply.github.com> Co-authored-by: Roger Barreto <19890735+RogerBarreto@users.noreply.github.com>

shawncal added .NET triage labels Sep 14, 2023

nacharya1 added planner and removed triage labels Sep 19, 2023

nacharya1 added this to Semantic Kernel Sep 19, 2023

nacharya1 added the enhancement label Sep 19, 2023

nacharya1 moved this to Backlog - Planner in Semantic Kernel Sep 19, 2023

nacharya1 added this to the R3: Cycle 3 milestone Sep 19, 2023

markwallace-microsoft added the prune label Oct 10, 2023

matthewbolanos added core plugin and removed planner prune labels Nov 27, 2023

yuezhishun added a commit to yuezhishun/semantic-kernel that referenced this issue Dec 16, 2023

Use UnsafeRelaxedJsonEscaping for JsonSerializer，This change ensures …

d4dc1cd

…consistent encoding Fixes microsoft#2820

yuezhishun mentioned this issue Dec 16, 2023

.Net: Fix for #2820 Avoid unnecessary conversion of strings to Unicode encoding #4327

Merged

4 tasks

RogerBarreto closed this as completed in #4327 Dec 20, 2023

github-project-automation bot moved this from Backlog to Sprint: Done in Semantic Kernel Dec 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.Net: Unicode escape sequences consume too many tokens when using WebSearchEngineSkill (BingConnector) in StepwisePlanner #2820

.Net: Unicode escape sequences consume too many tokens when using WebSearchEngineSkill (BingConnector) in StepwisePlanner #2820

takeo-iw commented Sep 14, 2023

lemillermicrosoft commented Sep 14, 2023

takeo-iw commented Sep 15, 2023 •

edited

Loading

huyqta commented Oct 27, 2023 •

edited

Loading

yuezhishun commented Dec 10, 2023

.Net: Unicode escape sequences consume too many tokens when using WebSearchEngineSkill (BingConnector) in StepwisePlanner #2820

.Net: Unicode escape sequences consume too many tokens when using WebSearchEngineSkill (BingConnector) in StepwisePlanner #2820

Comments

takeo-iw commented Sep 14, 2023

lemillermicrosoft commented Sep 14, 2023

takeo-iw commented Sep 15, 2023 • edited Loading

huyqta commented Oct 27, 2023 • edited Loading

yuezhishun commented Dec 10, 2023

takeo-iw commented Sep 15, 2023 •

edited

Loading

huyqta commented Oct 27, 2023 •

edited

Loading