-
Notifications
You must be signed in to change notification settings - Fork 757
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support a "data"-like field on CompletionList that is also returned to the server in completionItem/resolve to avoid duplication in CompletionItem.data #1802
Comments
Just remember that If we are going to specify a merge operation the client must perform, then the exact detailed merge behaviour and semantics must be fully specified by the protocol. Saying 'what JavaScript does' is not helpful if the client isn't JavaScript. |
I do see the need for something like this due to performance reasons. But before we go down the pass can you collect some numbers that show how much this will speed things up (e.g time it take to show the completion in the UI). It will complicate the protocol and adds effort for clients to implement this and I want to ensure it is worth the effort. |
It's difficult to give concrete numbers because it varies a lot across environments and payloads, and I can't currently measure what difference it would make without implementing it. My main motivation is to reduce completion payload sizes because currently they can be a few megabytes (more on that below) and in Codespaces this can be really slow (like 5-6 seconds). I'm not sure why it's so slow, but I see the data over the websocket is batched into 256kb chunks and the timestamps seem much further apart than I'd expect (). There are a number of contributing factors to the payload being so large:
There are many other things making the payload large which I'm working on, but being able to round-trip some context without duplicating it on every item will definitely help. The whole purpose of Another possibility would be to allow the server to keep some state between If you have other ideas that would be better, I'm all ears :-) |
Have you every tried not sending the data at all but instead doing the following:
I know that this holds onto memory on the server however code complete requests are so frequent that you can always free the data hold for the last request when a new one arrives. This would not only benefit the communication between servers and the extension host but also in VS Code a remote setup. ItemDefaults are a LSP concept and don't exist in VS Code. The data is basically inflated when VS Code talks from the extension host to the renderer. |
I've thought about this a few times (I think we actually did this in the past) but I wasn't sure how reasonable it was for a server to only support resolve for the "last completion request". I can imagine some situations where this might have issues:
I don't think this would affect VS Code (I don't think it filters client-side when it's sending another request?) and it's probably likely that the new completion request would complete and a new item resolved anyway, so perhaps that's not an issue. I'll think about this a bit more and maybe try it out and see how it works. |
For me I ran a test with a whole bunch of completions. Something that returned over 900 items for a completion list. With the data not being passed (I was only sending the |
@dbaeumer if we decide to do this, could we add something to the spec about it? Right now the spec doesn't say anything about whether calling Ideally, the spec would say that clients should only call it for the last one, because that would avoid needing to even round-trip some identifier (which would still be a bit of junk to duplicate on every single completion item in a large list). |
@DanTup only allowing to resolve the a completion item from the last completion request makes total sense to me. |
I think per filename would make more sense. You could have a completion running in one editor and switch away to make a change in another editor. Same with signature help. |
@dbaeumer great, I'll open a PR (and post a link back here in case others in this thread have feedback/suggestions).
This wouldn't be possible without But it also means the server would have to keep a lot more state (the last completion for every file). I think if you switch to another editor and back, it's probably reasonable to just re-invoke code completion (VS Code already does not keep the completion widget open if you switch editor). |
…t completion request See microsoft#1802 (comment)
I opened a PR here that says |
I agree with @DanTup here. When switching editors the client should actually cancel the last completion request since its result might be not correct anymore anyways. |
@dbaeumer are you happy with #1834? (there's a "Community PR Approvals" check that seems stuck?). I'd prefer to have that merged before I start making server changes assuming that's good 🙂 (if we merge that, I think we can close this, as keeping the state on the server provides the ability to do everything that this would) |
We had a longer discussion about that problem and due to that fact the code could directly talk to the server we can't spec that a resolve request can only be sent for items from the last code complete request (although this is the case in 99%). The only way I can see to tackle this is to have an explicit release call that client can send to the server. Something like this:
This will allow servers to keep state for a completion item on the server instead of attaching all state to the completion item itself. This approach however has to go behind a capability flag but it is implementable for VS Code. |
@dbaeumer do you mean releasing each completion item? Adding a unique ID to each completion item feels like it's going to add more to the payload that the goal was to remove. I wonder if we'd be better trying to do the original plan here (a mergeable data) instead? Or, how about a new field ( Something like that seems way simpler - both for LSP/spec, and for servers (no need to keep state, worry about it not being cleaned up, no extra per-item data to track IDs). |
Yes. But I doubt that this will add more data since a single ID field / property would make the whole data property go away on these completion items. I am pretty sure that in total that will be a smaller payload in the cases were servers add state to the data property The problem with the merge is that users will ask for more and more complicated merging algorithms. The next thing I already see users asking for is to allow to template paths since the majority of the paths only differ in small parts. I do understand the need of lowering the payload but I am not convinced that the merging is the right solution. |
It would definitely be smaller than it is today, but it still feels needless verbose. My current goal is to strip everything I don't need from the payload, so trading a large property for a small property for potentially a large number of items is not as good as removing it :-)
To be clear, my last suggestion above involved no merge. I was asking that we have a second field (in addition to textDocument/completion result:{
"context": {
"foo": "bar",
},
"items": [
{
"label": "...",
"data": { "a": "b" }
}
]
} completion / resolve:{
"label": "...",
"data": { "a": "b" }
"context": {
"foo": "bar",
},
} This seems like a much simpler solution than having to release completion items (something I'm not sure clients would bother to implement), has no complexity of merge, and has no restrictions on the ability to call Edit: For my specific case, even just sending the original completion arguments as |
For what's worth from a client implementation perspective I'd much rather have an additional property in the |
@dbaeumer any thoughts/opinions on the above? |
I am still not convinced that this will drastically reduce the amount of data servers add to the data field of a completion item. @DanTup could you provide some example of before and after. @dibarbet and @jdneo do you have any insights / number you can share about the payload C# / Java encodes into the data field and if such a context on completion list would help lowering the payload significantly. |
@dbaeumer it's difficult to give specifics because it depends very much on the specific context (for example how many completion items there are, how many of them need context to provide auto-imports, etc.) but as an example, I just created a new file in a Flutter project which has no dependencies other than Flutter and invoking completion at the top level has the file path repeated 6220 times: That's 78 * 6220 = which is 485,160 characters just to provide the server with the filename of where it will be inserting This number will go down if some of the items are already imported (because they won't need this in Of course, this can be reduced with (I'm aware there are other savings to be made in the screenshot above - I've made some that haven't shipped in the SDK I'm using, and I've still some to make :) ) |
From the Java side, we don't have such request for now. But just in several months ago, we did some refactoring to remove some unnecessary data field in the completion items which helps to improve the completion performance. One example is: We previously stored the document uri into the data field for each completion items (which looks similar as @DanTup mentioned in the dart), and then we found that is not necessary, so we remove it from the data field. After removing that uri string, triggering completion via 'S' in Spring Petclinic project, the response (textDocument/completion) payload size can be reduced from 3.05MB to 2.63MB. (Directly copy the trace string to a text file). And the completion time becomes a little bit faster. More details: eclipse-jdtls/eclipse.jdt.ls#2614. |
@jdneo do I understand correctly that you're stitching this data back in in the LSP client? (eg., it won't work for other generic LSP clients)? I was also considering something like that for Dart if LSP doesn't support it, but it seems silly not to include it in LSP if clients and servers are going to build custom support for exactly this anyway. If both Dart and Java benefit significantly from extracting this from |
In our case, the uri is used during resolve stage. The way we remove that uri is: For every completion item, we have a generated unique id for it. And we add a cache at the server side that maps id -> the context of the completion item (things like uri, etc...). Then we only put that id to the data field. At the completion resolve stage, the server side can recover the context via the id and do some further tasks. |
@jdneo what happens for items that are never resolved? When do you clean up the context? There were some suggestions about this above, but it seemed complicated to manage releasing the context, which is why I was advocating for just adding a new field to |
We remove the cache when a new completion request comes. Because when new request comes, the contexts of the old completion request are all out-of-date. |
Ahh - we discussed that above and was decided that this was not safe - see #1802 (comment). The issue is that extensions could call for completions in VS Code, so each caller might have its own idea of what the "last" completion request was (and they may clear each others contexts). |
Sorry I'm a little bit lost here. Would you mind give an example why it won't work? |
I wasn't involved in the discussions @dbaeumer mentioned, but my understanding was that because completions are available via the API, you could in theory have the following situation:
|
Today, C# encodes a single resultId which is associated with a cache entry on the server side that contains the text document and other information needed to compute the resolve information. This information is unique to the list, not per item, so we already use the So in our case, a merged context wouldn't save us a lot since we already only have the data item once in the payload. I did want to touch on the caching and resolve (in general) a bit though: As mentioned above, if anyone tries to resolve a completion item older than 3 requests, we will fail. In general, I don't think we can resolve an arbitrary item (even with storing more context in the data object) unless we also specify how resolve should behave if there are document changes in between the original request and the resolve request. On the server side we can't hold onto arbitrarily old snapshots, its too much information. Specifying that a resolve can only happen if there are no changes would allow us to store enough context on the item to handle any number of completion and resolve requests (as long as the state doesn't change). Or something like |
Thank you @DanTup. I'm thinking this from user's perspective. No matter which extension and how it triggers the completion, user can only see one active completion list, which is the last one. When user is selecting the items among the list, the items all come from the last completion request. Other older completions become out-of-date. So, in our Java LS, if the context is cleared, the LS just skip the resolve (fail quietly). There may be some situations that we have not considered. If there is, please let me know. Thanks! |
@jdneo in my (hypothetical) scenario above, the extensions requesting completions are not shown in the editor. VS Code exposes an API that allows extensions to just request code completion results (not to trigger display of them in the editor). I don't know what this might be used for (I use it only for testing), but @dbaeumer might have some more insight into that.
It sounds like you have the same issue that @dbaeumer expressed concern about above (clearing after some number of requests may not be sound). If I understand, fixing this would result in the same problem - that you can't provide both per-list and per-item |
Not quite - even if we didn't use a cache we would still only be storing per-list data, we don't need different data on items currently |
Ah, gotcha. But I noticed you included the word "currently", so this could be an issue in the future? I guess my point is that this doesn't seem like a Dart-specific problem to me, I really feel there is a need to have something standard here, because currently some languages are doing things that are discounted as unsound above and the only other way is to use non-standard LSP (which is what I've recently been considering). @dbaeumer do the comments (and payload sizes!) above convince you that this is worth supporting in LSP? |
@dibarbet @jdneo due to the fact the something like completions can be requested using code as well the following can happen:
The likelihood that this happens is very low but it can. To avoid the clearance of the context cache on the server my proposal was that we add a capability that a client could inform the server of the release of items, so that the server can manage the context cache correctly. @DanTup you are talking about the data property on the item list. Not about the original request of merging. |
Yep! I'll change the title of this issue to make it clearer. My suggestion was to call it something like "context" but I see there's already something called context for completion. I've called it "listData" here (to avoid confusion with the existing
|
Yeah - having the client tell us when its 'done' with the original completion request would give us a more deterministic way of clearing the cache. We'd just have to be careful about how many of those we're holding onto at once - the context for us can be quite expensive to keep (the entire snapshot of the solution). The same issue applies to pretty much all the Absolutely seems like a reasonable feature to me, but the priority for us on the C# side isn't super high.
To put it another way, I can't think of any existing or planned features where we'd use it in C#. |
@dbaeumer I'm still keen to progress this to reduce my payloads :-) My suggestion is above (#1802 (comment)). I think having the client tell the server to "release" the cached items is a lot of unnecessary complexity on both sides and this provides the same functionality in a much simpler way. I'm happy to look at PRs for both the spec and VS Code LSP client if we can agree this is a reasonable way forward. |
I am still the opinion that keeping as much state as possible on the server is the right thing to do. We do the same in remote where we don't send all state to the renderer, but keep as much as possible in the remote extension host. But since we do have |
I understand, but doing so here requires lots more complexity, and adding more data to every item (which is the very thing I'm trying to reduce).
This isn't quite what I was proposing above. I'm happy for no merging, but we need both fields on the list and fields on the item. I wanted to have two distinctly named fields and have them both sent back to the server, like this: Completion Response{
"listData": {
// Properties here that are the same for every item and currently are being duplicated a lot
"filename": "/long/path/required/for/resolve/foo.dart"
},
"items": [
{
"label": "Foo",
"detail": "Auto-Import from dart:core",
"data": {
// Properties here are per-item
"import-uri": "dart:core"
}
}
]
} Resolve request{
"label": "Foo",
"detail": "Auto-Import from dart:core",
// Original data from the item
"data": {
"import-uri": "dart:core"
},
// From the list
"listData": {
"filename": "/long/path/required/for/resolve/foo.dart"
}
} I don't know that I provided numbers above showing that we can sometimes see over half a MB of payload just being duplicated data in every items |
What I have in mind is the following:
on the resolve call the data field of the item to resolve is Object.assign({}, list.data, item.data); |
Oh, that would be absolutely fine by me. I wasn't sure if this is what you meant you didn't want by "no merging" (or if it's what @puremourning had concerns about in the second comment above). Although do we need to think about this interacts with
|
@dbaeumer the issue of payload sizes came up again and I'd like to make some progress on this. Would you accept PRs based on the above? To summarize, my understanding is: Add to
|
(Edit 2023-12-04: This request has changed slightly throughout the discussion - see #1802 (comment) below)
(from microsoft/vscode-languageserver-node#1237 (comment))
There's an
itemDefaults.data
field for completion that allowsdata
to be included once in a completion response rather than duplicated across all items.For Dart, the
data
field contains a mix of data that is the same for all items (eg. the file the completion is being inserted into so we can compute edits for addingimport
s where required - since/resolve
doesn't get any context) and data that is different (an ID to get back to the element being inserted so we can resolve things like documentation). SinceitemDefaults.data
replaces the whole ofdata
we can't use it here, so we end up with a large payload with a lot of duplicated into.It would be very helpful to have an option to merge
data
from items over the default (for ex.Object.assign(itemDefaults.data, item.data)
?).I'm happy to send PRs for this, but I want to agree an approach first:
data
or all fields?itemDefaults.mergedData
?)Object.assign(itemDefaults.data, item.data)
flexible enough? (you can usenull
to erase something from the defaults for a given item?)mergedData
) in thecompletionList.itemDefaults
set?@dbaeumer WDYT?
The text was updated successfully, but these errors were encountered: