Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove syntax serialization logic. #70277

Merged
merged 31 commits into from
Nov 7, 2023

Conversation

CyrusNajmabadi
Copy link
Member

@CyrusNajmabadi CyrusNajmabadi commented Oct 6, 2023

Fixes #70275.

Reviewed and approved by API review. We have shipped already with this functionality deprecated. This PR now removes the functionality from our next release.

--

The linked issue explains the motivation here. But the TLDR is: this functionality only existed (and was only usable) for exactly one scenario. A single host (like VS) dumping trees out of memory to disk, and then loading them back into memory in the same session. The data dumped to disk contains references to live memory objects, and as such could not work across multiple processes (like between VS and OOP), or even within the same host across different sessions (like if you closed/reopened VS). We only had this for the 32bit days when memory was scarce, and dumping trees to disk allowed us to free up the 32bit address space. Those days are long behind us, and we've already removed the codepaths that actually dump trees to disk. So nothing in roslyn itself is even utilizing this functionality anymore.

This saves 20k of code, and allows us to now make interesting changes to the serialization subsystem that will benefit our current architecture. For example, moving it to sit on pipes, instead of streams, or allowing it to be replaced with something like messagepack, instead of our home-rolled serialization logic (which was only necesary to do wacky stuff like encode pointers to live memory objects stored in static locations).

@dotnet-issue-labeler dotnet-issue-labeler bot added Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead labels Oct 6, 2023
@CyrusNajmabadi CyrusNajmabadi changed the title WIP - Remove IObjectWritable Remove syntax serialization logic. Oct 27, 2023
@CyrusNajmabadi CyrusNajmabadi marked this pull request as ready for review October 27, 2023 19:41
@CyrusNajmabadi CyrusNajmabadi requested review from a team as code owners October 27, 2023 19:41
@@ -276,14 +276,11 @@ static string EncodeIndices(IEnumerable<int> indices, int additionalLocationsLen
return $"https://learn.microsoft.com/dotnet/fundamentals/code-analysis/style-rules/{id.ToLowerInvariant()}";
}

public sealed class LocalizableStringWithArguments : LocalizableString, IObjectWritable
public sealed class LocalizableStringWithArguments : LocalizableString
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IObjectWritable as a concept goes away. Instead of the serialization system taking in arbitrary objects, querying for that, and having that object then decide what to do, serialization is owned entirely on the outside by the objects, who just use our serialization system as a dumb stream wrapper to do things like write ints, bools, and strings to. Only basic constructs are supported (like primitive types, and arrays of types). Arbitrary object graphs are now handled by the caller (note: we have no such cases in roslyn anyways. it was only 'syntax' with its heterogenous graph structure).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LocalizableStirng used to have to be serializable, because it was stored in diagnostics as the messages, and diagnostics were stored inside a tree for syntax errors, and as such needed to be serializable if we serialized the tree. Now that we no longer serialize trees, neither diagnostics nor locstrings need this functionality.

{
private readonly LocalizableString _messageFormat;
private readonly string[] _formatArguments;

static LocalizableStringWithArguments()
=> ObjectBinder.RegisterTypeReader(typeof(LocalizableStringWithArguments), reader => new LocalizableStringWithArguments(reader));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ObjectBinder.RegisterTypeReader was how types specified how they could be deseriealized. But it worked by registering a callback function at an index in an array, then emitting that array index into the serialized stream of bytes. this, of course, coudl not work across processes or across sessions as that index woudl never be the same. this is why this functionality is so practically useless.

@@ -14,26 +14,14 @@

namespace Microsoft.CodeAnalysis.CSharp
{
internal sealed class MessageProvider : CommonMessageProvider, IObjectWritable
internal sealed class MessageProvider : CommonMessageProvider
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MessageProvider is part of loc strings. Also doesn't need serialization now.

@@ -10,11 +10,6 @@ namespace Microsoft.CodeAnalysis.CSharp
{
internal class SyntaxDiagnosticInfo : DiagnosticInfo
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

diagnostic-infos are part of diagnostics. which no longer need to be serialized now.

}

#endregion

public override string GetMessage(IFormatProvider? formatProvider = null)
{
var culture = formatProvider as CultureInfo;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writing this comment here above the generated file. THe generated file is where the buld of the deletion comes from as we no longer need to emit all that goop for syntax serialization.

@@ -53,11 +53,6 @@ internal CSharpSyntaxNode(SyntaxKind kind, DiagnosticInfo[] diagnostics, SyntaxA
GreenStats.NoteGreen(this);
}

internal CSharpSyntaxNode(ObjectReader reader)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a lot of the core syntax types that follow will just see their internal serialization support removed.

new PrimitiveArrayMemberTest();
new PrimitiveMemberTest();
new PrimitiveValueTest();
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i deleted all the tests related to object graph serialization. all the tests testing things like writing out primitives/arrays/strings/etc. all remain and are still relevant.

/// <summary>
/// Map from a <see cref="Type"/> to the corresponding index in <see cref="s_types"/> and
/// <see cref="s_typeReaders"/>. <see cref="ObjectWriter"/> will write out the index into
/// the stream, and <see cref="ObjectReader"/> will use that index to get the reader used
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was the part that was truly nasty.

}

return array;
throw ExceptionUtilities.UnexpectedValue(elementKind);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't support arrays of aribtrary objects. it always needs to be an array of some primitive type (which is all we ever do in roslyn).

Copy link
Member

@jcouv jcouv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done with review pass (iteration 18). Only looked at Compiler and Tools folders

@CyrusNajmabadi
Copy link
Member Author

PR feedback responded to.

Copy link
Member

@jcouv jcouv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks (iteration 26). Looked at Compilers and Tools changes only

@jcouv jcouv self-assigned this Nov 6, 2023
@@ -266,6 +259,9 @@
<Compile Include="$(MSBuildThisFileDirectory)Options\IPublicOption.cs" />
<Compile Include="$(MSBuildThisFileDirectory)Options\PublicOptionFactory.cs" />
<Compile Include="$(MSBuildThisFileDirectory)Progress\NullProgress.cs" />
<Compile Include="$(MSBuildThisFileDirectory)Serialization\ObjectReader.cs" />
<Compile Include="$(MSBuildThisFileDirectory)Serialization\ObjectWriter.cs" />
<Compile Include="$(MSBuildThisFileDirectory)Serialization\TextEncodingKind.cs" />
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these types moved out of hte compielr entirely. it's now just pulled into the IDE layers and nothing else.

@CyrusNajmabadi
Copy link
Member Author

@RikkiGibson @333fred @jjonescz for another set of eyes when available.

@ToddGrun for IDE side. Thanks!

@jaredpar jaredpar self-assigned this Nov 6, 2023
Copy link
Member

@jaredpar jaredpar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signing off on compiler side

@@ -39,23 +39,9 @@ private CSharpTestSource(object value)
{
var stringText = SourceText.From(text, encoding ?? Encoding.UTF8, checksumAlgorithm);
var tree = SyntaxFactory.ParseSyntaxTree(stringText, options ?? TestOptions.RegularPreview, path);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could simplify this to just return SyntaxFactory...

@@ -23,23 +23,9 @@ Public Structure BasicTestSource

Dim sourceTest = SourceText.From(text, If(encoding, Encoding.UTF8), checksumAlgorithm)
Dim tree = SyntaxFactory.ParseSyntaxTree(sourceTest, If(options, TestOptions.RegularLatest), path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could simplify this to Return SyntaxFactory...

private static Exception NoSerializationWriterException(string typeName)
{
return new InvalidOperationException(string.Format(Resources.Cannot_serialize_type_0, typeName));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

}

I was going to make a comment that the comment below is no longer in compiler and there is a bidirectional map available. But then I noticed that it uses ImmutableDictionarys for it's mapping, which is definitely not perf friendly. As the usage here doesn't look like it needs modifications, would it make sense for there to be a NonMutableBidirectionalMap?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. that would make a ton of sense.

Copy link
Contributor

@ToddGrun ToddGrun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@CyrusNajmabadi CyrusNajmabadi merged commit 545c540 into dotnet:main Nov 7, 2023
27 checks passed
@CyrusNajmabadi CyrusNajmabadi deleted the nodeSerialization2 branch November 7, 2023 17:38
@ghost ghost added this to the Next milestone Nov 7, 2023
@RikkiGibson RikkiGibson modified the milestones: Next, 17.9 P2 Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers untriaged Issues and PRs which have not yet been triaged by a lead
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Deprecate syntax serialization logic.
5 participants