Add type mapping APIs to customize JSON value serialization/deserialization #30677

roji · 2023-04-12T13:48:36Z

When reading and writing JSON documents, we need to control the precise conversion from a .NET value to a JSON element, to be integrated in the larger JSON document. This is somewhat similar to GenerateNonNullSqlLiteral, but for values inside JSON. In simple cases (int, string), no special logic is needed; but in certain cases, we need to precisely control the representation. For example, a NetTopologySuite Geometry would likely need to be serialized as WKT, which isn't necessarily what ToString returns.

Note the interaction between this issue and #30604, which is about switching to the lower-level Utf8JsonReader/Utf8JsonWriter APIs rather than passing through DOM. If/when we do that, the type mappings would be the ones calling specific APIs e.g. on Utf8JsonWriter.

In addition to serialization/deserialization, we also need a hook for perform a SQL conversion from the JSON (text) value to the actual store type; this is needed when extracting a value from a JSON document and e.g. comparing them to a regular database column. For example, the JSON representation of a timestamp is ISO8601 (with a T), but in SQLite, a timestamp is typically represented as a timestamp text without the T (and so datetime() must be invoked). Similarly, to convert a geometry WKT representation to an actual SQL Server geometry type, a function to parse the WKT needs to be invoked. Moving the server-side conversion part to #30727.

Tests affected:

NorthwindJoinQueryTestBase.Join_local_string_closure_is_cached_correctly (string treated as enumerable of chars)
NorthwindJoinQueryTestBase.Join_local_bytes_closure_is_cached_correctly (same)

The text was updated successfully, but these errors were encountered:

AndriySvyryd · 2023-04-21T23:34:26Z

Consider using an abstraction that doesn't depend on a particular implementation, see CosmosSerializer.
For example, CoreTypeMapping could have T FromJson<T>(Stream) and void ToJson<T>(T, Stream) (and async versions) that would only need work for a T supported by a given type mapping and would instantiate (without allocating) a Utf8JsonReader/Utf8JsonWriter in the implementation.
To replace the underlying serializer a provider-specific ITypeMappingSourcePlugin implementation could be used (#28043)

The base implementation could call JsonSerializer.Serialize/Deserialize

We could allow JsonReaderOptions and JsonWriterOptions or a custom type in context options to control the default implementation.

AndriySvyryd · 2023-04-22T07:20:23Z

Looking into this a bit more I don't think it's possible to create a performant abstraction that would allow to replace the implementation without massive amounts of code.

Some things to consider:

We can't control in what order the properties will come out of the query, so we can't get the correct type mapping until we read the property name.
When reading from a stream we need a callback to increase the buffer size in case a token doesn't fit.

So here's another proposal:

T FromJson<T>(ref Utf8JsonReader, Action<bool>?)
void ToJson<T>(Utf8JsonWriter, T)

For the server-side conversion consider #30729.

roji · 2023-04-22T07:41:27Z

We could allow JsonReaderOptions and JsonWriterOptions or a custom type in context options to control the default implementation.

Opened #30744 to track that.

We can't control in what order the properties will come out of the query, so we can't get the correct type mapping until we read the property name.

Yeah; our core deserialization logic would be the one iterating over the properties, and calling the type mapping API only for the values of those properties based on the names.

When reading from a stream we need a callback to increase the buffer size in case a token doesn't fit.

Unless I'm mistaken, EF currently always reads the entire row into memory at the DbDataReader level (i.e. we don't specify CommandBehavior.SequentialAccess); one reason for that is that expression trees aren't compatible with async, which would be necessary if we wanted to stream the row.

However, it seems that Utf8JsonReader buffers by property anyway. In other words, according to this sample, you call Read() to get the next property, and if that returns false, that means there aren't enough bytes in the buffer and you need to get more. If that returns true, it seems that you're guaranteed to be able to read the next property; note that there's a GetString() API, but no TryGetString - which I understand to mean that string values are buffered, and you never need to deal with streaming the property value. If that's the case (we can verify), we don't need to worry about this at all.

T FromJson(ref Utf8JsonReader, Action?)
void ToJson(Utf8JsonWriter, T)

Looks good - though what's the Action<bool> for?

AndriySvyryd · 2023-04-22T07:58:12Z

Looks good - though what's the Action for?

That's the callback to increase the buffer size.

If that returns true, it seems that you're guaranteed to be able to read the next property;

Utf8JsonReader doesn't buffer at all. If the current token doesn't fit in the passed in buffer you need to increase the buffer size to be able to keep the part that it tried to read so far. In other words, when reading from a stream the buffer must be able to fit the largest token.

roji · 2023-04-22T08:06:46Z

@AndriySvyryd the question is whether a single token is buffered - the Utf8JsonReader seems to suggest it is... To read a single string property via Utf8JsonReader, I think the user does the following:

if (!reader.Read())
{
    // Not enough bytes for the next token!
    // Return to read more bytes from the stream etc.
}

var string = reader.GetString();

GetString() doesn't have a way of telling you that it didn't have enough bytes (e.g. there's no TryGetString). So if I'm getting it right, within a specific property value (which is what our type mapping API handles), we shouldn't need to worry about this. We can set up a quick experiment to prove this.

Am I misunderstanding the API?

AndriySvyryd · 2023-04-22T08:27:50Z

GetString() doesn't have a way of telling you that it didn't have enough bytes (e.g. there's no TryGetString).

Right, but Read won't return true unless the next token is completely contained in the buffer. And we'll need to call it for collections of primitive types.

AndriySvyryd · 2023-04-22T08:43:16Z

Perhaps a better signature would be
bool TryReadFromJson<T>(ref Utf8JsonReader, out T)

AndriySvyryd · 2023-04-22T09:32:31Z

Perhaps a better signature would be
bool TryReadFromJson(ref Utf8JsonReader, out T)

No, this implies that the buffer should fit the entire value, not just the largest token.

How about this?

ref struct Utf8JsonReaderManager
{
    readonly Stream? _stream;
    byte[]? _buffer;
    ref Utf8JsonReader _currentReader;

    public ref Utf8JsonReader MoveNext();
}

T FromJson(ref Utf8JsonReaderManager)

roji · 2023-04-22T13:39:18Z

GetString() doesn't have a way of telling you that it didn't have enough bytes (e.g. there's no TryGetString).

Right, but Read won't return true unless the next token is completely contained in the buffer. And we'll need to call it for collections of primitive types.

I think I understand where we have a disconnect... The APIs I was thinking of here were about the element type mapping, i.e. reading a timestamp, int or string (i.e. a primitive JSON token) - those shouldn't be a problem. But you're right that we need to think about the collection, and also about nested collections.

However, if we do want to allow streaming, don't we have a bigger general problem with value converters, regardless of JSON? In other words, since we're implementing serialization/deserialization of the primitive collection as a value converter, we'd need streaming support for value converters, which is something we don't have at the moment... Maybe let's discuss on Monday...

roji · 2023-04-24T21:20:51Z

Design decision: the type mapping API should support streaming. We won't do real streaming (since we don't support that on e.g. DbDataReader because of LINQ expression tree limitations). However, if we implement optimized UTF8 Stream access via DbDataReader.GetStream() (should be supported at least with PostgreSQL, SQL Server), we can read small chunks out of that stream and hand those chunks to the type mapping.

roji · 2023-04-24T21:48:58Z

ref struct Utf8JsonReaderManager

Yeah, generally looks good - though if we go in this direction, maybe it's possible to simply expose GetNextReader(), and encapsulate the details of reading more into the buffer and creating the new reader inside the manager? In other words, I use my reader until there's no more data; at that point I go to the manager and ask for the next reader - it reads from the stream into the buffer, creates a new reader of that and returns it to me.

If we want to, we should be able to make the manager API shape compatible with real streaming (in case we can make that work one day). That would mean making the manager a regular struct, and having a MoveNextAsync returning ValueTask<bool>, after which you call GetReader which returns the reader. The reader needs to be passed separately from the manager (since the manager is no longer a ref struct), and the caller also needs to tell the manager how much data the reader consumed before calling MoveNextAsync (some sort of Commit method that accepts the previous reader and gets BytesConsumed from it.

Whether it's worth the extra API shape complexity is a good question...

/cc @NinoFloris @vonzshik, this bears a bit of resemblence to our type handling stuff in Npgsql :)

AndriySvyryd · 2023-04-25T00:07:14Z

maybe it's possible to simply expose GetNextReader(), and encapsulate the details of reading more into the buffer and creating the new reader inside the manager? In other words, I use my reader until there's no more data; at that point I go to the manager and ask for the next reader - it reads from the stream into the buffer, creates a new reader of that and returns it to me.

I thought that's what my proposal was. What do you think needs to change in the API shape?

If we want to, we should be able to make the manager API shape compatible with real streaming

For async we'll need a separate method - FromJsonAsync and a different manager type anyway, so we can design it later.

roji · 2023-04-25T07:51:13Z

I thought that's what my proposal was. What do you think needs to change in the API shape?

Ah, I misread your proposal as exposing the fields publicly. All good.

For async we'll need a separate method - FromJsonAsync and a different manager type anyway, so we can design it later.

Sure, though at that point it would be a breaking change for all existing type mappings.

One more idea regardless: we should can consider guaranteeing that Read() has already been called on the reader before the manager was passed to the API, and that it returned true. That would allow simple type mappings which just need a single token to just always read it from the reader, without dealing with the Read, getting a new reader, etc. Complex types (like collections) would need to deal with the complexity, but that will be very rare.

AndriySvyryd · 2023-04-25T22:33:08Z

One more idea regardless: we should can consider guaranteeing that Read() has already been called on the reader before the manager was passed to the API, and that it returned true. That would allow simple type mappings which just need a single token to just always read it from the reader, without dealing with the Read, getting a new reader, etc. Complex types (like collections) would need to deal with the complexity, but that will be very rare.

Yes, we can always do this in Utf8JsonReaderManager when returning a new reader, to make the consuming logic simpler

Blocked on dotnet#30677 for serializing spatial types as WKT in JSON Closes dotnet#30630

roji · 2023-05-25T14:11:51Z

Note I moved the server-side conversion part of this to #30727, so this only tracks the serialization/deserialization component.

Part of #30677 There is still work to do here, but this is basically working so I wanted to get it out for review. Remaining work: - Negative cases/exception messages - Re-introduce checks for whether or not query supports primitive collections - Cases where the primitive collection object exists and should be added to - Custom element type mapping on the property - Custom element reader/writer - Custom collection reader/writer - Use default element type mapping configured in ConfigureConventions - Update Cosmos primitive collection mapping - More query tests!

Part of #30677 Builds on #31272 This is the case where the CLR types contains, for example, an `ObservableCollection<int>`. In this case, we populate this existing collection, rather than creating a new instance. Same still needs to be done for relational materialization of primitive collections.

Part of #30677 There is still work to do here, but this is basically working so I wanted to get it out for review. Remaining work: - Negative cases/exception messages - Re-introduce checks for whether or not query supports primitive collections - Cases where the primitive collection object exists and should be added to - Custom element type mapping on the property - Custom element reader/writer - Custom collection reader/writer - Use default element type mapping configured in ConfigureConventions - Update Cosmos primitive collection mapping - More query tests!

Part of #30677 Builds on #31272 This is the case where the CLR types contains, for example, an `ObservableCollection<int>`. In this case, we populate this existing collection, rather than creating a new instance. Same still needs to be done for relational materialization of primitive collections.

roji added type-enhancement area-query area-type-mapping area-json labels Apr 12, 2023

AndriySvyryd added the needs-design label Apr 14, 2023

This was referenced Apr 18, 2023

JSON type representations and conversions to store types #30727

Closed

Metadata and type mapping support for primitive collections #30730

Closed

Primitive collections #30731

Open

ajcvickers added this to the 8.0.0 milestone Apr 19, 2023

ajcvickers self-assigned this Apr 19, 2023

roji mentioned this issue Apr 20, 2023

Support primitive collections #30738

Merged

This was referenced Apr 21, 2023

Cosmos: Allow to use a custom JSON serializer #17306

Open

Allow configuration of JSON serialization for a given property in the model #28835

Open

roji mentioned this issue Apr 22, 2023

Allow the user to specify JsonWriterOptions/JsonReaderOptions globally #30744

Open

This was referenced Apr 26, 2023

Json: add support for collection of primitive types inside JSON columns #28688

Closed

Support mapping (and querying) of geometry collections #30630

Open

roji added a commit to roji/efcore that referenced this issue Apr 26, 2023

Allow primitive collections of spatial types

f2e1444

Blocked on dotnet#30677 for serializing spatial types as WKT in JSON Closes dotnet#30630

This was referenced May 15, 2023

Converting values to primitive types in primitive collections #30893

Closed

EFCore 7 - JSON types- Inserting custom value for DateTime field in JSON column #30892

Closed

roji mentioned this issue May 25, 2023

Parameter for collection of enum values contains ints even when correlated column has strings #30921

Closed

roji changed the title ~~Add type mapping APIs to customize JSON value serialization/deserialization and querying~~ Add type mapping APIs to customize JSON value serialization/deserialization May 25, 2023

ajcvickers mentioned this issue Jul 15, 2023

Part of JSON primitive collection serialization #31272

Merged

This was referenced Jul 17, 2023

Uses existing JSON primitive collection instead of creating a new one #31282

Merged

Added using existing collection for primitive collections not inside a JSON doc #31285

Merged

ajcvickers mentioned this issue Sep 6, 2023

Create fluent API for JsonValueReaderWriter on a property #31647

Open

ajcvickers closed this as completed Sep 6, 2023

ajcvickers added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label Sep 6, 2023

ajcvickers modified the milestones: 8.0.0, 8.0.0-rc2 Sep 6, 2023

ajcvickers removed the needs-design label Oct 9, 2023

ajcvickers modified the milestones: 8.0.0-rc2, 8.0.0 Nov 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add type mapping APIs to customize JSON value serialization/deserialization #30677

Add type mapping APIs to customize JSON value serialization/deserialization #30677

roji commented Apr 12, 2023 •

edited

AndriySvyryd commented Apr 21, 2023 •

edited

AndriySvyryd commented Apr 22, 2023

roji commented Apr 22, 2023 •

edited

AndriySvyryd commented Apr 22, 2023

roji commented Apr 22, 2023

AndriySvyryd commented Apr 22, 2023

AndriySvyryd commented Apr 22, 2023

AndriySvyryd commented Apr 22, 2023

roji commented Apr 22, 2023

roji commented Apr 24, 2023

roji commented Apr 24, 2023

AndriySvyryd commented Apr 25, 2023

roji commented Apr 25, 2023

AndriySvyryd commented Apr 25, 2023

roji commented May 25, 2023

Add type mapping APIs to customize JSON value serialization/deserialization #30677

Add type mapping APIs to customize JSON value serialization/deserialization #30677

Comments

roji commented Apr 12, 2023 • edited

AndriySvyryd commented Apr 21, 2023 • edited

AndriySvyryd commented Apr 22, 2023

roji commented Apr 22, 2023 • edited

AndriySvyryd commented Apr 22, 2023

roji commented Apr 22, 2023

AndriySvyryd commented Apr 22, 2023

AndriySvyryd commented Apr 22, 2023

AndriySvyryd commented Apr 22, 2023

roji commented Apr 22, 2023

roji commented Apr 24, 2023

roji commented Apr 24, 2023

AndriySvyryd commented Apr 25, 2023

roji commented Apr 25, 2023

AndriySvyryd commented Apr 25, 2023

roji commented May 25, 2023

roji commented Apr 12, 2023 •

edited

AndriySvyryd commented Apr 21, 2023 •

edited

roji commented Apr 22, 2023 •

edited