The JSON Schema
Have you ever found yourself in a situation when your application requires pre-defined data which you will use to build your business logic, but besides that you need a capability to extend your data, in a way so it stays meaningful. Maybe even extend it differently depending on circumstances. It might depend on the users location, or preference. Or it might depend on the customer, and customer-specific data.
And the first thing that comes to mind is JSON.
Why not store extra data as a JSON blob? It won’t be engaged in any business logic. And it is absolutely fine. But. It will eliminate only part of the problem. Data format and storage. Usually there are a few other requirements surrounding the data before and after it gets stored.
Before storing the data we often need to perform some sort of validation against it. It is flexible data, however we don’t expect it to be random.
Lets say we are capturing parcel rout data through the checkpoints/warehouses and apart from title, date and warehouse name in certain cases we want additionally store location of the warehouse which is identified by address or (if address is not available) geographical coordinates.
That said we are bringing some static data we know will always be there (id, title, warehouse name) and some extra data which mayhap be there (either address or geographical coordinates).
Now that we know what the data will look-like, let’s outline its shape. We want address to always follow the same pattern and be something like 1600 Pennsylvania Avenue NW, Washington, DC (street, city and state) where all fields are compulsory. For the geographical coordinates we will use latitude and longitude e.g. (48.858093, 2.294694). By definition latitude should be in the range form -90 to 90 and longitude in the range from -180 to 180. Both latitude and longitude are mandatory.
Latitude (-90 90) and Longitude (-180 180)
After storing this data, we want it to stay meaningful and consumable. We want to make sure that we still understand what this data means and represents.
Some other service might read the data afterwards and render it or perform operations depending on what the data is. E.g. if the location is geo coordinates, show a point on a map, or if it is an address, show formatted text. Another example of a data consumer might be an aggregation/reporting service which needs to calculate the amount of parcels passing through a certain area, which can be determined by either an address, coordinates or just warehouse name.
We need a kind of a contract for JSON. A declarative way to define its structure. So we can perform validation and ensure data is stored in a valid shape.
Something like XSD, but for JSON. And luckily there is an open specification for that. Please welcome the JSON Schema.
JSON Schema is a vocabulary that allows you to annotate and validate JSON documents.
JSON Schema is a powerful tool for validating the structure of JSON data.
JSON Schema specifies a JSON-based format to define the structure of JSON data for validation, documentation, and interaction control. It provides a contract for the JSON data required by a given application, and how that data can be modified.
Well, you get the idea.
Lets try to use JSON Schema to define our flexible data.
We’ll start with an address. The definition below describes the address as an object that must contain street address, city and state. All of these fields are string, are required, and no extra fields are allowed to add to this object.
Now it is time to define the geographical coordinates. We declare latitude and longitude as a numeric value in pre-defined ranges (-90:90 and -180:180 correspondingly). Both of them are required and no extra fields are allowed.
So far so good. Now that we have both definitions, let’s specify that we only want to use one of them.
Bringing together address and geo coordinates definition into single JSON Schema will look as follows.
Alright, it looks like the assembled location JSON Schema ticks all of our data requirements. Both examples illustrated below will be valid.
Implementation
Now that we have our flexible part sorted out, let’s wire it up in a strongly typed language. I will use C# further down, but you will find most major languages have a decent implementation.
Lets start with a class definition. As we stated earlier, our application relies on an Id, Title and WarehouseName. Some business logic will surround these properties.
|
|
Next step will be to ensure the data is extensible. Probably the easiest way to achieve it would be by introducing some sort of property bag IDictionary<string, object>
.
|
|
Lets think of (de)serialization?
When expressing JSON in a C# context, we often reference Newtonsoft JSON. It is a rich library for working with JSON, which provides a great way to (de)serialize AdditionalData
property bag by decorating it with JsonExtensionData
attribute. After importing it lets update our class
|
|
Too Easy. Our class is ready to represent JSON. To illustrate it lets write some tests.
Looks good so far, it is time to season it with some JSON Schema validation. Luckily Newtonsoft has that covered with JSchema
type and a set of operations we can perform with it.
Let’s go back to the JSON Schema we constructed earlier and try to test it against our class. I moved the JSON Schema to location-address-or-coord.schema.json file so it won’t bloat the test.
Lots of stuff is happening here. First we are creating an instance of ParcelCheckpoint
. Once it is done we are loading schema from a file and parsing it into JSchema
object. After we get a JObject
from the instance of the ParcelCheckpoint
object we created earlier. Finally we are validating one against another.
Pretty straight forward. However feels like too much going on, especially if it is something we are going to do every now and then. As well as a lot of questionable moments left for practical use. How do we get a full object JSON Schema, including the extra data (e.g. if we want to render an HTML form based on a JSON Schema)? How do we prevent adding values to the instance of an object prior to validation or rollback if it is invalid data? Is there a way to detect collisions between class properties and JSON Schema? etc.
Implementation 2.0
I came across these questions as well and built a small library which encapsulates all these behaviors in a comfortable and practical manner for real-life consumption.
You are more than welcome to check it out ➡️
and extend it ➡️
Lets add the package and re-factor our code.
First stop is to update ParcelCheckpoint
class and inherit it from V.Udodov.Json.Entity
and clean unnecessary properties so it’ll look like this
|
|
Now in order to add additional flexible data we can simply do this
|
|
Too easy, lets bring JSON Schema in.
Here we are using the same JSON Schema as before. Once we add a flexible property it will be automatically validated against JSON Schema if the one is defined. In order to prove it, lets add another test, but now we will try to supply our object with invalid data.
As we can see from the test, once the data is invalid we get JsonEntityValidationException
with some handy information inside right after we attempt to add it.
And retrieving the Full JSON Schema (including static class properties) is quite easy.
|
|
Will print in console
WTF did I just read?
The idea behind this article is to illustrate how powerful and useful JSON Schema is in a world of strongly typed objects.
There are plenty of examples we are faced with during our daily routine. Instead of adding layers of abstractions and custom configuration, we can make objects a bit more loose and take advantage of it.
JSON Schema has been out for a while and nowadays is surrounded with a wide variety of tools.
Once again, you are more than welcome to add your ideas and enhancements.
👋