Can you import this JSON?

In a recent project, I got the opportunity to learn way more about the SQL Server (2016 and later) JSON functions than I would have even imagined. The task was to “get it imported”. Much like a staging table, the goal here was ingestion, not normal form or relational design (although some of that came for the ride). The file is the full content from the source each time, so there is no append logic here.

The JSON document arrived resembling something like this:

{
    "sales": [
        {
            "pointOfSale": "Register 2",
            "guid": "ac93eaa6-e3eb-46f4-a613-c0ee201681d9",
            "clerk": 1122,
            "end": "2020-07-27T10:20:07.863",
            "completed": true,
            "amount": 48.99,
            "schedule": {
                "scheduleName": "First Shift",
                "supervisor": 448
            },
            "departments": [
                {
                    "guid": "50a6091c-e675-4a2b-a4eb-0f6a343fbc36",
                    "name": "Seasonal",
                    "events": [
                        {
                            "type": "Browsing",
                            "duration": 60
                        },
                        {
                            "type": "Inquiry",
                            "duration": 200,
                            "employeeSkills": [
                                "Sales",
                                "Spanish"
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}

This file is just one sales array item. The real document would have hundreds more.

How did I break this down?

First, I determined the hierarchy and what sections of the document need tables of their own. Next was to decide on data types and the key for each table, which would also be used for the parent to child relationships. JSON arrays (represented by square brackets) indicate where a child table structure begins. In this document, there is an object hierarchy like
sales > departments > events > employeeSkills.
These will be the four tables I demo here.

Getting the JSON document into a variable

The first task is to get the JSON contents in memory. For the purposes of this post, that document is in a file located on my local file system. I get that into a variable like this:

DECLARE @JsonDoc VARCHAR(MAX) =
(
SELECT 
	BulkColumn
FROM 
	OPENROWSET (BULK 'C:\JsonDemo\Sales.json', SINGLE_CLOB) JsonDoc
)

Here I am dealing with ASCII data. Accordingly, I have chosen a VARCHAR data type and SINGLE_CLOB option in OPENROWSET. If I were handling UNICODE data these would have been NVARCHAR and SINGLE_NCLOB respectively.

Enter OPENJSON

This is an amazing system table-valued function. It takes inputs of the JSON itself and a JSON path expression. In its most basic form of syntax, it returns KEY, VALUE, and TYPE columns. To understand this, I found it helpful to see the output on the first item in our topmost array. In my document, that is sales[0]:

SELECT * 
FROM 
OPENJSON(@JsonDoc, '$.sales[0]')

In JSON schema terms, the type‘s value has the following meaning:

0null
1string
2number
3boolean
4array
5object

Sales table structure

I mapped the string, number, and boolean typed values directly into columns with the best fit SQL Server data types. guid mapped directly into a uniqueidentifier, and lengths needed to be chosen for the other strings. The values from the JSON number type need to be reviewed as they can flow into int, decimal, and other numeric types. For the schedule object, I examined it further and found it to have a 1:1 relationship to the Sales table, so I simply extended Sales and added the schedule object’s columns to the end of the Sales table. To examine the contents of the first schedule, I used this:

SELECT * FROM 
OPENJSON(@JsonDoc, '$.sales[0].schedule')

This could be normalized in a later process. For now, I know I’m dealing with another string and number (which seems to be an int).

The departments type of array indicates another table needed, so I

Here is what the code looks like for Sales:

SELECT *
FROM 
OPENJSON(@JsonDoc, '$.sales')
WITH (
	pointOfSale VARCHAR(50),
	[guid] UNIQUEIDENTIFIER,
	clerk INT,
	[end] DATETIME,
	completed BIT,
	amount DECIMAL(18,2),
	scheduleName VARCHAR(50) '$.schedule.scheduleName',
	supervisor INT '$.schedule.supervisor'
)

Here, I’m using the WITH clause to build the column list. When a name is not found at the first level of the path provided in OPENJSON, then an additional column_path must be provided. The scope of $ in the the column_path begins at the item provided in the OPENJSON. So, in this case, the $ in supervisor starts at the ‘$.sales‘ context.

First level Child Table – Departments

Here, the code introduces the CROSS APPLY concept:

SELECT
departments.[guid],
departments.[name],
CONVERT(UNIQUEIDENTIFIER, JSON_VALUE(sales.value, '$.guid')) AS salesGuid
FROM 
OPENJSON(@JsonDoc, '$.sales') AS sales
CROSS APPLY OPENJSON(sales.[value], '$.departments') 
WITH (
	[guid] UNIQUEIDENTIFIER,
	[name] VARCHAR(50)
) AS departments;

Here, I used the WITH clause for the departments table and then used JSON_VALUE to extract the key (salesGuid) from the parent table.

Second level Child Table – Events

This was mostly the same pattern as the departments table until I realized there is no key. I didn’t recognize this as a problem until I saw the array below, employeeSkills, would need to be linked back to the event. For this purpose, I included the key from OPENJSON as a surrogate. Here, I relied on JSON_VALUE completely rather than the WITH clause column list.

SELECT
CONVERT(INT, [events].[key]) AS eventKey,
CONVERT(VARCHAR(50), JSON_VALUE([events].value, '$.type')) AS [type],
CONVERT(INT, JSON_VALUE([events].value, '$.duration')) AS [duration],
CONVERT(UNIQUEIDENTIFIER, JSON_VALUE(departments.value, '$.guid')) AS departmentsGuid
FROM 
OPENJSON(@JsonDoc, '$.sales') AS sales
CROSS APPLY OPENJSON(sales.[value], '$.departments') AS departments
CROSS APPLY OPENJSON(departments.[value], '$.events') AS [events];

Third level Child Table – EmployeeSkills

This one was interesting. It’s just an array of strings, with no column names. Turns out our old friend “value” is here to help. I didn’t need JSON_VALUE with this part, because there is no JSON object structure.

SELECT
CONVERT(INT, [events].[key]) AS eventKey,
CONVERT(UNIQUEIDENTIFIER, JSON_VALUE(departments.value, '$.guid')) AS departmentsGuid,
CONVERT(VARCHAR(50), [employeeSkills].[value]) AS [type]
FROM 
OPENJSON(@JsonDoc, '$.sales') AS sales
CROSS APPLY OPENJSON(sales.[value], '$.departments') AS departments
CROSS APPLY OPENJSON(departments.[value], '$.events') AS [events]
CROSS APPLY OPENJSON([events].[value], '$.employeeSkills') AS [employeeSkills]

Bringing it together

Running all four queries on this document produces four tables than can be joined together in a relational way.

For the final product, I made each table a SELECT INTO with a preceding DROP IF EXISTS and wrapped it all into a Stored Procedure.

There you have it, with four queries and all that learning, this JSON has been put into a relational form! I hope these examples will help others as they face a similar challenge.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s