Can you import this JSON?

In a recent project, I got the opportunity to learn way more about the SQL Server (2016 and later) JSON functions than I would have even imagined. The task was to “get it imported”. Much like a staging table, the goal here was ingestion, not normal form or relational design (although some of that came for the ride). The file is the full content from the source each time, so there is no append logic here.

The JSON document arrived resembling something like this:

{
    "sales": [
        {
            "pointOfSale": "Register 2",
            "guid": "ac93eaa6-e3eb-46f4-a613-c0ee201681d9",
            "clerk": 1122,
            "end": "2020-07-27T10:20:07.863",
            "completed": true,
            "amount": 48.99,
            "schedule": {
                "scheduleName": "First Shift",
                "supervisor": 448
            },
            "departments": [
                {
                    "guid": "50a6091c-e675-4a2b-a4eb-0f6a343fbc36",
                    "name": "Seasonal",
                    "events": [
                        {
                            "type": "Browsing",
                            "duration": 60
                        },
                        {
                            "type": "Inquiry",
                            "duration": 200,
                            "employeeSkills": [
                                "Sales",
                                "Spanish"
                            ]
                        }
                    ]
                }
            ]
        }
    ]
}

This file is just one sales array item. The real document would have hundreds more.

How did I break this down?

First, I determined the hierarchy and what sections of the document need tables of their own. Next was to decide on data types and the key for each table, which would also be used for the parent to child relationships. JSON arrays (represented by square brackets) indicate where a child table structure begins. In this document, there is an object hierarchy like
sales > departments > events > employeeSkills.
These will be the four tables I demo here.

Getting the JSON document into a variable

The first task is to get the JSON contents in memory. For the purposes of this post, that document is in a file located on my local file system. I get that into a variable like this:

DECLARE @JsonDoc VARCHAR(MAX) =
(
SELECT 
	BulkColumn
FROM 
	OPENROWSET (BULK 'C:\JsonDemo\Sales.json', SINGLE_CLOB) JsonDoc
)

Here I am dealing with ASCII data. Accordingly, I have chosen a VARCHAR data type and SINGLE_CLOB option in OPENROWSET. If I were handling UNICODE data these would have been NVARCHAR and SINGLE_NCLOB respectively.

Enter OPENJSON

This is an amazing system table-valued function. It takes inputs of the JSON itself and a JSON path expression. In its most basic form of syntax, it returns KEY, VALUE, and TYPE columns. To understand this, I found it helpful to see the output on the first item in our topmost array. In my document, that is sales[0]:

SELECT * 
FROM 
OPENJSON(@JsonDoc, '$.sales[0]')

In JSON schema terms, the type‘s value has the following meaning:

0null
1string
2number
3boolean
4array
5object

Sales table structure

I mapped the string, number, and boolean typed values directly into columns with the best fit SQL Server data types. guid mapped directly into a uniqueidentifier, and lengths needed to be chosen for the other strings. The values from the JSON number type need to be reviewed as they can flow into int, decimal, and other numeric types. For the schedule object, I examined it further and found it to have a 1:1 relationship to the Sales table, so I simply extended Sales and added the schedule object’s columns to the end of the Sales table. To examine the contents of the first schedule, I used this:

SELECT * FROM 
OPENJSON(@JsonDoc, '$.sales[0].schedule')

This could be normalized in a later process. For now, I know I’m dealing with another string and number (which seems to be an int).

The departments type of array indicates another table needed, so I

Here is what the code looks like for Sales:

SELECT *
FROM 
OPENJSON(@JsonDoc, '$.sales')
WITH (
	pointOfSale VARCHAR(50),
	[guid] UNIQUEIDENTIFIER,
	clerk INT,
	[end] DATETIME,
	completed BIT,
	amount DECIMAL(18,2),
	scheduleName VARCHAR(50) '$.schedule.scheduleName',
	supervisor INT '$.schedule.supervisor'
)

Here, I’m using the WITH clause to build the column list. When a name is not found at the first level of the path provided in OPENJSON, then an additional column_path must be provided. The scope of $ in the the column_path begins at the item provided in the OPENJSON. So, in this case, the $ in supervisor starts at the ‘$.sales‘ context.

First level Child Table – Departments

Here, the code introduces the CROSS APPLY concept:

SELECT
departments.[guid],
departments.[name],
CONVERT(UNIQUEIDENTIFIER, JSON_VALUE(sales.value, '$.guid')) AS salesGuid
FROM 
OPENJSON(@JsonDoc, '$.sales') AS sales
CROSS APPLY OPENJSON(sales.[value], '$.departments') 
WITH (
	[guid] UNIQUEIDENTIFIER,
	[name] VARCHAR(50)
) AS departments;

Here, I used the WITH clause for the departments table and then used JSON_VALUE to extract the key (salesGuid) from the parent table.

Second level Child Table – Events

This was mostly the same pattern as the departments table until I realized there is no key. I didn’t recognize this as a problem until I saw the array below, employeeSkills, would need to be linked back to the event. For this purpose, I included the key from OPENJSON as a surrogate. Here, I relied on JSON_VALUE completely rather than the WITH clause column list.

SELECT
CONVERT(INT, [events].[key]) AS eventKey,
CONVERT(VARCHAR(50), JSON_VALUE([events].value, '$.type')) AS [type],
CONVERT(INT, JSON_VALUE([events].value, '$.duration')) AS [duration],
CONVERT(UNIQUEIDENTIFIER, JSON_VALUE(departments.value, '$.guid')) AS departmentsGuid
FROM 
OPENJSON(@JsonDoc, '$.sales') AS sales
CROSS APPLY OPENJSON(sales.[value], '$.departments') AS departments
CROSS APPLY OPENJSON(departments.[value], '$.events') AS [events];

Third level Child Table – EmployeeSkills

This one was interesting. It’s just an array of strings, with no column names. Turns out our old friend “value” is here to help. I didn’t need JSON_VALUE with this part, because there is no JSON object structure.

SELECT
CONVERT(INT, [events].[key]) AS eventKey,
CONVERT(UNIQUEIDENTIFIER, JSON_VALUE(departments.value, '$.guid')) AS departmentsGuid,
CONVERT(VARCHAR(50), [employeeSkills].[value]) AS [type]
FROM 
OPENJSON(@JsonDoc, '$.sales') AS sales
CROSS APPLY OPENJSON(sales.[value], '$.departments') AS departments
CROSS APPLY OPENJSON(departments.[value], '$.events') AS [events]
CROSS APPLY OPENJSON([events].[value], '$.employeeSkills') AS [employeeSkills]

Bringing it together

Running all four queries on this document produces four tables than can be joined together in a relational way.

For the final product, I made each table a SELECT INTO with a preceding DROP IF EXISTS and wrapped it all into a Stored Procedure.

There you have it, with four queries and all that learning, this JSON has been put into a relational form! I hope these examples will help others as they face a similar challenge.

Clean up your dbatools

It’s easy to get the latest version of the dbatools PowerShell module:
Update-Module -Name dbatools
However, this does not remove the old versions. Those stay around until you manually do something about it.

To see all the versions installed, this command is helpful:
Get-Module -Name dbatools -ListAvailable | Select-Object Name, Version, Path

Why clean them up?
1) Remove clutter and free up space
2) Eliminates questions about exactly which version you’re running
3) Removes duplicates in the ISE Commands tab like this…

Clean it up
The process is made simple by a script included in dbatools, cleanup.ps1. It is not exposed as a public function, so you need to run it from the module directory.
You could get the full file location or change directory into the path listed for the latest version installed in the above Get-Module output and execute the cleanup.ps1 script.

There is a simpler and repeatable one-liner method that I found in one of Shawn Melton‘s GitHub comments.
It can be run with this command, in an Administrative PowerShell command window, ISE is not supported:
& "$((Get-InstalledModule dbatools).InstalledLocation)\cleanup.ps1"
It will show you the current versions, and then ask you to confirm the removal action.
Then you just have to wait for the cleanup to run (some 3rd party Anti-Virus products can slow down this step)

Confirm it
Then you just run that Get-Module command again:
Get-Module -Name dbatools -ListAvailable | Select-Object Name, Version, Path

A much cleaner set of modules!

Introducing ISDeploymentCmd

A few months back, after resuming SSIS development full-time on a new team, I began to look for ways to incrementally improve the processes causing pain around deployments. New or improved packages would be sent to the DBA in the form of an ISPAC and then he’d open SSMS and deploy to the target server using the context menu on the catalog.

This had a few challenges and opportunities for improvement. We support both SQL 2012 & 2014, so it is critical to use the right Management Studio for the job, otherwise the PackageFormatVersion is incorrect. In addition, I have a preference for using scripted deploys, taking the GUI and (some) of the human element out altogether.

I began researching solutions and very quickly remembered that ISDeploymentWizard.exe had a silent mode. In fact it reminds us about this every time we deploy!

I began using ISDeploymentWizard.exe in my releases to Test and Production environments and had some quick success by calling it in a batch file for the DBA to execute. This worked great until I tried to reproduce the process with other team members. I found that people had several versions of SSMS and BIDS/Data Tools/Visual Studio installed and that moving ISDeploymentWizard.exe around to these different machines got mixed results due to varying versions of the DLL dependencies, such as Microsoft.SqlServer.ManagedDTS. They also needed SQL Data Tools or Developer Edition to have the functionality at all. When trying to run ISDeploymentWizard.exe on a machine without all the dependencies, an error window like this pops up:
 

I wanted something easy to distribute to all the team members and simple to use. I also wanted it to work on a vanilla install of Windows, so that the DBA could run the process on a deploy machine that didn’t necessarily need a SQL Server install. This ruled out the PowerShell solutions that reference a non-redistributable DLL. Eliminating this dependency would also make things easier in the future on a build agent in an automated process.

As I wanted to share this as a free and open source app, I wrote a ISDeploymentCmd, a C# console app on my own time over a weekend. It’s a wrapper around the public SSIS Catalog Stored Procedures that supports the same command line arguments as ISDeploymentWizard.exe. It also adds some additional capabilities, such as logging output. Now, with a simple find/replace in my previously mentioned batch file scripts, I’ve swapped in this tool and had success both in the developer team and in the integration to other environments. At this point it has now been used all the way to Production successfully several times by multiple teams.

Feel free to check out the project on GitHub and grab a copy of the executable on the releases page to try out yourself.

SSRS and Kerberos with a Cluster

I recently came across an error involving SQL Server Reporting Services (SSRS) that I recognized immediately but still took me some time to resolve due to intricacies of this particular environment.

This involves a scenario with a Web App calling into SSRS which then connects to a separate SQL Cluster.

The admin of the Web App reported this error when calling SSRS:

An error has occurred during report processing. (rsProcessingAborted) Cannot create a connection to data source ‘DataSource’. (rsErrorOpeningConnection)
For more information about this error navigate to the report server on the local server machine, or enable remote errors

As usual, not a lot of details. So I went into the Report Manager web interface (from a workstation) and tested the datasource, which turned out to be using Windows integrated security. I got

Login failed for user ‘NT AUTHORITYANONYMOUS LOGON’

I had a feeling this was Kerberos related right away. The Web App admin, agreed and mentioned adding SPNs.

After much research, I found that the needed SPNs are

  • HTTP / SSRS server, with no port
  • SQL cluster, with SQL port

In my environment the SSRS server is accessed by its machine name but if it had a separate DNS entry or was load-balanced, this would have changed the name I needed to register.

We also had multiple nodes in the SQL cluster but the report datasource used the cluster name, so it is the name that matters for the “second hop”.

The solution that worked for us was like this:

setspn -a HTTP/ReportServer mydomainsqlreportsvc setspn -a HTTP/ReportServer.mydomain.local mydomainsqlreportsvc setspn -a MSSQLSvc/SqlCluster:1433 mydomainsqlEngineServiceAcct setspn -a MSSQLSvc/SqlCluster.mydomain.local:1433 mydomainsqlEngineServiceAcct 

In this example the SSRS machine name is “ReportServer” in the “mydomain.local” domain running with a services account of “sqlreportsvc”. The multi-node SQL cluster had a cluster name of “SqlCluster” and it was running on port 1433. Names have been changed to protect the innocent.

Shout out:
I have to say Scott Stauffer’s chapter on Kerberos in SQL Server MVP Deep Dives was an excellent resource for me during the problem solving and fix of this issue.

Little fish, big pond

The SQL Server Community feels large. There are so many resources online at my disposal that I may even say it is bit overwhelming.

There so are many that greats that came before me which leads me to ask, what could I possibly contribute by having yet another blog?

The answer is quite simple – it is a challenge to myself. A challenge to begin sharing and contibuting more often. I have been a dark matter DBA for far too long!

The quest begins online now. Eventually, I may put together material that is interesting enough and work up the courage to present in person at an event such as a SQL Saturday, Code Camp, or other user group.

In writing this, I’m looking forward to the grand SQL Server event of them all, PASS Summit. From past experience, I shall be returning all insipred, fired up, and ready to apply all the great new nuggets that are learned.

To all the SQL Family, hope to see you around, both online and at live events.