什么时候可以在 SQL 表中保存 JSON 或 XML 数据 [英] When can I save JSON or XML data in an SQL Table

查看:20
本文介绍了什么时候可以在 SQL 表中保存 JSON 或 XML 数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当使用 SQLMySQL(或任何与此相关的关系数据库)时 - 我知道将数据保存在常规列中更适合索引和其他目的...

When using SQL or MySQL (or any relational DB for that matter) - I understand that saving the data in regular columns is better for indexing sake and other purposes...

加载和保存 JSON 数据有时要简单得多 - 并且使开发更容易.

The thing is loading and saving JSON data is sometimes a lot more simple - and makes the development easier.

在数据库中保存原始 JSON 数据是否有任何黄金法则"?

Are there any "golden rules" for saving raw JSON data in the DB?

这样做绝对是不好的做法吗?

Is it absolutely a bad practice to do so?

推荐答案

主要问题是

  • 您打算如何处理这些数据?和
  • 您如何过滤/排序/加入/处理这些数据?

JSON(如 XML)非常适合数据交换、小型存储和通用定义的结构,但它不能参与您在 RDBMS 中运行的典型操作.在大多数情况下,最好将 JSON 数据传输到普通表,并在需要时重新创建 JSON.

JSON (like XML) is great for data exchange, small storage and generically defined structures, but it cannot participate in typical actions you run within your RDBMS. In most cases it will be better to transfer your JSON data into normal tables and re-create the JSON when you need it.

规范化的第一条规则规定,永远不要将多于一位的信息存储到一列中.您会看到一列PersonName"像米老鼠"这样的值?你指着这个哭:立即改变!

The first rule of normalisation dictates, never to store more than one bit of information into one column. You see a column "PersonName" with a value like "Mickey Mouse"? You point to this and cry: Change that immediately!

XML 或 JSON 怎么样?这些类型是否破坏了 1.NF?嗯,是与否......

What about XML or JSON? Are these types breaking 1.NF? Well, yes and no... 

如果它实际上是一位信息,那么将一个完整的结构存储为一位信息是完全可以的.您收到一个 SOAP 响应并想要存储它,因为您可能需要它以供将来参考(但您不会不将这些数据用于您自己的流程)?只需按原样存储

It is perfectly okay to store a complete structure as one bit of information if it is one bit of information actually. You get a SOAP response and want to store it because you might need this for future reference (but you will not use this data for your own processes)? Just store it as is!

现在想象一个表示一个人的复杂结构(XML 或 JSON)(带有它的地址、更多细节......).现在,您将此作为PersonInCharge 放入一列.这是错误的吗?这不应该存在于正确设计的具有外键引用而不是 XML/JSON 的相关表中吗?特别是如果同一个人可能出现在许多不同的行中,那么使用 XML/JSON 方法绝对是错误的.

Now imagine a complex structure (XML or JSON) representing a person (with its address, further details...). Now you put this into one column as PersonInCharge. Is this wrong? Shouldn't this rather live in properly designed related tables with a foreign key reference instead of the XML/JSON? Especially if the same person might occur in many different rows it is definitely wrong to use an XML/JSON approach.

但现在想象一下需要存储历史数据.您希望在特定时间保留此人的数据.几天后,这个人告诉你一个新地址?没问题!如果您需要,旧地址位于 XML/JSON 中...

But now imagine the need to store historical data. You want to persist the person's data for a given moment in time. Some days later the person tells you a new address? No problem! The old address lives in an XML/JSON if you ever need it...

结论:如果您存储数据只是为了保留它,那没关系.如果此数据是唯一部分,那没关系...
但是,如果您经常需要内部部件,或者如果这意味着冗余重复存储,那就不行了...

Conclusion: If you store the data just to keep it, it's okay. If this data is a unique portion, it's okay...
But if you need the internal parts regularly or if this would mean redundant duplicate storage it's not okay...

以下适用于 SQL Server,在其他 RDBM 上可能有所不同.

The following is for SQL Server and might be different on other RDBMs.

XML 不是存储为您看到的文本,而是存储为层次结构树.查询这一点的表现令人惊讶!这个结构不是在字符串级别解析的!
SQL Server (2016+) 中的 JSON 存在于字符串中并且必须被解析.没有真正的原生 JSON 类型(就像有一个原生的 XML 类型).这可能会稍后出现,但现在我假设 JSON 在 SQL Server 上的性能不如 XML(请参阅更新 2 部分).任何需要从 JSON 中读取值都需要大量隐藏的字符串方法调用......

XML is not stored as the text you see, but as a hierarchy tree. Querying this is astonishingly well performing! This structure is not parsed on string level!
JSON in SQL Server (2016+) lives in a string and must be parsed. There is no real native JSON type (like there is a native XML type). This might come later, but for now I'd assume, that JSON will not be as performant as XML on SQL Server (see section UPDATE 2). Any need to read a value out of JSON will need a hell of lot of hidden string method calls...

您的可爱的 DB 艺术家 :-D 知道,按原样存储 JSON 违反了 RDBM 的通用原则.他知道,

your lovable DB artist :-D knows, that storing JSON as is, is against common principles of RDBMs. He knows,

  • JSON 很可能破坏了 1.NF
  • JSON 可能会随时间变化(同一列,不同的内容).
  • JSON 不容易阅读,而且很难过滤/搜索/加入或排序.
  • 这样的操作会将相当多的额外负载转移到可怜的小型数据库服务器上

有一些变通方法(取决于您使用的 RDBMS),但其中大部分都无法按照您希望的方式工作...

There are some workarounds (depending on the RDBMS you are using), but most of them don't work the way you'd like it...

  • 如果您不想使用存储在 JSON 中的数据以进行昂贵的操作(过滤/加入/排序).
    您可以像任何其他仅存在内容一样存储它.我们将许多图片存储为 BLOB,但我们不会尝试过滤所有带有花的图片...
  • 如果你根本不关心里面有什么(只需存储它并作为一个信息读取它)
  • 如果结构是可变的,那么创建物理表和处理 JSON 数据就会变得更加困难.
  • 如果结构嵌套很深,那么物理表中的存储开销很大
  • If you do not want to use data, which is stored within your JSON for expensive operations (filter/join/sort).
    You can store this just as any other exists only content. We are storing many pictures as BLOBs, but we would not try to filter for all images with a flower...
  • If you do not bother at all what's inside (just store it and read it as one bit of information)
  • If the structures are variable, which would make it harder to create physical tables then to work with JSON data.
  • If the structure is deeply nested, that the storage in physical tables is to much overhead

  • 如果您想像使用关系表的数据(过滤器、索引、连接...)一样使用内部数据
  • 如果您要存储重复项(创建冗余)
  • 一般来说:如果您遇到性能问题(在许多典型场景中您肯定会遇到这些问题!)

您可以从字符串列中的 JSON 或 BLOB 开始,并在需要时将其更改为物理表.我的魔法水晶球告诉我,这可能是明天:-D

You might start with the JSON within a string column or as BLOB and change this to physical tables when you need it. My magic crystal ball tells me, this might be tomorrow :-D

在此处查找有关性能和磁盘空间的一些想法:https://stackoverflow.com/a/47408528/5089204

Find some ideas about performance and disc space here: https://stackoverflow.com/a/47408528/5089204

以下解决了 SQL-Server 2016 中的 JSON 和 XML 支持

用户@mike123 指向一个 微软官方博客上的文章似乎在实验中证明,查询 JSON 比查询 XML 快 10 倍SQL 服务器.

User @mike123 pointed to an article on an official microsoft blog which seems to proof in an experiment, that querying a JSON is 10 x faster then querying an XML in SQL-Server.

对此的一些想法:

与实验"的一些交叉检查:

Some cross-checks with the "experiment":

  • 实验"衡量了很多,但不是 XML 与 JSON 的性能.针对相同(未更改)的字符串重复执行相同的操作是不现实的场景
  • 经过测试的示例对于一般陈述来说非常简单
  • 读取的值始终相同,甚至未被使用.优化器会看到这个...
  • 一个字都没有提到强大的XQuery 支持!在数组中查找具有给定 ID 的产品?JSON 需要读取整个批次,然后使用 WHERE 使用过滤器,而 XML 将允许内部 XQuery 谓词.更不用说FLWOR...
  • 实验"我系统上的代码原样显示:JSON 似乎快了 3 倍(但不是 10 倍).
  • /text() 添加到 XPath 将其减少到小于 2x.在相关文章用户Mister Magoo"中已经指出了这一点,但 click-bait 标题仍然没有改变......
  • 使用实验"中给出的如此简单的 JSON最快的纯 T-SQL 方法是 SUBSTRINGCHARINDEX 的组合:-D
  • the "experiment" measures a lot, but not the performance of XML vs. JSON. Doing the same action agaist the same (unchanged) string repeatedly is not a realistic scenario
  • The tested examples are far to simple for a general statement!
  • The value read is always the same and not even used. The optimizer will see this...
  • Not a single word about the mighty XQuery support! Find a product with a given ID within an array? JSON needs to read the whole lot and use a filter afterwards using WHERE, while XML would allow an internal XQuery predicate. Not to speak about FLWOR...
  • the "experiments" code as is on my system brings up: JSON seems to be 3x faster (but not 10x).
  • Adding /text() to the XPath reduces this to less than 2x. In the related article user "Mister Magoo" pointed this out already, but the click-bait title is still unchanged...
  • With such an easy JSON as given in the "experiment" the fastest pure T-SQL approach was a combination of SUBSTRING and CHARINDEX :-D

下面的代码将展示一个更真实的实验

The following code will show a more realistic experiment

  • 使用 JSON 和具有多个 Product 的相同 XML(JSON 数组与同级节点)
  • JSON 和 XML 略有变化(10000 个运行数字)并插入到表中.
  • 有一个针对两个表的初始调用以避免first-call-bias
  • 读取所有 10000 个条目,并将检索到的值插入到另一个表中.
  • 使用 GO 10 将运行此块十次以避免first-call-bias
  • Using a JSON and an identical XML with more than one Product (a JSON array vs. sibling nodes)
  • JSON and XML are slightly changing (10000 running numbers) and inserted into tables.
  • There is an initial call agaist both tables to avoid first-call-bias
  • All 10000 entries are read and the values retrieved are inserted to another table.
  • Using GO 10 will run through this block ten times to avoid first-call-bias

最终结果清楚地表明,JSON 比 XML 慢(没那么多,在一个仍然非常简单的例子中大约是 1.5 倍).

The final result shows clearly, that JSON is slower than XML (not that much, about 1.5x on a still very simple example).

最后声明:

  • 在过度简化的情况下,JSON 可能比 XML 更快
  • 处理 JSON 是纯字符串操作,而 XML 则是解析和转换的.这在第一个操作中相当昂贵,但一旦完成,将加快所有操作.
  • JSON 在 一次性 操作中可能更好(避免创建 XML 的内部分层表示的开销)
  • 使用一个仍然非常简单但更现实的示例,XML 在简单阅读时会更快
  • 每当需要从数组中读取特定元素、过滤数组中包含给定 ProductID 的所有条目或在路径上上下导航时,JSON 都无法支持.它必须完全从字符串中解析出来 - 每次你必须抓住它......
  • With an overly simplified example under undue circumstances JSON can be faster than XML
  • Dealing with JSON is pure string action, while XML is parsed and transformed. This is rather expensive in the first action, but will speed up everything, once this is done.
  • JSON might be better in a one-time action (avoids the overhead of creating an internal hierarchical representation of an XML)
  • With a still very simple but more realistic example XML will be faster in simple reading
  • Whenever there is any need to read a specific element out of an array, to filter all entries where a given ProductID is included in the array, or to navigate up and down the path, JSON cannot hold up. It must be parsed out of a string completely - each time you have to grab into it...

测试代码

USE master;
GO
--create a clean database
CREATE DATABASE TestJsonXml;
GO
USE TestJsonXml;
GO
--create tables
CREATE TABLE TestTbl1(ID INT IDENTITY,SomeXml XML);
CREATE TABLE TestTbl2(ID INT IDENTITY,SomeJson NVARCHAR(MAX));
CREATE TABLE Target1(SomeString NVARCHAR(MAX));
CREATE TABLE Target2(SomeString NVARCHAR(MAX));
CREATE TABLE Times(Test VARCHAR(10),Diff INT)
GO
--insert 10000 XMLs into TestTbl1
WITH Tally AS(SELECT TOP 10000 ROW_NUMBER() OVER(ORDER BY (SELECT NULL))*2 AS Nmbr FROM master..spt_values AS v1 CROSS APPLY master..spt_values AS v2)
INSERT INTO TestTbl1(SomeXml)
SELECT 
N'<Root>
    <Products>
    <ProductDescription>
        <Features>
            <Maintenance>' + CAST(Nmbr AS NVARCHAR(10)) + ' year parts and labor extended maintenance is available</Maintenance>
            <Warranty>1 year parts and labor</Warranty>
        </Features>
        <ProductID>' + CAST(Nmbr AS NVARCHAR(10)) + '</ProductID>
        <ProductName>Road Bike</ProductName>
    </ProductDescription>
    <ProductDescription>
        <Features>
            <Maintenance>' + CAST(Nmbr + 1 AS NVARCHAR(10)) + ' blah</Maintenance>
            <Warranty>1 year parts and labor</Warranty>
        </Features>
        <ProductID>' + CAST(Nmbr + 1 AS NVARCHAR(10)) + '</ProductID>
        <ProductName>Cross Bike</ProductName>
    </ProductDescription>
    </Products>
</Root>'
FROM Tally;

--insert 10000 JSONs into TestTbl2
WITH Tally AS(SELECT TOP 10000 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Nmbr FROM master..spt_values AS v1 CROSS APPLY master..spt_values AS v2)
INSERT INTO TestTbl2(SomeJson)
SELECT 
N'{
    "Root": {
        "Products": {
            "ProductDescription": [
                {
                    "Features": {
                        "Maintenance": "' + CAST(Nmbr AS NVARCHAR(10)) + ' year parts and labor extended maintenance is available",
                        "Warranty": "1 year parts and labor"
                    },
                    "ProductID": "' + CAST(Nmbr AS NVARCHAR(10)) + '",
                    "ProductName": "Road Bike"
                },
                {
                    "Features": {
                        "Maintenance": "' + CAST(Nmbr + 1 AS NVARCHAR(10)) + ' blah",
                        "Warranty": "1 year parts and labor"
                    },
                    "ProductID": "' + CAST(Nmbr + 1 AS NVARCHAR(10)) + '",
                    "ProductName": "Cross Bike"
                }
            ]
        }
    }
}'
FROM Tally;
GO

--Do some initial action to avoid first-call-bias
INSERT INTO Target1(SomeString)
SELECT SomeXml.value('(/Root/Products/ProductDescription/Features/Maintenance/text())[1]', 'nvarchar(4000)')
FROM TestTbl1;
INSERT INTO Target2(SomeString)
SELECT JSON_VALUE(SomeJson, N'$.Root.Products.ProductDescription[0].Features.Maintenance')
FROM TestTbl2;
GO

--Start the test
DECLARE @StartDt DATETIME2(7), @EndXml DATETIME2(7), @EndJson DATETIME2(7);

--Read all ProductNames of the second product and insert them to Target1
SET @StartDt = SYSDATETIME();
INSERT INTO Target1(SomeString)
SELECT SomeXml.value('(/Root/Products/ProductDescription/ProductName/text())[2]', 'nvarchar(4000)')
FROM TestTbl1
ORDER BY NEWID();
--remember the time spent
INSERT INTO Times(Test,Diff)
SELECT 'xml',DATEDIFF(millisecond,@StartDt,SYSDATETIME());

--Same with JSON into Target2
SET @StartDt = SYSDATETIME();
INSERT INTO Target2(SomeString)
SELECT JSON_VALUE(SomeJson, N'$.Root.Products.ProductDescription[1].ProductName')
FROM TestTbl2
ORDER BY NEWID();
--remember the time spent
INSERT INTO Times(Test,Diff)
SELECT 'json',DATEDIFF(millisecond,@StartDt,SYSDATETIME());

GO 10 --do the block above 10 times

--Show the result
SELECT Test,SUM(Diff) AS SumTime, COUNT(Diff) AS CountTime
FROM Times
GROUP BY Test;
GO
--clean up
USE master;
GO
DROP DATABASE TestJsonXml;
GO

结果(Acer Aspire v17 Nitro Intel i7、8GB Ram 上的 SQL Server 2016 Express)

The result (SQL Server 2016 Express on an Acer Aspire v17 Nitro Intel i7, 8GB Ram)

Test    SumTime 
------------------
json    2706    
xml     1604    

这篇关于什么时候可以在 SQL 表中保存 JSON 或 XML 数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆