什么时候可以将JSON或XML数据保存在SQL表中 [英] When can I save JSON or XML data in an SQL Table

查看:94
本文介绍了什么时候可以将JSON或XML数据保存在SQL表中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在使用SQLMySQL(或与此相关的任何关系数据库)时-我知道将数据保存在常规列中对于索引和其他目的更好...

When using SQL or MySQL (or any relational DB for that matter) - I understand that saving the data in regular columns is better for indexing sake and other purposes...

事情是加载和保存JSON数据有时要简单得多.并使开发更容易.

The thing is loading and saving JSON data is sometimes a lot more simple. and makes the development easier.

在数据库中保存原始JSON数据是否有任何黄金法则"?

Are there any "golden rules" for saving raw JSON data in the DB?

这样做绝对是错误的做法吗?

is it absolutely wrong practice to do so?

给出了很好的答案,但是毫无疑问,最井井有条的是@Shnugo给出的答案,应该得到赏金.

Very nice answers were given, but no doubt the most well organized is the answer given by @Shnugo which deserves the bounty.

还想指出@Gordon Linoff和@Amresh Pandey给出的答案,以解释其他特殊用例.

Would also like to point out answers given by @Gordon Linoff and @Amresh Pandey for explaining other special use cases.

感谢上帝,祝大家工作顺利!

Thank god, and good job everyone!

推荐答案

主要问题是

  • 您将如何处理这些数据?和
  • 您如何过滤/排序/合并/处理此数据?

JSON(如XML)非常适合数据交换,小型存储和通用定义的结构,但它不能参与在RDBMS中运行的典型操作.在大多数情况下,最好将JSON数据传输到普通表中,并在需要时重新创建JSON.

JSON (like XML) is great for data exchange, small storage and generically defined structures, but it cannot participate in typical actions you run within your RDBMS. In most cases it will be better to transfer your JSON data into normal tables and re-create the JSON when you need it.

规范化的第一条规则规定,永远不要将多于一位的信息存储到一列中.您看到带有"Mickey Mouse"之类的值的"PersonName"列吗?您指向此并哭了:立即更改它!

The first rule of normalisation dictates, never to store more than one bit of information into one column. You see a column "PersonName" with a value like "Mickey Mouse"? You point to this and cry: Change that immediately!

XML或JSON呢?这些类型是否会破坏1.NF?好吧,是的,没有... 

What about XML or JSON? Are these types breaking 1.NF? Well, yes and no... 

如果将完整结构实际上存储为一点信息,则完全可以将其存储为一部分信息.您会收到SOAP响应并要存储它,因为您可能需要它作为以后的参考(但您不会将此数据用于自己的进程)?只需原样存储

It is perfectly okay to store a complete structure as one bit of information if it is one bit of information actually. You get a SOAP response and want to store it because you might need this for future reference (but you will not use this data for your own processes)? Just store it as is!

现在想象一个代表人的复杂结构(XML或JSON)(及其地址,更多详细信息...).现在,您将此放在PersonInCharge 一栏中.这是错的吗?难道这不应该存在于带有外键引用而不是XML/JSON的经过适当设计的相关表中吗?尤其是如果同一个人可能出现在许多不同的行中,那么使用XML/JSON方法肯定是错误的.

Now imagine a complex structure (XML or JSON) representing a person (with its address, further details...). Now you put this into one column as PersonInCharge. Is this wrong? Shouldn't this rather live in properly designed related tables with a foreign key reference instead of the XML/JSON? Especially if the same person might occur in many different rows it is definitely wrong to use an XML/JSON approach.

但是现在想象一下需要存储历史数据.您想坚持该人在给定时刻的数据.几天后,此人告诉您一个新地址?没问题!如果需要,旧地址将保存在XML/JSON中.

But now imagine the need to store historical data. You want to persist the person's data for a given moment in time. Some days later the person tells you a new address? No problem! The old address lives in an XML/JSON if you ever need it...

结论:如果您只是为了保留数据而存储数据,那就可以了.如果此数据是 unique 部分,就可以了...
但是,如果您定期需要内部零件,或者如果这意味着多余的重复存储,那就不好了...

Conclusion: If you store the data just to keep it, it's okay. If this data is a unique portion, it's okay...
But if you need the internal parts regularly or if this would mean redundant duplicate storage it's not okay...

以下内容适用于SQL Server,在其他RDBM上可能有所不同.

The following is for SQL Server and might be different on other RDBMs.

XML不会存储为您看到的文本,而是存储为层次结构树.查询这是惊人的好表现!无法在字符串级别解析此结构!
SQL Server(2016+)中的JSON位于字符串中,必须进行解析.没有真正的本机JSON类型(例如,有本机XML类型).这可能会在以后出现,但是现在我想假设JSON在SQL Server上的性能不如XML(请参见 UPDATE 2 部分).任何需要从JSON读取值的操作都将需要大量隐藏的字符串方法调用...

XML is not stored as the text you see, but as a hierarchy tree. Querying this is astonishingly well performing! This structure is not parsed on string level!
JSON in SQL Server (2016+) lives in a string and must be parsed. There is no real native JSON type (like there is a native XML type). This might come later, but for now I'd assume, that JSON will not be as performant as XML on SQL Server (see section UPDATE 2). Any need to read a value out of JSON will need a hell of lot of hidden string method calls...

您的可爱的DB艺术家:-D知道,按原样存储 JSON 违反了RDBM的通用原则.他知道,

your lovable DB artist :-D knows, that storing JSON as is, is against common principles of RDBMs. He knows,

  • JSON很可能破坏了1.NF
  • JSON可能会随时间变化(相同的列,不同的内容).
  • JSON不易阅读,并且很难对其进行过滤/搜索/联接或排序.
  • 这样的操作会将相当多的额外负担转移到可怜的小型DB服务器上

有一些解决方法(取决于您所使用的RDBMS),但是大多数方法都无法按照您想要的方式工作...

There are some workarounds (depending on the RDBMS you are using), but most of them don't work the way you'd like it...

  • 如果您不想使用存储在JSON中的数据 ,以进行昂贵的操作(过滤器/联接/排序).
    您可以将其存储为与任何其他仅存在内容一样.我们将许多图片存储为BLOB,但是我们不会尝试过滤所有带有花朵的图片...
  • 如果您完全不打扰里面的东西(只需将其存储并读取为一点信息)
  • 如果结构是可变的,这将使得创建物理表变得更加困难,然后将其与JSON数据一起使用.
  • 如果该结构是深层嵌套的,则物理表中的存储开销很大
  • If you do not want to use data, which is stored within your JSON for expensive operations (filter/join/sort).
    You can store this just as any other exists only content. We are storing many pictures as BLOBs, but we would not try to filter for all images with a flower...
  • If you do not bother at all what's inside (just store it and read it as one bit of information)
  • If the structures are variable, which would make it harder to create physical tables then to work with JSON data.
  • If the structure is deeply nested, that the storage in physical tables is to much overhead

  • 如果要使用内部数据,就像使用关系表的数据(过滤器,索引,联接...)
  • 如果要存储重复项(创建冗余)
  • 通常:如果您遇到性能问题(可以肯定的是,在许多典型情况下都会遇到这些问题!)

您可以在字符串列中以JSON开头或以BLOB开头,并在需要时将其更改为物理表.我的魔幻水晶球告诉我,这可能是明天:-D

You might start with the JSON within a string column or as BLOB and change this to physical tables when you need it. My magic crystal ball tells me, this might be tomorrow :-D

在此处找到有关性能和磁盘空间的一些想法: https://stackoverflow.com/a/47408528/5089204

Find some ideas about performance and disc space here: https://stackoverflow.com/a/47408528/5089204

以下内容解决了SQL-Server 2016中的JSON和XML支持

用户@ mike123指向 Microsoft官方博客上的文章似乎在实验中得到了证明,查询JSON的速度比 10倍快,然后查询XML SQL服务器.

User @mike123 pointed to an article on an official microsoft blog which seems to proof in an experiment, that querying a JSON is 10 x faster then querying an XML in SQL-Server.

对此有一些想法:

一些与实验"相对照的检查:

Some cross-checks with the "experiment":

  • 实验"衡量很多,但不能衡量XML与JSON的性能.反复对相同(不变)的字符串反复进行相同的操作不是现实的情况
  • 经过测试的示例对于一般性声明而言非常简单
  • 读取的值始终相同,甚至不使用.优化器将看到此...
  • 关于强大的XQuery支持一字不漏!在数组中找到具有给定ID的产品? JSON需要读取全部内容,然后使用WHERE使用过滤器,而XML允许内部使用XQuery predicate.不谈论FLWOR ...
  • 系统上的实验"代码 出现了:JSON似乎快了3倍(但没有10倍).
  • XPath中添加/text()会将其减少到小于2倍.在相关文章中,用户"Magento先生"已经指出了这一点,但是 click-诱饵标题仍然保持不变...
  • 借助实验"中提供的简单JSON,最快的纯T-SQL方法是SUBSTRINGCHARINDEX :-D
  • 的组合
  • the "experiment" measures a lot, but not the performance of XML vs. JSON. Doing the same action agaist the same (unchanged) string repeatedly is not a realistic scenario
  • The tested examples are far to simple for a general statement!
  • The value read is always the same and not even used. The optimizer will see this...
  • Not a single word about the mighty XQuery support! Find a product with a given ID within an array? JSON needs to read the whole lot and use a filter afterwards using WHERE, while XML would allow an internal XQuery predicate. Not to speak about FLWOR...
  • the "experiments" code as is on my system brings up: JSON seems to be 3x faster (but not 10x).
  • Adding /text() to the XPath reduces this to less than 2x. In the related article user "Mister Magoo" pointed this out already, but the click-bait title is still unchanged...
  • With such an easy JSON as given in the "experiment" the fastest pure T-SQL approach was a combination of SUBSTRING and CHARINDEX :-D

以下代码将显示更现实的实验

The following code will show a more realistic experiment

  • 使用JSON和具有多个Product的相同XML(JSON数组与同级节点)
  • JSON和XML稍有变化(10000个运行数字),并已插入到表中.
  • 两个表都有初始调用,以避免 first-call-bias
  • 读取所有10000个条目,并将检索到的值插入到另一个表中.
  • 使用GO 10将在此块中运行十次,以避免 first-call-bias
  • Using a JSON and an identical XML with more than one Product (a JSON array vs. sibling nodes)
  • JSON and XML are slightly changing (10000 running numbers) and inserted into tables.
  • There is an initial call agaist both tables to avoid first-call-bias
  • All 10000 entries are read and the values retrieved are inserted to another table.
  • Using GO 10 will run through this block ten times to avoid first-call-bias

最终结果清楚地表明,JSON比XML慢(不是那么多,在一个非常简单的示例中约为1.5倍).

The final result shows clearly, that JSON is slower than XML (not that much, about 1.5x on a still very simple example).

最终声明:

  • 在过度情况下,通过过于简化的示例,JSON可能比XML快
  • 处理JSON是纯字符串操作,而XML则经过解析和转换.第一步的费用相当昂贵,但是一旦完成,它将加速所有操作.
  • 一次性执行一次 JSON可能会更好(避免创建XML的内部层次表示形式的开销)
  • 使用仍然非常简单但更实际的示例,XML的简单阅读速度会更快
  • 无论何时需要从数组中读取特定元素,过滤数组中包含给定ProductID的所有条目,或在路径中上下移动时,JSON都不会阻止.必须从字符串中完全解析出它-每次您必须抓住它时...
  • With an overly simplified example under undue circumstances JSON can be faster than XML
  • Dealing with JSON is pure string action, while XML is parsed and transformed. This is rather expensive in the first action, but will speed up everything, once this is done.
  • JSON might be better in a one-time action (avoids the overhead of creating an internal hierarchical representation of an XML)
  • With a still very simple but more realistic example XML will be faster in simple reading
  • Whenever there is any need to read a specific element out of an array, to filter all entries where a given ProductID is included in the array, or to navigate up and down the path, JSON cannot hold up. It must be parsed out of a string completely - each time you have to grab into it...

测试代码

USE master;
GO
--create a clean database
CREATE DATABASE TestJsonXml;
GO
USE TestJsonXml;
GO
--create tables
CREATE TABLE TestTbl1(ID INT IDENTITY,SomeXml XML);
CREATE TABLE TestTbl2(ID INT IDENTITY,SomeJson NVARCHAR(MAX));
CREATE TABLE Target1(SomeString NVARCHAR(MAX));
CREATE TABLE Target2(SomeString NVARCHAR(MAX));
CREATE TABLE Times(Test VARCHAR(10),Diff INT)
GO
--insert 10000 XMLs into TestTbl1
WITH Tally AS(SELECT TOP 10000 ROW_NUMBER() OVER(ORDER BY (SELECT NULL))*2 AS Nmbr FROM master..spt_values AS v1 CROSS APPLY master..spt_values AS v2)
INSERT INTO TestTbl1(SomeXml)
SELECT 
N'<Root>
    <Products>
    <ProductDescription>
        <Features>
            <Maintenance>' + CAST(Nmbr AS NVARCHAR(10)) + ' year parts and labor extended maintenance is available</Maintenance>
            <Warranty>1 year parts and labor</Warranty>
        </Features>
        <ProductID>' + CAST(Nmbr AS NVARCHAR(10)) + '</ProductID>
        <ProductName>Road Bike</ProductName>
    </ProductDescription>
    <ProductDescription>
        <Features>
            <Maintenance>' + CAST(Nmbr + 1 AS NVARCHAR(10)) + ' blah</Maintenance>
            <Warranty>1 year parts and labor</Warranty>
        </Features>
        <ProductID>' + CAST(Nmbr + 1 AS NVARCHAR(10)) + '</ProductID>
        <ProductName>Cross Bike</ProductName>
    </ProductDescription>
    </Products>
</Root>'
FROM Tally;

--insert 10000 JSONs into TestTbl2
WITH Tally AS(SELECT TOP 10000 ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS Nmbr FROM master..spt_values AS v1 CROSS APPLY master..spt_values AS v2)
INSERT INTO TestTbl2(SomeJson)
SELECT 
N'{
    "Root": {
        "Products": {
            "ProductDescription": [
                {
                    "Features": {
                        "Maintenance": "' + CAST(Nmbr AS NVARCHAR(10)) + ' year parts and labor extended maintenance is available",
                        "Warranty": "1 year parts and labor"
                    },
                    "ProductID": "' + CAST(Nmbr AS NVARCHAR(10)) + '",
                    "ProductName": "Road Bike"
                },
                {
                    "Features": {
                        "Maintenance": "' + CAST(Nmbr + 1 AS NVARCHAR(10)) + ' blah",
                        "Warranty": "1 year parts and labor"
                    },
                    "ProductID": "' + CAST(Nmbr + 1 AS NVARCHAR(10)) + '",
                    "ProductName": "Cross Bike"
                }
            ]
        }
    }
}'
FROM Tally;
GO

--Do some initial action to avoid first-call-bias
INSERT INTO Target1(SomeString)
SELECT SomeXml.value('(/Root/Products/ProductDescription/Features/Maintenance/text())[1]', 'nvarchar(4000)')
FROM TestTbl1;
INSERT INTO Target2(SomeString)
SELECT JSON_VALUE(SomeJson, N'$.Root.Products.ProductDescription[0].Features.Maintenance')
FROM TestTbl2;
GO

--Start the test
DECLARE @StartDt DATETIME2(7), @EndXml DATETIME2(7), @EndJson DATETIME2(7);

--Read all ProductNames of the second product and insert them to Target1
SET @StartDt = SYSDATETIME();
INSERT INTO Target1(SomeString)
SELECT SomeXml.value('(/Root/Products/ProductDescription/ProductName/text())[2]', 'nvarchar(4000)')
FROM TestTbl1
ORDER BY NEWID();
--remember the time spent
INSERT INTO Times(Test,Diff)
SELECT 'xml',DATEDIFF(millisecond,@StartDt,SYSDATETIME());

--Same with JSON into Target2
SET @StartDt = SYSDATETIME();
INSERT INTO Target2(SomeString)
SELECT JSON_VALUE(SomeJson, N'$.Root.Products.ProductDescription[1].ProductName')
FROM TestTbl2
ORDER BY NEWID();
--remember the time spent
INSERT INTO Times(Test,Diff)
SELECT 'json',DATEDIFF(millisecond,@StartDt,SYSDATETIME());

GO 10 --do the block above 10 times

--Show the result
SELECT Test,SUM(Diff) AS SumTime, COUNT(Diff) AS CountTime
FROM Times
GROUP BY Test;
GO
--clean up
USE master;
GO
DROP DATABASE TestJsonXml;
GO

结果(Acer Aspire v17 Nitro Intel i7、8GB Ram上的SQL Server 2016 Express)

The result (SQL Server 2016 Express on an Acer Aspire v17 Nitro Intel i7, 8GB Ram)

Test    SumTime 
------------------
json    2706    
xml     1604    

这篇关于什么时候可以将JSON或XML数据保存在SQL表中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆