将 XML 数据分解为 SQL Server 数据库列的最佳方法 [英] The Best Way to shred XML data into SQL Server database columns

查看:15
本文介绍了将 XML 数据分解为 SQL Server 数据库列的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

将 XML 数据分解为各种数据库列的最佳方法是什么?到目前为止,我主要使用的是节点和值函数,如下所示:

What is the best way to shred XML data into various database columns? So far I have mainly been using the nodes and value functions like so:

INSERT INTO some_table (column1, column2, column3)
SELECT
Rows.n.value('(@column1)[1]', 'varchar(20)'),
Rows.n.value('(@column2)[1]', 'nvarchar(100)'),
Rows.n.value('(@column3)[1]', 'int'),
FROM @xml.nodes('//Rows') Rows(n)

然而,我发现即使是中等大小的 xml 数据,这也会变得非常慢.

However I find that this is getting very slow for even moderate size xml data.

推荐答案

在遇到一个非常相似的问题时偶然发现了这个问题,我一直在运行一个查询,处理一个 7.5MB 的 XML 文件(约 10,000 个节点)大约 3.5约 4 小时后终于放弃.

Stumbled across this question whilst having a very similar problem, I'd been running a query processing a 7.5MB XML file (~approx 10,000 nodes) for around 3.5~4 hours before finally giving up.

但是,经过更多研究后,我发现使用架构输入 XML 并创建 XML 索引(我将批量插入到表中)后,相同的查询在大约 0.04 毫秒内完成.

However, after a little more research I found that having typed the XML using a schema and created an XML Index (I'd bulk inserted into a table) the same query completed in ~ 0.04ms.

如何提高性能!

创建架构的代码:

IF EXISTS ( SELECT * FROM sys.xml_schema_collections where [name] = 'MyXmlSchema')
DROP XML SCHEMA COLLECTION [MyXmlSchema]
GO

DECLARE @MySchema XML
SET @MySchema = 
(
    SELECT * FROM OPENROWSET
    (
        BULK 'C:\Path\To\Schema\MySchema.xsd', SINGLE_CLOB 
    ) AS xmlData
)

CREATE XML SCHEMA COLLECTION [MyXmlSchema] AS @MySchema 
GO

使用类型化的 XML 列创建表的代码:

Code to create the table with a typed XML column:

CREATE TABLE [dbo].[XmlFiles] (
    [Id] [uniqueidentifier] NOT NULL,

    -- Data from CV element 
    [Data] xml(CONTENT dbo.[MyXmlSchema]) NOT NULL,

CONSTRAINT [PK_XmlFiles] PRIMARY KEY NONCLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

创建索引的代码

CREATE PRIMARY XML INDEX PXML_Data
ON [dbo].[XmlFiles] (Data)

不过,有几件事需要牢记.SQL Server 的架构实现不支持 xsd:include.这意味着如果您有一个引用其他架构的架构,您必须将所有这些复制到一个单一的架构中并添加.

There are a few things to bear in mind though. SQL Server's implementation of Schema doesn't support xsd:include. This means that if you have a schema which references other schema, you'll have to copy all of these into a single schema and add that.

我也会得到一个错误:

XQuery [dbo.XmlFiles.Data.value()]: Cannot implicitly atomize or apply 'fn:data()' to complex content elements, found type 'xs:anyType' within inferred type 'element({http://www.mynamespace.fake/schemas}:SequenceNumber,xs:anyType) ?'.

如果我尝试在我使用节点功能选择的节点上方导航.例如

if I tried to navigate above the node I had selected with the nodes function. E.g.

SELECT
    ,C.value('CVElementId[1]', 'INT') AS [CVElementId]
    ,C.value('../SequenceNumber[1]', 'INT') AS [Level]
FROM 
    [dbo].[XmlFiles]
CROSS APPLY
    [Data].nodes('/CVSet/Level/CVElement') AS T(C)

发现处理此问题的最佳方法是使用 OUTER APPLY 实际上对 XML 执行外部联接".

Found that the best way to handle this was to use the OUTER APPLY to in effect perform an "outer join" on the XML.

SELECT
    ,C.value('CVElementId[1]', 'INT') AS [CVElementId]
    ,B.value('SequenceNumber[1]', 'INT') AS [Level]
FROM 
    [dbo].[XmlFiles]
CROSS APPLY
    [Data].nodes('/CVSet/Level') AS T(B)
OUTER APPLY
    B.nodes ('CVElement') AS S(C)

希望这对某人有所帮助,因为那几乎是我的一天.

Hope that that helps someone as that's pretty much been my day.

这篇关于将 XML 数据分解为 SQL Server 数据库列的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆