使用外键批量插入嵌套的xml作为第一个表的标识列 [英] Bulk insert nested xml with foreign key as identity column of first table

查看：68 发布时间：2020/9/17 21:02:50 sql-server xml tsql azure-sql-database sqlxml

本文介绍了使用外键批量插入嵌套的xml作为第一个表的标识列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的xml如下:

<Records>
  <Record>
    <Name>Best of Pop</Name>
    <Studio>ABC studio</Studio>
    <Artists>
      <Artist>
        <ArtistName>John</ArtistName>
        <Age>36</Age>            
      </Artist> 
      <Artist>
        <ArtistName>Jessica</ArtistName>
        <Age>20</Age>            
      </Artist>
    </Artists>
  </Record>
  <Record>
    <Name>Nursery rhymes</Name>
    <Studio>XYZ studio</Studio>
    <Artists>
      <Artist>
        <ArtistName>Judy</ArtistName>
        <Age>10</Age>            
      </Artist> 
      <Artist>
        <ArtistName>Rachel</ArtistName>
        <Age>15</Age>            
      </Artist>
    </Artists>
  </Record>
</Records>

此文件可能包含数百万条记录.我在 Azure SQL数据库上运行的MS SQL数据库具有以下2个表来存储这些记录:

This file may contain millions of records. My MS SQL database, running on Azure SQL Database, has the following 2 tables to store these records:

Record(RecordId [PK，身份，自动递增，名称，工作室)

Record (RecordId [PK, identity, auto-increment], Name, Studio)

Artist(RecordId [外键指的是Record.RecordId]，ArtistName，Age)

Artist (RecordId [Foreign Key refers Record.RecordId], ArtistName, Age)

是否可以使用xml节点方法将记录批量插入到Record表中，获取RecordId，然后以单个遍历xml的方式将艺术家信息批量插入到Artist表中?

Is it possible to bulk insert records into the Record table, get the RecordIds and then bulk insert the artist information into the Artist table in a single traversal of the xml using the xml nodes approach?

很长一段时间以来，我一直在寻找一种有效的方法来做到这一点，但徒劳无功.

I have been searching for an efficient way to do this for a long time but in vain.

我尝试了与此处，但我无法解决问题.

I have tried approaches similar to the ones described here and here, but I'm not able to get to the solution.

任何指向解决方案方向的指针都会有很大帮助.

Any pointers in the direction of the solution will be of great help.

更新: @srutzky:感谢您的解决方案.这完全符合我的要求.但是有一个陷阱.我必须使用节点方法来解决.我已经更改了查询的第一部分.但是我被困在第二半.这就是我要干的事.

Update: @srutzky: Thanks for the solution. This works exactly as I wanted. But there is one catch. I have to use the nodes approach for the solution. I have changed the first part of the query. But I'm stuck in the 2nd half. Here's what I have got up to.

DECLARE @Record TABLE (RecordId INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
                       Name NVARCHAR(400) UNIQUE,
                       Studio NVARCHAR(400));
DECLARE @Artist TABLE (ArtistId INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
                       RecordId INT NOT NULL,
                       ArtistName NVARCHAR(400), Age INT);

INSERT INTO @Record (Name, Studio)
   SELECT  T.c.value(N'(Name/text())[1]', 'NVARCHAR(400)'),
           T.c.value(N'(Studio/text())[1]', 'NVARCHAR(400)')
 FROM @ImportData.nodes('/Records/Record') T(c);

SELECT * FROM @Record

您能帮我解决第二部分吗?我是这种xml处理方法的新手.

Could you please help me out with the 2nd part? I'm new to this xml handling approach.

UPDATE2 :我明白了.我绞尽脑汁几个小时，尝试了几件事，终于找到了解决方案.

UPDATE2: And I got it.... I racked my brains for a couple of hours, tried a few things and finally arrived at the solution.

DECLARE @Record TABLE (RecordId INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
                       Name NVARCHAR(400) UNIQUE,
                       Studio NVARCHAR(400));
DECLARE @Artist TABLE (ArtistId INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
                       RecordId INT NOT NULL,
                       ArtistName NVARCHAR(400), 
                       Age INT);

INSERT INTO @Record (Name, Studio)
   SELECT  T.c.value(N'(Name/text())[1]', 'NVARCHAR(400)'),
           T.c.value(N'(Studio/text())[1]', 'NVARCHAR(400)')
 FROM @ImportData.nodes('/Records/Record') T(c);

INSERT INTO @Artist (RecordId, ArtistName, Age)
    SELECT  (SELECT RecordId FROM @Record WHERE Name=T.c.value(N'(../../Name/text())[1]', 'NVARCHAR(400)')),
            T.c.value(N'(ArtistName/text())[1]', 'NVARCHAR(400)'),
           T.c.value(N'(Age/text())[1]', 'INT')
 FROM @ImportData.nodes('/Records/Record/Artists/Artist') T(c);

 SELECT * FROM @Record
 SELECT * FROM @Artist

@srutzky:非常感谢您向我指出了正确的方向.欢迎提出任何改进此解决方案的建议.

@srutzky: Thanks a ton for pointing me in the right direction. Any suggestions to improve this solution are welcome.

推荐答案

由于您无法在同一DML语句的两个表中插入(因此，在Triggers和OUTPUT子句，在这两个方面都不会有帮助).但是可以通过两次有效地完成.关键在于<Record>中<Name>元素的事实，因为它允许我们将Record表用作第二遍的查找表(即，当我们获取Artist行时).

This can't be done in a single pass anyway as you can't insert into two tables in the same DML statement (well, outside of Triggers and the OUTPUT clause, neither of which would help here). But it can be done efficiently in two passes. The fact at the <Name> element within <Record> is unique is the key, as that allows us to use the Record table as the lookup table for the second pass (i.e. when we are getting the Artist rows).

首先，您需要(嗯，应该是 )在Record (Name ASC)上创建一个UNIQUE INDEX.在下面的示例中，我正在使用UNIQUE CONSTRAINT，但这仅是由于我使用表变量而不是临时表来使示例代码更易于重新运行(顶部不需要显式的IF EXISTS DROP).该索引将有助于第二遍的性能.

First, you need (well, should) create a UNIQUE INDEX on Record (Name ASC). In my example below I am using a UNIQUE CONSTRAINT, but that is only due to my using a table variable instead of a temp table to make the example code more easily rerunnable (without needing an explicit IF EXISTS DROP at the top). This index will help the performance of the second pass.

该示例使用OPENXML，因为与之相比，使用.nodes()函数更有效，因为同一文档需要遍历两次. OPENXML函数的最后一个参数2指定文档是基于元素"的，因为默认解析正在寻找基于属性".

The example uses OPENXML as that will most likely be more efficient that using the .nodes() function since the same document needs to be traversed twice. The last parameter for the OPENXML function, the 2, specifies that the document is "Element-based" since the default parsing is looking for "Attribute-based".

DECLARE @DocumentID INT, @ImportData XML;

SET @ImportData = N'
<Records>
  <Record>
    <Name>Best of Pop</Name>
    <Studio>ABC studio</Studio>
    <Artists>
      <Artist>
        <ArtistName>John</ArtistName>
        <Age>36</Age>            
      </Artist> 
      <Artist>
        <ArtistName>Jessica</ArtistName>
        <Age>20</Age>            
      </Artist>
    </Artists>
  </Record>
  <Record>
    <Name>Nursery rhymes</Name>
    <Studio>XYZ studio</Studio>
    <Artists>
      <Artist>
        <ArtistName>Judy</ArtistName>
        <Age>10</Age>            
      </Artist> 
      <Artist>
        <ArtistName>Rachel</ArtistName>
        <Age>15</Age>            
      </Artist>
    </Artists>
  </Record>
</Records>';


DECLARE @Record TABLE (RecordId INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
                       Name NVARCHAR(400) UNIQUE,
                       Studio NVARCHAR(400));
DECLARE @Artist TABLE (ArtistId INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
                       RecordId INT NOT NULL,
                       ArtistName NVARCHAR(400), Age INT);

EXEC sp_xml_preparedocument @DocumentID OUTPUT, @ImportData;

-- First pass: extract "Record" rows
INSERT INTO @Record (Name, Studio)
   SELECT Name, Studio
   FROM   OPENXML (@DocumentID, N'/Records/Record', 2) 
             WITH (Name    NVARCHAR(400)  './Name/text()', 
                   Studio  NVARCHAR(400)  './Studio/text()');


-- Second pass: extract "Artist" rows
INSERT INTO @Artist (RecordId, ArtistName, Age)
   SELECT rec.RecordId, art.ArtistName, art.Age
   FROM   OPENXML (@DocumentID, N'/Records/Record/Artists/Artist', 2) 
             WITH (Name        NVARCHAR(400)  '../../Name/text()',
                   ArtistName  NVARCHAR(400)  './ArtistName/text()', 
                   Age         INT  './Age/text()') art
   INNER JOIN @Record rec
           ON rec.[Name] = art.[Name];


EXEC sp_xml_removedocument @DocumentID;
-------------------

SELECT * FROM @Record ORDER BY [RecordID];
SELECT * FROM @Artist ORDER BY [RecordID];

参考:

OPENXML
sp_xml_preparedocument
sp_xml_removedocument

对于使用.nodes()函数而不是OPENXML的新要求，将可以进行以下操作:

With the new requirement to use the .nodes() function instead of OPENXML, the following will work:

DECLARE @ImportData XML;

SET @ImportData = N'
<Records>
  <Record>
    <Name>Best of Pop</Name>
    <Studio>ABC studio</Studio>
    <Artists>
      <Artist>
        <ArtistName>John</ArtistName>
        <Age>36</Age>            
      </Artist> 
      <Artist>
        <ArtistName>Jessica</ArtistName>
        <Age>20</Age>            
      </Artist>
    </Artists>
  </Record>
  <Record>
    <Name>Nursery rhymes</Name>
    <Studio>XYZ studio</Studio>
    <Artists>
      <Artist>
        <ArtistName>Judy</ArtistName>
        <Age>10</Age>            
      </Artist> 
      <Artist>
        <ArtistName>Rachel</ArtistName>
        <Age>15</Age>            
      </Artist>
    </Artists>
  </Record>
</Records>';

IF (OBJECT_ID('tempdb..#Record') IS NOT NULL)
BEGIN
   DROP TABLE #Record;
END;
IF (OBJECT_ID('tempdb..#Artist') IS NOT NULL)
BEGIN
   DROP TABLE #Artist;
END;

CREATE TABLE #Record (RecordId INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
                      Name NVARCHAR(400) UNIQUE,
                      Studio NVARCHAR(400));
CREATE TABLE #Artist (ArtistId INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
                      RecordId INT NOT NULL,
                      ArtistName NVARCHAR(400),
                      Age INT);


-- First pass: extract "Record" rows
INSERT INTO #Record (Name, Studio)
   SELECT col.value(N'(./Name/text())[1]', N'NVARCHAR(400)') AS [Name],
          col.value(N'(./Studio/text())[1]', N'NVARCHAR(400)') AS [Studio]
   FROM   @ImportData.nodes(N'/Records/Record') tab(col);


-- Second pass: extract "Artist" rows
;WITH artists AS
(
   SELECT col.value(N'(../../Name/text())[1]', N'NVARCHAR(400)') AS [RecordName],
          col.value(N'(./ArtistName/text())[1]', N'NVARCHAR(400)') AS [ArtistName],
          col.value(N'(./Age/text())[1]', N'INT') AS [Age]
   FROM   @ImportData.nodes(N'/Records/Record/Artists/Artist') tab(col)
)
INSERT INTO #Artist (RecordId, ArtistName, Age)
   SELECT rec.RecordId, art.ArtistName, art.Age
   FROM artists art
   INNER JOIN #Record rec
           ON rec.[Name] = art.RecordName;

-- OR --
-- INSERT INTO #Artist (RecordId, ArtistName, Age)
   SELECT rec.RecordId,
          col.value(N'(./ArtistName/text())[1]', N'NVARCHAR(400)') AS [ArtistName],
          col.value(N'(./Age/text())[1]', N'INT') AS [Age]
   FROM   @ImportData.nodes(N'/Records/Record/Artists/Artist') tab(col)
   INNER JOIN #Record rec
           ON rec.Name = col.value(N'(../../Name/text())[1]', N'NVARCHAR(400)');

-------------------

SELECT * FROM #Record ORDER BY [RecordID];
SELECT * FROM #Artist ORDER BY [RecordID];

有两个选项可以插入到上面显示的#Artist中.第一个使用CTE从INSERT/SELECT查询中提取XML提取.另一个是简化版本，类似于问题的 UPDATE 2 中的查询.

There are two options for inserting into #Artist shown above. The first uses a CTE to abstract the XML extraction away from the INSERT / SELECT query. The other is a simplified version, similar to your query in UPDATE 2 of the question.

这篇关于使用外键批量插入嵌套的xml作为第一个表的标识列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用外键批量插入嵌套的xml作为第一个表的标识列 [英] Bulk insert nested xml with foreign key as identity column of first table

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

使用外键批量插入嵌套的xml作为第一个表的标识列 [英] Bulk insert nested xml with foreign key as identity column of first table

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭