将大型XML文件导入SQL 2.5Gb [英] Importing Large XML file into SQL 2.5Gb

查看:100
本文介绍了将大型XML文件导入SQL 2.5Gb的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将大型XML文件导入到sql服务器上的表中(2014年)

Hi I am trying to import a large XML file into a table on my sql server (2014)

我将下面的代码用于较小的文件,并认为这是可以的,因为这是一次启动,昨天我启动了该查询,并且当我今天开始工作时查询仍在运行,因此,这显然是错误的路线.

I have used the code below for smaller files and thought it would be ok as this is a once off, I kicked it off yesterday and the query was still running when I came into work today so this is obviously the wrong route.

这是代码.

CREATE TABLE files_index_bulk
(
Id INT IDENTITY PRIMARY KEY,
XMLData XML,
LoadedDateTime DATETIME
)


INSERT INTO files_index_bulk(XMLData, LoadedDateTime)
SELECT CONVERT(XML, BulkColumn, 2) AS BulkColumn, GETDATE() 
FROM OPENROWSET(BULK 'c:\scripts\icecat\files.index.xml', SINGLE_BLOB) AS x;


SELECT * FROM files_index_bulk

任何人都可以指出另一种执行此操作的方法,请ive环顾四周以导入大文件为特色,并且它一直在重新使用批量文件.我已经是了.

Can anyone point out another way of doing this please ive looked around at importing large files and it keeps coming back to using bulk. which I already am.

提前谢谢.

这是我要提取所有数据的表.

here is the table I am using I want to pull all the data into.

USE [ICECATtesting]
GO

/****** Object:  Table [dbo].[files_index]    Script Date: 28/04/2017 20:10:44 
******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[files_index](
    [Product_ID] [int] NULL,
    [path] [varchar](100) NULL,
    [Updated] [varchar](50) NULL,
    [Quality] [varchar](50) NULL,
    [Supplier_id] [int] NULL,
    [Prod_ID] [varchar](1) NULL,
    [Catid] [int] NULL,
    [On_Market] [int] NULL,
    [Model_Name] [varchar](250) NULL,
    [Product_View] [int] NULL,
    [HighPic] [varchar](1) NULL,
    [HighPicSize] [int] NULL,
    [HighPicWidth] [int] NULL,
    [HighPicHeight] [int] NULL,
    [Date_Added] [varchar](150) NULL
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

这是xml文件的摘要.

and here is a snippit of the xml file.

<ICECAT-interface xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://data.icecat.biz/xsd/files.index.xsd">
  <files.index Generated="20170427010009">
  <file path="export/level4/EN/11.xml" Product_ID="11" Updated="20170329110432" Quality="SUPPLIER" Supplier_id="2" Prod_ID="PS300E-03YNL-DU" Catid="151" On_Market="0" Model_Name="Satellite 3000-400" Product_View="587591" HighPic="" HighPicSize="0" HighPicWidth="0" HighPicHeight="0" Date_Added="20050627000000">
  </file>
  <file path="export/level4/EN/12.xml" Product_ID="12" Updated="20170329110432" Quality="ICECAT" Supplier_id="7" Prod_ID="91.42R01.32H" Catid="151" On_Market="0" Model_Name="TravelMate  740LF" Product_View="40042" HighPic="http://images.icecat.biz/img/norm/high/12-31699.jpg" HighPicSize="19384" HighPicWidth="170" HighPicHeight="192" Date_Added="20050627000000">
  </file>
  <file path="export/level4/EN/13.xml" Product_ID="13" Updated="20170329110432" Quality="SUPPLIER" Supplier_id="2" Prod_ID="PP722E-H390W-NL" Catid="151" On_Market="0" Model_Name="Portégé 7220CT / NW2" Product_View="37021" HighPic="http://images.icecat.biz/img/norm/high/13-31699.jpg" HighPicSize="27152" HighPicWidth="280" HighPicHeight="280" Date_Added="20050627000000">
  </file>

推荐答案

SQL Server中XML列值的最大大小为2GB.无法将2.5GB的文件导入到单个XML列中.

The max size of an XML column value in SQL Server is 2GB. It will not be possible to import a 2.5GB file into a single XML column.

更新

由于您的基本目标是将文件中的XML元素转换为表行,因此无需将整个文件内容暂存为单个XML列.通过在客户端代码中分解XML并使用批量插入技术来插入多行批次,可以避免2GB的限制,减少内存需求并提高性能.

Since your underlying objective is to transform XML elements within the file into table rows, you don't need to stage the entire file contents into a single XML column. You can avoid the 2GB limitation, reduce memory requirements, and improve performance by shredding the XML in client code and using a bulk insert technique to insert batches of multiple rows.

下面的示例Powershell脚本使用XmlTextReader来避免将整个XML读取到DOM中,并使用SqlBulkCopy一次插入许多行的批处理.这些技术的结合将使您能够在几分钟而不是几小时内插入数百万行.这些相同的技术可以在自定义应用或SSIS脚本任务中实现.

The example Powershell script below uses an XmlTextReader to avoid reading the entire XML into a DOM and uses SqlBulkCopy to insert batches of many rows at once. The combination of these techniques should allow you to insert millions rows in minutes rather than hours. These same techniques can be implemented in a custom app or SSIS script task.

我注意到几个表列指定了varchar(1),但是XML属性值包含许多字符.您需要扩展列的长度或转换源值.

I noticed a couple of the table columns specify varchar(1) yet the XML attribute values contain many characters. You'll need to either expand length of the columns or transform the source values.

[String]$global:connectionString = "Data Source=YourServer;Initial Catalog=YourDatabase;Integrated Security=SSPI";
[System.Data.DataTable]$global:dt = New-Object System.Data.DataTable;
[System.Xml.XmlTextReader]$global:xmlReader = New-Object System.Xml.XmlTextReader("C:\FilesToImport\files.xml");
[Int32]$global:batchSize = 10000;

Function Add-FileRow() {
    $newRow = $dt.NewRow();
    $null = $dt.Rows.Add($newRow);
    $newRow["Product_ID"] = $global:xmlReader.GetAttribute("Product_ID");
    $newRow["path"] = $global:xmlReader.GetAttribute("path");
    $newRow["Updated"] = $global:xmlReader.GetAttribute("Updated");
    $newRow["Quality"] = $global:xmlReader.GetAttribute("Quality");
    $newRow["Supplier_id"] = $global:xmlReader.GetAttribute("Supplier_id");
    $newRow["Prod_ID"] = $global:xmlReader.GetAttribute("Prod_ID");
    $newRow["Catid"] = $global:xmlReader.GetAttribute("Catid");
    $newRow["On_Market"] = $global:xmlReader.GetAttribute("On_Market");
    $newRow["Model_Name"] = $global:xmlReader.GetAttribute("Model_Name");
    $newRow["Product_View"] = $global:xmlReader.GetAttribute("Product_View");
    $newRow["HighPic"] = $global:xmlReader.GetAttribute("HighPic");
    $newRow["HighPicSize"] = $global:xmlReader.GetAttribute("HighPicSize");
    $newRow["HighPicWidth"] = $global:xmlReader.GetAttribute("HighPicWidth");
    $newRow["HighPicHeight"] = $global:xmlReader.GetAttribute("HighPicHeight");
    $newRow["Date_Added"] = $global:xmlReader.GetAttribute("Date_Added");
}

try
{

    # init data table schema
    $da = New-Object System.Data.SqlClient.SqlDataAdapter("SELECT * FROM dbo.files_index WHERE 0 = 1;", $global:connectionString);
    $null = $da.Fill($global:dt);
    $bcp = New-Object System.Data.SqlClient.SqlBulkCopy($global:connectionString);
    $bcp.DestinationTableName = "dbo.files_index";

    $recordCount = 0;

    while($xmlReader.Read() -eq $true)
    {

        if(($xmlReader.NodeType -eq [System.Xml.XmlNodeType]::Element) -and ($xmlReader.Name -eq "file"))
        {
            Add-FileRow -xmlReader $xmlReader;
            $recordCount += 1;
            if(($recordCount % $global:batchSize) -eq 0) 
            {
                $bcp.WriteToServer($dt);
                $dt.Rows.Clear();
                Write-Host "$recordCount file elements processed so far";
            }
        }

    }

    if($dt.Rows.Count -gt 0)
    {
        $bcp.WriteToServer($dt);
    }

    $bcp.Close();
    $xmlReader.Close();

    Write-Host "$recordCount file elements imported";

}
catch
{
    throw;
}

这篇关于将大型XML文件导入SQL 2.5Gb的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆