微软针对BULK INSERT的技术参考可能存在一些不足...无法使其正常工作 [英] Microsoft tech references for BULK INSERT may have some gaps ... can't get them to work

查看:58
本文介绍了微软针对BULK INSERT的技术参考可能存在一些不足...无法使其正常工作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

巨大的编辑-我删除了;"字符并用'GO'替换它们... ...辅助键和URL有效,除了我得到了以下内容: 无法批量加载.文件"06May2013_usr_tmp_cinmachI.csv"不存在,或者您没有文件访问权限.

Huge edit -- I removed the ';' characters and replace them with 'GO' and ... the secondary key and URL worked, except I got this: Cannot bulk load. The file "06May2013_usr_tmp_cinmachI.csv" does not exist or you don't have file access rights.

顺便说一句,这不是真的:)-我能够使用PowerShell上载文件,因此我确定它不是我的帐户凭据.这是我现在正在使用的代码,再次说明,无论我使用此编辑器做什么,它都无法放入{}块中,对于给您带来的不便,我们深表歉意.

BTW, this can't be true :) -- I'm able to use PowerShell to upload the file so I'm sure it's not my account credentials. Here is the code I'm using now, again, it WON'T fit into a {} block no matter what I do with this editor, sorry for the inconvenience.

文档可以稍后使用CREATE MASTER KEY来加密SECRET,但是没有明显的链接,假设这一切都在幕后-是吗?如果没有,那可能就是导致我访问错误的原因.

The docs can CREATE MASTER KEY is used to encrypt SECRET later on but there's no obvious link, assumed this is all under the hood -- is that right? If not, maybe that's what's causing my access error.

因此,数据源不存在的问题是语法错误-无法使用';'显然可以终止SQL块,但是'GO'可以工作.

So, the issue with the data source not existing was errant syntax -- one can't use ';' evidently to terminate blocks of SQL but 'GO' will work.

CSV文件确实存在:

The CSV file does exist:

CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'S0me!nfo'
GO
CREATE DATABASE SCOPED CREDENTIAL AzureStorageCredential
WITH IDENTITY = 'SHARED ACCESS SIGNATURE',
SECRET = 'removed'
GO
CREATE EXTERNAL DATA SOURCE myDataSource
WITH (TYPE = BLOB_STORAGE, LOCATION = 'https://dtstestcsv.blob.core.windows.net/sunsource', CREDENTIAL = AzureStorageCredential)
GO
BULK
INSERT dbo.ISSIVISFlatFile
FROM '06May2013_usr_tmp_cinmachI.csv'
WITH
(DATA_SOURCE = 'myDataSource', FORMAT = 'CSV')

推荐答案

即使不是完整的答案,我也有义务发布一些信息.

I feel obliged to post at some info even if it's not a full answer.

我遇到此错误:

Msg 4860,第16级,状态1,第58行 无法批量加载.文件"container/folder/file.txt"不存在,或者您没有文件访问权限.

Msg 4860, Level 16, State 1, Line 58 Cannot bulk load. The file "container/folder/file.txt" does not exist or you don't have file access rights.

我认为问题可能是我现在是从 生成我的SAS密钥的,但这是UTC时间,这意味着在澳大利亚,该密钥仅在十小时内有效.因此,我生成了一个新的密钥,该密钥从一个月前开始并且有效.

I believe the problem might have been that I generated my SAS key from right now, but that is UTC time, meaning that here in Australia, the key only becomes valid in ten hours. So I generated a new key that started a month before and it worked.

SAS(共享访问签名)是一个大字符串,其创建方式如下:

The SAS (Shared Access Signature) is a big string that is created as follows:

  1. 在Azure门户中,转到您的存储帐户
  2. 按共享访问签名
  3. 填写字段(确保开始日期是几天前,并且可以将允许的IP地址留空)
  4. 生成SAS
  5. SAS令牌字段中复制字符串
  6. 删除前导?,然后将其粘贴到SQL脚本中
  1. In Azure portal, go to your storage account
  2. Press Shared Access Signature
  3. Fill in fields (make sure your start date is a few days prior, and you can leave Allowed IP addresses blank)
  4. Press Generate SAS
  5. Copy the string in the SAS Token field
  6. Remove the leading ? before pasting it into your SQL script

下面是我带注释的完整脚本.

Below is my full script with comments.

-- Target staging table
IF object_id('recycle.SampleFile') IS NULL
    CREATE TABLE recycle.SampleFile
    (
    Col1 VARCHAR(MAX)
    );


-- more info here
-- https://blogs.msdn.microsoft.com/sqlserverstorageengine/2017/02/23/loading-files-from-azure-blob-storage-into-azure-sql-database/

-- You can use this to conditionally create the master key
select * from sys.symmetric_keys where name like '%DatabaseMasterKey%'


-- Run once to create a database master key
-- Can't create credentials until a master key has been generated
-- Here, zzz is a password that you make up and store for later use
CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'zzz';




-- Create a database credential object that can be reused for external access to Azure Blob
CREATE DATABASE SCOPED CREDENTIAL BlobTestAccount 
WITH 
-- Must be SHARED ACCESS SIGNATURE to access blob storage
IDENTITY= 'SHARED ACCESS SIGNATURE',
-- Generated from Shared Access Signature area in Storage account
-- Make sure the start date is at least a few days before
-- otherwise UTC can mess you up because it might not be valid yet
-- Don't include the ? or the endpoint. It starts with 'sv=', NOT '?' or 'https'
SECRET = 'sv=2016-05-31&zzzzzzzzzzzz'


-- Create the external data source
-- Note location starts with https. I've seen examples without this but that doesn't work
CREATE EXTERNAL DATA SOURCE BlobTest
WITH ( 
    TYPE = BLOB_STORAGE, 
    LOCATION = 'https://yourstorageaccount.blob.core.windows.net',
    CREDENTIAL= BlobTestAccount);



BULK INSERT recycle.SampleFile
FROM 'container/folder/file'
WITH ( DATA_SOURCE = 'BlobTest');




-- If you're fancy you can use these to work out if your things exist first
select * from sys.database_scoped_credentials
select * from sys.external_data_sources

DROP EXTERNAL DATA SOURCE BlobTest;
DROP DATABASE SCOPED CREDENTIAL BlobTestAccount;

ADF所做的不会要做的一件事是基于通配符获取文件.

One thing that this wont do that ADF does, is pick up a file based on wildcard.

也就是说:如果我有一个名为ABC_20170501_003.TXT的文件,则需要在bulk insert加载脚本中显式列出该文件,而在ADF中,我可以仅指定ABC_20170501,并自动对其余的通配符

That is: If I have a file called ABC_20170501_003.TXT, I need to explicitly list that in the bulk insert load script, whereas in ADF I can just specify ABC_20170501 and it automatically wildcards the rest

不幸的是,没有(简便)方法可以从SQL Server枚举blob存储中的文件.我最终通过使用Azure自动化运行powershell脚本来枚举文件并将它们注册到SQL Server可以看到的表中,从而解决了这个问题.这似乎很复杂,但实际上Azure自动化是一个非常有用的学习和使用工具,并且运行非常可靠

Unfortunately there is no (easy) way to enumerate files in blob storage from SQL Server. I eventually got around this by using Azure Automation to run a powershell script to enumerate the files and register them into a table that SQL Server could see. This seems complicated but actually Azure Automation is a very useful tool to learn and use, and it works very reliably

有关ADF的更多意见:

More opinions on ADF:

我找不到将加载的文件名(或其他信息)传递到数据库中的方法.

I couldn't find a way to pass the filename that I loaded (or other info) into the database.

如果您需要按照文件中显示的顺序(即由身份字段捕获的顺序)加载数据,请不要使用ADF. ADF将尝试并行执行操作.实际上,我的ADF确实按顺序插入了一个星期(即由身份记录),然后有一天它才开始按顺序插入东西.

Do not use ADF if you need data to be loaded in the order it appears in the file (i.e. as captured by an identity field). ADF will try and do things in parallel. In fact, my ADF did insert things in order for about a week (i.e. as recorded by the identity) then one day it just started inserting stuff out of order.

时间片概念在有限的情况下很有用(当您将清晰地描述的文件中的清晰地描述的数据想要整齐地放入表中时).在任何其他情况下,它都很复杂,笨拙并且难以理解和使用.以我的经验,实际数据需要更复杂的规则来制定和应用正确的合并键.

The timeslice concept is useful in limited circumstances (when you have cleanly delineated data in cleanly delineated files that you want to drop neatly into a table). In any other circumstances it is complicated, unwieldy and difficult to understand and use. In my experience real world data needs more complicated rules to work out and apply the correct merge keys.

我不知道通过ADF导入文件和通过BULK INSERT导入文件之间的成本差异,但是ADF速度很慢.我不必耐心地通过Azure刀片进行黑客攻击以立即找到指标,但是您在ADF中通话5分钟而在批量插入中通话5秒

I don't know the cost difference between importing files via ADF and files via BULK INSERT, but ADF is slow. I don't have to patience to hack through Azure blades to find metrics right now but your talking 5 minutes in ADF vs 5 seconds in Bulk Insert

尝试使用Azure Data Factory V2. 大大改进,您不再受时间片的束缚.

Try Azure Data Factory V2. It is vastly improved, and you are no longer bound to timeslices.

这篇关于微软针对BULK INSERT的技术参考可能存在一些不足...无法使其正常工作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆