使用Blob存储作为数据源按需对SQL中的数据进行分区 [英] Partitioning Data in SQL On-Demand with Blob Storage as Data Source

查看:111
本文介绍了使用Blob存储作为数据源按需对SQL中的数据进行分区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Amazon Redshift中,有一种方法可以在将S3存储桶用作数据源时创建分区键.链接.

In Amazon Redshift there is a way to create a partition key when using your S3 bucket as a data source. Link.

我正在尝试使用SQL On-Demand服务在Azure Synapse中做类似的事情.

I am attempting to do something similar in Azure Synapse using the SQL On-Demand service.

目前,我有一个存储帐户,该帐户已按以下方案进行了分区:

Currently I have a storage account that is partitioned such that it follows this scheme:

-Sales (folder)
  - 2020-10-01 (folder)
    - File 1
    - File 2
  - 2020-10-02 (folder)
    - File 3
    - File 4

要创建视图并提取所有4个文件,我运行了命令:

To create a view and pull in all 4 files I ran the command:

CREATE VIEW testview3 AS SELECT * FROM OPENROWSET ( BULK 'Sales/*/*.csv', FORMAT = 'CSV', PARSER_VERSION = '2.0', DATA_SOURCE = 'AzureBlob', FIELDTERMINATOR = ',', FIRSTROW = 2 ) AS tv1;

如果我运行 SELECT * FROM [myview] 的查询,我将从所有4个文件中接收数据.

If I run a query of SELECT * FROM [myview] I receive data from all 4 files.

我该如何创建分区键,以便可以运行查询,例如

How can I go about creating a partition key so that I could run a query such as

SELECT * FROM [myview] WHERE folderdate > 2020-10-01

这样我只能分析文件3和4中的数据?

so that I can only analyze data from Files 3 and 4?

我知道我可以编辑OPENROWSET BULK语句,但是我希望能够首先从容器中获取所有数据,然后根据需要限制搜索.

I know I can edit my OPENROWSET BULK statement but I want to be able to get all the data from my container at first and then constrain searches as needed.

推荐答案

无服务器SQL可以使用文件名(您希望在其中加载一个或多个特定文件)和文件路径(您在其中加载所有文件)来解析分区文件夹结构.这个说的路径).有关语法和用法的更多信息,请参见在线文档.

Serverless SQL can parse partitioned folder structure's using the filename (where you wish to load a specific file or files) and filepath (where you wish to load all files in this said path). More information on syntax and usage is available on documentation online.

对于您而言,您可以使用文件路径语法(例如filepath(1)>'2020-10-01'

In your case, you can parse all files from '2020-10-01' and beyond using the filepath syntax such as filepath(1) > '2020-10-01'

这篇关于使用Blob存储作为数据源按需对SQL中的数据进行分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆