迭代blob存储中的特定文件 [英] Iterate specific files in blob storage

查看:80
本文介绍了迭代blob存储中的特定文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在azure blob存储中的文件夹中有20个文件,这些文件具有相似的名称组,例如ACT_ADDRESS.csv,NSW_ADDRESS.csv将与ACT_LOCALITY.csv& 。NSW_LOCALITY.csv 我试图找到一种方法让数据工厂使用
获取文件通配符,例如* ADDRESS.csv。 然后遍历每个* Address.csv文件并将它们加载到SQL DB中。 我尝试了Lookup和GetMetadata,但它们似乎没有获取文件名。 我记得使用* Address.csv的查找似乎是
来获取数据但是它一直在文件夹中循环并将数百万条记录加载到表中。

I have 20 files in a folder in azure blob storage that have similar groups of names eg ACT_ADDRESS.csv, NSW_ADDRESS.csv would be in with ACT_LOCALITY.csv & NSW_LOCALITY.csv.  I am trying to find a way to get the Data Factory to get the files using a wildcard eg *ADDRESS.csv.  Then iterate through each of the *Address.csv files and load them into a SQL DB.  I have tried both a Lookup and GetMetadata but they don't seem to get the file names.  I recall the lookup with *Address.csv seemed to get the data but it kept cycling through the folder and loading millions of records into the table.

管道看起来应该相当简单,但也许还有另一个步骤我失踪了。

The pipeline looks like it should be fairly simple but maybe there is another step that I am missing.

我似乎无法在论坛上找到和举例,所以任何帮助都会很棒。

I couldn't seem to find and examples on the forums so any help would be great.

谢谢

Binway

推荐答案

Hi Binway,

Hi Binway,

为了获取文件名,您可以选择"子项目"。在"数据集"的参数部分中"getMetadata"选项卡活动。请参阅以下屏幕截图:

In order to get the file names, you can select "child items" in the argument section in the "dataset" tab of "getMetadata" activity. Please refer to the below screenshot for the same :

获得childItems后(作为数组),您可以使用forEach活动迭代所有文件。

Once you have the childItems (as an array), you can use a forEach activity to iterate through all the files.

作为替代,您始终可以使用自定义活动(.NET)。要阅读有关自定义活动的更多信息,请参阅

此文档


As an alternate, you can always use a custom activity (.NET). To read more about custom activities, please refer this doc.

希望这会有所帮助。


这篇关于迭代blob存储中的特定文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆