如何获取Azure数据工厂以循环浏览文件夹中的文件 [英] How to get Azure Data Factory to Loop Through Files in a Folder

查看：86 发布时间：2020/9/16 23:23:34 azure azure-data-factory

本文介绍了如何获取Azure数据工厂以循环浏览文件夹中的文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在查看下面的链接.

I am looking at the link below.

https://azure.microsoft.com/zh-CN/updates/data-factory-supports-wildcard-file-filter-for-copy-activity/

我们应该能够在文件夹路径和文件名中使用通配符.如果我们单击活动"，然后单击源"，则会看到此视图.

We are supposed to have the ability to use wildcard characters in folder paths and file names. If we click on the 'Activity' and click 'Source', we see this view.

我想每天浏览几个月，所以应该像这样的视图.

I would like to loop through months any days, so it should be something like this view.

当然，这实际上是行不通的.我收到以下错误消息:ErrorCode:"PathNotFound".消息:指定的路径不存在.".给定文件路径和文件名中的特定字符串模式，如何获得该工具以递归方式遍历所有文件夹中的所有文件?谢谢.

Of course that doesn't actually work. I'm getting errors that read: ErrorCode: 'PathNotFound'. Message: 'The specified path does not exist.'. How can I get the tool to recursively iterate through all files in all folders, given a specific pattern of strings in a file path and file name? Thanks.

推荐答案

我想每天循环浏览几个月

I would like to loop through months any days

为此，您可以将两个参数从管道传递到活动，以便可以基于这些参数动态构建路径. ADF V2允许您传递参数.

让我们一步一步地开始该过程:

Let's start the process one by one:

注意:如果需要，也可以从其他活动的输出中传递此参数.参考:

Note: This parameters can be passed from the output of other activities as well if needed. Reference: Parameters in ADF

2. Create two datasets.

2.1 Sink Dataset - Blob Storage here. Link it with your Linked Service and provide the container name (make sure it is existing). Again if needed, it can be passed as parameters.

2.2 Source Dataset - Blob Storage here again or depends as per your need. Link it with your Linked Service and provide the container name (make sure it is existing). Again if needed, it can be passed as parameters.
Note: 1. The folder path decides the path to copy the data. If the container does not exists, the activity will create for you and if the file already exists the file will get overwritten by default.

2. Pass the parameters in the dataset if you want to build the output path dynamically. Here i have created two parameters for dataset named monthcopy and datacopy.

3. Create Copy Activity in the pipeline.

Wildcard Folder Path:

    @{concat(formatDateTime(adddays(utcnow(),-1),'yyyy'),'/',string(pipeline().parameters.month),'/',string(pipeline().parameters.day),'/*')}

where:
    The path will become as: current-yyyy/month-passed/day-passed/* (the * will take any folder on one level)

{
    "name": "pipeline2",
    "properties": {
        "activities": [
            {
                "name": "Copy Data1",
                "type": "Copy",
                "dependsOn": [],
                "policy": {
                    "timeout": "7.00:00:00",
                    "retry": 0,
                    "retryIntervalInSeconds": 30,
                    "secureOutput": false,
                    "secureInput": false
                },
                "userProperties": [],
                "typeProperties": {
                    "source": {
                        "type": "DelimitedTextSource",
                        "storeSettings": {
                            "type": "AzureBlobStorageReadSettings",
                            "recursive": true,
                            "wildcardFolderPath": {
                                "value": "@{concat(formatDateTime(adddays(utcnow(),-1),'yyyy'),'/',string(pipeline().parameters.month),'/',string(pipeline().parameters.day),'/*')}",
                                "type": "Expression"
                            },
                            "wildcardFileName": "*.csv",
                            "enablePartitionDiscovery": false
                        },
                        "formatSettings": {
                            "type": "DelimitedTextReadSettings"
                        }
                    },
                    "sink": {
                        "type": "DelimitedTextSink",
                        "storeSettings": {
                            "type": "AzureBlobStorageWriteSettings"
                        },
                        "formatSettings": {
                            "type": "DelimitedTextWriteSettings",
                            "quoteAllText": true,
                            "fileExtension": ".csv"
                        }
                    },
                    "enableStaging": false
                },
                "inputs": [
                    {
                        "referenceName": "DelimitedText1",
                        "type": "DatasetReference"
                    }
                ],
                "outputs": [
                    {
                        "referenceName": "DelimitedText2",
                        "type": "DatasetReference",
                        "parameters": {
                            "monthcopy": {
                                "value": "@pipeline().parameters.month",
                                "type": "Expression"
                            },
                            "datacopy": {
                                "value": "@pipeline().parameters.day",
                                "type": "Expression"
                            }
                        }
                    }
                ]
            }
        ],
        "parameters": {
            "month": {
                "type": "string"
            },
            "day": {
                "type": "string"
            }
        },
        "annotations": []
    }
}

用于SINK数据集的JSON模板:

{
    "name": "DelimitedText1",
    "properties": {
        "linkedServiceName": {
            "referenceName": "AzureBlobStorage1",
            "type": "LinkedServiceReference"
        },
        "annotations": [],
        "type": "DelimitedText",
        "typeProperties": {
            "location": {
                "type": "AzureBlobStorageLocation",
                "container": "corpdata"
            },
            "columnDelimiter": ",",
            "escapeChar": "\\",
            "quoteChar": "\""
        },
        "schema": []
    }
}

源数据集的JSON模板:

{
    "name": "DelimitedText2",
    "properties": {
        "linkedServiceName": {
            "referenceName": "AzureBlobStorage1",
            "type": "LinkedServiceReference"
        },
        "parameters": {
            "monthcopy": {
                "type": "string"
            },
            "datacopy": {
                "type": "string"
            }
        },
        "annotations": [],
        "type": "DelimitedText",
        "typeProperties": {
            "location": {
                "type": "AzureBlobStorageLocation",
                "folderPath": {
                    "value": "@concat(formatDateTime(adddays(utcnow(),-1),'yyyy'),dataset().monthcopy,'/',dataset().datacopy)",
                    "type": "Expression"
                },
                "container": "copycorpdata"
            },
            "columnDelimiter": ",",
            "escapeChar": "\\",
            "quoteChar": "\""
        },
        "schema": []
    }
}

这篇关于如何获取Azure数据工厂以循环浏览文件夹中的文件的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何获取Azure数据工厂以循环浏览文件夹中的文件 [英] How to get Azure Data Factory to Loop Through Files in a Folder

问题描述

推荐答案

用于SINK数据集的JSON模板:

源数据集的JSON模板:

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何获取Azure数据工厂以循环浏览文件夹中的文件 [英] How to get Azure Data Factory to Loop Through Files in a Folder

问题描述

推荐答案

用于SINK数据集的JSON模板:

源数据集的JSON模板:

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭