如何在Azure数据工厂中读取扩展名为.xlsx和.xls的文件? [英] How to read files with .xlsx and .xls extension in Azure data factory?
本文介绍了如何在Azure数据工厂中读取扩展名为.xlsx和.xls的文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试在Azure数据工厂数据集中读取具有.xlsx扩展名的Azure Blob存储中的文件并使其表现出色.会引发以下错误
I am trying to read and excel file in Azure Blob Storage with .xlsx extension in my azure data factory dataset. it throws following error
Error found when processing 'Csv/Tsv Format Text' source 'Filename.xlsx' with row number 3: found more columns than expected column count: 1.
在Azure Data Factory中要读取的Excel文件,正确的列和行分隔符是什么
What are the right Column and row delimiters for excel files to be read in azure Data factory
推荐答案
Excel files have a proprietary format and are not simple delimited files. As indicated here, Azure Data Factory does not have a direct option to import Excel files, eg you cannot create a Linked Service to an Excel file and read it easily. Your options are:
- 将数据导出或转换为平面文件,例如,在传输到云之前,因为.csv,制表符分隔,管道分隔等比Excel文件更易于读取.尽管显然需要更改流程,但这是最简单的选择.
- 尝试分解XML-创建自定义任务以将Excel文件作为XML打开并按照建议的执行SSIS包活动),并且对Excel文件(例如连接管理器)具有更好的支持.因此,可能是创建SSIS包以处理Excel并将其托管在ADFv2中的一个选项. 警告!我还没有测试过,我只是猜测这是可能的.此外,还有创建用于在ADFv2中运行SSIS的集成运行时(IR)的开销.
- 尝试其他自定义活动,例如,有一个自定义U-SQL提取器,用于在github 此处.
- 尝试使用Databricks阅读Excel,一些示例
- Export or convert the data as flat files eg before transfer to cloud, as .csv, tab-delimited, pipe-delimited etc are easier to read than Excel files. This is your simplest option although obviously requires a change in process.
- Try shredding the XML - create a custom task to open the Excel file as XML and extract your data as suggested here.
- SSIS packages are now supported in Azure Data Factory (with the Execute SSIS package activity) and have better support for Excel files, eg a Connection Manager. So it may be an option to create an SSIS package to deal with the Excel and host it in ADFv2. Warning! I have not tested this, I am only speculating it is possible. Also there is the overhead of creating an Integration Runtime (IR) for running SSIS in ADFv2.
- Try some other custom activity, eg there is a custom U-SQL Extractor for shredding XML on github here.
- Try and read the Excel using Databricks, some examples here although spinning up a Spark cluster to read a few Excel files does seem somewhat overkill. This might be a good option if Spark is already in your architecture.
让我们知道您的生活吧.
Let us know how you get on.
这篇关于如何在Azure数据工厂中读取扩展名为.xlsx和.xls的文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文