如何在Azure数据工厂中读取扩展名为.xlsx和.xls的文件? [英] How to read files with .xlsx and .xls extension in Azure data factory?

查看:96
本文介绍了如何在Azure数据工厂中读取扩展名为.xlsx和.xls的文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在Azure数据工厂数据集中读取具有.xlsx扩展名的Azure Blob存储中的文件并使其表现出色.会引发以下错误

I am trying to read and excel file in Azure Blob Storage with .xlsx extension in my azure data factory dataset. it throws following error

Error found when processing 'Csv/Tsv Format Text' source 'Filename.xlsx' with row number 3: found more columns than expected column count: 1.

在Azure Data Factory中要读取的Excel文件,正确的列和行分隔符是什么

What are the right Column and row delimiters for excel files to be read in azure Data factory

推荐答案

Excel文件具有专有格式,不是简单的定界文件.如所示

Excel files have a proprietary format and are not simple delimited files. As indicated here, Azure Data Factory does not have a direct option to import Excel files, eg you cannot create a Linked Service to an Excel file and read it easily. Your options are:

  1. 将数据导出或转换为平面文件,例如,在传输到云之前,因为.csv,制表符分隔,管道分隔等比Excel文件更易于读取.尽管显然需要更改流程,但这是最简单的选择.
  2. 尝试分解XML-创建自定义任务以将Excel文件作为XML打开并按照建议的此处.
  3. 尝试使用Databricks阅读Excel,一些示例
  1. Export or convert the data as flat files eg before transfer to cloud, as .csv, tab-delimited, pipe-delimited etc are easier to read than Excel files. This is your simplest option although obviously requires a change in process.
  2. Try shredding the XML - create a custom task to open the Excel file as XML and extract your data as suggested here.
  3. SSIS packages are now supported in Azure Data Factory (with the Execute SSIS package activity) and have better support for Excel files, eg a Connection Manager. So it may be an option to create an SSIS package to deal with the Excel and host it in ADFv2. Warning! I have not tested this, I am only speculating it is possible. Also there is the overhead of creating an Integration Runtime (IR) for running SSIS in ADFv2.
  4. Try some other custom activity, eg there is a custom U-SQL Extractor for shredding XML on github here.
  5. Try and read the Excel using Databricks, some examples here although spinning up a Spark cluster to read a few Excel files does seem somewhat overkill. This might be a good option if Spark is already in your architecture.

让我们知道您的生活吧.

Let us know how you get on.

这篇关于如何在Azure数据工厂中读取扩展名为.xlsx和.xls的文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆