将固定长度的多记录类型文件提取到ADL中 [英] Fixed length mutli record type file ingestion into ADL

查看:112
本文介绍了将固定长度的多记录类型文件提取到ADL中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

大家好,

新成员,我是一位业务分析师,希望开始使用Azure进行分析.

New member, I am a business analyst that wants to start using Azure for analytics.

首先要简单地开始,基本上是基于过滤原始源文件来创建文件.

Want to start simple first, basically creating files based on filtering raw source files.

不幸的是,我不必使用CSV或XML/JSON,而不能使用大型机文件,固定记录长度,代表销售交易的多种记录类型文件.

Unfortunately, I have to work not with CSVs nor XML/JSON but with mainframe files, fixed record length, multiple record type files representing sales transactions.

最初,Microsoft代表告诉我(帮助我们提供POC),在Azure上无法实现以上操作.在不了解任何更好的情况下,我所做的就是将文件加载到SQL Server表中并分解每个记录类型到自己的表中.然后,执行 使用业务密钥在每个表上进行左外部联接,为每个事务创建具有特定粒度的非规范化记录,然后将其提取到文件中并放入Azure Data Lake中.那时,我使用U-SQL很好 以获得我需要的过滤结果.

I initially was told by a Microsoft rep (helping us with a POC) that the above was not possible with Azure. Without knowing any better, what I did was load the file into a SQL Server table and break out each record type into its own table. Then, performing left outer joins on each of the tables using the business key, created denormalized records for each transaction with the specific granularity required that I then extracted to a file and put into the Azure Data Lake. At that point, I was fine using U-SQL to get the filtered results I needed.

将数据从原始状态转换成可以在上面编写U-SQL的内容应该并不难.我的问题是:可以使用哪些Azure工具/功能来缩短计划"周期的时间.这类非标记/非分隔文件?

It should not be that difficult to get the data from the raw state into something I could write U-SQL on. My question is: What Azure tool/functionality is available to make short work for "schematizing" these kinds of non-tagged/non-delimited files?

感谢和问候,

Paul

推荐答案

大家好,

进行了一些研究,发现Azure Data Factory编写了自定义提取器,这是我需要的.

Did some research and found what I need with Azure Data Factory writing a custom extractor.

谢谢,
保罗

Thanks,
Paul


这篇关于将固定长度的多记录类型文件提取到ADL中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆